Commit 93aa3c4
Use MmapUseMlockIgnoreErrors to pin model pages in RAM
On memory-constrained machines (e.g. 32 GB with a 21 GB model), plain
Mmap causes constant page eviction and re-fault — 0.2 tok/s decode.
MmapUseMlockIgnoreErrors calls mlock() to pin pages after mmap, keeping
weights resident once loaded. Falls back silently if mlock exceeds the
system limit. Measured 4.7 tok/s decode on M3 Pro 32 GB (24x speedup).
Co-authored-by: Claude <noreply@anthropic.com>1 parent 9c23685 commit 93aa3c4
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
158 | | - | |
| 158 | + | |
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
| |||
0 commit comments