Commit b5285d5
Restore error handling for CUDA weight_sharing_across_methods set_option
The error checks were dropped during refactoring. Without them, a failed
set_option silently disables weight sharing, causing prefill and decode
to allocate separate KV-cache buffers (OOM at runtime with no diagnostic).
Also resolve <turn|> EOS token ID from the tokenizer at startup instead
of hardcoding token 106.
Co-authored-by: Claude <noreply@anthropic.com>1 parent 379a22d commit b5285d5
1 file changed
Lines changed: 18 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
190 | 190 | | |
191 | 191 | | |
192 | 192 | | |
193 | | - | |
194 | | - | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
195 | 211 | | |
196 | 212 | | |
197 | 213 | | |
| |||
0 commit comments