Commit dfb709b
fix(stream): preserve multi-byte UTF-8 split across chunk boundaries (#2091)
## Summary
`BraintrustStream` corrupted multi-byte UTF-8 characters when an SSE
chunk boundary landed mid-character (e.g. Cloudflare `workerd` flushing
a small first chunk), emitting `U+FFFD` instead of the decoded
character.
Root cause: `btStreamParser` called `decoder.decode(chunk)` without `{
stream: true }`, so the `TextDecoder` flushed its internal buffer on
every chunk and dropped any trailing partial UTF-8 sequence.
## Fix
Pass `{ stream: true }` when decoding each `Uint8Array` chunk so partial
sequences are held until the next chunk arrives, and flush the decoder
once in the transform's `flush` handler to handle a truly-truncated
tail. No behavior change for `string` or `BraintrustStreamChunk` inputs,
and single-chunk UTF-8 input is byte-identical to before.
## Tests
Added a regression test in `js/src/functions/stream.test.ts` that splits
an SSE `text_delta` event inside the 3-byte sequence for `U+201C` and
asserts a clean round-trip with no `U+FFFD`. Fails on `main`, passes
here.
---------
Co-authored-by: Luca Forstner <luca.forstner@gmail.com>1 parent 8a4eb2f commit dfb709b
3 files changed
Lines changed: 42 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
58 | 90 | | |
59 | 91 | | |
60 | 92 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
283 | 283 | | |
284 | 284 | | |
285 | 285 | | |
286 | | - | |
| 286 | + | |
287 | 287 | | |
288 | 288 | | |
289 | 289 | | |
290 | 290 | | |
291 | 291 | | |
292 | 292 | | |
293 | 293 | | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
294 | 298 | | |
295 | 299 | | |
296 | 300 | | |
| |||
0 commit comments