Commit 08a3780
committed
Fix TextChunker orphan chunk gluing using word count instead of token count
In ProcessParagraphs, the orphan chunk merging logic was splitting
strings by spaces to get word counts and comparing those against the
token limit. This caused merged chunks to exceed the target token count
when the token counter produces different counts than naive word splitting.
Use GetTokenCount() consistently, matching how the rest of the method
already measures chunk sizes.
Fixes #137131 parent 5f282a9 commit 08a3780
1 file changed
+4
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
192 | 192 | | |
193 | 193 | | |
194 | 194 | | |
195 | | - | |
196 | | - | |
| 195 | + | |
| 196 | + | |
197 | 197 | | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
| 198 | + | |
202 | 199 | | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
| 200 | + | |
207 | 201 | | |
208 | 202 | | |
209 | 203 | | |
| |||
0 commit comments