Skip to content

fix: bounds check prefixes index in convert_to_unicode#289

Draft
toddr-bot wants to merge 1 commit into
mainfrom
koan.toddr.bot/fix-issue-284
Draft

fix: bounds check prefixes index in convert_to_unicode#289
toddr-bot wants to merge 1 commit into
mainfrom
koan.toddr.bot/fix-issue-284

Conversation

@toddr-bot

@toddr-bot toddr-bot commented May 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a missing bounds check in convert_to_unicode() where the index variable (set from enc->bytemap[]) is used to access enc->prefixes[index] without validating that index < enc->prefixes_size. A corrupt or malicious .enc file could trigger an out-of-bounds read.

Fixes #284

Changes

  • Add if (index >= enc->prefixes_size) break; after bytemap lookup in convert_to_unicode() (Expat/Expat.xs:1145)
  • Add test crafting an encoding with an out-of-bounds bytemap entry to verify no crash

Test plan

  • make test passes (728 tests, all successful)
  • New test exercises the exact code path with a crafted encoding whose bytemap points beyond prefixes_size
  • Without the fix, the test would trigger an out-of-bounds read; with the fix, convert_to_unicode safely returns -1

Generated by Kōan /fix


Quality Report

Changes: 2 files changed, 84 insertions(+), 1 deletion(-)

Code scan: clean

Tests: passed (OK)

Branch hygiene: clean

Generated by Kōan post-mission quality pipeline

After setting index from enc->bytemap[], validate that it is less than
enc->prefixes_size before using it to access enc->prefixes[] on the next
loop iteration. A corrupt or malicious .enc file could contain a bytemap
entry pointing beyond the prefixes array, causing an out-of-bounds read.

The fix mirrors the validation pattern already used during loading in
XML_LoadEncoding and safely breaks out of the conversion loop when an
invalid index is encountered (returning -1 for unmappable character).

Adds test exercising the code path with a crafted encoding file whose
bytemap contains an index beyond prefixes_size.

Fixes #284

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov

codecov Bot commented May 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.68%. Comparing base (ab2416c) to head (5623d28).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #289      +/-   ##
==========================================
+ Coverage   76.40%   76.68%   +0.27%     
==========================================
  Files           1        1              
  Lines        1102     1115      +13     
  Branches      346      352       +6     
==========================================
+ Hits          842      855      +13     
+ Misses         52       51       -1     
- Partials      208      209       +1     
Flag Coverage Δ
perl 76.68% <100.00%> (+0.27%) ⬆️
xs 76.68% <100.00%> (+0.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a28ad1...5623d28. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

robustness: convert_to_unicode missing bounds check on prefixes array index

1 participant