Skip to content

fix: fix memory bugs and last-char truncation in deHTMLxs XS#55

Merged
toddr merged 1 commit into
mainfrom
koan.toddr.bot/fix-xs-memory-bugs
May 20, 2026
Merged

fix: fix memory bugs and last-char truncation in deHTMLxs XS#55
toddr merged 1 commit into
mainfrom
koan.toddr.bot/fix-xs-memory-bugs

Conversation

@toddr-bot

@toddr-bot toddr-bot commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

What

Fix three bugs in the deHTMLxs XS extension: memory leak, input truncation, and unsafe memory operations.

Why

doit() overwrote the last character of input with \0 before HTML stripping — silently losing one byte of content on every call. It also leaked an intermediate SV via newSVpv + sv_setsv (the temporary was never freed), and used C malloc/free instead of Perl's Newx/Safefree memory API (bypassing Perl's memory debugging and tracking). Additionally, doit() had no guard against empty strings — calling it with size==0 would write to raw[-1] (buffer underflow).

How

  • Truncation: Removed *(raw + size - 1) = '\0' — Perl SVs from SvPV are already null-terminated at raw[size], making this destructive overwrite unnecessary.
  • Memory leak: Replaced newSVpv + sv_setsv with sv_setpv, which sets the value directly without creating a temporary SV.
  • Memory API: Replaced malloc/free with Newx/Safefree. Also replaced deprecated Newz with Newxz in new().
  • Empty string: Added size == 0 guard before buffer operations.
  • isit() mutation: Removed sv_catpv(text, &mynull) hack that unnecessarily touched the input SV.

Testing

All 182 tests pass. Golden .deHTMLxs files regenerated — 5 files grew by exactly 1 byte each (the previously truncated character).

🤖 Generated with Claude Code


Quality Report

Changes: 6 files changed, 15 insertions(+), 34 deletions(-)

Code scan: clean

Tests: passed (OK)

Branch hygiene: clean

Generated by Kōan post-mission quality pipeline

doit() had three bugs: (1) overwriting the last character of input with
'\0' before HTML stripping, silently losing one byte of content;
(2) leaking the intermediate SV created by newSVpv — replaced with
sv_setpv which sets the value directly; (3) using malloc/free instead
of Perl's Newx/Safefree memory API. Also adds an empty-string guard
to prevent buffer underflow when size==0, replaces deprecated Newz
with Newxz in new(), and removes unnecessary input mutation in isit().

Golden test files regenerated to match corrected output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@toddr-bot

Copy link
Copy Markdown
Contributor Author

CPAN Testers Evidence

I analyzed the CPAN Testers failure reports for v2.86 (API data). The results confirm this PR fixes the root cause of all 45 FAIL reports.

Summary

Root Cause Chain

  1. The truncation bug (*(raw + size - 1) = '\0') in doit() drops the final \n from CRLF files, leaving a bare \r at the end of the XS output
  2. The golden .deHTMLxs files correctly encode this truncated output (ending with bare \r)
  3. On Windows, CPAN clients extract tarballs via Archive::Tar in text mode, converting \n\r\n
  4. This turns the input's \r\n\r\r\n, but the golden file's bare trailing \r is unaffected
  5. After truncation of the expanded input, output ends with \r\r, but golden file ends with \rmismatch

Why only html.4, html.5, html.8

These three input files end with </html>\r\n — the truncation creates a trailing bare \r that survives outside tags. The passing CRLF files (html.6, html.7) end with </html> (no trailing newline), so the truncation only removes the tag's closing >, which is swallowed by tag processing.

Why this PR fixes it

With truncation removed, the golden files end with proper \r\n instead of bare \r. On Windows, text-mode extraction consistently converts all \r\n �� \r\r\n in both input and expected files. The XS output from the expanded input matches the expanded expected file at every position, including the end.

🤖 Analysis by Kōan

@toddr toddr marked this pull request as ready for review May 20, 2026 07:25
@toddr toddr merged commit b07fc13 into main May 20, 2026
24 checks passed
@toddr toddr deleted the koan.toddr.bot/fix-xs-memory-bugs branch May 20, 2026 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants