Project
cortex
Description
The strip_ansi_codes function incorrectly assumes that CSI (Control Sequence Introducer) escape sequences always terminate with an ASCII alphabetic character (a-z, A-Z). According to ECMA-48, CSI sequences terminate with any byte in the range 0x40-0x7E (@, A-Z, [, , ], ^, _, , a-z, {, |, }, ~). The code uses c.is_ascii_alphabetic()as the termination condition, which misses characters like(0x7E),@(0x40),{, |, }, etc. This causes the function to consume valid text characters that follow such sequences until it encounters an alphabetic character or the string ends. For example, the common terminal sequence \x1b[3(Delete key) ends with, so stripping \x1b[3Textresults inextinstead ofText`, corrupting the data.
Error Message
None (silent data corruption)
Debug Logs
Input: '\x1b[3~Text', Output: 'ext', Expected: 'Text'
System Information
Bounty Challenge v0.1.0
Cortex: 0.0.7 OS: Ubuntu 24.04
Steps to Reproduce
Call strip_ansi_codes with a string containing a CSI sequence that ends with a non-alphabetic character, such as '\x1b[3~Text' (common in terminal input sequences for function keys like Delete, Insert, Page Up/Down)
Observe that the output is missing the first alphabetic character after the escape sequence
Expected Behavior
The function should strip the entire escape sequence \x1b[3~ and return Text
Actual Behavior
The function strips \x1b[3~T (consuming the T because it is the first alphabetic character encountered after the [) and returns ext
Additional Context
Affected code is in the strip_ansi_codes function, specifically the while loop at lines 45-51 that searches for the terminator. The condition if c.is_ascii_alphabetic() should instead check for the full valid range of final bytes (0x40-0x7E).
Project
cortex
Description
The
strip_ansi_codesfunction incorrectly assumes that CSI (Control Sequence Introducer) escape sequences always terminate with an ASCII alphabetic character (a-z, A-Z). According to ECMA-48, CSI sequences terminate with any byte in the range 0x40-0x7E (@, A-Z, [, , ], ^, _,, a-z, {, |, }, ~). The code usesc.is_ascii_alphabetic()as the termination condition, which misses characters like(0x7E),@(0x40),{,|,}, etc. This causes the function to consume valid text characters that follow such sequences until it encounters an alphabetic character or the string ends. For example, the common terminal sequence\x1b[3(Delete key) ends withText, so stripping\x1b[3results inextinstead ofText`, corrupting the data.Error Message
None (silent data corruption)
Debug Logs
Input: '\x1b[3~Text', Output: 'ext', Expected: 'Text'
System Information
Bounty Challenge v0.1.0
Cortex: 0.0.7 OS: Ubuntu 24.04
Steps to Reproduce
Call
strip_ansi_codeswith a string containing a CSI sequence that ends with a non-alphabetic character, such as '\x1b[3~Text' (common in terminal input sequences for function keys like Delete, Insert, Page Up/Down)Observe that the output is missing the first alphabetic character after the escape sequence
Expected Behavior
The function should strip the entire escape sequence
\x1b[3~and returnTextActual Behavior
The function strips
\x1b[3~T(consuming theTbecause it is the first alphabetic character encountered after the[) and returnsextAdditional Context
Affected code is in the
strip_ansi_codesfunction, specifically the while loop at lines 45-51 that searches for the terminator. The conditionif c.is_ascii_alphabetic()should instead check for the full valid range of final bytes (0x40-0x7E).