You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace utf-8 decoder with a compliant one (global TextDecoder with ignoreBOM is usually fine unless you are using stream)
Replace utf-16 decoder with a compliant one (global TextDecoder with ignoreBOM is fine unless you are running in Node.js without ICU, where utf16-le is exposed but broken and utf-16be does not exist)
Adjust legacy multi-byte decoders to behave by the Encoding spec (and likely encoders too, the spec describes those too)
For some of that, you could check how I did it in https://github.com/ExodusOSS/bytes 🙃
Which also exposes utf8/utf16 encoders/decoders and single-byte/legacy multi-byte decoders, but I doubt you want to depend on that as it would increase the tables size 1.5x
Improving the approach here based on that impl could be nice though
Previously: #360, now reopened with more data
Refs: jsdom/whatwg-encoding#22
Note
If interop with WHATWG Encoding is a non-target, feel free to close this
Documenting the discrepancies would be helpful though
In this image,
whatwg-encodingis whaticonv-litedoes (as that's a wrapper on top of iconv-lite, I did not create a separate column)Spec used: https://encoding.spec.whatwg.org/
Half of single-byte encodings including
windows-1252don't match the spec and decode differentlyE.g., even for the most basic
windows-1252encoding:The latter behavior is correct, see the mapping from the Encoding spec
utf-8is wrong when bundled. Because https://npmjs.com/buffer polyfill is wrong andiconv-liteuses that instead of a clean impl.utf-16is wrong because it doesn't produce well-formed stringsAll of the multi-byte encodings don't match the decoders in the WHATWG Encoding spec
I can test iconv-lite separately further but I confirmed that all those discrepancies are also happening on pure
iconv-liteIf interop is desired:
utf-8decoder with a compliant one (global TextDecoder with ignoreBOM is usually fine unless you are using stream)utf-16decoder with a compliant one (global TextDecoder with ignoreBOM is fine unless you are running in Node.js without ICU, where utf16-le is exposed but broken and utf-16be does not exist)For some of that, you could check how I did it in https://github.com/ExodusOSS/bytes 🙃
Which also exposes utf8/utf16 encoders/decoders and single-byte/legacy multi-byte decoders, but I doubt you want to depend on that as it would increase the tables size 1.5x
Improving the approach here based on that impl could be nice though