Draft: experimental UTF-8 support #2417
Draft
Kondrashka177 wants to merge 4 commits intocc-tweaked:mc-1.20.xfrom
Draft
Draft: experimental UTF-8 support #2417Kondrashka177 wants to merge 4 commits intocc-tweaked:mc-1.20.xfrom
Kondrashka177 wants to merge 4 commits intocc-tweaked:mc-1.20.xfrom
Conversation
Contributor
|
@Kondrashka177 yk you don't have to comment on every single review wojbie made right? They do not give any useful information but boom our emails. |
Author
|
Sorry:( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This draft PR is an experiment around UTF-8 support in CC: Tweaked.
It improves several common text-handling paths so that non-ASCII text behaves more naturally in normal terminal-based workflows. I am opening this as a draft because this is not a finished solution, and it is not fully backwards-compatible.
What currently works
This branch improves UTF-8 behaviour in several user-facing places, including:
term.write/term.blitwindowread()editKnown problems
This approach breaks compatibility with parts of CC's legacy byte-based text model.
In particular:
term.blitcan now fail on inputs that previously worked if a byte sequence decodes to fewer Unicode code points than its original byte lengthSo while this makes UTF-8 text work better in many common cases, it is not a drop-in replacement for the current behaviour.
Limitations
This is not full Unicode support.
It does not properly handle things like:
The model is still effectively closer to
1 code point = 1 cell.Why I'm opening this
I do not expect this to be merged as-is. I am opening it as a concrete prototype so the tradeoffs are easier to discuss with real code and test results.
I still hope UTF-8 support in some form remains on the table, even if this specific approach is not suitable for upstream.