Draft: Add practical UTF-8 support across terminal input/output paths by Kondrashka177 · Pull Request #2416 · cc-tweaked/CC-Tweaked

Kondrashka177 · 2026-04-17T01:07:27Z

Summary

This PR is an attempt to improve practical UTF-8 support in CC: Tweaked's terminal-related paths.

The goal here is not to claim that full Unicode support is solved, but to provide a working implementation for the most common user-facing cases inside CC itself: terminal output, terminal input, text editing, monitor rendering, and related ROM paths.

I am opening this as a draft because I do not consider the design questions around backwards compatibility fully resolved, and I want to present the implementation and tested behaviour clearly rather than oversell it as a finished universal solution.

What this changes

This branch updates multiple terminal and ROM text paths so that UTF-8 text can be handled correctly in common scenarios.

In practice, this includes work around:

terminal text input/output
term.write
term.blit
terminal/window text storage and rendering
read()
edit.lua
monitor rendering
Lua REPL output/error handling paths
cc.pretty
cc.strings
pastebin put/get

Tested behaviour

The following scenarios have been tested and are working in this branch:

UTF-8 output in the normal terminal
UTF-8 input through read()
term.write
term.blit
window.write
window.blit
editing UTF-8 text in edit
rendering UTF-8 text on monitors
Lua REPL no longer failing with Invalid UTF-8 text in the tested cases
pastebin put/get with UTF-8 content

Important limitations

This branch does not claim to solve all Unicode issues.

Known limitations include:

complex emoji sequences
flags
ZWJ-based grapheme clusters
the broader legacy charset/backwards-compatibility problem
cases where existing byte-oriented behaviour may be relied upon by older programs

In other words, this branch is best described as a practical UTF-8 implementation for common CC text workflows, not a complete and compatibility-perfect Unicode redesign.

Backwards compatibility

I understand the main concern here is not just "can UTF-8 be made to work", but whether it can be introduced without breaking long-standing byte-oriented assumptions in CC programs and internal behaviour.

I do not want to pretend this draft fully solves that design problem.

This PR is therefore intended as:

a working implementation of common user-facing cases
a concrete basis for discussion
a demonstration of what currently works well in practice

If this direction is considered fundamentally incompatible with the project's compatibility goals, that is understandable. In that case, this branch may still be useful as a reference implementation or experiment.

Why submit this anyway

Even with the compatibility concerns, I think there is still value in showing the implementation and the practical results.

The issue affects real users in day-to-day use, especially in non-English environments, and this branch demonstrates that a substantial part of the user-facing experience can be improved inside CC itself.

Notes

I am very open to feedback on scope, structure, or whether this is better treated as an experiment/prototype rather than a mergeable change in its current form.

Kondrashka177 added 5 commits April 11, 2026 22:59

unicode-support(wip)

a3b3c30

Cyrillic input fix

cce2a18

monitor fix

774ff03

Invalid UTF-8 text fix1

959b11a

pastebin fix1

6c808cb

Kondrashka177 force-pushed the wip/unicode-support branch from 4e7d56a to 6c808cb Compare April 17, 2026 01:50

Kondrashka177 closed this Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Add practical UTF-8 support across terminal input/output paths#2416

Draft: Add practical UTF-8 support across terminal input/output paths#2416
Kondrashka177 wants to merge 5 commits intocc-tweaked:mc-1.20.xfrom
Kondrashka177:wip/unicode-support

Kondrashka177 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kondrashka177 commented Apr 17, 2026

Summary

What this changes

Tested behaviour

Important limitations

Backwards compatibility

Why submit this anyway

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant