Skip to content

perf: rotate line slices in Insert/DeleteLineArea full-width fast path#119

Open
LXXero wants to merge 1 commit into
charmbracelet:mainfrom
LXXero:scroll-fastpath
Open

perf: rotate line slices in Insert/DeleteLineArea full-width fast path#119
LXXero wants to merge 1 commit into
charmbracelet:mainfrom
LXXero:scroll-fastpath

Conversation

@LXXero

@LXXero LXXero commented Jun 6, 2026

Copy link
Copy Markdown

Problem

Buffer.DeleteLineArea (the screen-scroll path — every newline at the bottom of the scroll region) shifts cells one struct assignment at a time: O(rows×cols) Cell copies per scrolled line. Profiling a real terminal emulator (xerotty) flooding seq 1 2000000 into an 80×24 screen, this single function + the memmoves it generates was ~80% of total process CPU (perf, Linux). At ~100 bytes per Cell that's hundreds of gigabytes moved to scroll two million lines.

Change

  • Full-width areas (the overwhelmingly common case — whole-screen scroll regions): rotate the line slice headers instead of copying cells — O(rows) slice-header moves, with the displaced lines' storage recycled as the cleared lines. Same trick for InsertLineArea (scroll down).
  • Cleared lines fill by direct assignment — Line.Set's wide-cell repair has nothing to repair when every cell in the line is replaced.
  • Partial-width areas (DECSLRM margins): keep the shift semantics but use copy() per row instead of per-cell assignment.

Results

End-to-end seq 1 2000000 wall time in xerotty: 31s → 13.5s from this change alone (→ ~6–8s combined with a scrollback ring buffer in x/vt, PR incoming there).

Existing tests pass; behavior is identical — lines land in the same places with the same contents, only the storage shuffling changed.

Scrolling (DeleteLineArea at the top of the scroll region) shifted
cells one by one — O(rows*cols) Cell struct copies per scrolled
line. During bulk output every newline pays this, and it profiled
at ~80% of total process CPU in a real terminal emulator (xerotty)
flooding 2M lines.

Full-width areas (the common case — whole-screen scroll regions)
now rotate the line slice headers instead: O(rows) slice-header
moves plus clearing the n recycled lines, reusing their storage.
Cleared lines are filled by direct assignment — Line.Set's
wide-cell repair has nothing to repair when every cell is replaced.
Partial-width areas keep the shift but use copy() per row instead
of per-cell assignment.

Measured end-to-end (seq 1 2000000 into an 80x24 emulator): 31s ->
13.5s wall just from this change; combined with a scrollback ring
buffer in x/vt it reaches ~6-8s.
LXXero added a commit to LXXero/xerotty that referenced this pull request Jun 6, 2026
seq 1 2000000 into an 80x24 window took ~31s — 15-25x behind
ghostty (~1.5s) and xfce4-terminal (~2.7s). perf blamed upstream,
layer by layer:

- ultraviolet Buffer.DeleteLineArea shifted cells one struct
  assignment at a time on every scroll (~80% of CPU)
- vt Scrollback.Push evicted via slices.Delete — an O(10k) memmove
  per line of output at the default cap (~47% of the remainder)
- the trailing-blank trim paid interface-unwrapping color compares
  per cell, and whole-line clears ran wide-cell repair per cell

Fixes are upstream PRs (charmbracelet/ultraviolet#119,
charmbracelet/x#888; see also #887) — go.mod pins both modules to
the LXXero fork commits carrying them until they merge, then the
replaces drop. Result: ~6-10s wall, 3-5x faster, within ~3x of the
purpose-built terminals from ~20x behind.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant