LangQuant

LPCI: Statefulness Through Language for Stateless Models

langquant is a scaffold-as-state research artifact that tests whether a refreshing language scaffold can serve as the sole working state for a stateless LLM.

Single A/B run, 2026-03-28 — Hermes Labs

In one A/B run (n=1 per condition), a stateless LLM held conversational coherence across 20 turns with zero conversation history. The model never saw any prior messages — a structured language scaffold, refreshed every turn, was the sole state representation.

Transfer entropy analysis is consistent with the scaffold approximating a Markov state: conditioning on the current scaffold left little measurable information flow from prior turns (TE dropped from 0.608 naked to 0.085 compressed — a substantial reduction, not zero). This is a single observation, not a proof; see Caveats.

input is output is input is output

The Problem

Every LLM conversation works the same way:

Turn 1:  [system prompt] + [message 1]
Turn 2:  [system prompt] + [message 1] + [response 1] + [message 2]
Turn 10: [system prompt] + [all 9 prior exchanges] + [message 10]
Turn N:  [system prompt] + [entire history] + [message N]  ← grows without bound

Context grows linearly. Eventually the model chokes, truncates, or loses coherence. Every provider's solution: make the context window bigger. 128k. 200k. 1M tokens.

That's not a solution. That's a bigger bucket for the same leak.

The Thesis

The model is stateless. It has no memory. It has no continuity.

The hypothesis: the text can serve as the state.

Instead of feeding the model a growing conversation, feed it a fixed-budget structured scaffold that encodes the cognitive state of the session — goals, decisions, facts, constraints, vocabulary, open questions — and refreshes every turn.

Every turn:  [scaffold: K tokens, refreshed] + [current message]

The scaffold doesn't grow. It compresses. Turn 20 and turn 2,000 look identical from the model's perspective.

What We Observed

Setup

Main model: qwen3.5:9b (Ollama, local)
State extractor: qwen3.5:4b (extracts state changes as JSON deltas after each turn)
A/B test: 20-turn conversation, two conditions:
- Naked: zero framing, pure state extraction
- Compressed: contrastive IS/NOT markers guiding extraction
Probes at turns 4, 8, 12, 16, 20: recall tests, contradiction injection, false claim detection

Results

The model had amnesia every turn — it only saw the scaffold + current message. Despite this:

Probe	Turn	What happened
Early recall	4	Model correctly recalled all prior decisions (1.0 recall)
Contradiction	8	Model rejected "switch to GPT-4" — scaffold said "state extractor is qwen3.5:4b"
Deep recall	12	Model listed decisions from turns 1–11 accurately (0.93 recall, compressed)
Topic pivot	16	Model recalled turn 1's topic and connected it to turn 15's discussion
Final exam	20	Model listed all decisions in order, caught a false claim

Compression

The scaffold grew slower than the conversation it represented:

Turn	Scaffold (tokens)	Conversation (tokens)	Compression
1	343	114	0.3x (scaffold larger)
5	444	456	1.0x (break-even)
10	613	873	1.4x
15	662	1,363	2.1x
20	789	1,945	2.5x

Scaffold grows at ~23 tokens/turn. Conversation grows at ~97 tokens/turn. The compression ratio improves continuously. At turn 100 the scaffold represents ~10,000 tokens of conversation. At turn 1,000, the gap is enormous.

Information-Theoretic Verification

Using Shannon entropy, mutual information, KL divergence, and transfer entropy (via pyitlib + scipy):

Metric	Naked	Compressed	Reading
Transfer entropy	0.608 bits	0.085 bits	Compressed scaffold left little measurable flow from prior turns
Scaffold entropy	7.30	7.78	Compressed carries more information per token
KL divergence (t1→t20)	—	0.20 → 0.48	Conditions diverge over time
Scaffold-response MI	0.49 bits	0.24 bits	Different information coupling

The transfer entropy drop is the central finding. Conditioning on the current compressed scaffold cut transfer entropy from 0.608 to 0.085 bits — a substantial reduction (not zero), consistent with the scaffold carrying most of the per-turn state in this run. This is the direction the LPCI hypothesis predicts; with n=1 per condition over 20 turns it is suggestive, not a proof. See Caveats.

Architecture

┌─────────────┐
│ User message │
└──────┬──────┘
       ↓
┌──────────────────────────────────────────────┐
│  [Scaffold: K tokens]  +  [Current message]  │  ← Only thing the model sees
└──────────────────────┬───────────────────────┘
                       ↓
              ┌─────────────────┐
              │   Main model    │  (qwen3.5:9b)
              │   (stateless)   │
              └────────┬────────┘
                       ↓
              ┌─────────────────┐
              │    Response     │
              └────────┬────────┘
                       ↓
┌──────────────────────────────────────────────┐
│  State extractor (qwen3.5:4b)               │
│  Input: scaffold + user msg + response       │
│  Output: JSON delta (add/remove decisions,   │
│          facts, constraints, vocabulary...)   │
└──────────────────────┬───────────────────────┘
                       ↓
              ┌─────────────────┐
              │  Apply delta    │
              │  to scaffold    │
              └────────┬────────┘
                       ↓
              ┌─────────────────┐
              │ Refreshed       │  ← Same structure, updated content
              │ scaffold        │     Ready for next turn
              └─────────────────┘

The model is a pure function. The scaffold is the program. The output of one pass feeds the compression of the next.

Scaffold Schema

The scaffold is a structured state object with typed fields:

SessionState:
  role         — who the model is
  style        — communication constraints
  goal         — current objective
  subgoals     — active sub-tasks
  decisions    — things decided (irreversible)
  facts        — established truths
  artifacts    — things produced
  constraints  — hard boundaries (NOTs)
  open_threads — unresolved questions
  uncertainties — things we're unsure about
  vocabulary   — domain terms (term → meaning)
  turn         — counter

Each field maps to a section in the scaffold text. The state extractor outputs JSON deltas (add_decisions, remove_open_threads, add_vocabulary, etc.) that are applied to the state object, which is then re-rendered as the scaffold for the next turn.

Caveats (Honest)

The scaffold was growing (343 → 789 tokens), not truly fixed budget. The budget ceiling only triggered once. A hard-clamped experiment (exactly K tokens every turn) is needed to prove true fixed-budget compression.
n=1 per condition. Needs replications.
20 turns, not 1,000. Needs scale testing.
The state extractor corrupts. It generates paraphrased strings, not verbatim text. Classification drift was observed: same conversation produced 71 facts / 4 decisions (naked) vs 3 facts / 23 decisions (compressed). The extractor needs an index-based selection mechanism (output integer pointers, not generated text) to guarantee fidelity.
Scaffold framing affects extraction, not just model behavior. The contrastive markers helped the state extractor classify information, not necessarily the main model's coherence. Both conditions maintained continuity — the difference was in what the scaffold contained.

Additional Experiment: Scaffold Amplification (619 trials)

Separate from LPCI, we ran a single-shot experiment testing 5 scaffold conditions across 4 model sizes (qwen3.5: 0.8b, 2b, 4b, 9b) on 12 tasks:

Scaffold condition significantly affects score (Kruskal-Wallis p=0.0007)
But only for small models (0.8b: p=0.0008, 2b: p=0.005, 4b: p=0.92, 9b: p=0.94)
Condition explains 4.2% of score variation; model size explains 4.7%
Dense scaffolds (QuickThink compressed grammar) break small models: 0.8b drops from 0.78 → 0.40. Models need enough capacity to "decompress" the scaffold.

Repo Contents

File	Description
`lpci.py`	Core prototype: SessionState, LPCISession, state extraction, scaffold refresh, interactive CLI
`lpci_test.py`	A/B continuity test: 20 turns × 2 conditions, probes, scaffold evaluation, delta tracing
`analyze_results.py`	Information-theoretic analysis: MI, KL divergence, transfer entropy, significance tests
`run_experiment.py`	Single-shot scaffold amplification harness (matrix run)
`results/lpci_ab_test.jsonl`	LPCI A/B run data: 40 rows, full scaffold snapshots, delta traces, probe evaluations
`results/full_run_v1.jsonl`	619-trial matrix run: 4 models × 5 conditions × 12 tasks × 3 runs
`LOG.md`	Complete project log
`TODO.md`	Future work

What's Next

Hard-clamped budget test: exactly K tokens every turn, real compression every turn
Scale test: 100+ turns, then 1,000+
Per-token ablation: which scaffold tokens carry the signal (semantic curvature measurement)
Minimum viable scaffold: progressive compression curve — at what token count does behavior degrade?
Cross-model scaffold transfer: does a scaffold built by one model work when injected into another?
Index-based extraction: eliminate content generation from the extraction pipeline

LPCI

Formulated ~summer 2025 as Linguistically Persistent Cognitive Interface:

Linguistically — the medium is language, not tensors
Persistent — survives across the stateless inference boundary
Cognitive — does thinking-work (attention steering, probability reshaping), not just storage
Interface — sits between sessions and the stateless model

About Hermes Labs

Hermes Labs is an independent AI-reliability lab building open-source tools that catch silent failure modes in production AI. More at hermes-labs.ai.

If LangQuant is useful to you, please star the repo — it helps others find it.

Citation

If you use this work, please cite:

@misc{langquant2026,
  author = {Hermes Labs},
  title = {LangQuant: Language State Compression and the Linguistically Persistent Cognitive Interface},
  year = {2026},
  url = {https://github.com/hermes-labs-ai/langquant}
}

License

Apache 2.0

Hermes Labs, 2026

"Take a bunch of empty words and make them mean something."

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
results		results
scripts		scripts
tasks		tasks
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.zenodo.json		.zenodo.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LOG.md		LOG.md
README.md		README.md
TODO.md		TODO.md
analyze_results.py		analyze_results.py
llms.txt		llms.txt
lpci.py		lpci.py
lpci_resume_10.py		lpci_resume_10.py
lpci_rigorous.py		lpci_rigorous.py
lpci_test.py		lpci_test.py
postprocess_te.py		postprocess_te.py
pyproject.toml		pyproject.toml
run_experiment.py		run_experiment.py
run_raw.py		run_raw.py
sbom.cdx.json		sbom.cdx.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LangQuant

LPCI: Statefulness Through Language for Stateless Models

The Problem

The Thesis

What We Observed

Setup

Results

Compression

Information-Theoretic Verification

Architecture

Scaffold Schema

Caveats (Honest)

Additional Experiment: Scaffold Amplification (619 trials)

Repo Contents

What's Next

LPCI

About Hermes Labs

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LangQuant

LPCI: Statefulness Through Language for Stateless Models

The Problem

The Thesis

What We Observed

Setup

Results

Compression

Information-Theoretic Verification

Architecture

Scaffold Schema

Caveats (Honest)

Additional Experiment: Scaffold Amplification (619 trials)

Repo Contents

What's Next

LPCI

About Hermes Labs

Citation

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages