A geometry-first language model lab for building generative AI without attention, without Q/K/V, and without Transformer blocks.
DRM Language Emitter turns language generation into controlled motion through a learned relational manifold: active directions choose where the model can move, a learned metric shapes how expensive that movement is, and an emitter decodes the resulting state into tokens.
This is a research scaffold, not a production model and not a claim of superiority over Transformers or general world models.
- Architecture
- Mathematical Notes
- Competition Protocol
- Tiny World-Model Competition
- Benchmark Artifacts
- Model Card
- Limitations
- Commercial License
DRM Language Emitter does not use:
- Transformer blocks;
- self-attention;
- Q/K/V attention;
nn.MultiheadAttention;- KV cache.
Its central computation is a latent trajectory:
token e_t
-> latent state z_t in M
-> active directions D(z_t)
-> gates a_i(z_t), effective dimD(z_t)
-> relational metric g_z = diag + U U^T
-> velocity dz in span(D(z_t))
-> z_{t+1}
-> token logits
The working hypothesis is that language generation can be modeled as motion through a relational state space, where geometry is measurable through action, condition, active dimension, recurrence, stability, and low-action path diagnostics.
pip install -e .Optional dev tools:
pip install -e ".[dev]"The project is CPU-runnable. CUDA is optional; the latest local benchmark was CPU-only because the environment reported torch=2.12.0+cpu and cuda_available=False.
Train a tiny DRM model:
python scripts/train_tiny.py --config configs/tiny.yaml --text data/tiny.txtGenerate text:
python scripts/generate.py --checkpoint runs/tiny/drm_tiny.pt --prompt "DRM "Run geometry diagnostics:
python scripts/eval_geometry.py --checkpoint runs/tiny/drm_tiny.pt
python scripts/eval_geodesic_paths.py --checkpoint runs/tiny/drm_tiny.ptIf data/tiny.txt is missing, the training script creates a tiny fallback corpus. The default tokenizer is byte-level, so mixed case, digits, punctuation, and prompts such as DRM are representable.
input_ids
|
TokenEmbedding
|
for each time step:
z_t
|
DirectionField(z_t) -> directions V(z_t), gates a(z_t), dimD
|
RelationalMetric(z_t) -> diag + U U^T
|
DRMFlow(z_t, e_t, V, a) -> dz in active directional span
|
metric action g_z(dz, dz)
|
StateUpdater -> z_{t+1}
|
LanguageEmitter(z_{t+1}) -> logits
The model is autoregressive, but its memory is the evolving latent state rather than attention over a token sequence.
Read the full design in ARCHITECTURE.md.
src/drm_language_emitter/config.py:DRMConfigsrc/drm_language_emitter/direction_field.py: active directional fields and gatessrc/drm_language_emitter/metric.py: relational metricdiag + U U^Tsrc/drm_language_emitter/dynamics.py: DRM flow and metric naturalizationsrc/drm_language_emitter/model.py: causal language emittertransformer/: tiny Transformer baselineworld_model/: tiny symbolic seq2seq world-model baseline
The code logs and exports:
- cross entropy and approximate perplexity;
- metric action;
- effective active dimension
dimD; - gate entropy and hard/soft active fractions;
- metric low-rank norm and condition proxy;
- recurrence and stability proxies;
- learned low-action path diagnostics;
- symbolic world-modeling metrics in the gridworld benchmark.
Important caveat: scripts/eval_geodesic_paths.py evaluates learned low-action trajectories. It is not an exact geodesic solver.
Benchmark outputs that are small enough to keep are copied to docs/benchmarks/. Large run directories remain under runs/ and are ignored by git.
Versioned dashboard:
docs/benchmarks/drm_transformer_full_1k_3k/dashboard.html
Run the sweep:
python scripts/sweep_drm_transformer.py --steps 1000 2000 3000 --seeds 1 2 3 --output-root runs/sweep_drm_transformer
python scripts/make_competition_dashboard.py --root runs/sweep_drm_transformer --title "DRM vs Transformer Sweep"Current interpretation: DRM showed strong step-matched and parameter-matched results in the tiny regime, while Transformer throughput remains much higher. This does not establish broad superiority.
Versioned dashboard:
docs/benchmarks/world_model_competition/dashboard.html
Run the benchmark:
python scripts/make_tiny_world_dataset.py --output-root data/tiny_world --seed 1 --grid-size 5 --num-train 20000 --num-val 2000 --max-rollout-len 8
python scripts/sweep_world_model_competition.py --steps 1000 2000 3000 --seeds 1 2 3 --dataset-root data/tiny_world --output-root runs/world_model_competition
python scripts/make_world_model_dashboard.py --root runs/world_model_competition --title "DRM vs Transformer vs Tiny Symbolic World Model"Latest local result:
- 72 runs, 24 aggregate rows.
- Best next-state exact match:
drm_tiny @ 2000with0.0751. - Lowest invalid-state rate among top next-state rows:
transformer_tiny_220k @ 3000with0.0026. - Best supervised world-model CE among top rows:
world_model_tiny @ 3000around0.2497, but exact-match metrics remained low.
Interpretation: DRM had the best next-state exact-match score in this tiny symbolic text-world benchmark, but absolute symbolic accuracy is still low. This is a diagnostic result, not evidence that DRM is broadly better than Transformers or general world models.
See docs/report/002_world_model_competition_2026-06-18.md.
Quick DRM vs Transformer comparison:
python scripts/compare_drm_transformer.py --steps 50 --batch-size 4 --output-root runs/quick_compareRobustness:
python scripts/eval_robustness.py --drm-checkpoint runs/quick_compare/drm/drm_tiny.pt --drm-tokenizer runs/quick_compare/drm/tokenizer.json --transformer-checkpoint runs/quick_compare/transformer/tiny_transformer.ptBridge diagnostic:
python scripts/eval_bridge_task.py --checkpoint runs/quick_compare/drm/drm_tiny.pt --tokenizer runs/quick_compare/drm/tokenizer.jsonSequence stability:
python scripts/eval_sequence_stability.py --drm-checkpoint runs/quick_compare/drm/drm_tiny.pt --drm-tokenizer runs/quick_compare/drm/tokenizer.json --transformer-checkpoint runs/quick_compare/transformer/tiny_transformer.ptDRM profile:
python scripts/profile_drm.py --checkpoint runs/quick_compare/drm/drm_tiny.ptpython -m pytest -qIf pytest is not installed:
pip install -e ".[dev]"CUDA tests are conditional. They run only when torch.cuda.is_available() is true.
configs/ DRM and benchmark configs
docs/ math, limitations, competition notes, benchmark artifacts
scripts/ training, generation, evaluation, sweeps, dashboards
src/drm_language_emitter/ DRM model package
tests/ smoke and invariant tests
transformer/ tiny Transformer baseline
world_model/ tiny symbolic world-model baseline
Allowed claims:
- DRM Language Emitter is a functional non-Transformer language model prototype.
- Its geometry is explicit, measurable, and trainable in small experiments.
- The repository includes controlled tiny comparisons against Transformer and a tiny symbolic world model.
Not allowed:
- DRM is better than Transformers in general.
- DRM is better than world models in general.
- The model has proven emergent geodesics.
- The model has proven toroidal topology.
- The model is production-ready or safety-evaluated.
- The temporal loop is slow compared with optimized Transformer kernels.
- Benchmarks are tiny and diagnostic.
- Low-action path evaluation is not a formal geodesic solver.
- Symbolic world-modeling exact match is still low.
- No large-scale benchmark, RLHF, alignment evaluation, or safety validation is included.
- Toroidal convergence is not guaranteed; it is only a possible diagnostic under boundedness, recurrence, and stability assumptions.
- Add stronger trajectory integrators and variational path objectives.
- Improve constrained symbolic decoding for the world benchmark.
- Add time-matched CUDA comparisons.
- Broaden ablations around metric, gates, and active dimension.
- Study pullback/Fisher-style metrics as future work.
This project is released under AGPL-3.0. For commercial licensing, see LICENCE-COMMERCIAL.md or contact felupe@truthagi.ai.