LLM Engineering Projects

This repository consists of a collection of my implementations to an llm engineering project list composed by Ahmad M. Osman

Roadmap

The following is an incomplete version of the composed list.

Tokenization & embeddings (README)
- Build a byte-pair encoder to train your own subword vocabulary
- Implement a token visualizer to map chunks to IDs
- One-hot encoding vs learned embeddings, plot cosine distances
Positional embeddings (in progress)
- Implement four demos: classic sinusoidal vs learned vs RoPE vs ALiBi
- Animate a toy sequence being position-encoded in 3D
- Ablate positions to see the attention collapse
Self attention & multi-head attention
- Hand-wire dot-product attention for one token
- Scale to multi-head, plot per-head weight heatmaps
- Mask out future tokens, verify causal property

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
projects/tokenization_embeddings		projects/tokenization_embeddings
src/llm_engineering		src/llm_engineering
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml