Rearchitecting LLMs.

Structural techniques for efficient models

Rearchitecting LLMs addresses the growing need for AI professionals who understand how LLMs work at a fundamental level—professionals who can create hyper-efficient models tailored to specific data and tasks rather than relying on one-size-fits-all solutions.

This is the official repository for the book Rearchitecting LLMs - Structural techniques for efficient models. Although the notebooks include explanations so they can be understood and run individually, the best experience is through the book, which provides more details on the experiments, decisions, papers, and technologies used.

The industry is shifting away from generic, closed-source models toward open-source alternatives that offer better stability, data privacy, lower operational costs, and competitive differentiation that proprietary APIs cannot provide. However, this transition faces a critical bottleneck: a shortage of engineers equipped with the deep architectural knowledge required to optimize these models effectively.

The book teaches optimization techniques for transforming large pre-trained models into efficient Small Language Models (SLMs). These methodologies—including depth pruning, width pruning in GLU architectures, and knowledge distillation—are similar to approaches used by companies like Nvidia (Minitron family) and Mistral (Ministral family) to create production-ready model families.

Beyond these foundational techniques, the book introduces original methodologies like Fair Pruning (bias-aware optimization) and Adaptive Attention Bypass (dynamic inference), combining industry best practices with cutting-edge research. You'll learn to apply all these techniques to open-source models like Llama, Gemma, and Qwen, with hands-on notebooks that run on Google Colab's free tier.

Surgically optimize model architectures through depth and width pruning
Recover lost knowledge using targeted distillation techniques
Specialize models for your specific domain and use case
Measure and validate every optimization decision

The Rearchitecting Pipeline. The domain-specific dataset guides the calibration of the base model, informs structural optimization decisions, and drives the final specialization through LoRA fine-tuning. A general dataset supports Knowledge Recovery, ensuring the pruned model retains broad capabilities before domain-specific specialization. This dual approach optimizes each phase for the project's specific objectives.

📚 Book Contents

	Chapter	What you'll learn
PART 1: FOUNDATIONS
✅	1 · Why rearchitecting LLMs matters	The case for specialized models over generic LLMs
✅	2 · An end-to-end rearchitecting project	Full pipeline: prune and recover
✅	3 · A blueprint to modern transformers	GLU architectures, attention, and model internals
PART 2: HANDS-ON OPTIMIZATION
✅	4 · Building smaller and faster LLMs with depth pruning	Block removal, caputure block importance with python hooks, evaluate pruning
✅	5 · Shaping model architectures via width pruning	GLU neuron selection, data-driven pruning strategies
✅	6 · Knowledge recovery through distillation	Recovering capability after structural compression
✅	7 · Model specialization	LoRA / DoRA fine-tuning and quantization for domain tasks
✅	8 · Attention Optimization	KV cache, attention bypass, inference acceleration
🔜	9 · Dynamic Pruning for Adaptive Inference	Early exit mechanisms that stop inference before the final layer
PART 3: BEYOND THE BLACK BOX
🔜	10 · Exploring the Black Box	Activation analysis and behavioral interpretability
🔜	11 · Optimizing While Eliminating Biases	Fair pruning: removing demographic bias at neuron level
🔜	12 · Capstone Project	End-to-end: replacing API calls with a specialized SLM

🧠 Your Interactive Technical Companion: NotebookLM Space

Start experimenting interactively.

This NotebookLM space contains all the research papers, chapter notebooks, and optiPfair guides in a conversational format. Think of it as your AI-powered technical assistant for the book, which helps you to become an LLM architect.

What you can do:

Ask specific questions: "How does depth pruning work?" or "How many layers can I remove from a 70B model?"
Get code snippets: "Show me the code to reduce the GLU expansion of Llama3"
Explore techniques: Query any pruning, distillation, or optimization method
Troubleshoot: Get help understanding implementation details from the notebooks

Perfect for:

Quick reference while coding
Understanding paper implementations
Exploring techniques before diving into chapters
Clarifying concepts on the go

→ Launch NotebookLM Space

💡 Pro tip: Use NotebookLM for quick queries and experimentation. For structured, in-depth learning, the book remains your best companion.

Stop being a mere user. It's time to become an architect.

🧪 Join the Hands-on Labs Discussions

Every chapter in this book includes a specific Hands-on Lab designed to push your understanding of model architecture. We use GitHub Discussions as our active laboratory to share metrics, architectural insights, doubts, and even Out-of-Memory (OOM) errors.

→ Explore all Hands-on Labs Discussions

🌟 Support This Project

If you find these techniques useful, consider:

⭐ Starring this repo to stay updated
🔄 Sharing it with your team
💬 Opening Discussions with your questions

Citation

If you find this repository or the techniques described in the book useful, please cite it as follows:

APA: Martra, P. (2026). Rearchitecting LLMs: Structural techniques for efficient models. Manning Publications. ISBN 9781633434332.

BibTeX:

@book{martra2026rearchitecting,
  title={Rearchitecting LLMs: Structural techniques for efficient models},
  author={Martra, Pere},
  isbn={9781633434332},
  year={2026},
  publisher={Manning Publications},
  url={[https://www.manning.com/books/rearchitecting-llms](https://www.manning.com/books/rearchitecting-llms)},
  note={Manning Early Access Program (MEAP)}
}

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
.claude/worktrees		.claude/worktrees
.github/workflows		.github/workflows
APPB		APPB
APPC		APPC
CH02		CH02
CH03		CH03
CH04		CH04
CH05		CH05
CH06		CH06
CH07		CH07
CH08		CH08
Images		Images
benchmarks		benchmarks
LICENSE		LICENSE
README.md		README.md
REFERENCES.MD		REFERENCES.MD
clean_notebooks.py		clean_notebooks.py
clean_notebooks_ci.py		clean_notebooks_ci.py
optipfair_llm_reference_manual.txt		optipfair_llm_reference_manual.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rearchitecting LLMs.

📚 Book Contents

🧠 Your Interactive Technical Companion: NotebookLM Space

🧪 Join the Hands-on Labs Discussions

🌟 Support This Project

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rearchitecting LLMs.

📚 Book Contents

🧠 Your Interactive Technical Companion: NotebookLM Space

🧪 Join the Hands-on Labs Discussions

🌟 Support This Project

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages