Skip to content

sunishbharat/llm-from-the-ground-up

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 

Repository files navigation

Acknowledgment

This project was inspired by Sebastian Raschka and his book “Build a Large Language Model (From Scratch)”, which provided the foundational guidance for this implementation.

Table of Contents

Motivation

This project was created as a personal learning journey to understand how Large Language Models (LLMs) are built from the ground up. Inspired by Sebastian Raschka’s book “Build a Large Language Model (From Scratch)”, it re-implements the key concepts and architecture of GPT-2, following a step-by-step, hands-on approach. Each module is designed to deepen understanding of transformer components, such as tokenization, attention, and training loops, making it a practical guide for anyone who wants to learn how LLMs work internally.

Project Goals:

Each module could be executed on its own to understand how it works and check its output for better understanding.

  • Understand the inner workings of transformer-based models like GPT-2
  • Implement key components step-by-step (tokenization, attention, training loop, etc.)
  • Learn the fundamentals of how modern LLMs are built and optimized

Installation

  • Python 3.x required
  • Install dependencies:

Setup VScode environment (optional)

  • conda env create -f colab/environment.yml

Clone the repository

git clone https://github.com/sunishbharat/llm-from-the-ground-up.git cd llm-from-the-ground-up

Install dependencies

pip install -r requirements.txt

Run an individual module to explore

python test.py

Usage

To run in Google colab

Main script to execute for Inference and training. Pretraining_llm:

  • Run entire script to evaluate the default training loop, set for 100 epochs, sufficient enough to complete in few minutes.
  • Display the training loss wrt to batches processed.
  • Performs the inference based on the training loops, tweak the epochs to see improvement in performance.
  • Hyperparameters are set in config file.

License

(Suggested by ChatGpt:)
This project is licensed under the MIT License.
✔️ Free to use, modify, and distribute — no restrictions.

About

This project helps to get a hands-on into building a large language model (LLM) from the ground up. It starts with foundational concepts builds the architecture brick by brick to a functional one. Thanks to Build a Large Language Model (From Scratch) by Sebastian Raschka.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors