CodeSage is a Python framework that combines compiler principles, tree-walk interpretation, and AI-based summarization to not just run your code β but explain it in plain English.
It reads Python source code, tokenizes it, builds an Abstract Syntax Tree, interprets it live, and produces a structured plain-English summary of what the code does β all available via a Tkinter GUI or a terminal mode.
π Documentation Β |Β π GitHub
- Scanner / Lexer β Tokenizes raw source character by character; catches unrecognized symbols early
- Recursive Descent Parser β Produces a full AST with meaningful syntax error messages
- AST Summarizer β Converts loops, conditionals, and assignments into readable plain English
- Tree-Walk Interpreter β Evaluates expressions and executes code live from the AST
- NLP / GPT Integration (optional) β AI-powered line-by-line explanations via GPT-4o-mini
- GUI Mode β Tkinter interface with code editor, output console, AST summary panel, and colored AST tree
- Terminal Mode β Scanner output, parser AST, plain English summary, and execution result in the terminal
| Metric | Value |
|---|---|
| Pipeline stages | 6 |
| Run modes | 2 (GUI + Terminal) |
| Python constructs supported | 5+ |
Full Tkinter interface with code editor, interpreter output, AST summary, and colored AST tree β all in one window. Uses the local tree-walk interpreter; no API key required.
python -m codesage.guiType code directly in the terminal. Get scanner output, parser AST, plain English summary, and execution result.
python main.pyUncomment the GPT block in
main.pyto enable AI-powered summaries.
git clone https://github.com/Mokshii46/CODESAGE.git
cd CODESAGEpython3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activatepip install -r requirements.txt| Stage | Name | Description | Tag |
|---|---|---|---|
| 1 | Scanner / Lexer | Reads raw source character by character; converts to tokens (keywords, operators, literals, identifiers); catches unrecognized symbols early | lexical analysis |
| 2 | Recursive Descent Parser | Transforms the token stream into an AST capturing the logical, hierarchical structure; generates meaningful syntax error messages | syntax analysis |
| 3 | AST Summarizer | Traverses the AST node by node, converting constructs β loops, conditionals, assignments β into structured, readable plain English | code summarization |
| 4 | Tree-Walk Interpreter | Recursively executes the AST β evaluates expressions, runs loops and functions, handles conditionals β producing live runtime output | execution |
| 5 | NLP / GPT Integration (optional) | Uncomment the GPT block in main.py to enable AI-powered line-by-line explanations via GPT-4o-mini. Requires an OpenAI API key in .env |
natural language |
| 6 | GUI / IDE | Built with Tkinter β code editor, output console, AST summary panel, and colored AST tree visualization all in one window | tkinter |
Input code:
i = 0
while i < 5:
print(i)
i = i + 1AST Summary:
β Assigning '0' to variable 'i'
β While loop: runs while i < 5
β Print value of i each iteration
β Increment i by 1
Interpreter output: 0 1 2 3 4
GUI Panel Output:
ββ Code input ββββββββββββββββββββββββββ
i = 0
while i < 5:
print(i)
i = i + 1
ββ Interpreter Output ββββββββββββββββββ
0 Β· 1 Β· 2 Β· 3 Β· 4
ββ AST Summary βββββββββββββββββββββββββ
Assigning '0.0' to variable 'i'
While loop: repeatedly executes body while condition is true
Print statement printing the value of expression
ββ AST Tree ββββββββββββββββββββββββββββ
βββ Expression
βββ Assign
βββ While
| Construct | Details |
|---|---|
| Variables | Declarations, assignments, arithmetic & logical operations |
| Loops | for and while with full iteration support |
| Conditionals | if, elif, else branching |
| Functions | Return statements & built-ins like len, range |
| Lists | Index-based access and list operations |
CODESAGE/
βββ main.py # Terminal entry point
βββ README.md
βββ requirements.txt # added
βββ .gitignore # added
βββ .env # gitignored
βββ codesage/ # core package
β βββ __init__.py
β βββ scanner.py
β βββ parser.py
β βββ interpreter.py
β βββ resolver.py
β βββ nlp.py
β βββ gui.py # GUI entry point
βββ nlp/ # training pipeline
β βββ train.py
β βββ train_gpt.py
β βββ decoder.py
β βββ filter.py
β βββ generate_datasets.py
β βββ prepare_embeddings.py
βββ models/ # gitignored
βββ data/ # datasets
βββ assets/ # images
- No suitable NLP training dataset was available initially
- Built a custom template-based dataset filtered to interpreter capabilities
- NLP accuracy gaps led to a pivot toward AST-based summarization as the primary explanation method
- Integrate CodeT5 / LLaMA for richer, more nuanced code explanations
- Add support for classes, modules, and advanced Python constructs
- Replace Tkinter with a modern web-based IDE
- Real-time explanation as users type
| Area | Technology |
|---|---|
| Language | Python 3.9+ |
| GUI | Tkinter |
| AST & Parsing | Python ast module + custom recursive descent parser |
| NLP / AI | OpenAI GPT-4o-mini (optional) |
| NLP Training | Custom template-based dataset |
Yadnyesh Patil β Mentor, Project X Β· VJTI
Rupak Gupta β Mentor, Project X Β· VJTI
- No API key is required to run the core interpreter or GUI
- GPT-4o-mini integration requires an OpenAI API key stored in
.env(gitignored) - The
.envfile andmodels/directory are both excluded from version control
β If you find CodeSage useful, consider starring the repo!