A compiler for a compact Go-inspired language that lowers source code into TAC and MIPS assembly.
PO is a small programming language and compiler implemented in Java. The language uses a compact, Go-inspired syntax with func, var, if, switch, while, for, and ret, while the compiler implements the full path from source text to generated MIPS assembly.
The interesting part of PO is the compiler pipeline. It includes lexical analysis, grammar-driven parsing, semantic validation, three-address-code generation, a TAC optimization pass, and a MIPS backend with function calls, recursion, syscall support, stack context handling, and register-aware code generation.
This is not positioned as a production language or a Go replacement. PO is a systems project: it demonstrates how to parse a language, validate it, lower it into an intermediate representation, optimize it, and emit target-specific assembly.
- A complete source-to-assembly compiler pipeline.
- A grammar-driven LL(1)-style parser built from
grammar.txt. - Static semantic checks before code generation.
- Type inference for variables from first assignment, with later mismatch detection.
- Symbol table and scope tracking for globals,
main, functions, params, and internal string labels. - Three-address-code IR generation for assignments, control flow, calls, returns, and syscalls.
- A TAC optimization pass for simple algebraic reductions.
- MIPS assembly generation with
.dataand.textsections. - Function calls, recursive calls, parameter passing, return values, and context save/restore.
- Register assignment based on variable usage, spilling lower-priority values into memory.
PO has a deliberately small language surface:
var value = 10
func double(n) {
ret n * 2
}
main() {
prints("Result: ")
var result = double(value)
print(result)
}
The syntax is intentionally familiar if you have used Go, C-like languages, or simple scripting languages:
funcfor functions.varfor variable declarations.main()as the program entrypoint.- Braced blocks.
if,elsif,else.switch,opt,default.whileandfor.retfor returning values.
The language is compact, but it is not dynamically typed at runtime. Variables are declared without explicit type annotations, then the compiler assigns and checks their type during semantic analysis.
PO source
|
v
Preprocessor
|
v
Lexer
|
v
Grammar-driven parser
|
v
Parse tree cleanup + symbol table population
|
v
Semantic analysis
|
v
TAC generation
|
v
TAC optimization
|
v
MIPS code generation
|
v
.asm output
Main implementation areas:
src/preprocessor: removes comments and normalizes source before lexing.src/lexical_analysis: converts source text into a token stream.src/syntax_analysis: loadsgrammar.txt, computes first/follow sets, builds the parsing table, and produces a parse tree.src/entities: compiler data structures such as tokens, parse trees, symbol table entries, TAC blocks, and errors.src/semantic_analysis: validates identifiers, expressions, assignment types, function calls, arguments, conditions, and returns.src/intermediate_code_generator: lowers the parse tree into TAC.src/intermediate_code_optimizer: simplifies TAC before backend generation.src/mips_generator: emits MIPS assembly from TAC.src/testing: lexer, parser, preprocessor, TAC, MIPS, and integration examples.
PO should be described as statically checked with inferred variable types.
Variables can be declared without an initial value:
var aux
When a value is assigned, the compiler records the variable type:
aux = 1
Later assignments are checked against the recorded type. Invalid operations, undeclared identifiers, invalid function calls, argument-count mismatches, invalid conditional expressions, and invalid return values are reported before MIPS generation.
The current type system is intentionally small. The compiler supports integer-heavy programs, strings for syscall-backed printing, and a compact set of expression checks. This keeps the language small while still demonstrating a real semantic-analysis phase.
| Area | Support |
|---|---|
| Variables | var declarations with optional assignment |
| Functions | func name(args) { ... } |
| Entrypoint | main() |
| Recursion | Supported through normal function calls |
| Arithmetic | +, -, *, / through TAC/MIPS codegen; % appears at grammar/token level |
| Comparisons | <, >, <=, >=, ==, != |
| Logic | and, or through TAC/MIPS codegen; not appears at grammar level |
| Conditionals | if, elsif, else |
| Switch | switch, opt, default |
| Loops | while, for |
| Control flow | break, continue, ret |
| Syscalls | print, prints, read |
| Intermediate representation | TAC blocks and entries |
| Backend | MIPS assembly |
func recFactorial(num) {
var aux
if num >= 1 {
aux = recFactorial(num - 1)
ret num * aux
}
else {
ret 1
}
}
main() {
prints(" *** Factorial Calculator *** \n")
prints("Enter a number: ")
var num = read()
if num <= 12 {
prints("Calculating the factorial of ")
print(num)
var factorial = recFactorial(num)
prints("\nThe factorial result is: ")
print(factorial)
}
else {
prints("The number is too big, please enter a number between 0 and 12")
}
}
The compiler lowers recursive calls into TAC with explicit context save/restore steps:
recFactorial:
E0:
if num < 1 goto E2
E1:
savec
s0 = num - 1
s4 = s0
addp 0 s4
call recFactorial
loadc
aux = v0
s0 = num * aux
s4 = s0
ret s4
goto E3
E2:
ret 1
The MIPS backend then emits assembly with labels, jumps, argument registers, return values, syscalls, and stack context management.
PO generates MIPS assembly using a simple but real backend model:
- String literals are emitted into
.dataas.asciizvalues. - Code is emitted into
.text. - Global setup jumps into the generated
globalblock or directly intomain. - Function calls use
jal. - Return values use
$v0. - Arguments are passed through
$a0to$a3. - Registers are saved and restored around calls.
- High-use variables are assigned to
$tregisters when possible. - Variables that do not receive registers are emitted as memory-backed
.wordvalues.
Example generated assembly fragment:
la $a0, $z1101
jal $PRINT_STR
jal $READ_INT
move $t0, $v0
li $s1, 2
mul $s0, $t0, $s1
move $t0, $s0The language grammar lives in grammar.txt.
The parser reads the grammar file, computes first and follow sets, fills a parsing table, and uses that table to produce a parse tree. This makes the grammar an explicit part of the compiler rather than hardcoding every parse rule directly into Java control flow.
Small excerpt:
<declaracioFuncio&> ::= func ID ( <parametres> ) <sentenciaComposta> \n
<sentenciaCondicional&> ::= if <expressio> <sentenciaComposta> \n <llistaElsif> <blocElse&>
<sentenciaWhile&> ::= while <expressio> <sentenciaComposta> \n
<sentenciaFor&> ::= for <assignacioFor> , <expressio> , <assignacioForPrima> <sentenciaComposta> \n
<sentenciaRetorn&> ::= ret <expressio> \n
The original workflow is source-first and IDE-oriented.
Requirements:
- Java 17.
- JetBrains annotations available on the classpath.
- An IDE such as IntelliJ IDEA, or an equivalent
javacsetup with the annotations dependency.
To run a PO program:
- Open the project with Java 17.
- Make sure
org.jetbrains.annotationsis available. - Set the input file in
src/Main.java:
private final static String file = "src/testing/integration/po_mips/inputs/recursive-factorial.po";- Run
Main. - The compiler writes:
- a generated
.asmfile next to the input.pofile. output.tacat the repository root when TAC output is enabled.
This project contains the compiler implementation, grammar, examples, and tests, but it does not currently ship with a Maven, Gradle, or Make build wrapper.
Good next cleanup items:
- Add a Gradle or Maven build file.
- Add a CLI entrypoint that accepts a
.popath instead of editingMain.java. - Publish a small
examples/directory with expected TAC and MIPS output. - Split README examples from internal test fixtures.
- Tighten unsupported or partially supported language claims as the grammar evolves.
PO is a compiler project, not a production programming language.
It is strongest as a demonstration of:
- parsing and compiler frontend design,
- static validation,
- IR generation,
- target-specific backend output,
- deterministic transformation pipelines,
- and low-level execution constraints.
That makes it useful as a systems-engineering project: the same thinking applies to protocol parsers, static analyzers, smart-contract tooling, virtual machines, indexers, backend validators, and any infrastructure where inputs must be transformed into correct outputs through explicit intermediate stages.
This project is licensed under the BSD 3-Clause License. See LICENSE for details.
