Skip to content

bielcarpi/po

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

216 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PO logo

PO Compiler

A compiler for a compact Go-inspired language that lowers source code into TAC and MIPS assembly.

Java 17 Compiler pipeline BSD 3-Clause

Overview

PO is a small programming language and compiler implemented in Java. The language uses a compact, Go-inspired syntax with func, var, if, switch, while, for, and ret, while the compiler implements the full path from source text to generated MIPS assembly.

The interesting part of PO is the compiler pipeline. It includes lexical analysis, grammar-driven parsing, semantic validation, three-address-code generation, a TAC optimization pass, and a MIPS backend with function calls, recursion, syscall support, stack context handling, and register-aware code generation.

This is not positioned as a production language or a Go replacement. PO is a systems project: it demonstrates how to parse a language, validate it, lower it into an intermediate representation, optimize it, and emit target-specific assembly.

What It Demonstrates

  • A complete source-to-assembly compiler pipeline.
  • A grammar-driven LL(1)-style parser built from grammar.txt.
  • Static semantic checks before code generation.
  • Type inference for variables from first assignment, with later mismatch detection.
  • Symbol table and scope tracking for globals, main, functions, params, and internal string labels.
  • Three-address-code IR generation for assignments, control flow, calls, returns, and syscalls.
  • A TAC optimization pass for simple algebraic reductions.
  • MIPS assembly generation with .data and .text sections.
  • Function calls, recursive calls, parameter passing, return values, and context save/restore.
  • Register assignment based on variable usage, spilling lower-priority values into memory.

Language Snapshot

PO has a deliberately small language surface:

var value = 10

func double(n) {
    ret n * 2
}

main() {
    prints("Result: ")
    var result = double(value)
    print(result)
}

The syntax is intentionally familiar if you have used Go, C-like languages, or simple scripting languages:

  • func for functions.
  • var for variable declarations.
  • main() as the program entrypoint.
  • Braced blocks.
  • if, elsif, else.
  • switch, opt, default.
  • while and for.
  • ret for returning values.

The language is compact, but it is not dynamically typed at runtime. Variables are declared without explicit type annotations, then the compiler assigns and checks their type during semantic analysis.

Compiler Pipeline

PO source
   |
   v
Preprocessor
   |
   v
Lexer
   |
   v
Grammar-driven parser
   |
   v
Parse tree cleanup + symbol table population
   |
   v
Semantic analysis
   |
   v
TAC generation
   |
   v
TAC optimization
   |
   v
MIPS code generation
   |
   v
.asm output

Main implementation areas:

  • src/preprocessor: removes comments and normalizes source before lexing.
  • src/lexical_analysis: converts source text into a token stream.
  • src/syntax_analysis: loads grammar.txt, computes first/follow sets, builds the parsing table, and produces a parse tree.
  • src/entities: compiler data structures such as tokens, parse trees, symbol table entries, TAC blocks, and errors.
  • src/semantic_analysis: validates identifiers, expressions, assignment types, function calls, arguments, conditions, and returns.
  • src/intermediate_code_generator: lowers the parse tree into TAC.
  • src/intermediate_code_optimizer: simplifies TAC before backend generation.
  • src/mips_generator: emits MIPS assembly from TAC.
  • src/testing: lexer, parser, preprocessor, TAC, MIPS, and integration examples.

Static Semantics

PO should be described as statically checked with inferred variable types.

Variables can be declared without an initial value:

var aux

When a value is assigned, the compiler records the variable type:

aux = 1

Later assignments are checked against the recorded type. Invalid operations, undeclared identifiers, invalid function calls, argument-count mismatches, invalid conditional expressions, and invalid return values are reported before MIPS generation.

The current type system is intentionally small. The compiler supports integer-heavy programs, strings for syscall-backed printing, and a compact set of expression checks. This keeps the language small while still demonstrating a real semantic-analysis phase.

Supported Features

Area Support
Variables var declarations with optional assignment
Functions func name(args) { ... }
Entrypoint main()
Recursion Supported through normal function calls
Arithmetic +, -, *, / through TAC/MIPS codegen; % appears at grammar/token level
Comparisons <, >, <=, >=, ==, !=
Logic and, or through TAC/MIPS codegen; not appears at grammar level
Conditionals if, elsif, else
Switch switch, opt, default
Loops while, for
Control flow break, continue, ret
Syscalls print, prints, read
Intermediate representation TAC blocks and entries
Backend MIPS assembly

Example: Recursive Factorial

func recFactorial(num) {
    var aux

    if num >= 1 {
        aux = recFactorial(num - 1)
        ret num * aux
    }
    else {
        ret 1
    }
}

main() {
    prints(" *** Factorial Calculator *** \n")
    prints("Enter a number: ")
    var num = read()

    if num <= 12 {
        prints("Calculating the factorial of ")
        print(num)

        var factorial = recFactorial(num)

        prints("\nThe factorial result is: ")
        print(factorial)
    }
    else {
        prints("The number is too big, please enter a number between 0 and 12")
    }
}

The compiler lowers recursive calls into TAC with explicit context save/restore steps:

recFactorial:
E0:
    if num < 1 goto E2
E1:
    savec
    s0 = num - 1
    s4 = s0
    addp 0 s4
    call recFactorial
    loadc
    aux = v0
    s0 = num * aux
    s4 = s0
    ret s4
    goto E3
E2:
    ret 1

The MIPS backend then emits assembly with labels, jumps, argument registers, return values, syscalls, and stack context management.

MIPS Backend

PO generates MIPS assembly using a simple but real backend model:

  • String literals are emitted into .data as .asciiz values.
  • Code is emitted into .text.
  • Global setup jumps into the generated global block or directly into main.
  • Function calls use jal.
  • Return values use $v0.
  • Arguments are passed through $a0 to $a3.
  • Registers are saved and restored around calls.
  • High-use variables are assigned to $t registers when possible.
  • Variables that do not receive registers are emitted as memory-backed .word values.

Example generated assembly fragment:

la $a0, $z1101
jal $PRINT_STR

jal $READ_INT
move $t0, $v0

li $s1, 2
mul $s0, $t0, $s1
move $t0, $s0

Grammar

The language grammar lives in grammar.txt.

The parser reads the grammar file, computes first and follow sets, fills a parsing table, and uses that table to produce a parse tree. This makes the grammar an explicit part of the compiler rather than hardcoding every parse rule directly into Java control flow.

Small excerpt:

<declaracioFuncio&> ::= func ID ( <parametres> ) <sentenciaComposta> \n
<sentenciaCondicional&> ::= if <expressio> <sentenciaComposta> \n <llistaElsif> <blocElse&>
<sentenciaWhile&> ::= while <expressio> <sentenciaComposta> \n
<sentenciaFor&> ::= for <assignacioFor> , <expressio> , <assignacioForPrima> <sentenciaComposta> \n
<sentenciaRetorn&> ::= ret <expressio> \n

Running The Compiler

The original workflow is source-first and IDE-oriented.

Requirements:

  • Java 17.
  • JetBrains annotations available on the classpath.
  • An IDE such as IntelliJ IDEA, or an equivalent javac setup with the annotations dependency.

To run a PO program:

  1. Open the project with Java 17.
  2. Make sure org.jetbrains.annotations is available.
  3. Set the input file in src/Main.java:
private final static String file = "src/testing/integration/po_mips/inputs/recursive-factorial.po";
  1. Run Main.
  2. The compiler writes:
  • a generated .asm file next to the input .po file.
  • output.tac at the repository root when TAC output is enabled.

Repository Status

This project contains the compiler implementation, grammar, examples, and tests, but it does not currently ship with a Maven, Gradle, or Make build wrapper.

Good next cleanup items:

  • Add a Gradle or Maven build file.
  • Add a CLI entrypoint that accepts a .po path instead of editing Main.java.
  • Publish a small examples/ directory with expected TAC and MIPS output.
  • Split README examples from internal test fixtures.
  • Tighten unsupported or partially supported language claims as the grammar evolves.

Project Boundaries

PO is a compiler project, not a production programming language.

It is strongest as a demonstration of:

  • parsing and compiler frontend design,
  • static validation,
  • IR generation,
  • target-specific backend output,
  • deterministic transformation pipelines,
  • and low-level execution constraints.

That makes it useful as a systems-engineering project: the same thinking applies to protocol parsers, static analyzers, smart-contract tooling, virtual machines, indexers, backend validators, and any infrastructure where inputs must be transformed into correct outputs through explicit intermediate stages.

License

This project is licensed under the BSD 3-Clause License. See LICENSE for details.

About

Compiler for a compact Go-inspired language with LL(1) parsing, TAC generation, optimization, and MIPS assembly output.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors