|
| 1 | +# RGC-BASIC → C output (cc65 / z88dk) — design notes |
| 2 | + |
| 3 | +This document sketches how the **RGC-BASIC** front end could be extended with a **C code generator** so programs can be compiled with **cc65** (6502 targets) or **z88dk** (Z80 and others), instead of (or in addition to) interpretation. |
| 4 | + |
| 5 | +It is **not** implemented today; it is a roadmap for a **subset compiler / transpiler** with a **portable C** intermediate step. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Goals |
| 10 | + |
| 11 | +1. **Emit C source** that compiles with a standard C toolchain aimed at 8-bit hosts. |
| 12 | +2. **Share parsing and semantic analysis** with the interpreter where possible (same token/AST/IR story). |
| 13 | +3. **Document explicitly** what must be **excluded** or **restricted** so generated code is realistic on cc65/z88dk. |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## Why C first (not raw 6502/Z80 asm) |
| 18 | + |
| 19 | +- **One backend** for many CPUs: cc65 (6502), SDCC/z88dk (Z80), etc. |
| 20 | +- **Runtime in C**: `printf`, string helpers, and math can live in `basrtl.c` and be tuned per platform. |
| 21 | +- **Asm later**: once a stable IR exists, a direct assembler backend is optional. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## Architecture (high level) |
| 26 | + |
| 27 | +| Stage | Role | |
| 28 | +|--------|------| |
| 29 | +| Lex / parse | Same as today; optionally build an **AST** or **IR** instead of immediate execution. | |
| 30 | +| Semantic pass | Type/size rules for the **compilable subset**; reject unsupported constructs with clear errors. | |
| 31 | +| C codegen | Walk IR → `functions`, `static` data for `DATA`, `switch`/labels for `GOTO`, etc. | |
| 32 | +| `basrtl.h` / `basrtl.c` | **Runtime**: string pool or heap policy, `INPUT`/`PRINT` stubs, optional fixed-point math. | |
| 33 | + |
| 34 | +The interpreter’s execution functions (`statement_*`, `eval_*`) are **not** reused as-is for codegen; they inform **what** to emit for each construct. |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | +## Recommended compilable subset (phase 1) |
| 39 | + |
| 40 | +Aim for programs that look like **structured BASIC** with **integers** and **simple strings**. |
| 41 | + |
| 42 | +**Statements (initial)** |
| 43 | + |
| 44 | +- `REM` / `'` |
| 45 | +- Assignment (`LET` optional) |
| 46 | +- `PRINT` with `;` and `,` (map `,` to tab logic or simplified spacing) |
| 47 | +- `IF … THEN` (line and block `IF` / `ELSE` / `END IF`) — **no** `THEN` line-number-only goto unless lowered to `goto` |
| 48 | +- `FOR` / `NEXT` with numeric loop variable and `STEP` |
| 49 | +- `WHILE` / `WEND`, `DO` / `LOOP` / `LOOP UNTIL` / `EXIT` |
| 50 | +- `GOTO` / `GOSUB` / `RETURN` with **labels** (emit C `goto` / functions — see below) |
| 51 | +- `END` / `STOP` |
| 52 | +- `DIM` for **fixed** 1D arrays (numeric and string), bounds known at compile time |
| 53 | +- `READ` / `DATA` / `RESTORE` (static data tables in C) |
| 54 | +- `DEF FN` for **single-line** numeric functions only (phase 1) |
| 55 | + |
| 56 | +**Expressions (initial)** |
| 57 | + |
| 58 | +- Integer and float **literals**; prefer **16-bit `int`** or **`long`** policy per target |
| 59 | +- `+ - * /`, parentheses |
| 60 | +- Relational operators in `IF` |
| 61 | +- **Intrinsic functions (numeric, phase 1):** `ABS`, `INT`, `SGN`, `SQR` (via `sqrt` if lib linked), `RND` (runtime LCG) |
| 62 | + |
| 63 | +**Strings (phase 1, restricted)** |
| 64 | + |
| 65 | +- Fixed maximum length (e.g. 40 or 80) → `char name[N+1]` or pooled buffers |
| 66 | +- `LEFT$`, `RIGHT$`, `MID$`, `LEN`, `CHR$`, `ASC`, `STR$`, `VAL` with **bounded** temporaries |
| 67 | +- Concatenation `A$ + B$` only if result fits static policy |
| 68 | + |
| 69 | +--- |
| 70 | + |
| 71 | +## Features to **exclude** or **defer** (explicit) |
| 72 | + |
| 73 | +These are either **host-specific**, **dynamic**, or **too heavy** for a first cc65/z88dk pipeline. The transpiler should **reject** them with a clear diagnostic (not silently miscompile). |
| 74 | + |
| 75 | +### Meta and loader |
| 76 | + |
| 77 | +| Feature | Reason to exclude (v1) | |
| 78 | +|--------|-------------------------| |
| 79 | +| `#INCLUDE` | Needs a compile-time file merge step; use a preprocessor or single concatenated source for the demo. | |
| 80 | +| `#OPTION …` | Maps to CLI/cc65 pragmas; handle as **comments** or **separate build flags**, not generated C. | |
| 81 | +| Shebang | Strip or ignore. | |
| 82 | + |
| 83 | +### Host / OS / shell (no portable C equivalent on 8-bit without a full OS) |
| 84 | + |
| 85 | +| Feature | Reason | |
| 86 | +|--------|--------| |
| 87 | +| `SYSTEM`, `EXEC$` | Subprocess/shell — exclude. | |
| 88 | +| `ENV$` | Environment variables — exclude unless stubbed empty. | |
| 89 | +| `ARGC` / `ARG$` | Could map to `main(argc, argv)` **only** if the emitted program defines `main` and the harness passes args — document as **optional** phase 2. | |
| 90 | + |
| 91 | +### Dynamic execution |
| 92 | + |
| 93 | +| Feature | Reason | |
| 94 | +|--------|--------| |
| 95 | +| `EVAL(expr$)` | Runtime parse/eval — exclude unless embedding a tiny interpreter (out of scope for v1). | |
| 96 | + |
| 97 | +### Advanced strings and data formats |
| 98 | + |
| 99 | +| Feature | Reason | |
| 100 | +|--------|--------| |
| 101 | +| `JSON$` | Exclude; large runtime. | |
| 102 | +| `REPLACE`, `TRIM$`, `LTRIM$`, `RTRIM$`, `FIELD$` | Can add to `basrtl` later; exclude v1 or implement as C helpers with fixed buffers. | |
| 103 | +| `SPLIT` / `JOIN` | Heap/variable field counts — defer or static-max only. | |
| 104 | +| `STRING$(n, c)` | OK if `n` is **constant**; exclude if `n` is dynamic beyond max. | |
| 105 | + |
| 106 | +### Arrays and advanced control |
| 107 | + |
| 108 | +| Feature | Reason | |
| 109 | +|--------|--------| |
| 110 | +| `SORT` | Needs qsort-style helper or typed sort in runtime — defer or single-type v1. | |
| 111 | +| `INDEXOF` / `LASTINDEXOF` | Linear search in `basrtl` — optional phase 2. | |
| 112 | +| Multi-line `FUNCTION` / `END FUNCTION`, recursion | Needs calling convention and stack — **defer**; allow **DEF FN** one-liners first. | |
| 113 | +| `ON expr GOTO` / `GOSUB` | Lower to `switch`/`if` chain — possible in phase 2. | |
| 114 | + |
| 115 | +### Graphics, video memory, sprites (basic-gfx / canvas) |
| 116 | + |
| 117 | +| Feature | Reason | |
| 118 | +|--------|--------| |
| 119 | +| `SCREEN`, `PSET`, `PRESET`, `LINE`, `COLOR`, `BACKGROUND`, `LOCATE`, `TEXTAT`, `DRAWSPRITE`, `LOADSPRITE`, etc. | Tied to **GfxVideoState** / platform — **out of scope** for generic C unless targeting a **specific** library (e.g. one SDL or one console API) as a **separate** backend. | |
| 120 | +| `POKE` / `PEEK` (screen or fake RAM) | Exclude or map to a **named** `uint8_t mem[]` region in generated C if you add a **memory model** later. | |
| 121 | +| `LOAD` … `INTO`, `MEMSET`, `MEMCPY` | binary/GFX — exclude for generic C. | |
| 122 | + |
| 123 | +### File I/O |
| 124 | + |
| 125 | +| Feature | Reason | |
| 126 | +|--------|--------| |
| 127 | +| `OPEN` / `PRINT#` / `INPUT#` / `GET#` / `CLOSE` / `ST` | C11 `fopen` may exist on some hosted targets; on **bare** cc65, I/O is platform-specific — **exclude** in v1 or emit **stubs** (`putc`/`getc` to serial only) behind `#ifdef`. | |
| 128 | + |
| 129 | +### Interpreter-only or platform-variable semantics |
| 130 | + |
| 131 | +| Feature | Notes | |
| 132 | +|--------|--------| |
| 133 | +| `TI` / `TI$` | **Jiffy clock** — exclude or replace with a **`unsigned long` tick** updated by a **platform timer** stub (`basrtl_tick()` in IRQ). | |
| 134 | +| `PLATFORM$()` | Emit a **string constant** at compile time or stub `"c-generated"`. | |
| 135 | +| `RND` | Emit **LCG** in runtime; document seed policy. | |
| 136 | +| `SLEEP` | Map to **busy-wait** or **timer** stub per target. | |
| 137 | +| `CURSOR ON/OFF` | Terminal/ANSI only — exclude for bare metal or wrap in `#ifdef CONSOLE`. | |
| 138 | + |
| 139 | +### Floating point |
| 140 | + |
| 141 | +- **cc65** float support is limited/slow; **z88dk** has floats but size/speed cost. |
| 142 | +- **Recommendation:** Phase 1 = **integers only** or **fixed-point** (`typedef int16_t fixed_t` with fixed shift). Exclude `SIN`/`COS`/… unless linking **libm** and accepting code size — or provide **integer trig tables** for demos only. |
| 143 | + |
| 144 | +### PETSCII tokens in strings `{RED}` etc. |
| 145 | + |
| 146 | +- **Source transform** exists in the interpreter; for C output either: |
| 147 | + - **pre-expand** to `CHR$(n)` / concatenation in the emitter, or |
| 148 | + - exclude and require **ASCII** literals in v1. |
| 149 | + |
| 150 | +--- |
| 151 | + |
| 152 | +## `GOTO` / `GOSUB` lowering |
| 153 | + |
| 154 | +- **Labels → `goto`** in C: valid for **one function** if all labels live in the same generated `main` or one mega-function (cc65 supports `goto` within a function). |
| 155 | +- **GOSUB → `void` functions** or **inline call + return label** — needs stack discipline; **phase 1** can require **structured** code (no `GOSUB`) or only **forward** `GOSUB` to named subs lowered to C functions manually. |
| 156 | + |
| 157 | +Document a **style guide** for “compilable BASIC” (e.g. prefer `WHILE`/`FOR` over `GOTO`). |
| 158 | + |
| 159 | +--- |
| 160 | + |
| 161 | +## Output layout |
| 162 | + |
| 163 | +Suggested emitted files: |
| 164 | + |
| 165 | +- `program.c` — generated code |
| 166 | +- `basrtl.c` / `basrtl.h` — runtime (optional static link) |
| 167 | +- `Makefile` or documented **cc65** / **z88dk** invocations |
| 168 | + |
| 169 | +--- |
| 170 | + |
| 171 | +## Testing strategy |
| 172 | + |
| 173 | +1. **Golden tests**: small `.bas` files → run transpiler → compile with cc65 (sim) or host `gcc -Wall -Werror` first. |
| 174 | +2. **Compare** numeric output with **interpreter** on the same inputs for accepted programs. |
| 175 | +3. Expand subset only when tests exist. |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +## Relation to this repository |
| 180 | + |
| 181 | +- **Parser / AST**: future work; today execution is fused with parsing. |
| 182 | +- **This doc** fixes **expectations** and **exclusions** for anyone implementing `basic2c` or similar. |
| 183 | + |
| 184 | +--- |
| 185 | + |
| 186 | +## References |
| 187 | + |
| 188 | +- [cc65](https://cc65.github.io/) — C compiler and assembler for 6502. |
| 189 | +- [z88dk](https://www.z88dk.org/) — C and asm for Z80 family. |
| 190 | +- `docs/user-functions-plan.md` — full **FUNCTION** semantics in the interpreter (heavy for codegen v1). |
| 191 | +- `docs/meta-directives-plan.md` — `#OPTION` / `#INCLUDE` (treat as build-time, not runtime C). |
0 commit comments