Skip to content

Commit a2d4c86

Browse files
cursoragentomiq
andcommitted
Add basic-to-C transpiler plan doc (cc65/z88dk subset and exclusions)
Document architecture, phase-1 subset, explicit feature exclusions, GOTO lowering notes, and testing. Link from README; changelog Unreleased. Co-authored-by: Chris Garrett <chris@chrisg.com>
1 parent 3a78bf1 commit a2d4c86

3 files changed

Lines changed: 195 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
### Unreleased
44

5+
- **Documentation**: **`docs/basic-to-c-transpiler-plan.md`** — design notes for a future **BASIC → C** backend (**cc65** / **z88dk**), recommended subset, and **explicit exclusions** (host I/O, `EVAL`, graphics, etc.).
6+
57
- **IF conditions with parentheses**: **`IF (J=F OR A$=" ") AND SF=0 THEN`** failed with **Missing THEN** because **`eval_simple_condition`** used **`eval_expr`** inside **`(`**, which does not parse relational **`=`**. Leading **`(`** now distinguishes **`(expr) relop rhs`** (e.g. **`(1+1)=2`**) from boolean groups **`(… OR …)`** via **`eval_condition`**. Regression: **`tests/if_paren_condition_test.bas`**.
68

79
- **Canvas WASM `PEEK(56320+n)` keyboard**: **`key_state[]`** was never updated in the browser (only **basic-gfx** / Raylib did). **`canvas.html`** now drives **`wasm_gfx_key_state_set`** / **`wasm_gfx_key_state_clear`** on key up/down (A–Z, 0–9, arrows, Escape, Space, Enter, Tab) so **`examples/gfx_jiffy_game_demo.bas`**-style **`PEEK(KEYBASE+…)`** works.

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -452,6 +452,8 @@ The `examples` folder (included in release archives) contains:
452452

453453
This README describes the current feature set of the interpreter as implemented in `basic.c` and is subject to change without notice.
454454

455+
**Future / tooling (not implemented):** a possible **BASIC → C** transpiler for **cc65** / **z88dk** is outlined in [`docs/basic-to-c-transpiler-plan.md`](docs/basic-to-c-transpiler-plan.md), including which language features should be **excluded** from a first version.
456+
455457
---
456458

457459
## 🛠️ Building from Source

docs/basic-to-c-transpiler-plan.md

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# RGC-BASIC → C output (cc65 / z88dk) — design notes
2+
3+
This document sketches how the **RGC-BASIC** front end could be extended with a **C code generator** so programs can be compiled with **cc65** (6502 targets) or **z88dk** (Z80 and others), instead of (or in addition to) interpretation.
4+
5+
It is **not** implemented today; it is a roadmap for a **subset compiler / transpiler** with a **portable C** intermediate step.
6+
7+
---
8+
9+
## Goals
10+
11+
1. **Emit C source** that compiles with a standard C toolchain aimed at 8-bit hosts.
12+
2. **Share parsing and semantic analysis** with the interpreter where possible (same token/AST/IR story).
13+
3. **Document explicitly** what must be **excluded** or **restricted** so generated code is realistic on cc65/z88dk.
14+
15+
---
16+
17+
## Why C first (not raw 6502/Z80 asm)
18+
19+
- **One backend** for many CPUs: cc65 (6502), SDCC/z88dk (Z80), etc.
20+
- **Runtime in C**: `printf`, string helpers, and math can live in `basrtl.c` and be tuned per platform.
21+
- **Asm later**: once a stable IR exists, a direct assembler backend is optional.
22+
23+
---
24+
25+
## Architecture (high level)
26+
27+
| Stage | Role |
28+
|--------|------|
29+
| Lex / parse | Same as today; optionally build an **AST** or **IR** instead of immediate execution. |
30+
| Semantic pass | Type/size rules for the **compilable subset**; reject unsupported constructs with clear errors. |
31+
| C codegen | Walk IR → `functions`, `static` data for `DATA`, `switch`/labels for `GOTO`, etc. |
32+
| `basrtl.h` / `basrtl.c` | **Runtime**: string pool or heap policy, `INPUT`/`PRINT` stubs, optional fixed-point math. |
33+
34+
The interpreter’s execution functions (`statement_*`, `eval_*`) are **not** reused as-is for codegen; they inform **what** to emit for each construct.
35+
36+
---
37+
38+
## Recommended compilable subset (phase 1)
39+
40+
Aim for programs that look like **structured BASIC** with **integers** and **simple strings**.
41+
42+
**Statements (initial)**
43+
44+
- `REM` / `'`
45+
- Assignment (`LET` optional)
46+
- `PRINT` with `;` and `,` (map `,` to tab logic or simplified spacing)
47+
- `IF … THEN` (line and block `IF` / `ELSE` / `END IF`) — **no** `THEN` line-number-only goto unless lowered to `goto`
48+
- `FOR` / `NEXT` with numeric loop variable and `STEP`
49+
- `WHILE` / `WEND`, `DO` / `LOOP` / `LOOP UNTIL` / `EXIT`
50+
- `GOTO` / `GOSUB` / `RETURN` with **labels** (emit C `goto` / functions — see below)
51+
- `END` / `STOP`
52+
- `DIM` for **fixed** 1D arrays (numeric and string), bounds known at compile time
53+
- `READ` / `DATA` / `RESTORE` (static data tables in C)
54+
- `DEF FN` for **single-line** numeric functions only (phase 1)
55+
56+
**Expressions (initial)**
57+
58+
- Integer and float **literals**; prefer **16-bit `int`** or **`long`** policy per target
59+
- `+ - * /`, parentheses
60+
- Relational operators in `IF`
61+
- **Intrinsic functions (numeric, phase 1):** `ABS`, `INT`, `SGN`, `SQR` (via `sqrt` if lib linked), `RND` (runtime LCG)
62+
63+
**Strings (phase 1, restricted)**
64+
65+
- Fixed maximum length (e.g. 40 or 80) → `char name[N+1]` or pooled buffers
66+
- `LEFT$`, `RIGHT$`, `MID$`, `LEN`, `CHR$`, `ASC`, `STR$`, `VAL` with **bounded** temporaries
67+
- Concatenation `A$ + B$` only if result fits static policy
68+
69+
---
70+
71+
## Features to **exclude** or **defer** (explicit)
72+
73+
These are either **host-specific**, **dynamic**, or **too heavy** for a first cc65/z88dk pipeline. The transpiler should **reject** them with a clear diagnostic (not silently miscompile).
74+
75+
### Meta and loader
76+
77+
| Feature | Reason to exclude (v1) |
78+
|--------|-------------------------|
79+
| `#INCLUDE` | Needs a compile-time file merge step; use a preprocessor or single concatenated source for the demo. |
80+
| `#OPTION …` | Maps to CLI/cc65 pragmas; handle as **comments** or **separate build flags**, not generated C. |
81+
| Shebang | Strip or ignore. |
82+
83+
### Host / OS / shell (no portable C equivalent on 8-bit without a full OS)
84+
85+
| Feature | Reason |
86+
|--------|--------|
87+
| `SYSTEM`, `EXEC$` | Subprocess/shell — exclude. |
88+
| `ENV$` | Environment variables — exclude unless stubbed empty. |
89+
| `ARGC` / `ARG$` | Could map to `main(argc, argv)` **only** if the emitted program defines `main` and the harness passes args — document as **optional** phase 2. |
90+
91+
### Dynamic execution
92+
93+
| Feature | Reason |
94+
|--------|--------|
95+
| `EVAL(expr$)` | Runtime parse/eval — exclude unless embedding a tiny interpreter (out of scope for v1). |
96+
97+
### Advanced strings and data formats
98+
99+
| Feature | Reason |
100+
|--------|--------|
101+
| `JSON$` | Exclude; large runtime. |
102+
| `REPLACE`, `TRIM$`, `LTRIM$`, `RTRIM$`, `FIELD$` | Can add to `basrtl` later; exclude v1 or implement as C helpers with fixed buffers. |
103+
| `SPLIT` / `JOIN` | Heap/variable field counts — defer or static-max only. |
104+
| `STRING$(n, c)` | OK if `n` is **constant**; exclude if `n` is dynamic beyond max. |
105+
106+
### Arrays and advanced control
107+
108+
| Feature | Reason |
109+
|--------|--------|
110+
| `SORT` | Needs qsort-style helper or typed sort in runtime — defer or single-type v1. |
111+
| `INDEXOF` / `LASTINDEXOF` | Linear search in `basrtl` — optional phase 2. |
112+
| Multi-line `FUNCTION` / `END FUNCTION`, recursion | Needs calling convention and stack — **defer**; allow **DEF FN** one-liners first. |
113+
| `ON expr GOTO` / `GOSUB` | Lower to `switch`/`if` chain — possible in phase 2. |
114+
115+
### Graphics, video memory, sprites (basic-gfx / canvas)
116+
117+
| Feature | Reason |
118+
|--------|--------|
119+
| `SCREEN`, `PSET`, `PRESET`, `LINE`, `COLOR`, `BACKGROUND`, `LOCATE`, `TEXTAT`, `DRAWSPRITE`, `LOADSPRITE`, etc. | Tied to **GfxVideoState** / platform — **out of scope** for generic C unless targeting a **specific** library (e.g. one SDL or one console API) as a **separate** backend. |
120+
| `POKE` / `PEEK` (screen or fake RAM) | Exclude or map to a **named** `uint8_t mem[]` region in generated C if you add a **memory model** later. |
121+
| `LOAD``INTO`, `MEMSET`, `MEMCPY` | binary/GFX — exclude for generic C. |
122+
123+
### File I/O
124+
125+
| Feature | Reason |
126+
|--------|--------|
127+
| `OPEN` / `PRINT#` / `INPUT#` / `GET#` / `CLOSE` / `ST` | C11 `fopen` may exist on some hosted targets; on **bare** cc65, I/O is platform-specific — **exclude** in v1 or emit **stubs** (`putc`/`getc` to serial only) behind `#ifdef`. |
128+
129+
### Interpreter-only or platform-variable semantics
130+
131+
| Feature | Notes |
132+
|--------|--------|
133+
| `TI` / `TI$` | **Jiffy clock** — exclude or replace with a **`unsigned long` tick** updated by a **platform timer** stub (`basrtl_tick()` in IRQ). |
134+
| `PLATFORM$()` | Emit a **string constant** at compile time or stub `"c-generated"`. |
135+
| `RND` | Emit **LCG** in runtime; document seed policy. |
136+
| `SLEEP` | Map to **busy-wait** or **timer** stub per target. |
137+
| `CURSOR ON/OFF` | Terminal/ANSI only — exclude for bare metal or wrap in `#ifdef CONSOLE`. |
138+
139+
### Floating point
140+
141+
- **cc65** float support is limited/slow; **z88dk** has floats but size/speed cost.
142+
- **Recommendation:** Phase 1 = **integers only** or **fixed-point** (`typedef int16_t fixed_t` with fixed shift). Exclude `SIN`/`COS`/… unless linking **libm** and accepting code size — or provide **integer trig tables** for demos only.
143+
144+
### PETSCII tokens in strings `{RED}` etc.
145+
146+
- **Source transform** exists in the interpreter; for C output either:
147+
- **pre-expand** to `CHR$(n)` / concatenation in the emitter, or
148+
- exclude and require **ASCII** literals in v1.
149+
150+
---
151+
152+
## `GOTO` / `GOSUB` lowering
153+
154+
- **Labels → `goto`** in C: valid for **one function** if all labels live in the same generated `main` or one mega-function (cc65 supports `goto` within a function).
155+
- **GOSUB → `void` functions** or **inline call + return label** — needs stack discipline; **phase 1** can require **structured** code (no `GOSUB`) or only **forward** `GOSUB` to named subs lowered to C functions manually.
156+
157+
Document a **style guide** for “compilable BASIC” (e.g. prefer `WHILE`/`FOR` over `GOTO`).
158+
159+
---
160+
161+
## Output layout
162+
163+
Suggested emitted files:
164+
165+
- `program.c` — generated code
166+
- `basrtl.c` / `basrtl.h` — runtime (optional static link)
167+
- `Makefile` or documented **cc65** / **z88dk** invocations
168+
169+
---
170+
171+
## Testing strategy
172+
173+
1. **Golden tests**: small `.bas` files → run transpiler → compile with cc65 (sim) or host `gcc -Wall -Werror` first.
174+
2. **Compare** numeric output with **interpreter** on the same inputs for accepted programs.
175+
3. Expand subset only when tests exist.
176+
177+
---
178+
179+
## Relation to this repository
180+
181+
- **Parser / AST**: future work; today execution is fused with parsing.
182+
- **This doc** fixes **expectations** and **exclusions** for anyone implementing `basic2c` or similar.
183+
184+
---
185+
186+
## References
187+
188+
- [cc65](https://cc65.github.io/) — C compiler and assembler for 6502.
189+
- [z88dk](https://www.z88dk.org/) — C and asm for Z80 family.
190+
- `docs/user-functions-plan.md` — full **FUNCTION** semantics in the interpreter (heavy for codegen v1).
191+
- `docs/meta-directives-plan.md``#OPTION` / `#INCLUDE` (treat as build-time, not runtime C).

0 commit comments

Comments
 (0)