Skip to content

Commit 1a02c35

Browse files
Merge pull request #20 from protegrity/feat/pg-acceptance-and-readme-sync
Improve PostgreSQL acceptance and sync README benchmark snapshot
2 parents 19f2473 + ba8e31a commit 1a02c35

4 files changed

Lines changed: 6877 additions & 206 deletions

File tree

README.md

Lines changed: 53 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -95,13 +95,57 @@ The transpiler applies dialect-specific rewrite rules when converting between di
9595
| Data type mapping | `TEXT``STRING`, `INT``BIGINT` (BigQuery) |
9696
| `BYTEA``BLOB` | Postgres `BYTEA` ↔ MySQL `BLOB` |
9797

98+
## Recent Parser Improvements and Benchmark Snapshot
99+
100+
This repository has recently added tolerance and compatibility fixes across
101+
multiple dialects, including:
102+
103+
- ClickHouse: richer dotted/typed access parsing (`expr.:Type`, `expr.^field`, `.null` field paths), plus broader support for alias-heavy query forms.
104+
- DuckDB: alias-first projection support (`alias: expr`) and DESCRIBE-in-subquery tolerance.
105+
- PostgreSQL: `BIT VARYING(n)` cast tolerance, JSON key/value argument parsing (`'k' : v`), richer `ON CONFLICT (...)` target parsing, and broader geometric/operator-sequence tolerance.
106+
- MySQL: improved UPDATE/DELETE tolerance around ORDER BY, LIMIT, PARTITION, and insert alias edge cases.
107+
- T-SQL / ANSI extensions: tolerance for `WITH XMLNAMESPACES (...)` and additional parser guardrails for mixed corpora.
108+
109+
Benchmark reference:
110+
111+
- [SQL AST Benchmark](https://sql-ast-benchmark.luca.phd/)
112+
113+
### Acceptance Report (latest run)
114+
115+
| dialect | total | accepted | rejected | accept% | panics |
116+
| --- | ---: | ---: | ---: | ---: | ---: |
117+
| postgresql | 29402 | 28089 | 1313 | 95.53% | 0 |
118+
| sqlite | 12119 | 12103 | 16 | 99.87% | 0 |
119+
| mysql | 30220 | 29925 | 295 | 99.02% | 0 |
120+
| clickhouse | 92268 | 91148 | 1120 | 98.79% | 0 |
121+
| duckdb | 41148 | 40029 | 1119 | 97.28% | 0 |
122+
| hive | 41294 | 40774 | 520 | 98.74% | 0 |
123+
| spark_sql | 14464 | 14180 | 284 | 98.04% | 0 |
124+
| trino | 71 | 70 | 1 | 98.59% | 0 |
125+
| tsql | 14782 | 13521 | 1261 | 91.47% | 0 |
126+
| oracle | 21648 | 21608 | 40 | 99.82% | 0 |
127+
| bigquery | 224 | 222 | 2 | 99.11% | 0 |
128+
| redshift | 2992 | 2964 | 28 | 99.06% | 0 |
129+
| multi | 10962 | 9637 | 1325 | 87.91% | 0 |
130+
| **TOTAL** | **311594** | **304270** | **7324** | **97.65%** | **0** |
131+
132+
### Brief Summary of Not-Yet-Supported Query Kinds
133+
134+
The remaining rejects are concentrated in a few recurring categories:
135+
136+
- Non-SQL or shell-like corpus lines mixed into SQL datasets (for example raw text, script fragments, malformed separators).
137+
- Dialect-specific meta commands or client commands (for example backslash-style command lines in PostgreSQL corpora).
138+
- Template or parameterized placeholders that are not concrete SQL syntax until preprocessed.
139+
- Highly dialect-specific operator families and parser corner-cases in advanced analytical or geometric expressions.
140+
- Intentionally malformed or truncated statements in test corpora (unterminated strings/comments, unexpected EOF).
141+
98142
## Quick Start
99143

100144
Add to your `Cargo.toml`:
101145

102146
```toml
103147
[dependencies]
104-
sqlglot-rust = "0.9.37"
148+
sqlglot-rust = "0.10.0"
105149
```
106150

107151
### Parse and generate SQL
@@ -358,17 +402,19 @@ LD_LIBRARY_PATH=target/release ./example
358402
src/
359403
├── ast/ # AST node definitions (~40 expression types, 15 statement types)
360404
├── bin/ # CLI binary (sqlglot) — feature-gated behind "cli"
405+
├── builder/ # Expression builder API (fluent SQL construction)
406+
├── dialects/ # 30 dialect definitions with transform rules
407+
├── diff/ # AST diff / semantic SQL comparison
408+
├── errors/ # Error types
409+
├── executor/ # In-memory SQL execution engine
361410
├── ffi.rs # C-compatible FFI bindings (extern "C" API)
362-
├── tokens/ # Token types (~200+ variants) and tokenizer
363-
├── parser/ # Recursive-descent SQL parser
364411
├── generator/ # SQL code generator
365-
├── dialects/ # 30 dialect definitions with transform rules
412+
├── lib.rs # Public API (parse, generate, transpile)
366413
├── optimizer/ # Query optimization and scope analysis
414+
├── parser/ # Recursive-descent SQL parser
367415
├── planner/ # Logical query planner (execution plan DAG)
368-
├── executor/ # In-memory SQL execution engine
369416
├── schema/ # Schema management (MappingSchema, dialect-aware lookups)
370-
├── errors/ # Error types
371-
└── lib.rs # Public API (parse, generate, transpile)
417+
└── tokens/ # Token types (~200+ variants) and tokenizer
372418
```
373419

374420
## Development

0 commit comments

Comments
 (0)