@@ -95,13 +95,57 @@ The transpiler applies dialect-specific rewrite rules when converting between di
9595| Data type mapping | ` TEXT ` ↔ ` STRING ` , ` INT ` → ` BIGINT ` (BigQuery) |
9696| ` BYTEA ` ↔ ` BLOB ` | Postgres ` BYTEA ` ↔ MySQL ` BLOB ` |
9797
98+ ## Recent Parser Improvements and Benchmark Snapshot
99+
100+ This repository has recently added tolerance and compatibility fixes across
101+ multiple dialects, including:
102+
103+ - ClickHouse: richer dotted/typed access parsing (` expr.:Type ` , ` expr.^field ` , ` .null ` field paths), plus broader support for alias-heavy query forms.
104+ - DuckDB: alias-first projection support (` alias: expr ` ) and DESCRIBE-in-subquery tolerance.
105+ - PostgreSQL: ` BIT VARYING(n) ` cast tolerance, JSON key/value argument parsing (` 'k' : v ` ), richer ` ON CONFLICT (...) ` target parsing, and broader geometric/operator-sequence tolerance.
106+ - MySQL: improved UPDATE/DELETE tolerance around ORDER BY, LIMIT, PARTITION, and insert alias edge cases.
107+ - T-SQL / ANSI extensions: tolerance for ` WITH XMLNAMESPACES (...) ` and additional parser guardrails for mixed corpora.
108+
109+ Benchmark reference:
110+
111+ - [ SQL AST Benchmark] ( https://sql-ast-benchmark.luca.phd/ )
112+
113+ ### Acceptance Report (latest run)
114+
115+ | dialect | total | accepted | rejected | accept% | panics |
116+ | --- | ---: | ---: | ---: | ---: | ---: |
117+ | postgresql | 29402 | 28089 | 1313 | 95.53% | 0 |
118+ | sqlite | 12119 | 12103 | 16 | 99.87% | 0 |
119+ | mysql | 30220 | 29925 | 295 | 99.02% | 0 |
120+ | clickhouse | 92268 | 91148 | 1120 | 98.79% | 0 |
121+ | duckdb | 41148 | 40029 | 1119 | 97.28% | 0 |
122+ | hive | 41294 | 40774 | 520 | 98.74% | 0 |
123+ | spark_sql | 14464 | 14180 | 284 | 98.04% | 0 |
124+ | trino | 71 | 70 | 1 | 98.59% | 0 |
125+ | tsql | 14782 | 13521 | 1261 | 91.47% | 0 |
126+ | oracle | 21648 | 21608 | 40 | 99.82% | 0 |
127+ | bigquery | 224 | 222 | 2 | 99.11% | 0 |
128+ | redshift | 2992 | 2964 | 28 | 99.06% | 0 |
129+ | multi | 10962 | 9637 | 1325 | 87.91% | 0 |
130+ | ** TOTAL** | ** 311594** | ** 304270** | ** 7324** | ** 97.65%** | ** 0** |
131+
132+ ### Brief Summary of Not-Yet-Supported Query Kinds
133+
134+ The remaining rejects are concentrated in a few recurring categories:
135+
136+ - Non-SQL or shell-like corpus lines mixed into SQL datasets (for example raw text, script fragments, malformed separators).
137+ - Dialect-specific meta commands or client commands (for example backslash-style command lines in PostgreSQL corpora).
138+ - Template or parameterized placeholders that are not concrete SQL syntax until preprocessed.
139+ - Highly dialect-specific operator families and parser corner-cases in advanced analytical or geometric expressions.
140+ - Intentionally malformed or truncated statements in test corpora (unterminated strings/comments, unexpected EOF).
141+
98142## Quick Start
99143
100144Add to your ` Cargo.toml ` :
101145
102146``` toml
103147[dependencies ]
104- sqlglot-rust = " 0.9.37 "
148+ sqlglot-rust = " 0.10.0 "
105149```
106150
107151### Parse and generate SQL
@@ -358,17 +402,19 @@ LD_LIBRARY_PATH=target/release ./example
358402src/
359403├── ast/ # AST node definitions (~40 expression types, 15 statement types)
360404├── bin/ # CLI binary (sqlglot) — feature-gated behind "cli"
405+ ├── builder/ # Expression builder API (fluent SQL construction)
406+ ├── dialects/ # 30 dialect definitions with transform rules
407+ ├── diff/ # AST diff / semantic SQL comparison
408+ ├── errors/ # Error types
409+ ├── executor/ # In-memory SQL execution engine
361410├── ffi.rs # C-compatible FFI bindings (extern "C" API)
362- ├── tokens/ # Token types (~200+ variants) and tokenizer
363- ├── parser/ # Recursive-descent SQL parser
364411├── generator/ # SQL code generator
365- ├── dialects/ # 30 dialect definitions with transform rules
412+ ├── lib.rs # Public API (parse, generate, transpile)
366413├── optimizer/ # Query optimization and scope analysis
414+ ├── parser/ # Recursive-descent SQL parser
367415├── planner/ # Logical query planner (execution plan DAG)
368- ├── executor/ # In-memory SQL execution engine
369416├── schema/ # Schema management (MappingSchema, dialect-aware lookups)
370- ├── errors/ # Error types
371- └── lib.rs # Public API (parse, generate, transpile)
417+ └── tokens/ # Token types (~200+ variants) and tokenizer
372418```
373419
374420## Development
0 commit comments