SQL4Json

Query JSON data using SQL — filter, aggregate, sort, and project without a database.

SQL4Json lets you query in-memory JSON data with familiar SQL SELECT syntax. If you know SQL, you already know how to use it. No database, no schema, no setup.

SELECT dept,
       COUNT(*)    AS headcount,
       AVG(salary) AS avgSalary,
       MAX(salary) AS topSalary
FROM $r
WHERE active = true
  AND hire_date BETWEEN '2023-01-01' AND '2024-12-31'
GROUP BY dept
HAVING headcount > 2
ORDER BY avgSalary DESC LIMIT 10

Requirements
Installation
Quick Start
API Reference
SQL Syntax
- SELECT
- FROM
- WHERE
- GROUP BY & HAVING
- ORDER BY
- LIMIT & OFFSET
- DISTINCT
- Subqueries
- CASE Expressions
- JOINs
Window Functions
Functions
Type System
Pipeline Stages
NULL Handling
Error Handling
Security Defaults
Thread Safety
Limitations
License

Requirements

Java 21 or above

Installation

Maven

<dependency>
    <groupId>io.github.mnesimiyilmaz</groupId>
    <artifactId>sql4json</artifactId>
    <version>1.0.0</version>
</dependency>

Gradle

implementation 'io.github.mnesimiyilmaz:sql4json:1.0.0'

Quick Start

import io.github.mnesimiyilmaz.sql4json.SQL4Json;

String json = """
        [
            {"name": "Alice", "age": 30, "dept": "Engineering"},
            {"name": "Bob",   "age": 25, "dept": "Marketing"},
            {"name": "Carol", "age": 35, "dept": "Engineering"}
        ]
        """;

String result = SQL4Json.query(
        "SELECT name, age FROM $r WHERE age > 28 ORDER BY age DESC",
        json
);
// [{"name":"Carol","age":35},{"name":"Alice","age":30}]

API Reference

Simple Query

Parse and execute in a single call.

// Returns JSON string
String result = SQL4Json.query("SELECT * FROM $r WHERE active = true", jsonString);

// Returns JsonValue (library's own JSON abstraction)
JsonValue result = SQL4Json.queryAsJsonValue("SELECT * FROM $r", jsonString);

// With custom limits
Sql4jsonSettings settings = Sql4jsonSettings.builder()
        .limits(l -> l.maxRowsPerQuery(10_000))
        .build();
String result = SQL4Json.query(sql, jsonString, settings);

Prepared Query

Parse once, execute many times against different data. Analogous to JDBC PreparedStatement.

PreparedQuery query = SQL4Json.prepare("SELECT name FROM $r WHERE age > 25");

for(String json :dataList){
    String result = query.execute(json);  // no re-parsing
}

Engine with Data Binding

Bind JSON data once at build time and run multiple queries against it. Optionally cache query results with an LRU cache.

// Minimal — no cache
SQL4JsonEngine engine = SQL4Json.engine()
        .data(jsonString)
        .build();

engine.query("SELECT * FROM $r WHERE active = true");
engine.query("SELECT dept, COUNT(*) AS cnt FROM $r GROUP BY dept");

// With default LRU cache (64 entries)
SQL4JsonEngine cached = SQL4Json.engine()
        .settings(Sql4jsonSettings.builder()
                .cache(c -> c.queryResultCacheEnabled(true))
                .build())
        .data(jsonString)
        .build();

cached.query(sql);  // cache miss — executes and caches result
cached.query(sql);  // cache hit — returns cached result instantly

// With custom-size cache
SQL4JsonEngine engine = SQL4Json.engine()
        .settings(Sql4jsonSettings.builder()
                .cache(c -> c.queryResultCacheEnabled(true).queryResultCacheSize(256))
                .build())
        .data(jsonString)
        .build();

The engine is designed to be long-lived (e.g., one per dataset). It holds no external resources and needs no cleanup.

You can also plug in your own cache implementation via the QueryResultCache interface:

SQL4JsonEngine engine = SQL4Json.engine()
        .settings(Sql4jsonSettings.builder()
                .cache(c -> c.customCache(myCaffeineCache))
                .build())
        .data(jsonString)
        .build();

Multi-Source Queries (JOINs)

For queries that JOIN across multiple JSON sources, bind named data sources:

// Static API — one-off multi-source query
String result = SQL4Json.query(
        "SELECT u.name AS name, o.amount AS amount FROM users u JOIN orders o ON u.id = o.user_id",
        Map.of("users", usersJson, "orders", ordersJson));

// Engine API — bind named sources for repeated queries
SQL4JsonEngine engine = SQL4Json.engine()
        .data("users", usersJson)
        .data("orders", ordersJson)
        .build();

String result = engine.query(
        "SELECT u.name AS name, o.amount AS amount FROM users u JOIN orders o ON u.id = o.user_id");

Named sources and unnamed data can coexist in the same engine — unnamed for $r queries, named for JOIN queries.

Custom JSON Codec

SQL4Json ships with a built-in JSON parser and serializer (zero external dependencies). If you prefer to use your own JSON library (Jackson, Gson, etc.), implement the JsonCodec interface:

import io.github.mnesimiyilmaz.sql4json.types.JsonCodec;

public class GsonJsonCodec implements JsonCodec {
    private final Gson gson = new Gson();

    @Override
    public JsonValue parse(String json) { /* parse with Gson, convert to JsonValue */ }

    @Override
    public String serialize(JsonValue value) { /* convert JsonValue, serialize with Gson */ }
}

// Wrap the codec in settings and pass to any API entry point
Sql4jsonSettings settings = Sql4jsonSettings.builder()
        .codec(new GsonJsonCodec())
        .build();

String result = SQL4Json.query(sql, json, settings);
PreparedQuery q = SQL4Json.prepare(sql, settings);
SQL4JsonEngine engine = SQL4Json.engine().settings(settings).data(json).build();

Example: Jackson-based JsonCodec

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.*;
import io.github.mnesimiyilmaz.sql4json.types.JsonCodec;
import io.github.mnesimiyilmaz.sql4json.types.JsonValue;
import io.github.mnesimiyilmaz.sql4json.types.JsonValue.*;

import java.util.ArrayList;
import java.util.LinkedHashMap;

public class JacksonJsonCodec implements JsonCodec {

    private final ObjectMapper mapper = new ObjectMapper();

    @Override
    public JsonValue parse(String json) {
        try {
            return convertNode(mapper.readTree(json));
        } catch (Exception e) {
            throw new RuntimeException("Failed to parse JSON", e);
        }
    }

    @Override
    public String serialize(JsonValue value) {
        try {
            return mapper.writeValueAsString(toJsonNode(value));
        } catch (Exception e) {
            throw new RuntimeException("Failed to serialize JSON", e);
        }
    }

    private JsonValue convertNode(JsonNode node) {
        return switch (node) {
            case ObjectNode obj -> {
                var map = new LinkedHashMap<String, JsonValue>();
                obj.fields().forEachRemaining(e -> map.put(e.getKey(), convertNode(e.getValue())));
                yield new JsonObjectValue(map);
            }
            case ArrayNode arr -> {
                var list = new ArrayList<JsonValue>(arr.size());
                arr.forEach(n -> list.add(convertNode(n)));
                yield new JsonArrayValue(list);
            }
            case TextNode t -> new JsonStringValue(t.asText());
            case NumericNode n -> new JsonNumberValue(n.numberValue());
            case BooleanNode b -> new JsonBooleanValue(b.asBoolean());
            case NullNode ignored -> JsonNullValue.INSTANCE;
            default -> JsonNullValue.INSTANCE;
        };
    }

    private JsonNode toJsonNode(JsonValue value) {
        return switch (value) {
            case JsonObjectValue(var map) -> {
                var obj = mapper.createObjectNode();
                map.forEach((k, v) -> obj.set(k, toJsonNode(v)));
                yield obj;
            }
            case JsonArrayValue(var list) -> {
                var arr = mapper.createArrayNode();
                list.forEach(v -> arr.add(toJsonNode(v)));
                yield arr;
            }
            case JsonStringValue(var s) -> new TextNode(s);
            case JsonNumberValue(var n) -> mapper.getNodeFactory().numberNode(n.doubleValue());
            case JsonBooleanValue(var b) -> BooleanNode.valueOf(b);
            case JsonNullValue ignored -> NullNode.getInstance();
        };
    }
}

Usage:

String result = SQL4Json.query(sql, json,
    Sql4jsonSettings.builder()
        .codec(new JacksonJsonCodec())
        .build());

This is an example implementation. Adapt ObjectMapper configuration, error handling, and security limits to your requirements.

SQL Syntax

SQL4Json supports a subset of SQL SELECT. All keywords are case-insensitive.

SELECT

Select all fields, specific fields, or use aliases with dot-notation for structured output:

SELECT * FROM $r

SELECT name, age FROM $r

SELECT fullname,
       id AS userId,
       username AS user.username
FROM $r

Dotted aliases create nested JSON in the output:

[
  {
    "fullname": "Alice",
    "userId": 1,
    "user": {
      "username": "alice"
    }
  }
]

FROM

$r references the root JSON data. Use dot-notation to drill into nested structures:

-- Query the root array
SELECT * FROM $r

-- Query a nested array
SELECT * FROM $r.response.data.items

If the input is a single JSON object (not an array), it is automatically wrapped in an array.

WHERE

Filter rows with comparison operators, logical connectives, and parenthesized groups:

SELECT *
FROM $r
WHERE age >= 18
  AND status IN ('active', 'pending')
  AND name LIKE 'A%'
  AND score BETWEEN 50 AND 100
  AND email IS NOT NULL

Operators:

Operator	Example
`=`, `!=`	`status = 'active'`
`>`, `<`, `>=`, `<=`	`age >= 18`
`LIKE`, `NOT LIKE`	`name LIKE '%son'` (`%` = any chars, `_` = one char)
`IN`, `NOT IN`	`dept IN ('IT', 'HR')`
`BETWEEN`, `NOT BETWEEN`	`age BETWEEN 18 AND 65`
`IS NULL`, `IS NOT NULL`	`email IS NOT NULL`
`AND`, `OR`	`a > 1 AND (b = 2 OR c = 3)`

AND has higher precedence than OR. Use parentheses to override.

Functions can be used on the left-hand side of any comparison, including nested calls:

WHERE LOWER(name) = 'alice'
WHERE CAST(age AS STRING) = '25'
WHERE TRIM(NULLIF(name, '')) = 'Alice'
WHERE LENGTH(TRIM(name)) > 5

GROUP BY & HAVING

Aggregate data with GROUP BY and filter groups with HAVING:

SELECT dept,
       COUNT(*)              AS headcount,
       SUM(salary)           AS totalPay,
       ROUND(AVG(salary), 2) AS avgPay,
       MAX(salary)           AS topPay,
       MIN(salary)           AS lowPay
FROM $r
GROUP BY dept
HAVING headcount > 5
   AND avgPay > 50000

HAVING conditions must reference aliases defined in the SELECT clause.

All aggregate functions except COUNT(*) operate on non-null values only.

Aggregate functions accept nested expressions — for example, AVG(NULLIF(salary, 0)) excludes zeros from the average.

ORDER BY

Sort by one or more fields. Default direction is ASC. Expressions are supported:

SELECT * FROM $r
ORDER BY age DESC, name ASC

SELECT * FROM $r
ORDER BY LENGTH(TRIM(name)) ASC

For grouped results, use SELECT aliases:

SELECT dept, AVG(salary) AS avgPay
FROM $r
GROUP BY dept
ORDER BY avgPay DESC

LIMIT & OFFSET

Restrict result size and paginate:

-- First 10 results
SELECT *
FROM $r
ORDER BY created_at DESC LIMIT 10

-- Page 3 (20 items per page)
SELECT *
FROM $r
ORDER BY id LIMIT 20
OFFSET 40

DISTINCT

Remove duplicate rows from results:

SELECT DISTINCT dept FROM $r

SELECT DISTINCT dept, status FROM $r ORDER BY dept

Subqueries

Use standard SQL FROM subqueries. The inner query executes first, and its result becomes the input for the outer query.

SELECT dept, COUNT(*) AS cnt
FROM (SELECT *
      FROM $r
      WHERE active = true)
GROUP BY dept
ORDER BY cnt DESC

CASE Expressions

SQL4Json supports both simple and searched CASE expressions, usable in SELECT, WHERE, ORDER BY, GROUP BY, and HAVING.

Simple CASE — compares an expression against values:

SELECT name,
       CASE dept
         WHEN 'eng' THEN 'Engineering'
         WHEN 'hr'  THEN 'Human Resources'
         ELSE 'Other'
       END AS department
FROM $r

Searched CASE — evaluates boolean conditions:

SELECT name,
       CASE
         WHEN salary > 100000 THEN 'senior'
         WHEN salary > 60000  THEN 'mid'
         ELSE 'junior'
       END AS level
FROM $r

CASE expressions nest freely with functions and other CASE expressions:

-- Function inside CASE
SELECT CASE WHEN LENGTH(name) > 4 THEN 'long' ELSE 'short' END AS label FROM $r

-- CASE inside function
SELECT UPPER(CASE WHEN active THEN 'yes' ELSE 'no' END) AS flag FROM $r

-- Nested CASE
SELECT CASE WHEN dept = 'eng'
         THEN CASE WHEN salary > 100000 THEN 'senior eng' ELSE 'eng' END
         ELSE 'other'
       END AS category
FROM $r

If no WHEN clause matches and no ELSE is provided, the result is NULL. Simple CASE uses SQL NULL semantics: CASE x WHEN NULL never matches — use CASE WHEN x IS NULL instead.

JOINs

Join across multiple JSON sources using INNER JOIN, LEFT JOIN, or RIGHT JOIN with equality-based ON conditions:

-- INNER JOIN (both spellings)
SELECT u.name AS name, o.amount AS amount
FROM users u JOIN orders o ON u.id = o.user_id

SELECT u.name AS name, o.amount AS amount
FROM users u INNER JOIN orders o ON u.id = o.user_id

-- LEFT JOIN — preserves all left-side rows, NULLs for unmatched right side
SELECT u.name AS name, o.amount AS amount
FROM users u LEFT JOIN orders o ON u.id = o.user_id

-- RIGHT JOIN — preserves all right-side rows, NULLs for unmatched left side
SELECT u.name AS name, o.amount AS amount
FROM users u RIGHT JOIN orders o ON u.id = o.user_id

-- Chained JOINs
SELECT u.name AS user_name, o.amount AS amount, p.name AS product
FROM users u
JOIN orders o ON u.id = o.user_id
LEFT JOIN products p ON o.product_id = p.id

-- Multi-column ON
SELECT * FROM a JOIN b ON a.x = b.x AND a.y = b.y

-- JOIN with all clauses
SELECT u.dept AS dept, COUNT(*) AS cnt, AVG(o.amount) AS avg_amount
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.dept
HAVING cnt >= 5
ORDER BY avg_amount DESC
LIMIT 10

Table aliases are recommended. Without aliases, table names are used as prefixes (e.g., users.name).

ON conditions support only equality (=) with AND:

ON a.col = b.col — single column
ON a.x = b.x AND a.y = b.y — multiple columns

Output format: Use AS aliases for flat output keys. Without aliases, dotted column refs like u.name produce the key u.name in the output.

Data binding: JOIN queries require named data sources — see Multi-Source Queries in the API Reference.

Window Functions

Window functions perform calculations across a set of rows related to the current row, without collapsing rows like GROUP BY. Use the OVER clause with optional PARTITION BY and ORDER BY:

-- Ranking within departments
SELECT name, dept, salary,
    ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS dept_rank,
    RANK()       OVER (ORDER BY salary DESC)                  AS global_rank
FROM $r

-- Running context: previous and next salary
SELECT name, salary,
    LAG(salary)  OVER (ORDER BY salary ASC) AS prev_salary,
    LEAD(salary) OVER (ORDER BY salary ASC) AS next_salary
FROM $r

-- Aggregate windows — same value for every row in the partition
SELECT name, dept, salary,
    SUM(salary) OVER (PARTITION BY dept) AS dept_total,
    AVG(salary) OVER (PARTITION BY dept) AS dept_avg,
    COUNT(*)    OVER (PARTITION BY dept) AS dept_count
FROM $r

-- NTILE — divide into buckets
SELECT name, salary,
    NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM $r

-- Combine with WHERE, ORDER BY, LIMIT
SELECT name, salary,
    RANK() OVER (ORDER BY salary DESC) AS rnk
FROM $r
WHERE dept = 'Engineering'
ORDER BY rnk
LIMIT 10

Ranking Functions

Function	Description
`ROW_NUMBER()`	Sequential number (1, 2, 3, ...) per partition
`RANK()`	Rank with gaps on ties (1, 1, 3, ...)
`DENSE_RANK()`	Rank without gaps on ties (1, 1, 2, ...)
`NTILE(n)`	Divide rows into `n` roughly equal buckets

Offset Functions

Function	Description
`LAG(col [, offset])`	Value from `offset` rows before (default 1), NULL at boundary
`LEAD(col [, offset])`	Value from `offset` rows after (default 1), NULL at boundary

Aggregate Window Functions

Aggregate functions (SUM, AVG, COUNT, MIN, MAX) can be used with OVER to compute values across the full partition without collapsing rows. COUNT(*) counts all rows; COUNT(col) counts non-null values.

SELECT name, dept, salary,
    SUM(salary) OVER (PARTITION BY dept) AS dept_total,
    COUNT(*)    OVER ()                  AS total_employees
FROM $r

OVER Clause

OVER () — entire result set is one partition
OVER (PARTITION BY col1, col2) — group rows into partitions
OVER (ORDER BY col ASC/DESC) — order within partition (required for ranking/offset functions)
OVER (PARTITION BY dept ORDER BY salary DESC) — both

Note: Window functions are only valid in SELECT. They cannot appear in WHERE, HAVING, or GROUP BY.

Functions

String Functions

Function	Description	Example
`LOWER(str)`	Lowercase (optional locale: `LOWER(str, 'tr-TR')`)	`LOWER(name)`
`UPPER(str)`	Uppercase (optional locale)	`UPPER(name, 'en-US')`
`CONCAT(a, b, ...)`	Concatenate strings	`CONCAT(first, ' ', last)`
`SUBSTRING(str, start, len)`	Extract substring (1-based start, length optional)	`SUBSTRING(name, 1, 3)`
`TRIM(str)`	Remove leading/trailing whitespace	`TRIM(name)`
`LENGTH(str)`	String length	`LENGTH(name)`
`REPLACE(str, search, replacement)`	Replace all occurrences	`REPLACE(name, ' ', '_')`
`LEFT(str, n)`	First n characters	`LEFT(name, 3)`
`RIGHT(str, n)`	Last n characters	`RIGHT(name, 3)`
`LPAD(str, len, pad)`	Left-pad to length	`LPAD(id, 5, '0')`
`RPAD(str, len, pad)`	Right-pad to length	`RPAD(name, 20, '.')`
`REVERSE(str)`	Reverse string	`REVERSE(name)`
`POSITION(substr, str)`	Find position (1-based, 0 if not found)	`POSITION('a', name)`

Math Functions

Function	Description	Example
`ABS(n)`	Absolute value	`ABS(delta)`
`ROUND(n, decimals)`	Round (decimals optional, default 0)	`ROUND(price, 2)`
`CEIL(n)`	Ceiling (round up)	`CEIL(score)`
`FLOOR(n)`	Floor (round down)	`FLOOR(score)`
`MOD(a, b)`	Modulo	`MOD(id, 10)`
`POWER(base, exp)`	Exponentiation	`POWER(n, 2)`
`SQRT(n)`	Square root	`SQRT(area)`
`SIGN(n)`	Returns -1, 0, or 1	`SIGN(balance)`

Date & Time Functions

Function	Description	Example
`TO_DATE(str, pattern?)`	Parse date string (ISO format if no pattern)	`TO_DATE(d, 'yyyy-MM-dd')`
`NOW()`	Current date and time	`NOW()`
`YEAR(d)`	Extract year	`YEAR(created_at)`
`MONTH(d)`	Extract month (1-12)	`MONTH(created_at)`
`DAY(d)`	Extract day of month	`DAY(created_at)`
`HOUR(dt)`	Extract hour (0 for date-only)	`HOUR(timestamp)`
`MINUTE(dt)`	Extract minute	`MINUTE(timestamp)`
`SECOND(dt)`	Extract second	`SECOND(timestamp)`
`DATE_ADD(d, amount, unit)`	Add to date (units: DAYS, MONTHS, HOURS, etc.)	`DATE_ADD(d, 30, 'DAYS')`
`DATE_DIFF(d1, d2, unit)`	Difference between dates	`DATE_DIFF(end, start, 'DAYS')`

TO_DATE without a pattern tries ISO datetime (yyyy-MM-dd'T'HH:mm:ss) first, then ISO date (yyyy-MM-dd).

Conversion Functions

Function	Description	Example
`CAST(expr AS type)`	Type conversion	`CAST(age AS STRING)`
`NULLIF(a, b)`	Returns NULL if `a = b`, otherwise `a`	`NULLIF(status, 'N/A')`
`COALESCE(a, b, ...)`	First non-null argument	`COALESCE(nickname, name)`

CAST target types: STRING, NUMBER, INTEGER, DECIMAL, BOOLEAN, DATE, DATETIME

Aggregate Functions

Used with GROUP BY. All operate on non-null values except COUNT(*).

Function	Description
`COUNT(field)`	Count of non-null values
`COUNT(*)`	Count of all rows including nulls
`SUM(field)`	Sum of numeric values
`AVG(field)`	Average of numeric values
`MIN(field)`	Minimum value
`MAX(field)`	Maximum value

Nested Function Calls

Functions nest arbitrarily — in SELECT, WHERE, GROUP BY, ORDER BY, and HAVING:

-- Exclude zeros from average, then round
SELECT dept, ROUND(AVG(NULLIF(salary, 0)), 2) AS avg_salary
FROM $r
GROUP BY dept

-- Pad trimmed values
SELECT LPAD(TRIM(NULLIF(name, '')), 10, '*') AS padded
FROM $r

-- Nested functions in WHERE and ORDER BY
SELECT *
FROM $r
WHERE TRIM(NULLIF(name, '')) = 'Alice'
ORDER BY LENGTH(TRIM(name)) ASC

Type System

SQL4Json maps JSON values to a sealed type hierarchy for type-safe processing:

JSON Type	SQL4Json Type	Notes
String	`SqlString`
Number	`SqlNumber`	Integer and floating-point
Boolean	`SqlBoolean`	`true` / `false` literals
Null	`SqlNull`	Singleton
Date string	`SqlDate`	Via `TO_DATE()` or `CAST(x AS DATE)`
DateTime string	`SqlDateTime`	Via `TO_DATE()`, `NOW()`, or `CAST(x AS DATETIME)`

The JsonValue sealed interface provides the library's own JSON abstraction. The public API accepts and returnsString or JsonValue with zero external dependencies. You can plug in your own JSON library via the JsonCodecinterface.

Pipeline Stages

SQL4Json executes queries through a staged pipeline. Each stage transforms the row stream and is either lazy (processes rows one at a time, O(1) memory) or materializing (buffers the entire input, O(n) memory).

WHERE → GROUP BY → HAVING → WINDOW → ORDER BY → LIMIT → SELECT → DISTINCT
(lazy)  (mat.)    (lazy)   (mat.)    (mat.)     (mat.)  (lazy)   (mat.)

Stage	SQL Clause	Behavior	Row-count Enforced	Notes
WHERE	`WHERE`	Lazy	STREAMING	Filters rows as they stream through
GROUP BY	`GROUP BY`	Materializing	GROUP_BY	Collects all rows, groups by key, applies aggregates
HAVING	`HAVING`	Lazy	—	Filters groups after aggregation
WINDOW	`OVER (...)`	Materializing	WINDOW	Partitions, sorts, computes window function values
ORDER BY	`ORDER BY`	Materializing	ORDER_BY	Sorts result set; fused with LIMIT into a bounded heap (TopN optimization) when both are present
LIMIT	`LIMIT` / `OFFSET`	Materializing	—	Truncates result set
SELECT	`SELECT`	Lazy	—	Projects columns, evaluates expressions and aliases
DISTINCT	`DISTINCT`	Materializing	DISTINCT	Eliminates duplicate rows

Exceeding maxRowsPerQuery at any materializing stage throws SQL4JsonExecutionException. Configure via:

Sql4jsonSettings.builder()
    .limits(l -> l.maxRowsPerQuery(50_000))
    .build();

NULL Handling

SQL4Json follows these rules for NULL values:

Non-existent fields are treated as NULL
Any comparison involving NULL returns false — matches standard SQL semantics where NULL represents an unknown value. NULL = NULL, NULL != x, NULL > x, etc. all evaluate to false.
Use IS NULL / IS NOT NULL to test for NULL values
Aggregate functions (SUM, AVG, MIN, MAX, COUNT(field)) ignore NULL values
COUNT(*) counts all rows including those with NULL fields
COALESCE returns the first non-null argument

Error Handling

SQL4Json uses a sealed exception hierarchy:

SQL4JsonException (sealed base)
  +-- SQL4JsonParseException      -- syntax errors (includes line & column position)
  +-- SQL4JsonExecutionException  -- runtime errors during query execution

try{
    String result = SQL4Json.query(sql, json);
}catch(SQL4JsonParseException e){
        // e.getLine() is 1-based, e.getCharPosition() is 0-based
        System.err.println("Syntax error at line "+e.getLine()+", col "+e.getCharPosition() +": "+e.getMessage());
}catch(SQL4JsonExecutionException e){
        System.err.println("Execution error: "+e.getMessage());
}

Security Defaults

SQL4Json enforces conservative limits by default to protect against denial-of-service and information-disclosure risks. All limits are enforced by default via SQL4Json.query(sql, json); pass a custom Sql4jsonSettings to relax or tighten them.

Setting	Default	Protects against	When to loosen
`security.maxLikeWildcards`	16	Soft ReDoS via `LIKE '%a%b%c%...'` patterns	Queries with many literal wildcards
`security.redactErrorDetails`	true	Leaking data source names, format strings, or user input into error messages (multi-tenant info disclosure)	Single-tenant dev/debug where verbose errors help
`limits.maxSqlLength`	65 536 (64 KiB)	Unbounded SQL input consuming parser memory	Generated SQL longer than 64 KiB
`limits.maxSubqueryDepth`	16	Recursion bomb via deeply nested FROM subqueries	Programmatically generated deep nestings
`limits.maxInListSize`	1 024	Parse-time amplification attacks via huge `IN (...)` lists	Bulk filter UIs that pass large allow-lists
`limits.maxRowsPerQuery`	1 000 000	Out-of-memory at materializing pipeline stages (GROUP BY, ORDER BY, WINDOW, JOIN, DISTINCT, and the final pipeline / streaming sink)	Batch queries over datasets larger than 1M rows
`cache.likePatternCacheSize`	1 024	Unbounded growth of compiled-LIKE-pattern cache (cache-poisoning DoS)	High-cardinality hot-pattern workloads
`cache.queryResultCacheEnabled`	false	Unbounded result memory on engines that don't need caching	When you want repeated identical queries to hit a result cache
`cache.queryResultCacheSize`	64	Cache oversize when enabled	Workloads with more than 64 hot queries
`codec.maxInputLength`	10 MiB	Parser memory exhaustion via oversize JSON input	Known-large payloads
`codec.maxNestingDepth`	64	Stack/recursion blowup on deeply nested JSON	Legitimately deep JSON documents
`codec.maxStringLength`	1 MiB	Memory exhaustion via a single huge string value	JSON contains embedded base64 blobs, etc.
`codec.maxNumberLength`	64	DoS via excessively long number literals	N/A — almost never needs to be loosened
`codec.maxPropertyNameLength`	1 024	DoS via excessively long object keys	N/A — almost never needs to be loosened
`codec.maxArrayElements`	1 000 000	Memory exhaustion via a single massive array	Datasets with >1M elements per JSON array
`codec.duplicateKeyPolicy`	REJECT	HTTP-Parameter-Pollution-style desync between systems that disagree on which duplicate wins (RFC 8259 ambiguity)	Legacy data sources that rely on last-wins or first-wins behaviour

To override specific limits, build a custom Sql4jsonSettings and pass it to any entry point:

Sql4jsonSettings strict = Sql4jsonSettings.builder()
        .limits(l -> l.maxRowsPerQuery(10_000).maxSqlLength(8 * 1024))
        .codec(new DefaultJsonCodec(
                DefaultJsonCodecSettings.builder()
                        .maxInputLength(50 * 1024 * 1024)
                        .build()))
        .build();

String result = SQL4Json.query(sql, json, strict);

For the Javadoc on each setting, see Sql4jsonSettings, SecuritySettings, LimitsSettings, CacheSettings, and DefaultJsonCodecSettings.

Thread Safety

PreparedQuery is immutable and safe to share across threads.
SQL4JsonEngine binds immutable data and uses a synchronized cache — safe for concurrent access from multiple threads.
Static SQL4Json methods are stateless and thread-safe.
Each query execution creates its own scoped state (rows, interner, pipeline) — no cross-query interference.

Limitations

In-memory only. The entire JSON dataset must fit in the JVM heap.
SELECT only. No INSERT, UPDATE, DELETE, CREATE, or DDL operations.
No FULL OUTER JOIN or CROSS JOIN. Only INNER, LEFT, and RIGHT JOINs with equality ON conditions.
No UNION / INTERSECT / EXCEPT. One query, one result set.
No window frame clause. Window functions operate over the full partition. ROWS BETWEEN N PRECEDING AND M FOLLOWING is not supported.

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
.mvn/wrapper		.mvn/wrapper
src		src
.claudeignore		.claudeignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Folders and files

Latest commit

History

Repository files navigation

SQL4Json

Table of Contents

Requirements

Installation

Maven

Gradle

Quick Start

API Reference

Simple Query

Prepared Query

Engine with Data Binding

Multi-Source Queries (JOINs)

Custom JSON Codec

SQL Syntax

SELECT

FROM

WHERE

GROUP BY & HAVING

ORDER BY

LIMIT & OFFSET

DISTINCT

Subqueries

CASE Expressions

JOINs

Window Functions

Ranking Functions

Offset Functions

Aggregate Window Functions

OVER Clause

Functions

String Functions

Math Functions

Date & Time Functions

Conversion Functions

Aggregate Functions

Nested Function Calls

Type System

Pipeline Stages

NULL Handling

Error Handling

Security Defaults

Thread Safety

Limitations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages