Query JSON data using SQL — filter, aggregate, sort, and project without a database.
SQL4Json lets you query in-memory JSON data with familiar SQL SELECT syntax. If you know SQL, you already know how to use it. No database, no schema, no setup.
SELECT dept,
COUNT(*) AS headcount,
AVG(salary) AS avgSalary,
MAX(salary) AS topSalary
FROM $r
WHERE active = true
AND hire_date BETWEEN '2023-01-01' AND '2024-12-31'
GROUP BY dept
HAVING headcount > 2
ORDER BY avgSalary DESC LIMIT 10- Requirements
- Installation
- Quick Start
- API Reference
- SQL Syntax
- Window Functions
- Functions
- Type System
- Pipeline Stages
- NULL Handling
- Error Handling
- Security Defaults
- Thread Safety
- Limitations
- License
- Java 21 or above
<dependency>
<groupId>io.github.mnesimiyilmaz</groupId>
<artifactId>sql4json</artifactId>
<version>1.0.0</version>
</dependency>implementation 'io.github.mnesimiyilmaz:sql4json:1.0.0'import io.github.mnesimiyilmaz.sql4json.SQL4Json;
String json = """
[
{"name": "Alice", "age": 30, "dept": "Engineering"},
{"name": "Bob", "age": 25, "dept": "Marketing"},
{"name": "Carol", "age": 35, "dept": "Engineering"}
]
""";
String result = SQL4Json.query(
"SELECT name, age FROM $r WHERE age > 28 ORDER BY age DESC",
json
);
// [{"name":"Carol","age":35},{"name":"Alice","age":30}]Parse and execute in a single call.
// Returns JSON string
String result = SQL4Json.query("SELECT * FROM $r WHERE active = true", jsonString);
// Returns JsonValue (library's own JSON abstraction)
JsonValue result = SQL4Json.queryAsJsonValue("SELECT * FROM $r", jsonString);
// With custom limits
Sql4jsonSettings settings = Sql4jsonSettings.builder()
.limits(l -> l.maxRowsPerQuery(10_000))
.build();
String result = SQL4Json.query(sql, jsonString, settings);Parse once, execute many times against different data. Analogous to JDBC PreparedStatement.
PreparedQuery query = SQL4Json.prepare("SELECT name FROM $r WHERE age > 25");
for(String json :dataList){
String result = query.execute(json); // no re-parsing
}Bind JSON data once at build time and run multiple queries against it. Optionally cache query results with an LRU cache.
// Minimal — no cache
SQL4JsonEngine engine = SQL4Json.engine()
.data(jsonString)
.build();
engine.query("SELECT * FROM $r WHERE active = true");
engine.query("SELECT dept, COUNT(*) AS cnt FROM $r GROUP BY dept");
// With default LRU cache (64 entries)
SQL4JsonEngine cached = SQL4Json.engine()
.settings(Sql4jsonSettings.builder()
.cache(c -> c.queryResultCacheEnabled(true))
.build())
.data(jsonString)
.build();
cached.query(sql); // cache miss — executes and caches result
cached.query(sql); // cache hit — returns cached result instantly
// With custom-size cache
SQL4JsonEngine engine = SQL4Json.engine()
.settings(Sql4jsonSettings.builder()
.cache(c -> c.queryResultCacheEnabled(true).queryResultCacheSize(256))
.build())
.data(jsonString)
.build();The engine is designed to be long-lived (e.g., one per dataset). It holds no external resources and needs no cleanup.
You can also plug in your own cache implementation via the QueryResultCache interface:
SQL4JsonEngine engine = SQL4Json.engine()
.settings(Sql4jsonSettings.builder()
.cache(c -> c.customCache(myCaffeineCache))
.build())
.data(jsonString)
.build();For queries that JOIN across multiple JSON sources, bind named data sources:
// Static API — one-off multi-source query
String result = SQL4Json.query(
"SELECT u.name AS name, o.amount AS amount FROM users u JOIN orders o ON u.id = o.user_id",
Map.of("users", usersJson, "orders", ordersJson));
// Engine API — bind named sources for repeated queries
SQL4JsonEngine engine = SQL4Json.engine()
.data("users", usersJson)
.data("orders", ordersJson)
.build();
String result = engine.query(
"SELECT u.name AS name, o.amount AS amount FROM users u JOIN orders o ON u.id = o.user_id");Named sources and unnamed data can coexist in the same engine — unnamed for $r queries, named for JOIN queries.
SQL4Json ships with a built-in JSON parser and serializer (zero external dependencies). If you prefer to use your own
JSON library (Jackson, Gson, etc.), implement the JsonCodec interface:
import io.github.mnesimiyilmaz.sql4json.types.JsonCodec;
public class GsonJsonCodec implements JsonCodec {
private final Gson gson = new Gson();
@Override
public JsonValue parse(String json) { /* parse with Gson, convert to JsonValue */ }
@Override
public String serialize(JsonValue value) { /* convert JsonValue, serialize with Gson */ }
}
// Wrap the codec in settings and pass to any API entry point
Sql4jsonSettings settings = Sql4jsonSettings.builder()
.codec(new GsonJsonCodec())
.build();
String result = SQL4Json.query(sql, json, settings);
PreparedQuery q = SQL4Json.prepare(sql, settings);
SQL4JsonEngine engine = SQL4Json.engine().settings(settings).data(json).build();Example: Jackson-based JsonCodec
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.*;
import io.github.mnesimiyilmaz.sql4json.types.JsonCodec;
import io.github.mnesimiyilmaz.sql4json.types.JsonValue;
import io.github.mnesimiyilmaz.sql4json.types.JsonValue.*;
import java.util.ArrayList;
import java.util.LinkedHashMap;
public class JacksonJsonCodec implements JsonCodec {
private final ObjectMapper mapper = new ObjectMapper();
@Override
public JsonValue parse(String json) {
try {
return convertNode(mapper.readTree(json));
} catch (Exception e) {
throw new RuntimeException("Failed to parse JSON", e);
}
}
@Override
public String serialize(JsonValue value) {
try {
return mapper.writeValueAsString(toJsonNode(value));
} catch (Exception e) {
throw new RuntimeException("Failed to serialize JSON", e);
}
}
private JsonValue convertNode(JsonNode node) {
return switch (node) {
case ObjectNode obj -> {
var map = new LinkedHashMap<String, JsonValue>();
obj.fields().forEachRemaining(e -> map.put(e.getKey(), convertNode(e.getValue())));
yield new JsonObjectValue(map);
}
case ArrayNode arr -> {
var list = new ArrayList<JsonValue>(arr.size());
arr.forEach(n -> list.add(convertNode(n)));
yield new JsonArrayValue(list);
}
case TextNode t -> new JsonStringValue(t.asText());
case NumericNode n -> new JsonNumberValue(n.numberValue());
case BooleanNode b -> new JsonBooleanValue(b.asBoolean());
case NullNode ignored -> JsonNullValue.INSTANCE;
default -> JsonNullValue.INSTANCE;
};
}
private JsonNode toJsonNode(JsonValue value) {
return switch (value) {
case JsonObjectValue(var map) -> {
var obj = mapper.createObjectNode();
map.forEach((k, v) -> obj.set(k, toJsonNode(v)));
yield obj;
}
case JsonArrayValue(var list) -> {
var arr = mapper.createArrayNode();
list.forEach(v -> arr.add(toJsonNode(v)));
yield arr;
}
case JsonStringValue(var s) -> new TextNode(s);
case JsonNumberValue(var n) -> mapper.getNodeFactory().numberNode(n.doubleValue());
case JsonBooleanValue(var b) -> BooleanNode.valueOf(b);
case JsonNullValue ignored -> NullNode.getInstance();
};
}
}Usage:
String result = SQL4Json.query(sql, json,
Sql4jsonSettings.builder()
.codec(new JacksonJsonCodec())
.build());This is an example implementation. Adapt
ObjectMapperconfiguration, error handling, and security limits to your requirements.
SQL4Json supports a subset of SQL SELECT. All keywords are case-insensitive.
Select all fields, specific fields, or use aliases with dot-notation for structured output:
SELECT * FROM $r
SELECT name, age FROM $r
SELECT fullname,
id AS userId,
username AS user.username
FROM $rDotted aliases create nested JSON in the output:
[
{
"fullname": "Alice",
"userId": 1,
"user": {
"username": "alice"
}
}
]$r references the root JSON data. Use dot-notation to drill into nested structures:
-- Query the root array
SELECT * FROM $r
-- Query a nested array
SELECT * FROM $r.response.data.itemsIf the input is a single JSON object (not an array), it is automatically wrapped in an array.
Filter rows with comparison operators, logical connectives, and parenthesized groups:
SELECT *
FROM $r
WHERE age >= 18
AND status IN ('active', 'pending')
AND name LIKE 'A%'
AND score BETWEEN 50 AND 100
AND email IS NOT NULLOperators:
| Operator | Example |
|---|---|
=, != |
status = 'active' |
>, <, >=, <= |
age >= 18 |
LIKE, NOT LIKE |
name LIKE '%son' (% = any chars, _ = one char) |
IN, NOT IN |
dept IN ('IT', 'HR') |
BETWEEN, NOT BETWEEN |
age BETWEEN 18 AND 65 |
IS NULL, IS NOT NULL |
email IS NOT NULL |
AND, OR |
a > 1 AND (b = 2 OR c = 3) |
AND has higher precedence than OR. Use parentheses to override.
Functions can be used on the left-hand side of any comparison, including nested calls:
WHERE LOWER(name) = 'alice'
WHERE CAST(age AS STRING) = '25'
WHERE TRIM(NULLIF(name, '')) = 'Alice'
WHERE LENGTH(TRIM(name)) > 5Aggregate data with GROUP BY and filter groups with HAVING:
SELECT dept,
COUNT(*) AS headcount,
SUM(salary) AS totalPay,
ROUND(AVG(salary), 2) AS avgPay,
MAX(salary) AS topPay,
MIN(salary) AS lowPay
FROM $r
GROUP BY dept
HAVING headcount > 5
AND avgPay > 50000HAVING conditions must reference aliases defined in the SELECT clause.
All aggregate functions except COUNT(*) operate on non-null values only.
Aggregate functions accept nested expressions — for example, AVG(NULLIF(salary, 0)) excludes zeros from the average.
Sort by one or more fields. Default direction is ASC. Expressions are supported:
SELECT * FROM $r
ORDER BY age DESC, name ASC
SELECT * FROM $r
ORDER BY LENGTH(TRIM(name)) ASCFor grouped results, use SELECT aliases:
SELECT dept, AVG(salary) AS avgPay
FROM $r
GROUP BY dept
ORDER BY avgPay DESCRestrict result size and paginate:
-- First 10 results
SELECT *
FROM $r
ORDER BY created_at DESC LIMIT 10
-- Page 3 (20 items per page)
SELECT *
FROM $r
ORDER BY id LIMIT 20
OFFSET 40Remove duplicate rows from results:
SELECT DISTINCT dept FROM $r
SELECT DISTINCT dept, status FROM $r ORDER BY deptUse standard SQL FROM subqueries. The inner query executes first, and its result becomes the input for the outer query.
SELECT dept, COUNT(*) AS cnt
FROM (SELECT *
FROM $r
WHERE active = true)
GROUP BY dept
ORDER BY cnt DESCSQL4Json supports both simple and searched CASE expressions, usable in SELECT, WHERE, ORDER BY, GROUP BY, and HAVING.
Simple CASE — compares an expression against values:
SELECT name,
CASE dept
WHEN 'eng' THEN 'Engineering'
WHEN 'hr' THEN 'Human Resources'
ELSE 'Other'
END AS department
FROM $rSearched CASE — evaluates boolean conditions:
SELECT name,
CASE
WHEN salary > 100000 THEN 'senior'
WHEN salary > 60000 THEN 'mid'
ELSE 'junior'
END AS level
FROM $rCASE expressions nest freely with functions and other CASE expressions:
-- Function inside CASE
SELECT CASE WHEN LENGTH(name) > 4 THEN 'long' ELSE 'short' END AS label FROM $r
-- CASE inside function
SELECT UPPER(CASE WHEN active THEN 'yes' ELSE 'no' END) AS flag FROM $r
-- Nested CASE
SELECT CASE WHEN dept = 'eng'
THEN CASE WHEN salary > 100000 THEN 'senior eng' ELSE 'eng' END
ELSE 'other'
END AS category
FROM $rIf no WHEN clause matches and no ELSE is provided, the result is
NULL. Simple CASE uses SQL NULL semantics:CASE x WHEN NULLnever matches — useCASE WHEN x IS NULLinstead.
Join across multiple JSON sources using INNER JOIN, LEFT JOIN, or RIGHT JOIN with equality-based ON conditions:
-- INNER JOIN (both spellings)
SELECT u.name AS name, o.amount AS amount
FROM users u JOIN orders o ON u.id = o.user_id
SELECT u.name AS name, o.amount AS amount
FROM users u INNER JOIN orders o ON u.id = o.user_id
-- LEFT JOIN — preserves all left-side rows, NULLs for unmatched right side
SELECT u.name AS name, o.amount AS amount
FROM users u LEFT JOIN orders o ON u.id = o.user_id
-- RIGHT JOIN — preserves all right-side rows, NULLs for unmatched left side
SELECT u.name AS name, o.amount AS amount
FROM users u RIGHT JOIN orders o ON u.id = o.user_id
-- Chained JOINs
SELECT u.name AS user_name, o.amount AS amount, p.name AS product
FROM users u
JOIN orders o ON u.id = o.user_id
LEFT JOIN products p ON o.product_id = p.id
-- Multi-column ON
SELECT * FROM a JOIN b ON a.x = b.x AND a.y = b.y
-- JOIN with all clauses
SELECT u.dept AS dept, COUNT(*) AS cnt, AVG(o.amount) AS avg_amount
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.dept
HAVING cnt >= 5
ORDER BY avg_amount DESC
LIMIT 10Table aliases are recommended. Without aliases, table names are used as prefixes (e.g., users.name).
ON conditions support only equality (=) with AND:
ON a.col = b.col— single columnON a.x = b.x AND a.y = b.y— multiple columns
Output format: Use AS aliases for flat output keys. Without aliases, dotted column refs like u.name produce the key u.name in the output.
Data binding: JOIN queries require named data sources — see Multi-Source Queries in the API Reference.
Window functions perform calculations across a set of rows related to the current row, without collapsing rows like GROUP BY. Use the OVER clause with optional PARTITION BY and ORDER BY:
-- Ranking within departments
SELECT name, dept, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS dept_rank,
RANK() OVER (ORDER BY salary DESC) AS global_rank
FROM $r
-- Running context: previous and next salary
SELECT name, salary,
LAG(salary) OVER (ORDER BY salary ASC) AS prev_salary,
LEAD(salary) OVER (ORDER BY salary ASC) AS next_salary
FROM $r
-- Aggregate windows — same value for every row in the partition
SELECT name, dept, salary,
SUM(salary) OVER (PARTITION BY dept) AS dept_total,
AVG(salary) OVER (PARTITION BY dept) AS dept_avg,
COUNT(*) OVER (PARTITION BY dept) AS dept_count
FROM $r
-- NTILE — divide into buckets
SELECT name, salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM $r
-- Combine with WHERE, ORDER BY, LIMIT
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rnk
FROM $r
WHERE dept = 'Engineering'
ORDER BY rnk
LIMIT 10| Function | Description |
|---|---|
ROW_NUMBER() |
Sequential number (1, 2, 3, ...) per partition |
RANK() |
Rank with gaps on ties (1, 1, 3, ...) |
DENSE_RANK() |
Rank without gaps on ties (1, 1, 2, ...) |
NTILE(n) |
Divide rows into n roughly equal buckets |
| Function | Description |
|---|---|
LAG(col [, offset]) |
Value from offset rows before (default 1), NULL at boundary |
LEAD(col [, offset]) |
Value from offset rows after (default 1), NULL at boundary |
Aggregate functions (SUM, AVG, COUNT, MIN, MAX) can be used with OVER to compute values across the full partition without collapsing rows. COUNT(*) counts all rows; COUNT(col) counts non-null values.
SELECT name, dept, salary,
SUM(salary) OVER (PARTITION BY dept) AS dept_total,
COUNT(*) OVER () AS total_employees
FROM $rOVER ()— entire result set is one partitionOVER (PARTITION BY col1, col2)— group rows into partitionsOVER (ORDER BY col ASC/DESC)— order within partition (required for ranking/offset functions)OVER (PARTITION BY dept ORDER BY salary DESC)— both
Note: Window functions are only valid in SELECT. They cannot appear in WHERE, HAVING, or GROUP BY.
| Function | Description | Example |
|---|---|---|
LOWER(str) |
Lowercase (optional locale: LOWER(str, 'tr-TR')) |
LOWER(name) |
UPPER(str) |
Uppercase (optional locale) | UPPER(name, 'en-US') |
CONCAT(a, b, ...) |
Concatenate strings | CONCAT(first, ' ', last) |
SUBSTRING(str, start, len) |
Extract substring (1-based start, length optional) | SUBSTRING(name, 1, 3) |
TRIM(str) |
Remove leading/trailing whitespace | TRIM(name) |
LENGTH(str) |
String length | LENGTH(name) |
REPLACE(str, search, replacement) |
Replace all occurrences | REPLACE(name, ' ', '_') |
LEFT(str, n) |
First n characters | LEFT(name, 3) |
RIGHT(str, n) |
Last n characters | RIGHT(name, 3) |
LPAD(str, len, pad) |
Left-pad to length | LPAD(id, 5, '0') |
RPAD(str, len, pad) |
Right-pad to length | RPAD(name, 20, '.') |
REVERSE(str) |
Reverse string | REVERSE(name) |
POSITION(substr, str) |
Find position (1-based, 0 if not found) | POSITION('a', name) |
| Function | Description | Example |
|---|---|---|
ABS(n) |
Absolute value | ABS(delta) |
ROUND(n, decimals) |
Round (decimals optional, default 0) | ROUND(price, 2) |
CEIL(n) |
Ceiling (round up) | CEIL(score) |
FLOOR(n) |
Floor (round down) | FLOOR(score) |
MOD(a, b) |
Modulo | MOD(id, 10) |
POWER(base, exp) |
Exponentiation | POWER(n, 2) |
SQRT(n) |
Square root | SQRT(area) |
SIGN(n) |
Returns -1, 0, or 1 | SIGN(balance) |
| Function | Description | Example |
|---|---|---|
TO_DATE(str, pattern?) |
Parse date string (ISO format if no pattern) | TO_DATE(d, 'yyyy-MM-dd') |
NOW() |
Current date and time | NOW() |
YEAR(d) |
Extract year | YEAR(created_at) |
MONTH(d) |
Extract month (1-12) | MONTH(created_at) |
DAY(d) |
Extract day of month | DAY(created_at) |
HOUR(dt) |
Extract hour (0 for date-only) | HOUR(timestamp) |
MINUTE(dt) |
Extract minute | MINUTE(timestamp) |
SECOND(dt) |
Extract second | SECOND(timestamp) |
DATE_ADD(d, amount, unit) |
Add to date (units: DAYS, MONTHS, HOURS, etc.) | DATE_ADD(d, 30, 'DAYS') |
DATE_DIFF(d1, d2, unit) |
Difference between dates | DATE_DIFF(end, start, 'DAYS') |
TO_DATE without a pattern tries ISO datetime (yyyy-MM-dd'T'HH:mm:ss) first, then ISO date (yyyy-MM-dd).
| Function | Description | Example |
|---|---|---|
CAST(expr AS type) |
Type conversion | CAST(age AS STRING) |
NULLIF(a, b) |
Returns NULL if a = b, otherwise a |
NULLIF(status, 'N/A') |
COALESCE(a, b, ...) |
First non-null argument | COALESCE(nickname, name) |
CAST target types: STRING, NUMBER, INTEGER, DECIMAL, BOOLEAN, DATE, DATETIME
Used with GROUP BY. All operate on non-null values except COUNT(*).
| Function | Description |
|---|---|
COUNT(field) |
Count of non-null values |
COUNT(*) |
Count of all rows including nulls |
SUM(field) |
Sum of numeric values |
AVG(field) |
Average of numeric values |
MIN(field) |
Minimum value |
MAX(field) |
Maximum value |
Functions nest arbitrarily — in SELECT, WHERE, GROUP BY, ORDER BY, and HAVING:
-- Exclude zeros from average, then round
SELECT dept, ROUND(AVG(NULLIF(salary, 0)), 2) AS avg_salary
FROM $r
GROUP BY dept
-- Pad trimmed values
SELECT LPAD(TRIM(NULLIF(name, '')), 10, '*') AS padded
FROM $r
-- Nested functions in WHERE and ORDER BY
SELECT *
FROM $r
WHERE TRIM(NULLIF(name, '')) = 'Alice'
ORDER BY LENGTH(TRIM(name)) ASCSQL4Json maps JSON values to a sealed type hierarchy for type-safe processing:
| JSON Type | SQL4Json Type | Notes |
|---|---|---|
| String | SqlString |
|
| Number | SqlNumber |
Integer and floating-point |
| Boolean | SqlBoolean |
true / false literals |
| Null | SqlNull |
Singleton |
| Date string | SqlDate |
Via TO_DATE() or CAST(x AS DATE) |
| DateTime string | SqlDateTime |
Via TO_DATE(), NOW(), or CAST(x AS DATETIME) |
The JsonValue sealed interface provides the library's own JSON abstraction. The public API accepts and returnsString
or JsonValue with zero external dependencies. You can plug in your own JSON library via the JsonCodecinterface.
SQL4Json executes queries through a staged pipeline. Each stage transforms the row stream and is either lazy (processes rows one at a time, O(1) memory) or materializing (buffers the entire input, O(n) memory).
WHERE → GROUP BY → HAVING → WINDOW → ORDER BY → LIMIT → SELECT → DISTINCT
(lazy) (mat.) (lazy) (mat.) (mat.) (mat.) (lazy) (mat.)
| Stage | SQL Clause | Behavior | Row-count Enforced | Notes |
|---|---|---|---|---|
| WHERE | WHERE |
Lazy | STREAMING | Filters rows as they stream through |
| GROUP BY | GROUP BY |
Materializing | GROUP_BY | Collects all rows, groups by key, applies aggregates |
| HAVING | HAVING |
Lazy | — | Filters groups after aggregation |
| WINDOW | OVER (...) |
Materializing | WINDOW | Partitions, sorts, computes window function values |
| ORDER BY | ORDER BY |
Materializing | ORDER_BY | Sorts result set; fused with LIMIT into a bounded heap (TopN optimization) when both are present |
| LIMIT | LIMIT / OFFSET |
Materializing | — | Truncates result set |
| SELECT | SELECT |
Lazy | — | Projects columns, evaluates expressions and aliases |
| DISTINCT | DISTINCT |
Materializing | DISTINCT | Eliminates duplicate rows |
Exceeding maxRowsPerQuery at any materializing stage throws SQL4JsonExecutionException. Configure via:
Sql4jsonSettings.builder()
.limits(l -> l.maxRowsPerQuery(50_000))
.build();SQL4Json follows these rules for NULL values:
- Non-existent fields are treated as NULL
- Any comparison involving NULL returns
false— matches standard SQL semantics where NULL represents an unknown value.NULL = NULL,NULL != x,NULL > x, etc. all evaluate tofalse. - Use
IS NULL/IS NOT NULLto test for NULL values - Aggregate functions (
SUM,AVG,MIN,MAX,COUNT(field)) ignore NULL values COUNT(*)counts all rows including those with NULL fieldsCOALESCEreturns the first non-null argument
SQL4Json uses a sealed exception hierarchy:
SQL4JsonException (sealed base)
+-- SQL4JsonParseException -- syntax errors (includes line & column position)
+-- SQL4JsonExecutionException -- runtime errors during query execution
try{
String result = SQL4Json.query(sql, json);
}catch(SQL4JsonParseException e){
// e.getLine() is 1-based, e.getCharPosition() is 0-based
System.err.println("Syntax error at line "+e.getLine()+", col "+e.getCharPosition() +": "+e.getMessage());
}catch(SQL4JsonExecutionException e){
System.err.println("Execution error: "+e.getMessage());
}SQL4Json enforces conservative limits by default to protect against denial-of-service and information-disclosure risks.
All limits are enforced by default via SQL4Json.query(sql, json); pass a custom Sql4jsonSettings to relax or tighten them.
| Setting | Default | Protects against | When to loosen |
|---|---|---|---|
security.maxLikeWildcards |
16 | Soft ReDoS via LIKE '%a%b%c%...' patterns |
Queries with many literal wildcards |
security.redactErrorDetails |
true | Leaking data source names, format strings, or user input into error messages (multi-tenant info disclosure) | Single-tenant dev/debug where verbose errors help |
limits.maxSqlLength |
65 536 (64 KiB) | Unbounded SQL input consuming parser memory | Generated SQL longer than 64 KiB |
limits.maxSubqueryDepth |
16 | Recursion bomb via deeply nested FROM subqueries | Programmatically generated deep nestings |
limits.maxInListSize |
1 024 | Parse-time amplification attacks via huge IN (...) lists |
Bulk filter UIs that pass large allow-lists |
limits.maxRowsPerQuery |
1 000 000 | Out-of-memory at materializing pipeline stages (GROUP BY, ORDER BY, WINDOW, JOIN, DISTINCT, and the final pipeline / streaming sink) | Batch queries over datasets larger than 1M rows |
cache.likePatternCacheSize |
1 024 | Unbounded growth of compiled-LIKE-pattern cache (cache-poisoning DoS) | High-cardinality hot-pattern workloads |
cache.queryResultCacheEnabled |
false | Unbounded result memory on engines that don't need caching | When you want repeated identical queries to hit a result cache |
cache.queryResultCacheSize |
64 | Cache oversize when enabled | Workloads with more than 64 hot queries |
codec.maxInputLength |
10 MiB | Parser memory exhaustion via oversize JSON input | Known-large payloads |
codec.maxNestingDepth |
64 | Stack/recursion blowup on deeply nested JSON | Legitimately deep JSON documents |
codec.maxStringLength |
1 MiB | Memory exhaustion via a single huge string value | JSON contains embedded base64 blobs, etc. |
codec.maxNumberLength |
64 | DoS via excessively long number literals | N/A — almost never needs to be loosened |
codec.maxPropertyNameLength |
1 024 | DoS via excessively long object keys | N/A — almost never needs to be loosened |
codec.maxArrayElements |
1 000 000 | Memory exhaustion via a single massive array | Datasets with >1M elements per JSON array |
codec.duplicateKeyPolicy |
REJECT | HTTP-Parameter-Pollution-style desync between systems that disagree on which duplicate wins (RFC 8259 ambiguity) | Legacy data sources that rely on last-wins or first-wins behaviour |
To override specific limits, build a custom Sql4jsonSettings and pass it to any entry point:
Sql4jsonSettings strict = Sql4jsonSettings.builder()
.limits(l -> l.maxRowsPerQuery(10_000).maxSqlLength(8 * 1024))
.codec(new DefaultJsonCodec(
DefaultJsonCodecSettings.builder()
.maxInputLength(50 * 1024 * 1024)
.build()))
.build();
String result = SQL4Json.query(sql, json, strict);For the Javadoc on each setting, see Sql4jsonSettings, SecuritySettings, LimitsSettings, CacheSettings, and DefaultJsonCodecSettings.
PreparedQueryis immutable and safe to share across threads.SQL4JsonEnginebinds immutable data and uses a synchronized cache — safe for concurrent access from multiple threads.- Static
SQL4Jsonmethods are stateless and thread-safe. - Each query execution creates its own scoped state (rows, interner, pipeline) — no cross-query interference.
- In-memory only. The entire JSON dataset must fit in the JVM heap.
- SELECT only. No
INSERT,UPDATE,DELETE,CREATE, or DDL operations. - No FULL OUTER JOIN or CROSS JOIN. Only INNER, LEFT, and RIGHT JOINs with equality ON conditions.
- No UNION / INTERSECT / EXCEPT. One query, one result set.
- No window frame clause. Window functions operate over the full partition.
ROWS BETWEEN N PRECEDING AND M FOLLOWINGis not supported.