Skip to content

Experimental support for the JSON data type#490

Open
slabko wants to merge 1 commit intomasterfrom
json-string
Open

Experimental support for the JSON data type#490
slabko wants to merge 1 commit intomasterfrom
json-string

Conversation

@slabko
Copy link
Copy Markdown
Contributor

@slabko slabko commented Apr 26, 2026

Experimental support for the JSON data type

Special thanks to @theory for the initial research.

Warning

This is an experimental implementation, and the API may change as we continue working toward full support for JSON columns..

This PR adds basic support for the JSON data type, but only when the data is stringified using
the output_format_native_write_json_as_string setting. Inserting JSON data does not require this setting.

Caution

ClickHouse validates JSON syntax even when values are passed as strings. Empty strings are not allowed and will cause a server error; use an empty object ({}) instead. This is especially important for nullable columns, where the server still expects a value for NULL cells.

Examples

Reading JSON values

// CREATE TABLE default.test_json (
//     `id` UInt64,
//     `data` JSON
// )
// ENGINE = MergeTree
// ORDER BY id

clickhouse::Query query{R"(
    SELECT * FROM test_json
)"};
query.SetSetting("output_format_native_write_json_as_string", {.value = "1"});

client.BeginSelect(query);

while(auto block = client.NextBlock()) {
    auto col_id = block->At(0)->AsStrict<clickhouse::ColumnUInt64>();
    auto col_json = block->At(1)->AsStrict<clickhouse::ColumnJSON>();

    for (size_t i = 0; i < block->GetRowCount(); ++i) {
        auto id = col_id->At(i); 
        auto json = col_json->At(i);
        std::cout << std::format("{}\t{}\n", id, json);
    }
}

Inserting simple non-nullable columns

// CREATE TABLE default.test_json (
//     `id` UInt64,
//     `data` JSON
// )
// ENGINE = MergeTree
// ORDER BY id

auto block = client.BeginInsert(R"(
    INSERT INTO test_json VALUES
)");

auto col_id = block.At(0)->AsStrict<clickhouse::ColumnUInt64>();
auto col_json = block.At(1)->AsStrict<clickhouse::ColumnJSON>();

col_id->Append(1);
col_json->Append(R"({"key": "value"})");

block.RefreshRowCount();
client.SendInsertBlock(block);
client.EndInsert();

Inserting Nullable columns using ColumnNullableT

// CREATE TABLE default.test_json (
//     `id` UInt64,
//     `data` Nullable(JSON)
// )
// ENGINE = MergeTree
// ORDER BY id

auto block = client.BeginInsert(R"(
    INSERT INTO test_json VALUES
)");

auto col_id = block.At(0)->AsStrict<clickhouse::ColumnUInt64>();
auto col_null_json = clickhouse::ColumnNullableT<clickhouse::ColumnJSON>::Wrap(block.At(1));

col_id->Append(1);
col_null_json->Append(R"({"key": "value"})");

col_id->Append(2);
col_null_json->Append(std::nullopt);

block.RefreshRowCount();
client.SendInsertBlock(block);
client.EndInsert();

@slabko slabko force-pushed the json-string branch 2 times, most recently from 32a9b0e to 824ea57 Compare April 26, 2026 08:53
@slabko slabko marked this pull request as ready for review April 26, 2026 09:20
@slabko slabko requested a review from mzitnik as a code owner April 26, 2026 09:20
@mshustov mshustov requested a review from Copilot April 26, 2026 18:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds experimental JSON type support to the clickhouse-cpp client by introducing a ColumnJSON that represents JSON values as strings in the Native protocol (requiring output_format_native_write_json_as_string=1 when reading).

Changes:

  • Add Type::JSON, parsing support, and ColumnJSON implementation (string-backed with a JSON serialization prefix check).
  • Wire JSON into factory/client headers and extend ItemView/printing support.
  • Add unit tests and generators covering JSON columns (including arrays) and roundtrip behavior.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
clickhouse/types/types.h Adds Type::JSON and CreateJSON() API.
clickhouse/types/types.cpp Maps JSON type name/unique-id behavior and implements CreateJSON().
clickhouse/types/type_parser.cpp Enables parsing "JSON" into Type::JSON.
clickhouse/columns/json.h Introduces ColumnJSON and a ColumnNullableT<ColumnJSON> null-handling specialization.
clickhouse/columns/json.cpp Implements JSON column serialization/deserialization using string encoding (versioned prefix).
clickhouse/columns/itemview.cpp Treats JSON as a variable-size (string-like) ItemView payload.
clickhouse/columns/factory.cpp Allows creating ColumnJSON from parsed terminal type.
clickhouse/client.h Exposes JSON column to users by including the header.
clickhouse/CMakeLists.txt Adds JSON sources/headers to build inputs.
README.md Documents experimental JSON support and the required read setting.
ut/value_generators.h / ut/value_generators.cpp Adds JSON sample value generator for tests.
ut/columns_ut.cpp Adds ColumnJSON init/append unit tests.
ut/column_array_ut.cpp Adds ColumnArrayT<ColumnJSON> unit test.
ut/itemview_ut.cpp Adds ItemView storage tests for JSON type.
ut/utils.cpp / ut/utils_ut.cpp Adds printing/serialization coverage for JSON.
ut/type_parser_ut.cpp Adds JSON parsing unit test.
ut/types_ut.cpp Extends type equality test coverage to include JSON.
ut/roundtrip_column.cpp Modifies server roundtrip helper to read JSON as string via settings.
ut/Column_ut.cpp Adds JSON to the generic server roundtrip test matrix.
ut/CreateColumnByType_ut.cpp Includes JSON header (supports compilation with JSON type).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ut/Column_ut.cpp

GenericColumnTestCase<ColumnString, &makeColumn<ColumnString>, std::string, &MakeStrings>,
GenericColumnTestCase<ColumnFixedString, &makeColumn<ColumnFixedString, 12>, std::string, &MakeFixedStrings<12>>,
GenericColumnTestCase<ColumnJSON, &makeColumn<ColumnJSON>, std::string, &MakeJSONs>,
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnJSON is added to the generic server roundtrip test matrix, but CheckIfShouldSkipTest() doesn't currently have a capability/version gate for JSON. This will cause failures against ClickHouse versions without the JSON type (again, .travis.yml uses 21.3.x). Consider adding a skip similar to Date32/Int128, either via a known minimum server version for JSON or by probing server support (e.g. attempt a trivial SELECT CAST('{}' AS JSON) and skip on error).

Suggested change
GenericColumnTestCase<ColumnJSON, &makeColumn<ColumnJSON>, std::string, &MakeJSONs>,

Copilot uses AI. Check for mistakes.
Comment thread clickhouse/columns/json.h Outdated
Comment on lines +11 to +12
* Works only when ClickHouse outputs JSON as strings and requires the setting
* output_format_native_write_json_as_string to be set to 1.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment says ColumnJSON "requires" output_format_native_write_json_as_string=1, but this setting only affects reading (server output) and the PR description notes inserts don't require it. Consider rewording the comment to clarify when the setting is needed (e.g., required for decoding JSON coming from the server via Native protocol) to avoid misleading users.

Suggested change
* Works only when ClickHouse outputs JSON as strings and requires the setting
* output_format_native_write_json_as_string to be set to 1.
* When decoding JSON returned by ClickHouse over the Native protocol, this
* requires ClickHouse to output JSON as strings by setting
* output_format_native_write_json_as_string to 1.

Copilot uses AI. Check for mistakes.
Comment thread ut/roundtrip_column.cpp
Comment on lines +50 to +51
std::string query = "SELECT col FROM temporary_roundtrip_table ORDER BY id "
"SETTINGS output_format_native_write_json_as_string=1";
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RoundtripColumnValues() now unconditionally adds SETTINGS output_format_native_write_json_as_string=1 to every SELECT. This breaks the unit tests on older ClickHouse versions that don't recognize this setting (e.g. the repo's .travis.yml pins 21.3.x), even for non-JSON columns. Consider only adding this setting when the tested column type contains JSON (or otherwise gating it based on server capability), and keep the original query for all other types.

Suggested change
std::string query = "SELECT col FROM temporary_roundtrip_table ORDER BY id "
"SETTINGS output_format_native_write_json_as_string=1";
std::string query = "SELECT col FROM temporary_roundtrip_table ORDER BY id";
if (type_name.find("JSON") != std::string::npos) {
query += " SETTINGS output_format_native_write_json_as_string=1";
}

Copilot uses AI. Check for mistakes.
Comment thread clickhouse/CMakeLists.txt
Comment on lines 60 to 66
columns/factory.h
columns/geo.h
columns/ip4.h
columns/ip6.h
columns/json.h
columns/itemview.h
columns/lowcardinality.h
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columns/json.h is added to the library sources/headers list, and client.h now includes it, but clickhouse/CMakeLists.txt's install section doesn't install columns/json.h. This will break consumers using make install (installed <clickhouse/client.h> will include a header that isn't installed). Please add INSTALL(FILES columns/json.h DESTINATION include/clickhouse/columns/) alongside the other column headers.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants