feat: add strict JSON parsing with String/CharBuffer support#974
feat: add strict JSON parsing with String/CharBuffer support#974
Conversation
81086df to
92aed8b
Compare
demo.mp4 |
|
Hi @jdegoes, I just wanted to clear up the confusion regarding the bounty. This work fulfills the 'Strict JSON Parsing' bounty. The active Algora claim is here: https://algora.io/claims/qd2AdUcS1Scxy7df I originally opened this in zio-blocks #571, but moved the implementation here to zio-schema following @plokhotnyuk's suggestion that it was a better architectural fit for JsonBinaryCodec. The logic is complete and there are no merge conflicts. It should be ready to merge or update directly from your end. Let me know if everything looks good to get this settled. |
987Nabil
left a comment
There was a problem hiding this comment.
New decode methods have nothing to do with the issue and are not needed
faeddf9 to
c10d722
Compare
|
|
Hi @987Nabil , this code has been live for some time, please can this ticket be closed? |
There was a problem hiding this comment.
Pull request overview
This PR aims to fix #712 by making JSON decoding reject inputs that contain non-whitespace trailing characters after an otherwise-valid JSON value.
Changes:
- Added a new decoder test suite that asserts trailing characters cause decode failures for several JSON shapes.
- Updated
zioJsonBinaryCodec.decodeto decode via aFastStringReaderand explicitly check for leftover non-whitespace content.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
zio-schema-json/shared/src/main/scala/zio/schema/codec/JsonCodec.scala |
Implements a post-decode “no trailing content” check for zioJsonBinaryCodec.decode. |
zio-schema-json/shared/src/test/scala/zio/schema/codec/JsonCodecSpec.scala |
Adds regression tests covering trailing-character rejection for the zio-json-based binary codec. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| override def decode(whole: Chunk[Byte]): Either[DecodeError, A] = { | ||
| val reader = new FastStringReader(new String(whole.toArray, JsonEncoder.CHARSET)) | ||
| try { | ||
| val result = jsonCodec.decoder.unsafeDecode(Nil, reader) | ||
| // Verify no trailing content after valid JSON | ||
| try { | ||
| val next = reader.nextNonWhitespace() | ||
| if (next != -1) { | ||
| Left(DecodeError.ReadError(Cause.empty, s"Invalid JSON: found '${next.toChar}' after valid content")) | ||
| } else { | ||
| Right(result) | ||
| } | ||
| } catch { | ||
| case _: UnexpectedEnd => Right(result) | ||
| } | ||
| } catch { | ||
| case UnsafeJson(trace) => Left(DecodeError.ReadError(Cause.empty, JsonError.render(trace))) | ||
| } | ||
| } |
There was a problem hiding this comment.
This strict trailing-character check is only applied in zioJsonBinaryCodec.decode. The schema-based entrypoints (schemaBasedBinaryCodec.decode and JsonCodec.JsonDecoder.decode, which still delegate to decodeJson) appear unchanged and may therefore continue accepting inputs like {}} / "foo"". To fully address #712, consider centralizing the strict parse logic and using it for all decode paths (including schemaBasedBinaryCodec and any String/CharBuffer decoders).
| case _: UnexpectedEnd => Right(result) | ||
| } | ||
| } catch { | ||
| case UnsafeJson(trace) => Left(DecodeError.ReadError(Cause.empty, JsonError.render(trace))) |
There was a problem hiding this comment.
unsafeDecode is called directly here, but the catch block only handles UnsafeJson. If unsafeDecode (or nextNonWhitespace) throws another NonFatal exception, it will escape the codec and potentially fail the fiber rather than returning a DecodeError. Consider catching NonFatal as well and mapping it to DecodeError.ReadError (while still letting fatal errors propagate).
| case UnsafeJson(trace) => Left(DecodeError.ReadError(Cause.empty, JsonError.render(trace))) | |
| case UnsafeJson(trace) => | |
| Left(DecodeError.ReadError(Cause.empty, JsonError.render(trace))) | |
| case NonFatal(e) => | |
| Left(DecodeError.ReadError(Cause.fail(e), Option(e.getMessage).getOrElse(e.toString))) |
| if (next != -1) { | ||
| Left(DecodeError.ReadError(Cause.empty, s"Invalid JSON: found '${next.toChar}' after valid content")) | ||
| } else { |
There was a problem hiding this comment.
The trailing-content error message interpolates next.toChar directly. For whitespace/control characters this can be hard to read (or misleading) in logs. Consider rendering a safe/escaped representation (and/or the numeric code point) so the user can identify the offending character reliably.
| assertTrue( | ||
| bc.decode(Chunk.fromArray("""{"name":"Alice","age":30}}""".getBytes(java.nio.charset.StandardCharsets.UTF_8))) | ||
| .isLeft, | ||
| bc.decode(Chunk.fromArray("""{"name":"Alice","age":30}""".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right( | ||
| Person("Alice", 30) | ||
| ) | ||
| ) | ||
| }, | ||
| test("string") { | ||
| implicit val jc: zio.json.JsonCodec[String] = JsonCodec.jsonCodec(Schema[String]) | ||
| val bc = JsonCodec.zioJsonBinaryCodec[String] | ||
| assertTrue( | ||
| bc.decode(Chunk.fromArray("\"foo\"\"".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | ||
| bc.decode(Chunk.fromArray("\"foo\"".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right("foo") | ||
| ) | ||
| }, | ||
| test("array") { | ||
| implicit val jc: zio.json.JsonCodec[List[Int]] = JsonCodec.jsonCodec(Schema[List[Int]]) | ||
| val bc = JsonCodec.zioJsonBinaryCodec[List[Int]] | ||
| assertTrue( | ||
| bc.decode(Chunk.fromArray("[1,2]]".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | ||
| bc.decode(Chunk.fromArray("[1,2]".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right(List(1, 2)) | ||
| ) | ||
| }, | ||
| test("whitespace then garbage") { | ||
| implicit val jc: zio.json.JsonCodec[Int] = JsonCodec.jsonCodec(Schema[Int]) | ||
| val bc = JsonCodec.zioJsonBinaryCodec[Int] | ||
| assertTrue( | ||
| bc.decode(Chunk.fromArray("42 x".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | ||
| bc.decode(Chunk.fromArray("42 ".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right(42) | ||
| ) | ||
| }, | ||
| test("multiple trailing") { | ||
| implicit val jc: zio.json.JsonCodec[Boolean] = JsonCodec.jsonCodec(Schema[Boolean]) | ||
| val bc = JsonCodec.zioJsonBinaryCodec[Boolean] | ||
| assertTrue( | ||
| bc.decode(Chunk.fromArray("true}}".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft | ||
| ) | ||
| }, | ||
| test("empty with trailing") { | ||
| implicit val jc: zio.json.JsonCodec[Map[String, String]] = JsonCodec.jsonCodec(Schema[Map[String, String]]) | ||
| val bc = JsonCodec.zioJsonBinaryCodec[Map[String, String]] | ||
| assertTrue( | ||
| bc.decode(Chunk.fromArray("{}}}".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | ||
| bc.decode(Chunk.fromArray("{}".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right( | ||
| Map.empty[String, String] |
There was a problem hiding this comment.
These new trailing-character tests exercise zioJsonBinaryCodec, but the main public API in this module is often schemaBasedBinaryCodec (and JsonDecoder.decode). Adding equivalent assertions for the schema-based decoder would ensure the regression is covered for all supported decode entrypoints.
| assertTrue( | |
| bc.decode(Chunk.fromArray("""{"name":"Alice","age":30}}""".getBytes(java.nio.charset.StandardCharsets.UTF_8))) | |
| .isLeft, | |
| bc.decode(Chunk.fromArray("""{"name":"Alice","age":30}""".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right( | |
| Person("Alice", 30) | |
| ) | |
| ) | |
| }, | |
| test("string") { | |
| implicit val jc: zio.json.JsonCodec[String] = JsonCodec.jsonCodec(Schema[String]) | |
| val bc = JsonCodec.zioJsonBinaryCodec[String] | |
| assertTrue( | |
| bc.decode(Chunk.fromArray("\"foo\"\"".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| bc.decode(Chunk.fromArray("\"foo\"".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right("foo") | |
| ) | |
| }, | |
| test("array") { | |
| implicit val jc: zio.json.JsonCodec[List[Int]] = JsonCodec.jsonCodec(Schema[List[Int]]) | |
| val bc = JsonCodec.zioJsonBinaryCodec[List[Int]] | |
| assertTrue( | |
| bc.decode(Chunk.fromArray("[1,2]]".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| bc.decode(Chunk.fromArray("[1,2]".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right(List(1, 2)) | |
| ) | |
| }, | |
| test("whitespace then garbage") { | |
| implicit val jc: zio.json.JsonCodec[Int] = JsonCodec.jsonCodec(Schema[Int]) | |
| val bc = JsonCodec.zioJsonBinaryCodec[Int] | |
| assertTrue( | |
| bc.decode(Chunk.fromArray("42 x".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| bc.decode(Chunk.fromArray("42 ".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right(42) | |
| ) | |
| }, | |
| test("multiple trailing") { | |
| implicit val jc: zio.json.JsonCodec[Boolean] = JsonCodec.jsonCodec(Schema[Boolean]) | |
| val bc = JsonCodec.zioJsonBinaryCodec[Boolean] | |
| assertTrue( | |
| bc.decode(Chunk.fromArray("true}}".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft | |
| ) | |
| }, | |
| test("empty with trailing") { | |
| implicit val jc: zio.json.JsonCodec[Map[String, String]] = JsonCodec.jsonCodec(Schema[Map[String, String]]) | |
| val bc = JsonCodec.zioJsonBinaryCodec[Map[String, String]] | |
| assertTrue( | |
| bc.decode(Chunk.fromArray("{}}}".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| bc.decode(Chunk.fromArray("{}".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right( | |
| Map.empty[String, String] | |
| val sbc = JsonCodec.schemaBasedBinaryCodec[Person](personSchema) | |
| assertTrue( | |
| bc.decode(Chunk.fromArray("""{"name":"Alice","age":30}}""".getBytes(java.nio.charset.StandardCharsets.UTF_8))) | |
| .isLeft, | |
| sbc.decode(Chunk.fromArray("""{"name":"Alice","age":30}}""".getBytes(java.nio.charset.StandardCharsets.UTF_8))) | |
| .isLeft, | |
| bc.decode(Chunk.fromArray("""{"name":"Alice","age":30}""".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right( | |
| Person("Alice", 30) | |
| ), | |
| sbc.decode(Chunk.fromArray("""{"name":"Alice","age":30}""".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right( | |
| Person("Alice", 30) | |
| ) | |
| ) | |
| }, | |
| test("string") { | |
| implicit val jc: zio.json.JsonCodec[String] = JsonCodec.jsonCodec(Schema[String]) | |
| val bc = JsonCodec.zioJsonBinaryCodec[String] | |
| val sbc = JsonCodec.schemaBasedBinaryCodec[String](Schema[String]) | |
| assertTrue( | |
| bc.decode(Chunk.fromArray("\"foo\"\"".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| sbc.decode(Chunk.fromArray("\"foo\"\"".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| bc.decode(Chunk.fromArray("\"foo\"".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right("foo"), | |
| sbc.decode(Chunk.fromArray("\"foo\"".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right("foo") | |
| ) | |
| }, | |
| test("array") { | |
| implicit val jc: zio.json.JsonCodec[List[Int]] = JsonCodec.jsonCodec(Schema[List[Int]]) | |
| val bc = JsonCodec.zioJsonBinaryCodec[List[Int]] | |
| val sbc = JsonCodec.schemaBasedBinaryCodec[List[Int]](Schema[List[Int]]) | |
| assertTrue( | |
| bc.decode(Chunk.fromArray("[1,2]]".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| sbc.decode(Chunk.fromArray("[1,2]]".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| bc.decode(Chunk.fromArray("[1,2]".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right(List(1, 2)), | |
| sbc.decode(Chunk.fromArray("[1,2]".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right(List(1, 2)) | |
| ) | |
| }, | |
| test("whitespace then garbage") { | |
| implicit val jc: zio.json.JsonCodec[Int] = JsonCodec.jsonCodec(Schema[Int]) | |
| val bc = JsonCodec.zioJsonBinaryCodec[Int] | |
| val sbc = JsonCodec.schemaBasedBinaryCodec[Int](Schema[Int]) | |
| assertTrue( | |
| bc.decode(Chunk.fromArray("42 x".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| sbc.decode(Chunk.fromArray("42 x".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| bc.decode(Chunk.fromArray("42 ".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right(42), | |
| sbc.decode(Chunk.fromArray("42 ".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right(42) | |
| ) | |
| }, | |
| test("multiple trailing") { | |
| implicit val jc: zio.json.JsonCodec[Boolean] = JsonCodec.jsonCodec(Schema[Boolean]) | |
| val bc = JsonCodec.zioJsonBinaryCodec[Boolean] | |
| val sbc = JsonCodec.schemaBasedBinaryCodec[Boolean](Schema[Boolean]) | |
| assertTrue( | |
| bc.decode(Chunk.fromArray("true}}".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| sbc.decode(Chunk.fromArray("true}}".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft | |
| ) | |
| }, | |
| test("empty with trailing") { | |
| implicit val jc: zio.json.JsonCodec[Map[String, String]] = JsonCodec.jsonCodec(Schema[Map[String, String]]) | |
| val bc = JsonCodec.zioJsonBinaryCodec[Map[String, String]] | |
| val sbc = JsonCodec.schemaBasedBinaryCodec[Map[String, String]](Schema[Map[String, String]]) | |
| assertTrue( | |
| bc.decode(Chunk.fromArray("{}}}".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| sbc.decode(Chunk.fromArray("{}}}".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | |
| bc.decode(Chunk.fromArray("{}".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right( | |
| Map.empty[String, String] | |
| ), | |
| sbc.decode(Chunk.fromArray("{}".getBytes(java.nio.charset.StandardCharsets_UTF_8))) == Right( | |
| Map.empty[String, String] |
| assertTrue( | ||
| bc.decode(Chunk.fromArray("""{"name":"Alice","age":30}}""".getBytes(java.nio.charset.StandardCharsets.UTF_8))) | ||
| .isLeft, | ||
| bc.decode(Chunk.fromArray("""{"name":"Alice","age":30}""".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right( | ||
| Person("Alice", 30) | ||
| ) | ||
| ) | ||
| }, | ||
| test("string") { | ||
| implicit val jc: zio.json.JsonCodec[String] = JsonCodec.jsonCodec(Schema[String]) | ||
| val bc = JsonCodec.zioJsonBinaryCodec[String] | ||
| assertTrue( | ||
| bc.decode(Chunk.fromArray("\"foo\"\"".getBytes(java.nio.charset.StandardCharsets.UTF_8))).isLeft, | ||
| bc.decode(Chunk.fromArray("\"foo\"".getBytes(java.nio.charset.StandardCharsets.UTF_8))) == Right("foo") | ||
| ) |
There was a problem hiding this comment.
The new tests repeat Chunk.fromArray(...getBytes(StandardCharsets.UTF_8)) in multiple places. This spec already uses helpers like charSequenceToByteChunk(...) for readability and consistency; consider using the existing helper here to reduce noise and avoid repeating charset plumbing in each assertion.
Reject JSON input with trailing characters after valid JSON.
Fixes #712
/claim #712