Skip to content

JSON parsing fails on "lone leading surrogate in hex escape" while normal json.loads don't #120

@lindycoder

Description

@lindycoder

Hello,

In out migration to pydantic 2, we found a JSON document that pydantic 1 was able to load and pydantic 2 can't with the error:

Invalid JSON: lone leading surrogate in hex escape at line...

Here's a simple way of reproducing:

import json

from pydantic_core import from_json

data = b'{"test": "text\udce2\udc80\udc99text"}'

print(json.loads(data))
print(from_json(data))

This first print from python's json works:

{'test': 'text\udce2\udc80\udc99text'}

The second one using pydantic_core (used by pydantic2) raises

Traceback (most recent call last):
  File "check.py", line 7, in <module>
    print(from_json(data))
          ^^^^^^^^^^^^^^^
ValueError: lone leading surrogate in hex escape at line 1 column 20

Here's some versions

Python 3.12.2
pydantic 2.8.2
pydantic-core 2.20.1

Thank you!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions