Skip to content

Suggestions for "UnicodeEncodeError: 'utf-8' codec can't encode character" #116

@boechat107

Description

@boechat107

I would be glad if someone gives me a suggestion.

I want to encode a big dictionary that contains text encoded in something different than utf-8. Does the library offer some option to handle this situation? Or must I change the data before trying to serialize it?

  File "/home/user/.local/lib/python3.7/site-packages/bson/codec.py", line 201, in encode_value
    buf.write(encode_string_element(name, value))
  File "/home/user/.local/lib/python3.7/site-packages/bson/codec.py", line 170, in encode_string_element
    return b"\x02" + encode_cstring(name) + encode_string(value)
  File "/home/user/.local/lib/python3.7/site-packages/bson/codec.py", line 125, in encode_string
    value = value.encode("utf-8")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce1' in position 13: surrogates not allowed

I read the source code, and it seems to not offer any quick fix (something like encode(errors="ignore").

Might the text be passing the condition?

   if isinstance(value, text_type)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions