Skip to content

multibyte UTF-8 characters #6

Description

@necroahegyrol

All algorithms are ranging over the bytes of strings, which leads to different results for multibyte and singlebyte UTF-8 characters.

Example using WagnerFischer(a, b, 1, 1, 1):

"szellemhaj" - "szellemhajo" distance is 1
"szellemhaj" - "szellemhajó" distance is 2

Ranging over runes of strings would result 1,1 distance.
It's is slower but more accurate for non-ascii characters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions