Skip to content

Make unidecode optional in sanitize methods#67

Open
VMRuiz wants to merge 2 commits intomasterfrom
sanitize_utf8
Open

Make unidecode optional in sanitize methods#67
VMRuiz wants to merge 2 commits intomasterfrom
sanitize_utf8

Conversation

@VMRuiz
Copy link
Copy Markdown

@VMRuiz VMRuiz commented Feb 10, 2021

Allow to sanitize text from non English websites without losing data.

This is not backward compatibility as I believe this should be the default behavior in most cases.

Copy link
Copy Markdown

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe unidecode could be a pipe of its own?

I think sanitize | unidecode (or sanitize | ascii_safe) is more readable than sanitize(ascii_safe=True)

@VMRuiz
Copy link
Copy Markdown
Author

VMRuiz commented Feb 10, 2021

Maybe unidecode could be a pipe of its own?

I think sanitize | unidecode (or sanitize | ascii_safe) is more readable than sanitize(ascii_safe=False)

Yes, I think your approach is actually better.

@VMRuiz
Copy link
Copy Markdown
Author

VMRuiz commented Feb 10, 2021

Maybe unidecode could be a pipe of its own?

I think sanitize | unidecode (or sanitize | ascii_safe) is more readable than sanitize(ascii_safe=True)

I have implemented the method ascii_safe. I tried implementing it with unidecode but it looks like there was some name collision issue between the shublang name method and the unidecode method itself.

If you are able to fix it we could use unidecode instead. I don't really have a strong opinion on which one is better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants