We sometimes just want some basic splitting for really small datasets like us_fincen_enforcements - do we want something like rigour.names.split_phrases.split(test: str) -> list[str] ?
We also sometimes want to split on some additional strings, e.g. and. Do we want an easy way to add crawler-specific stuff until we're convinced we want to move them into rigour? e.g.
def contains_split_phrase(string: str, extra_split_phrases: list[str] = []) -> bool:
"""Check if the string contains name split phrases e.g. a.k.a."""
return re_split_phrases(NAME_SPLIT_PHRASES + extra_split_phrases).search(string) is not None
contains_split_phrase(names, ["and"])
We sometimes just want some basic splitting for really small datasets like us_fincen_enforcements - do we want something like
rigour.names.split_phrases.split(test: str) -> list[str]?We also sometimes want to split on some additional strings, e.g.
and. Do we want an easy way to add crawler-specific stuff until we're convinced we want to move them into rigour? e.g.