Skip to content

simple splitting and more common phrases #170

@jbothma

Description

@jbothma

We sometimes just want some basic splitting for really small datasets like us_fincen_enforcements - do we want something like rigour.names.split_phrases.split(test: str) -> list[str] ?

We also sometimes want to split on some additional strings, e.g. and. Do we want an easy way to add crawler-specific stuff until we're convinced we want to move them into rigour? e.g.

def contains_split_phrase(string: str, extra_split_phrases: list[str] = []) -> bool:
    """Check if the string contains name split phrases e.g. a.k.a."""
    return re_split_phrases(NAME_SPLIT_PHRASES + extra_split_phrases).search(string) is not None

contains_split_phrase(names, ["and"])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions