Skip to content

Enforce locale for fake entities #123

Description

@julcsii

I would like to ask how to make sure that the generated fake entities are corresponding to the language/locale I want to use.

Small example:

from presidio_evaluator.data_generator import PresidioSentenceFaker

sentence_templates = [
    "{{name}} ist in München geboren.",
    "Bitte die Rechnung an {{address}} schicken!"
]
sentence_faker = PresidioSentenceFaker(
    locale="de_DE",
    lower_case_ratio=lower_case_ratio,
    sentence_templates=sentence_templates,
    random_seed=42,
    provider_aliases=PresidioSentenceFaker.PROVIDER_ALIASES
)

fake_records = sentence_faker.generate_new_fake_sentences(num_samples=number_of_samples)
pprint.pprint(fake_records[1])

The result:

Full text: Bitte die Rechnung an 32181 Tuulimyllyntie 27 Suite 001 HIRVENSALMI Southern Savonia schicken!
Spans: [Span(type: STREET_ADDRESS, value: 32181 Tuulimyllyntie 27 Suite 001 HIRVENSALMI Southern Savonia, char_span: [22: 84])]

There are also often German/Austrian names and addresses, but not only. Is there a way to somehow force to only generate entities that correspond to the locale I specify?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomers

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions