I would like to ask how to make sure that the generated fake entities are corresponding to the language/locale I want to use.
Small example:
from presidio_evaluator.data_generator import PresidioSentenceFaker
sentence_templates = [
"{{name}} ist in München geboren.",
"Bitte die Rechnung an {{address}} schicken!"
]
sentence_faker = PresidioSentenceFaker(
locale="de_DE",
lower_case_ratio=lower_case_ratio,
sentence_templates=sentence_templates,
random_seed=42,
provider_aliases=PresidioSentenceFaker.PROVIDER_ALIASES
)
fake_records = sentence_faker.generate_new_fake_sentences(num_samples=number_of_samples)
pprint.pprint(fake_records[1])
The result:
Full text: Bitte die Rechnung an 32181 Tuulimyllyntie 27 Suite 001 HIRVENSALMI Southern Savonia schicken!
Spans: [Span(type: STREET_ADDRESS, value: 32181 Tuulimyllyntie 27 Suite 001 HIRVENSALMI Southern Savonia, char_span: [22: 84])]
There are also often German/Austrian names and addresses, but not only. Is there a way to somehow force to only generate entities that correspond to the locale I specify?
I would like to ask how to make sure that the generated fake entities are corresponding to the language/locale I want to use.
Small example:
The result:
There are also often German/Austrian names and addresses, but not only. Is there a way to somehow force to only generate entities that correspond to the locale I specify?