two ADRs for a streamlined evaluation process by Copilot · Pull Request #168 · data-privacy-stack/presidio-research

Copilot · 2026-03-21T14:08:07Z

Two ADRs to simplify the evaluation process and support other applications that use it.

…Interface Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/eb50120c-0b18-41ca-9db2-55c071434576

Updated the evaluation pipeline to simplify the model prediction process and entity mapping. Removed redundant code and clarified comments for better understanding.

Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/f2aad060-8eac-4214-aba8-01de85cf7536

omri374

@copilot please fix the ADR according to the comments.

Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/7e819255-deb4-44b4-88fe-6ef94a3afcdc

Copilot · 2026-03-21T14:52:27Z

@copilot please fix the ADR according to the comments.

Done in 3e900aa. Here's a summary of the changes made to ADR-002:

Context: Reduced to two focused pain points. The first now leads with the core problem — that cross-model comparison requires mapping but the current approach is too simplistic, causing biased results. Removed the SemanticEntityMapper, interactive fallback, and audit trail pain points (not present in main / specific to new implementation).
Decision: Added an explanation of what a canonical entity is and what "mapping to a canonical entity" means.
Consequences: Removed "Lighter dependency footprint", "Breaking change for SemanticEntityMapper users", and "Stateful object" bullets (not applicable to the current state).
Alternatives Considered: Replaced the single "Keep SemanticEntityMapper" entry with a comprehensive numbered list: (1) Score against own labels, (2) Manual mapping only, (3) Semantic similarity/embedding-based matching, (4) Stateless mapping function, (5) Embed mapping inside model prediction.

Added a proposed hierarchical entity mapping dictionary with detailed classifications for various entity types and their corresponding attributes.

Clarified the hierarchical entity mapping dictionary with specific examples for 2nd and 3rd levels.

Removed interactive resolution and clarified mapping retrieval process.

Copilot

Pull request overview

Adds two proposed Architecture Decision Records (ADRs) describing a future streamlined evaluation pipeline and a new canonical entity-mapping approach for presidio_evaluator.

Changes:

Introduces ADR-001 proposing a DataFrame-first evaluation interface boundary between models and evaluators.
Introduces ADR-002 proposing a CanonicalMapper-based entity mapping/resolution pipeline plus a draft hierarchy/alias dictionary.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.

File	Description
docs/adr/ADR-001-simplified-evaluation-pipeline.md	Documents a proposed DataFrame-centric evaluation pipeline and migration plan.
docs/adr/ADR-002-entity-mapping.md	Documents a proposed canonical entity mapping approach and provides a draft hierarchy/alias set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/0cb7a127-de0a-4167-a2cb-a4b98a8f4028

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…w feedback Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/8617ff64-a48a-4bc1-96cb-e83ac8d52192

Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/753f1074-cc0d-445e-b838-50b11eb55356

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/6abd899f-aafd-4d21-bbde-c535cfc20d6e

omri374 · 2026-03-22T20:00:29Z

@RonShakutai @negruber1 I created two ADRs that I think will greatly simplify the evaluation process. Would appreciate your honest feedback if that's the right path to go. We can also consider an alternative of writing things from scratch with a new interface (simliar to other evaluation frameworks we know)

Added example mappings of raw entity labels to canonical entities in the EntityHierarchy, enhancing clarity on how different labels correspond to standardized entities.

negruber1 · 2026-03-23T21:42:15Z

+
+3. **Make `model` optional in `BaseEvaluator`** — change `BaseEvaluator.__init__(self, model=None, ...)` so that `model` defaults to `None`, relying on the existing runtime check in `evaluate_all()` that raises a clear error when `model is None`.
+
+4. **Update `evaluate_all()` to delegate to `predict_dataset` + `calculate_score_on_df`** — refactor `SpanEvaluator.evaluate_all()` and `TokenEvaluator.evaluate_all()` to call `self.model.predict_dataset(dataset)` and then pass the result to `calculate_score_on_df()`. This ensures a single code path for both old and new usage.


Isn't evaluate_all() only part of BaseEvaluator?

Yes it is. Good catch. I'll update this comment, but the principle is the same- we try to keep the existing interface for backward compatibility but it would essentially call the new flow.

So technically evaluate_all() returns the dataframe?

Not sure, maybe we can just deprecate it. WDYT?

If we call predict_dataset and then calculate_score_on_df, wouldn't evaluate_all be redundant? If we keep it for backward compatibility, I think we should leave it as is (return EvaluationResult)

negruber1 · 2026-03-23T23:33:16Z

+
+1. **Add `BaseModel.predict_dataset()`** — implement the method as sketched above in `presidio_evaluator/models/base_model.py`. Add a unit test in `tests/` that verifies the 5-column schema and correct row count for a small synthetic dataset.
+
+2. **Add `map_entities()` utility** — add the function (and `Dict` import) to `presidio_evaluator/evaluation/` (e.g., in a new `utils.py` or alongside `get_results_dataframe`). Add a unit test verifying that both `annotation` and `prediction` columns are remapped.


I think we should add the mapping to results dataframe. If we allow users to define their own mapping, I think that we need to make sure that both predicted and annotated entities are mapped to the same canonical mapping (if they really are a match).

Yes that's what I was thinking too. Essentially the only place where we change entities to their canonical form is this dataframe. This would simplify the flow today which does mapping in different parts of the flow.

in any case, the entity mapper generates one mapping for both the annotation and prediction, so they would have to be mapped the same way

How does this look?

# 1. Load dataset dataset = InputSample.read_dataset_json("data/dataset.json") # 2. Choose model and run predictions → get DataFrame directly model = PresidioAnalyzerWrapper(analyzer_engine=AnalyzerEngine()) results_df = model.predict_dataset(dataset) # NEW: returns the DataFrame directly # 3. Map entities (transforms both predictions and annotations into canonical entities) mapper = CanonicalMapper() # 4. Map to hierarchy (PII, High level, canonical, specific) and evaluate evaluator = SpanEvaluator() results_per_hierarchy = [] for hierarchy in [1,2,3]): results_df_hierarchy = mapper.map_entities(results_df, hierarchy=hierarchy) results_per_hierarchy = evaluator.calculate_score_on_df(results_df=results_df_hierarchy) # 5. Analyze/plot plotter = Plotter(results=results_per_hierarchy[0]) plotter.plot_scores()

Alternatively, one can just use the default (hierarchy=3) and run the experiment, which is simpler:

# 1. Load dataset dataset = InputSample.read_dataset_json("data/dataset.json") # 2. Choose model and run predictions model = PresidioAnalyzerWrapper(analyzer_engine=AnalyzerEngine()) results_df = model.predict_dataset(dataset) # 3. Map entities (transforms both predictions and annotations into canonical entities) mapper = CanonicalMapper() # 4. Map to hierarchy (PII, High level, canonical, specific) and evaluate evaluator = SpanEvaluator() results_df_mapped = mapper.map_entities(results_df) results = evaluator.calculate_score_on_df(results_df=results_df_mapped) # 5. Analyze/plot plotter = Plotter(results=results) plotter.plot_scores()

Looks good to me :) BTW, what if a label is too general for the chosen canonical depth? Should we handle this case?

That's a good question! So if the user has a model with ["PERSON", "LOCATION"] and the dataset has ["STREET_ADDRESS", "NAME"], how should we map the two? PERSON and LOCATION are level 2, but we want to map to level 3. In this case, maybe if one of the entities is level 2, we should map everything to level 2? Or is this too naive?

Alternatively, we can choose one level 3 entity that a level 2 entity would be mapped to (like "NAME" for "PERSON" or "ADDRESS" to "LOCATION")

In this case, where the model's and dataset's levels of depth are different, should both mappers be coordinated when deciding on the depth? Assuming we have a mapper for the model and another one for the dataset and there is some auto-downgrade done. Also, what if the model includes both level 2 and level 3 entities? Does it make sense to downgrade per category only where there's a depth mismatch? It might complicate things

So the proposed approach here is to have one mapper for everything. If there level 2 entities + level 3 entities, we would go with level 2. Another option is to just change the branch that has level 2 entities and keep everything else at level 3. So if the dataset has PERSON (level 2), ADDRESS (level 3) and some other level 3, and the model has FIRST_NAME, PREFIX, LAST_NAME, (level 3 under PERSON), then these would be mapped to PERSON while the other level 3 entities under other branches (like the LOCATION branch) would be mapped to level 3.
Sorry... this is becoming too complicated :)

Added details about EvaluationResult and Error Analysis.

Refactor evaluation pipeline to include entity mapping and scoring per hierarchy.

Updated the prediction method to return a DataFrame directly.

Added input format details for token comparisons and updated usage examples for CanonicalMapper.

Added example code for multi-hierarchical evaluations in the ADR document.

Initial plan

fcfec9d

Copilot AI assigned Copilot and omri374 Mar 21, 2026

Copilot started work on behalf of omri374 March 21, 2026 14:08 View session

Add ADR-001: Simplified Evaluation Pipeline Using DataFrame as Clean …

95f230c

…Interface Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/eb50120c-0b18-41ca-9db2-55c071434576

Copilot AI changed the title ~~[WIP] Add ADR for simplified evaluation pipeline using Pandas~~ Add ADR-001: Simplified Evaluation Pipeline via DataFrame Interface Mar 21, 2026

Copilot AI requested a review from omri374 March 21, 2026 14:10

Copilot finished work on behalf of omri374 March 21, 2026 14:11

Simplify evaluation pipeline and update model prediction

b59cded

Updated the evaluation pipeline to simplify the model prediction process and entity mapping. Removed redundant code and clarified comments for better understanding.

omri374 changed the title ~~Add ADR-001: Simplified Evaluation Pipeline via DataFrame Interface~~ Add two ADRs for simplifying the evaluation process Mar 21, 2026

Copilot started work on behalf of omri374 March 21, 2026 14:23 View session

Copilot AI changed the title ~~Add two ADRs for simplifying the evaluation process~~ docs: Add ADR-001 for simplified evaluation pipeline via DataFrame interface Mar 21, 2026

Copilot finished work on behalf of omri374 March 21, 2026 14:24

Copilot started work on behalf of omri374 March 21, 2026 14:25 View session

docs: add ADR-002 for entity mapping via CanonicalMapper

1df517c

Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/f2aad060-8eac-4214-aba8-01de85cf7536

Copilot AI changed the title ~~docs: Add ADR-001 for simplified evaluation pipeline via DataFrame interface~~ docs: ADR-002 — Entity Mapping via CanonicalMapper Mar 21, 2026

Copilot finished work on behalf of omri374 March 21, 2026 14:28

omri374 changed the title ~~docs: ADR-002 — Entity Mapping via CanonicalMapper~~ two ADRs for a streamlined evaluation process Mar 21, 2026

omri374 reviewed Mar 21, 2026

View reviewed changes

Copilot started work on behalf of omri374 March 21, 2026 14:50 View session

docs: revise ADR-002 per review feedback

3e900aa

Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/7e819255-deb4-44b4-88fe-6ef94a3afcdc

Copilot AI requested a review from omri374 March 21, 2026 14:52

Copilot finished work on behalf of omri374 March 21, 2026 14:52

omri374 added 3 commits March 21, 2026 17:06

Add hierarchical entity mapping dictionary

6ca1cac

Added a proposed hierarchical entity mapping dictionary with detailed classifications for various entity types and their corresponding attributes.

Update entity mapping details in ADR-002

e1b5884

Clarified the hierarchical entity mapping dictionary with specific examples for 2nd and 3rd levels.

Refactor CanonicalMapper usage examples

ec19dc7

Removed interactive resolution and clarified mapping retrieval process.

omri374 requested a review from Copilot March 21, 2026 21:04

Copilot started reviewing on behalf of omri374 March 21, 2026 21:05 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

Update docs/adr/ADR-001-simplified-evaluation-pipeline.md

87a9d8e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot started work on behalf of omri374 March 22, 2026 06:37 View session

docs: move DOMAIN, WEB, URL, WEBSITE to NETWORK_IDENTIFIER

cddbdda

Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/0cb7a127-de0a-4167-a2cb-a4b98a8f4028

Copilot finished work on behalf of omri374 March 22, 2026 06:38

Copilot started work on behalf of omri374 March 22, 2026 07:24 View session

omri374 and others added 2 commits March 22, 2026 09:25

Update docs/adr/ADR-001-simplified-evaluation-pipeline.md

e0daa3c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

docs: update CanonicalMapper API and map_entities interface per revie…

d7d95e9

…w feedback Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/8617ff64-a48a-4bc1-96cb-e83ac8d52192

Copilot finished work on behalf of omri374 March 22, 2026 07:26

Copilot started work on behalf of omri374 March 22, 2026 07:41 View session

Copilot AI and others added 2 commits March 22, 2026 07:42

docs: remove HOSPITAL alias from ORGANIZATION.COMPANY

6d6866f

Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/753f1074-cc0d-445e-b838-50b11eb55356

Apply suggestions from code review

151acad

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot finished work on behalf of omri374 March 22, 2026 07:42

Copilot started work on behalf of omri374 March 22, 2026 07:42 View session

docs: remove MRN alias from PHI.PATIENT_ID

44adbc0

Co-authored-by: omri374 <3776619+omri374@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/presidio-research/sessions/6abd899f-aafd-4d21-bbde-c535cfc20d6e

Copilot finished work on behalf of omri374 March 22, 2026 07:44

omri374 marked this pull request as ready for review March 22, 2026 19:58

omri374 requested review from RonShakutai and negruber1 March 22, 2026 19:59

Add example mappings for entity labels

a56c6ac

Added example mappings of raw entity labels to canonical entities in the EntityHierarchy, enhancing clarity on how different labels correspond to standardized entities.

RonShakutai reviewed Mar 23, 2026

View reviewed changes

Comment thread docs/adr/ADR-001-simplified-evaluation-pipeline.md

Comment thread docs/adr/ADR-001-simplified-evaluation-pipeline.md

Comment thread docs/adr/ADR-001-simplified-evaluation-pipeline.md

negruber1 reviewed Mar 23, 2026

View reviewed changes

omri374 added 6 commits March 24, 2026 10:58

Enhance ADR-001 with additional evaluation details

c74e418

Added details about EvaluationResult and Error Analysis.

Enhance evaluation pipeline with hierarchy scoring

5fdf500

Refactor evaluation pipeline to include entity mapping and scoring per hierarchy.

Update ADR-001-simplified-evaluation-pipeline.md

b400004

Modify prediction method to return DataFrame

7c600b6

Updated the prediction method to return a DataFrame directly.

Enhance ADR-002 with input format and usage updates

191f330

Added input format details for token comparisons and updated usage examples for CanonicalMapper.

Enhance ADR-001 with multi-hierarchical evaluation example

6725db7

Added example code for multi-hierarchical evaluations in the ADR document.


		3. Make `model` optional in `BaseEvaluator` — change `BaseEvaluator.__init__(self, model=None, ...)` so that `model` defaults to `None`, relying on the existing runtime check in `evaluate_all()` that raises a clear error when `model is None`.

		4. Update `evaluate_all()` to delegate to `predict_dataset` + `calculate_score_on_df` — refactor `SpanEvaluator.evaluate_all()` and `TokenEvaluator.evaluate_all()` to call `self.model.predict_dataset(dataset)` and then pass the result to `calculate_score_on_df()`. This ensures a single code path for both old and new usage.


		1. Add `BaseModel.predict_dataset()` — implement the method as sketched above in `presidio_evaluator/models/base_model.py`. Add a unit test in `tests/` that verifies the 5-column schema and correct row count for a small synthetic dataset.

		2. Add `map_entities()` utility — add the function (and `Dict` import) to `presidio_evaluator/evaluation/` (e.g., in a new `utils.py` or alongside `get_results_dataframe`). Add a unit test verifying that both `annotation` and `prediction` columns are remapped.

Uh oh!

Conversation

Copilot AI commented Mar 21, 2026 • edited by omri374 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

omri374 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Mar 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

omri374 commented Mar 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

omri374 Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

omri374 Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

omri374 Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Copilot AI commented Mar 21, 2026 •

edited by omri374

Loading

omri374 Mar 24, 2026 •

edited

Loading

omri374 Mar 24, 2026 •

edited

Loading

omri374 Mar 24, 2026 •

edited

Loading