Improve accuracy by DavisAgyemang · Pull Request #18 · Crown-Commercial-Service/ccs-contract-map

DavisAgyemang · 2026-04-08T10:34:49Z

I created a new contract map model that uses keywords that generated from the framework/contracts data per ccs category but also switches to the original LLM contract map, if there is not enough keywords to prove that it belongs to the category. The results are below:

--- Test Results ---
Total Contract Descriptions Analysed: 781
New model(Contract Map Version3) Correct Matches: 639
New model(Contract Map Version3) Accuracy: 81.82%
Old model(Contract Map Version2) Correct Matches: 621
Old model(Contract Map Version2) Accuracy%: 79.5134443021767%

…as merged

Copilot

Pull request overview

This PR introduces a hybrid “keywords-first” contract classification approach by generating a new semantic anchor registry (keywords per CCS category) and using a heuristic classifier with fallback to the existing LLM/RAG approach when confidence is low.

Changes:

Added semantic-anchor extraction tooling (LLM-driven) and committed a new registry (semantic_anchors4.json) used by a heuristic classifier.
Added heuristic classifier (v5) and “mix” wrappers to route between heuristic vs LLM/RAG classification.
Expanded prompts and updated dependency management to a pip-compile-generated, fully pinned requirements.txt.

Reviewed changes

Copilot reviewed 20 out of 24 changed files in this pull request and generated 18 comments.

Show a summary per file

File	Description
src/utils/weight_keywords_tool_v3.py	Generates semantic anchors per category and writes `semantic_anchors4.json`
src/utils/RAG_system_v2.py	Builds/embeds category “rule cards” and uploads to Azure Search
src/utils/llm_keywords_finder.py	LLM keyword extraction helper used by the anchor generator
src/core/classification_v5.py	Heuristic keyword-based classifier over `semantic_anchors4.json`
src/core/classification_v3.py	RAG-backed mapper (Azure Search retrieval + LLM)
src/core/classification_v3_mix_v5.py	Hybrid wrapper combining v5 heuristic + v3 RAG fallback
src/core/classification_v2_mix_v5.py	Hybrid wrapper combining v5 heuristic + v2 LLM fallback
src/core/classification_v2.py	Minor import/path cleanup
src/core/classification_v1.py	Updated LangChain message imports + import path
semantic_anchors4.json	New semantic anchor registry used by v5 heuristic classifier
requirements.in	New `pip-compile` input set for dependencies
requirements.txt	New fully pinned dependency lockfile
prompts/keyword_system_prompt_v2.md	New keyword extraction prompt template
prompts/summariser_system_prompt.md	New rule-card summarisation prompt template
prompts/system_prompt_v3.md	New “rules-driven” classification prompt template
prompts/system_prompts.py	Prompt tweaks (incl. removing “Low Value” references)
prompts/system_prompt_v2.md	Removed “Low Value” category references
prompts/system_prompt.md	Removed “Low Value” category references
prompts/new_system_prompt.md	Removed “Low Value” category references
prompts/contractmap_prompt_with_descriptions.md	Removed “Low Value” category reference
investigate_model_accuracy.py	Local evaluation script for the new approach
async_investigate_model_accuracy.py	Async evaluation script for the hybrid approach
.gitignore	Ignore xlsx and other local experimentation artifacts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T10:44:31Z

+load_dotenv()
+
+DEFAULT_SYSTEM_PROMPT_FILE = (
+    Path(__file__).parent.parent.parent / "prompts/system_prompt_v6.md"


DEFAULT_SYSTEM_PROMPT_FILE points to prompts/system_prompt_v6.md, but that file does not exist in the repo (only up to system_prompt_v3.md). This will raise FileNotFoundError when contract_mapper_v3 runs. Update the path to an existing prompt file or add the missing prompt file to prompts/ (and keep naming consistent).

Suggested change

Path(__file__).parent.parent.parent / "prompts/system_prompt_v6.md"

Path(__file__).parent.parent.parent / "prompts/system_prompt_v3.md"

Copilot · 2026-04-08T10:44:32Z

+def keywords_and_llm(description):
+    classifier = HeuristicClassifier()
+    result, score  = classifier.classify(description)
+
+    if score >= 15:


keywords_and_llm decides based on score >= 15 only. When HeuristicClassifier.classify() returns (None, best_score) for ambiguous matches, this function will still return None (no fallback) as long as best_score >= 15. Gate on result is not None (and consider reusing a module-level classifier like classification_v2_mix_v5 to avoid reloading the JSON every call).

Suggested change

def keywords_and_llm(description):

classifier = HeuristicClassifier()

result, score = classifier.classify(description)

if score >= 15:

classifier = HeuristicClassifier()

def keywords_and_llm(description):

result, score = classifier.classify(description)

if result is not None and score >= 15:

Copilot · 2026-04-08T10:44:32Z

+                # \b ensures we match exact words only
+                if re.search(rf"\b{re.escape(word.lower())}\b", desc_lower):
+                    if word.lower().startswith("rm"):
+                        score += 50  # Instant win


word.lower().startswith("rm") treats many non-framework tokens as RM codes (e.g., anchors like "RM Codes not present" / "RM Migration" in semantic_anchors4.json) and gives them an "instant win" +50 score. This will cause systematic misclassification. Consider only applying the +50 boost for true RM codes (e.g., ^rm\d{4}\b) and treat other "rm..." phrases as normal keywords.

Suggested change

# \b ensures we match exact words only

if re.search(rf"\b{re.escape(word.lower())}\b", desc_lower):

if word.lower().startswith("rm"):

score += 50 # Instant win

word_lower = word.lower()

# \b ensures we match exact words only

if re.search(rf"\b{re.escape(word_lower)}\b", desc_lower):

if re.fullmatch(r"rm\d{4}", word_lower):

score += 50 # Instant win for true RM codes only

Copilot · 2026-04-08T10:44:32Z

+        Primary match = 3 points
+        Secondary match = 1 point


The scoring described in the docstring ("Primary match = 3 points") doesn't match the implementation (primary is +5 and RM-prefixed is +50). Update the docstring (or adjust the scoring) so callers can tune threshold/margin correctly.

Suggested change

Primary match = 3 points

Secondary match = 1 point

Primary match = 5 points

RM-prefixed primary match = 50 points

Secondary match = 1 point

The ``threshold`` and ``margin`` parameters should be tuned against

this scoring scale.

Copilot · 2026-04-08T10:44:33Z

+        if (best_score - second_score) < margin:
+            # We return None so the async loop falls back to the LLM
+            print(f"Ambiguity detected ({best_cat} {best_score} vs {second_score}). Forcing LLM.")
+            return None, best_score


classify() prints on ambiguity, which is noisy for library code and makes it hard to integrate in services/tests. Prefer returning an explicit status/reason or using the project's logging setup (and keep stdout clean by default).

Copilot · 2026-04-08T10:44:35Z

+    registry[category] = {"primary": list(primary_store), "secondary": list(secondary_store)}
+


registry starts with a dict shape ({"primary": ..., "secondary": ...}) but later is mutated to a list shape ([final_primary, final_secondary]). This shape-shifting makes the code fragile and harder to extend. Consider keeping a consistent structure throughout (e.g., always dicts or a small typed object) and only transform once at the final serialization step.

Copilot · 2026-04-08T10:44:35Z

@@ -0,0 +1,31 @@
+# Base Tools (Add these!)
+pip


Including pip in requirements.in is unusual (and pip-compile treats it as “unsafe” for requirements files). Unless you have a specific reason to pin/upgrade pip via your runtime dependencies, it’s safer to remove it and manage pip version via your build image/tooling instead.

Suggested change

pip

Copilot · 2026-04-08T10:44:36Z

 import os
 from pathlib import Path
-from utils.file_to_string import file_to_string_processor
+from src.utils.file_to_string import file_to_string_processor


These modules now import src.utils..., but the packaging/test setup treats src/ as the source root (see pyproject.toml [tool.setuptools.packages.find] where = ["src"] and pytest pythonpath = ["src"]), which makes the install-time packages utils, core, etc. In an installed environment (or when only src/ is on PYTHONPATH), src.utils may not exist. Prefer importing utils.file_to_string (consistent with tests/test_file_to_string.py) or update the packaging layout so src.* is a real distributable package.

Suggested change

from src.utils.file_to_string import file_to_string_processor

from utils.file_to_string import file_to_string_processor

Copilot · 2026-04-08T10:44:36Z

 from dotenv import load_dotenv
 import os
-from utils.file_to_string import file_to_string_processor
+from src.utils.file_to_string import file_to_string_processor


This file now imports src.utils..., but the repo’s packaging/test setup uses src/ as the source root (so packages resolve as utils, core, etc.). In environments where only src/ is on PYTHONPATH (common for tests/packaging), src.utils may not be importable. Prefer from utils.file_to_string import ... (as in tests/test_file_to_string.py) or adjust the packaging layout to make src.* imports valid after installation.

Suggested change

from src.utils.file_to_string import file_to_string_processor

from utils.file_to_string import file_to_string_processor

Copilot · 2026-04-08T10:44:36Z

+    for text in group['clean_description'].tolist():
+        # Using the LLM to find keywords for each individual contract
+        primary, secondary = keywords_finder_llm(text)
+        primary_store.update(primary)
+        secondary_store.update(secondary)


keywords_finder_llm() is called on each individual contract description (text) here, but the prompt template (keyword_system_prompt_v2.md) is written to analyze a category-level collection of framework titles/summaries for {{category_name}}. This mismatch makes the extracted anchors less reliable and also leaves {{category_name}} unresolved unless you pass it in. Consider either (a) batching/concatenating per-category text and passing category_name, or (b) rewriting the prompt/template to match per-contract extraction.

Davis Agyemang added 8 commits March 25, 2026 11:42

fixed errors in new version of contract map due mlflow changes that w…

566a612

…as merged

fixed the bug ready for deployment

30bd893

new contract map with RAG

f964a2b

improved model

514c0ce

improving experiment script

c177987

saving working

3adf15f

improved to model by 3%

e41fcc1

cleaned up excel sheet

527cab9

DavisAgyemang requested review from SamuelHLewis, ccs-gs and Copilot April 8, 2026 10:34

Copilot started reviewing on behalf of DavisAgyemang April 8, 2026 10:35 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

added parameneters to heuristic model

16353f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve accuracy#18

Improve accuracy#18
DavisAgyemang wants to merge 9 commits intodevelopfrom
improve_accuracy

DavisAgyemang commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	Path(__file__).parent.parent.parent / "prompts/system_prompt_v6.md"
	Path(__file__).parent.parent.parent / "prompts/system_prompt_v3.md"

-        Primary match = 3 points
-        Secondary match = 1 point
+        Primary match = 5 points
+        RM-prefixed primary match = 50 points
+        Secondary match = 1 point
+        The ``threshold`` and ``margin`` parameters should be tuned against
+        this scoring scale.

		registry[category] = {"primary": list(primary_store), "secondary": list(secondary_store)}

	from src.utils.file_to_string import file_to_string_processor
	from utils.file_to_string import file_to_string_processor

Conversation

DavisAgyemang commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants