Skip to content

Improve accuracy#18

Open
DavisAgyemang wants to merge 9 commits intodevelopfrom
improve_accuracy
Open

Improve accuracy#18
DavisAgyemang wants to merge 9 commits intodevelopfrom
improve_accuracy

Conversation

@DavisAgyemang
Copy link
Copy Markdown
Collaborator

I created a new contract map model that uses keywords that generated from the framework/contracts data per ccs category but also switches to the original LLM contract map, if there is not enough keywords to prove that it belongs to the category. The results are below:

--- Test Results ---
Total Contract Descriptions Analysed: 781
New model(Contract Map Version3) Correct Matches: 639
New model(Contract Map Version3) Accuracy: 81.82%
Old model(Contract Map Version2) Correct Matches: 621
Old model(Contract Map Version2) Accuracy%: 79.5134443021767%

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a hybrid “keywords-first” contract classification approach by generating a new semantic anchor registry (keywords per CCS category) and using a heuristic classifier with fallback to the existing LLM/RAG approach when confidence is low.

Changes:

  • Added semantic-anchor extraction tooling (LLM-driven) and committed a new registry (semantic_anchors4.json) used by a heuristic classifier.
  • Added heuristic classifier (v5) and “mix” wrappers to route between heuristic vs LLM/RAG classification.
  • Expanded prompts and updated dependency management to a pip-compile-generated, fully pinned requirements.txt.

Reviewed changes

Copilot reviewed 20 out of 24 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
src/utils/weight_keywords_tool_v3.py Generates semantic anchors per category and writes semantic_anchors4.json
src/utils/RAG_system_v2.py Builds/embeds category “rule cards” and uploads to Azure Search
src/utils/llm_keywords_finder.py LLM keyword extraction helper used by the anchor generator
src/core/classification_v5.py Heuristic keyword-based classifier over semantic_anchors4.json
src/core/classification_v3.py RAG-backed mapper (Azure Search retrieval + LLM)
src/core/classification_v3_mix_v5.py Hybrid wrapper combining v5 heuristic + v3 RAG fallback
src/core/classification_v2_mix_v5.py Hybrid wrapper combining v5 heuristic + v2 LLM fallback
src/core/classification_v2.py Minor import/path cleanup
src/core/classification_v1.py Updated LangChain message imports + import path
semantic_anchors4.json New semantic anchor registry used by v5 heuristic classifier
requirements.in New pip-compile input set for dependencies
requirements.txt New fully pinned dependency lockfile
prompts/keyword_system_prompt_v2.md New keyword extraction prompt template
prompts/summariser_system_prompt.md New rule-card summarisation prompt template
prompts/system_prompt_v3.md New “rules-driven” classification prompt template
prompts/system_prompts.py Prompt tweaks (incl. removing “Low Value” references)
prompts/system_prompt_v2.md Removed “Low Value” category references
prompts/system_prompt.md Removed “Low Value” category references
prompts/new_system_prompt.md Removed “Low Value” category references
prompts/contractmap_prompt_with_descriptions.md Removed “Low Value” category reference
investigate_model_accuracy.py Local evaluation script for the new approach
async_investigate_model_accuracy.py Async evaluation script for the hybrid approach
.gitignore Ignore xlsx and other local experimentation artifacts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

load_dotenv()

DEFAULT_SYSTEM_PROMPT_FILE = (
Path(__file__).parent.parent.parent / "prompts/system_prompt_v6.md"
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_SYSTEM_PROMPT_FILE points to prompts/system_prompt_v6.md, but that file does not exist in the repo (only up to system_prompt_v3.md). This will raise FileNotFoundError when contract_mapper_v3 runs. Update the path to an existing prompt file or add the missing prompt file to prompts/ (and keep naming consistent).

Suggested change
Path(__file__).parent.parent.parent / "prompts/system_prompt_v6.md"
Path(__file__).parent.parent.parent / "prompts/system_prompt_v3.md"

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +9
def keywords_and_llm(description):
classifier = HeuristicClassifier()
result, score = classifier.classify(description)

if score >= 15:
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keywords_and_llm decides based on score >= 15 only. When HeuristicClassifier.classify() returns (None, best_score) for ambiguous matches, this function will still return None (no fallback) as long as best_score >= 15. Gate on result is not None (and consider reusing a module-level classifier like classification_v2_mix_v5 to avoid reloading the JSON every call).

Suggested change
def keywords_and_llm(description):
classifier = HeuristicClassifier()
result, score = classifier.classify(description)
if score >= 15:
classifier = HeuristicClassifier()
def keywords_and_llm(description):
result, score = classifier.classify(description)
if result is not None and score >= 15:

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +40
# \b ensures we match exact words only
if re.search(rf"\b{re.escape(word.lower())}\b", desc_lower):
if word.lower().startswith("rm"):
score += 50 # Instant win
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

word.lower().startswith("rm") treats many non-framework tokens as RM codes (e.g., anchors like "RM Codes not present" / "RM Migration" in semantic_anchors4.json) and gives them an "instant win" +50 score. This will cause systematic misclassification. Consider only applying the +50 boost for true RM codes (e.g., ^rm\d{4}\b) and treat other "rm..." phrases as normal keywords.

Suggested change
# \b ensures we match exact words only
if re.search(rf"\b{re.escape(word.lower())}\b", desc_lower):
if word.lower().startswith("rm"):
score += 50 # Instant win
word_lower = word.lower()
# \b ensures we match exact words only
if re.search(rf"\b{re.escape(word_lower)}\b", desc_lower):
if re.fullmatch(r"rm\d{4}", word_lower):
score += 50 # Instant win for true RM codes only

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +21
Primary match = 3 points
Secondary match = 1 point
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scoring described in the docstring ("Primary match = 3 points") doesn't match the implementation (primary is +5 and RM-prefixed is +50). Update the docstring (or adjust the scoring) so callers can tune threshold/margin correctly.

Suggested change
Primary match = 3 points
Secondary match = 1 point
Primary match = 5 points
RM-prefixed primary match = 50 points
Secondary match = 1 point
The ``threshold`` and ``margin`` parameters should be tuned against
this scoring scale.

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +73
if (best_score - second_score) < margin:
# We return None so the async loop falls back to the LLM
print(f"Ambiguity detected ({best_cat} {best_score} vs {second_score}). Forcing LLM.")
return None, best_score
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classify() prints on ambiguity, which is noisy for library code and makes it hard to integrate in services/tests. Prefer returning an explicit status/reason or using the project's logging setup (and keep stdout clean by default).

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +53
registry[category] = {"primary": list(primary_store), "secondary": list(secondary_store)}

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registry starts with a dict shape ({"primary": ..., "secondary": ...}) but later is mutated to a list shape ([final_primary, final_secondary]). This shape-shifting makes the code fragile and harder to extend. Consider keeping a consistent structure throughout (e.g., always dicts or a small typed object) and only transform once at the final serialization step.

Copilot uses AI. Check for mistakes.
Comment thread requirements.in
@@ -0,0 +1,31 @@
# Base Tools (Add these!)
pip
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including pip in requirements.in is unusual (and pip-compile treats it as “unsafe” for requirements files). Unless you have a specific reason to pin/upgrade pip via your runtime dependencies, it’s safer to remove it and manage pip version via your build image/tooling instead.

Suggested change
pip

Copilot uses AI. Check for mistakes.
import os
from pathlib import Path
from utils.file_to_string import file_to_string_processor
from src.utils.file_to_string import file_to_string_processor
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These modules now import src.utils..., but the packaging/test setup treats src/ as the source root (see pyproject.toml [tool.setuptools.packages.find] where = ["src"] and pytest pythonpath = ["src"]), which makes the install-time packages utils, core, etc. In an installed environment (or when only src/ is on PYTHONPATH), src.utils may not exist. Prefer importing utils.file_to_string (consistent with tests/test_file_to_string.py) or update the packaging layout so src.* is a real distributable package.

Suggested change
from src.utils.file_to_string import file_to_string_processor
from utils.file_to_string import file_to_string_processor

Copilot uses AI. Check for mistakes.
from dotenv import load_dotenv
import os
from utils.file_to_string import file_to_string_processor
from src.utils.file_to_string import file_to_string_processor
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file now imports src.utils..., but the repo’s packaging/test setup uses src/ as the source root (so packages resolve as utils, core, etc.). In environments where only src/ is on PYTHONPATH (common for tests/packaging), src.utils may not be importable. Prefer from utils.file_to_string import ... (as in tests/test_file_to_string.py) or adjust the packaging layout to make src.* imports valid after installation.

Suggested change
from src.utils.file_to_string import file_to_string_processor
from utils.file_to_string import file_to_string_processor

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +50
for text in group['clean_description'].tolist():
# Using the LLM to find keywords for each individual contract
primary, secondary = keywords_finder_llm(text)
primary_store.update(primary)
secondary_store.update(secondary)
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keywords_finder_llm() is called on each individual contract description (text) here, but the prompt template (keyword_system_prompt_v2.md) is written to analyze a category-level collection of framework titles/summaries for {{category_name}}. This mismatch makes the extracted anchors less reliable and also leaves {{category_name}} unresolved unless you pass it in. Consider either (a) batching/concatenating per-category text and passing category_name, or (b) rewriting the prompt/template to match per-contract extraction.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants