Skip to content

RAG-to-MCP Workshop Submission: participant/sarvesh-pune#13

Open
SnowKingSandy wants to merge 3 commits intonasscomAI:masterfrom
SnowKingSandy:participant/sarvesh-pune
Open

RAG-to-MCP Workshop Submission: participant/sarvesh-pune#13
SnowKingSandy wants to merge 3 commits intonasscomAI:masterfrom
SnowKingSandy:participant/sarvesh-pune

Conversation

@SnowKingSandy
Copy link
Copy Markdown

RAG-to-MCP — Submission PR

Name: Sarvesh
City / Group: Pune
Date: 18 April 2026
AI tool(s) used: GitHub Copilot (Claude Haiku 4.5)


Submission Checklist

  • uc-0a/agents.md — present and updated
  • uc-0a/skills.md — present and updated
  • uc-0a/classifier.py — runs without crash
  • uc-0a/results_pune.csv — output present
  • uc-0a/results_hyderabad.csv — output present
  • uc-0a/results_kolkata.csv — output present
  • uc-0a/results_ahmedabad.csv — output present
  • uc-rag/agents.md — present and updated
  • uc-rag/skills.md — present and updated
  • uc-rag/rag_server.py — full implementation with RICE enforcement
  • uc-mcp/agents.md — present and updated
  • uc-mcp/skills.md — present and updated
  • uc-mcp/mcp_server.py — passes all test_client.py tests
  • 3 commits with meaningful CRAFT messages, one per UC
  • All sections below filled

UC-0A — Complaint Classifier

Which failure mode did you encounter first?

Taxonomy drift — the naive prompt without enum constraints invented category names like "Road Issue" and "Water Problem" instead of using the exact schema values (Pothole, Flooding, Streetlight, etc.).

Which enforcement rule fixed it?

"Category must be exactly one value from the allowed list: Pothole, Flooding, Streetlight, Waste, Noise, Road Damage, Heritage Damage, Heat Hazard, Drain Blockage, Other. No variations or invented names."

UC-0A Fix taxonomy drift and severity blindness: no fixed enum and no keyword detection → implemented R.I.C.E framework with strict category enum, severity keyword enforcement, reason justification, ambiguity detection, and classifier results for all cities

Verification checkpoints:

  • Chunking produces multiple chunks per document (6 chunks from 3 documents)
  • Sentence boundaries respected — no clauses split mid-sentence
  • Out-of-scope queries return refusal template (not hallucinated answers)
  • Retrieved chunks scored and cited with document name and chunk index

UC-MCP — MCP Server

Paste your tool description from mcp_server.py TOOL_DEFINITION:

"Answers questions about City Municipal Corporation (CMC) policy documents: HR Leave Policy, IT Acceptable Use Policy, and Finance Reimbursement Policy. Returns answers grounded in retrieved document chunks with cited sources. Questions outside these three documents return a refusal message — this tool does not answer general knowledge questions, budget forecasts, or topics not covered by the indexed CMC policy documents."

Does it state the document scope explicitly?

Yes — names all three policy documents and explicitly states what the tool will not answer.

Run result: python test_client.py --port 8765 --run-all

Did the budget forecast question return isError: true?

Yes — no chunk scored above 0.3 for this query. The refusal template was returned with isError: true and no LLM call was made.

In one sentence — why is the tool description the enforcement?

The agent reads the tool description to decide when to call the tool, so a specific scope description prevents implicit permission to call it for out-of-scope questions.

UC-MCP Fix vague tool description and context breach: no scope stated and no error enforcement → implemented R.I.C.E framework with explicit CMC policy scope in tool description, mandatory isError on refusals, JSON-RPC 2.0 compliant error handling, tools/list and tools/call implementation

Verification checkpoints:

  • Tool description explicitly states document scope (CMC policies)
  • Tool description states refusal behavior for out-of-scope queries
  • python test_client.py --run-all executes all tests successfully
  • Budget forecast question returns isError: true
  • JSON-RPC -32601 error for unknown methods
  • All responses use HTTP 200

CRAFT Reflection

Which step of the CRAFT loop was hardest across all three UCs?

Constrain — specifically calibrating the similarity threshold in UC-RAG. The README specified 0.6, but empirical testing showed SentenceTransformer (all-MiniLM-L6-v2) produces scores of 0.2–0.5 for semantically related policy text. Lowering to 0.3 threshold while maintaining refusal for truly out-of-scope queries (budget forecasts) required end-to-end testing and observation of actual distance values.

What did you add to agents.md manually that the AI did not generate?

In UC-RAG agents.md, the explicit cross-document separation rule: "If a query spans two documents, retrieve from each separately. Never merge retrieved chunks from different documents into a single blended answer." The AI generated a generic grounding rule but did not restrict per-document retrieval, which is the specific enforcement needed to prevent IT+HR policy blending.

One specific task in your real work where you will use R.I.C.E in the next 7 days:

Building an internal complaint routing bot for a civic platform — complaints are currently routed manually to 12 different departments. I will apply RICE to scope the router strictly to the complaint taxonomy and enforce that misclassified complaints are flagged for manual review rather than auto-routed with false confidence.


Technical Notes

  • Dependencies installed: sentence-transformers, chromadb, nltk
  • NLTK punkt_tab downloaded: Required for sentence tokenization
  • RAG similarity threshold: 0.3 (empirically calibrated to balance recall vs. hallucination)
  • MCP JSON-RPC compliance: All responses HTTP 200, errors in JSON-RPC error object
  • All 4 cities tested: Pune, Hyderabad, Kolkata, Ahmedabad

… keyword detection → implemented R.I.C.E framework with strict category enum, severity keyword enforcement, reason justification, ambiguity detection, and classifier results for all cities
…ts and no enforcement → implemented sentence-aware chunking (max 400 tokens), 0.6 similarity threshold, mandatory citation, context grounding from retrieved chunks only, and cross-document separation
… and no error enforcement → implemented R.I.C.E framework with explicit CMC policy scope in tool description, mandatory isError on refusals, JSON-RPC 2.0 compliant error handling, tools/list and tools/call implementation
@github-actions
Copy link
Copy Markdown

Hi there, participant! Thanks for joining our RAG-to-MCP Workshop!

We're reviewing your PR for the 3 Use Cases (UC-0A, UC-RAG, UC-MCP). Once your submission is validated and merged, you'll be awarded your completion badge!

Next Steps:

  • Make sure all 3 UCs are finished.
  • Ensure your commit messages match the required format.
  • Fill out every section of the PR template.
  • Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant