Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 21 additions & 11 deletions uc-0a/agents.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,28 @@
# agents.md — UC-0A Complaint Classifier
# INSTRUCTIONS: Generate a draft using your RICE prompt, then manually refine this file.
# Delete these comments before committing.

role: >
[FILL IN: Who is this agent? What is its operational boundary?]
You are a Civic Complaint Classification Agent for the GHMC (Greater Hyderabad Municipal Corporation).
Your sole operational boundary is: given a single citizen complaint row, output a structured
classification. You do not answer questions, make recommendations, or perform any action outside
of classifying complaints.

intent: >
[FILL IN: What does a correct output look like — make it verifiable]
A correct output is a structured record with exactly five fields:
- complaint_id: copied verbatim from input
- category: exactly one value from the allowed taxonomy (no variations, no synonyms)
- priority: exactly one of Urgent / Standard / Low — determined by keyword rules, not sentiment
- reason: one sentence citing specific words from the complaint description that justify the category and priority
- flag: either "NEEDS_REVIEW" or blank — set only when the category is genuinely ambiguous
The output is verifiable: a reviewer can check each field against the source description
without any additional judgment.

context: >
[FILL IN: What information is the agent allowed to use? State exclusions explicitly.]
Allowed information: the complaint description field only.
The agent must NOT use: location names, ward numbers, reporter type, days_open, or external
knowledge about Hyderabad geography or infrastructure to infer category.
The agent must NOT hallucinate sub-categories or create category names outside the allowed list.

enforcement:
- "[FILL IN: Specific testable rule 1 — e.g. Category must be exactly one of: Pothole, Flooding, ...]"
- "[FILL IN: Specific testable rule 2 — e.g. Priority must be Urgent if description contains: injury, child, school, ...]"
- "[FILL IN: Specific testable rule 3 — e.g. Every output row must include a reason field citing specific words from the description]"
- "[FILL IN: Refusal condition — e.g. If category cannot be determined from description alone, output category: Other and flag: NEEDS_REVIEW]"
- "Category must be exactly one of: Pothole · Flooding · Streetlight · Waste · Noise · Road Damage · Heritage Damage · Heat Hazard · Drain Blockage · Other — no spelling variations, no synonyms, no combined categories."
- "Priority must be set to Urgent if and only if the description contains at least one of these keywords: injury, child, school, hospital, ambulance, fire, hazard, fell, collapse, hospitalised, crater, collapsed, lives — case-insensitive match."
- "Every output row must include a reason field with one sentence that quotes or closely paraphrases specific words from the complaint description."
- "If the category cannot be determined from the description alone with confidence, set category to Other and flag to NEEDS_REVIEW — never guess a specific category."
- "Priority is Urgent, Standard, or Low only — never any other value. Default to Standard unless severity keywords are present (Urgent) or the complaint is minor and contains no inconvenience indicators (Low)."
192 changes: 183 additions & 9 deletions uc-0a/classifier.py
Original file line number Diff line number Diff line change
@@ -1,31 +1,205 @@
"""
UC-0A — Complaint Classifier
Starter file. Build this using the RICE → agents.md → skills.md → CRAFT workflow.
CRAFT-tested implementation guided by agents.md (RICE) and skills.md.
City: Hyderabad

Run:
python classifier.py --input ../data/city-test-files/test_hyderabad.csv --output results_hyderabad.csv
"""

import argparse
import csv
import re

# ── Taxonomy ──────────────────────────────────────────────────────────────────
ALLOWED_CATEGORIES = [
"Pothole", "Flooding", "Streetlight", "Waste", "Noise",
"Road Damage", "Heritage Damage", "Heat Hazard", "Drain Blockage", "Other"
]

# Keywords that MUST trigger Urgent priority (case-insensitive)
URGENT_KEYWORDS = [
"injury", "child", "school", "hospital", "ambulance", "fire",
"hazard", "fell", "collapse", "collapsed", "hospitalised",
"crater", "lives", "life at risk"
]

# ── Classification rules ──────────────────────────────────────────────────────
# Each rule: (regex pattern on description, category)
# Order matters — first match wins.
CATEGORY_RULES = [
# Flooding
(r"\bflood(ed|ing|s)?\b|\bunderpass flood|\bwater log", "Flooding"),
# Drain Blockage
(r"\bdrain\b.*\bblock(ed)?\b|\bblock(ed)?\b.*\bdrain\b|\bstormwater drain|mosquito breeding", "Drain Blockage"),
# Pothole
(r"\bpothole(s)?\b|\bpotholes?\b", "Pothole"),
# Road Damage
(r"\broad colla?ps(e|ed)?\b|\bcra?ter\b|\broad.*damage|\bdamage.*road", "Road Damage"),
# Waste
(r"\bwaste\b|\bgarbage\b|\bwaste not cleared\b|\boverflow\b", "Waste"),
# Noise
(r"\bnoise\b|\bdrilling\b|\bidling\b|\bengines? on\b|\bsound\b", "Noise"),
# Heritage Damage
(r"\bheritage\b", "Heritage Damage"),
# Streetlight
(r"\bstreetlight(s)?\b|\blight(s)? (not working|off|broken|out)\b", "Streetlight"),
# Heat Hazard
(r"\bheat\b|\btemperature\b|\bsun(stroke)?\b", "Heat Hazard"),
]

# Low-priority signal words (when no urgent keyword and category is not critical)
LOW_PRIORITY_SIGNALS = [r"\bslow(ed)?\b", r"\bminor\b", r"\bsmall\b"]


def _has_urgent_keyword(text: str) -> bool:
text_lower = text.lower()
for kw in URGENT_KEYWORDS:
if re.search(r"\b" + re.escape(kw) + r"\b", text_lower):
return True
return False


def _determine_category(description: str):
"""
Returns (category, needs_review).
Matches description against CATEGORY_RULES in order.
Falls back to Other + NEEDS_REVIEW if no rule matches.
"""
text_lower = description.lower()
for pattern, category in CATEGORY_RULES:
if re.search(pattern, text_lower):
return category, False
return "Other", True


def _determine_priority(description: str, category: str) -> str:
if _has_urgent_keyword(description):
return "Urgent"
# Flooding + Drain Blockage at minimum Standard; heat/streetlight could be low
low_category = category in ("Streetlight", "Heat Hazard", "Noise")
if low_category:
for pattern in LOW_PRIORITY_SIGNALS:
if re.search(pattern, description.lower()):
return "Low"
return "Standard"


def _build_reason(description: str, category: str, priority: str) -> str:
"""
Construct a one-sentence reason citing words from the description.
"""
# Find the urgent trigger word if any
urgent_word = None
for kw in URGENT_KEYWORDS:
if re.search(r"\b" + re.escape(kw) + r"\b", description.lower()):
urgent_word = kw
break

# Grab a short excerpt from the description (first 80 chars)
excerpt = description.strip()
if len(excerpt) > 80:
excerpt = excerpt[:77] + "..."

reason = f"Classified as '{category}' based on description: \"{excerpt}\""
if urgent_word:
reason += f" — priority set to Urgent due to keyword '{urgent_word}'."
else:
reason += f" — priority set to {priority}."
return reason


# ── Skill: classify_complaint ─────────────────────────────────────────────────
def classify_complaint(row: dict) -> dict:
"""
Classify a single complaint row.
Returns: dict with keys: complaint_id, category, priority, reason, flag

TODO: Build this using your AI tool guided by your agents.md and skills.md.
Your RICE enforcement rules must be reflected in this function's behaviour.
Implements agents.md RICE enforcement rules.
"""
raise NotImplementedError("Build this using your AI tool + RICE prompt")
complaint_id = row.get("complaint_id", "UNKNOWN")
description = row.get("description", "").strip()

# Handle empty description
if not description:
return {
"complaint_id": complaint_id,
"category": "Other",
"priority": "Low",
"reason": "No description provided — cannot classify.",
"flag": "NEEDS_REVIEW",
}

category, needs_review = _determine_category(description)
priority = _determine_priority(description, category)
reason = _build_reason(description, category, priority)
flag = "NEEDS_REVIEW" if needs_review else ""

return {
"complaint_id": complaint_id,
"category": category,
"priority": priority,
"reason": reason,
"flag": flag,
}


# ── Skill: batch_classify ─────────────────────────────────────────────────────
def batch_classify(input_path: str, output_path: str):
"""
Read input CSV, classify each row, write results CSV.

TODO: Build this using your AI tool.
Must: flag nulls, not crash on bad rows, produce output even if some rows fail.
Never aborts on a bad row — logs and continues.
"""
raise NotImplementedError("Build this using your AI tool + RICE prompt")
results = []
errors = []
urgent_count = 0
review_count = 0

with open(input_path, newline="", encoding="utf-8") as infile:
reader = csv.DictReader(infile)
rows = list(reader)

for row in rows:
try:
result = classify_complaint(row)
except Exception as exc:
cid = row.get("complaint_id", "UNKNOWN")
errors.append((cid, str(exc)))
result = {
"complaint_id": cid,
"category": "Other",
"priority": "Low",
"reason": "Classification error — row skipped.",
"flag": "NEEDS_REVIEW",
}

results.append(result)
if result["priority"] == "Urgent":
urgent_count += 1
if result["flag"] == "NEEDS_REVIEW":
review_count += 1

fieldnames = ["complaint_id", "category", "priority", "reason", "flag"]
with open(output_path, "w", newline="", encoding="utf-8") as outfile:
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(results)

# Summary
print(f"\n{'='*55}")
print(f" UC-0A Complaint Classifier — Results Summary")
print(f"{'='*55}")
print(f" Total rows processed : {len(rows)}")
print(f" Urgent : {urgent_count}")
print(f" NEEDS_REVIEW : {review_count}")
print(f" Classification errors: {len(errors)}")
if errors:
for cid, err in errors:
print(f" ⚠ {cid}: {err}")
print(f" Output written to : {output_path}")
print(f"{'='*55}\n")


# ── Entry point ───────────────────────────────────────────────────────────────
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="UC-0A Complaint Classifier")
parser.add_argument("--input", required=True, help="Path to test_[city].csv")
Expand Down
16 changes: 16 additions & 0 deletions uc-0a/results_hyderabad.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
complaint_id,category,priority,reason,flag
GH-202401,Flooding,Urgent,"Classified as 'Flooding' based on description: ""Underpass flooded after 1hr rain. Ambulance diverted. Lives at risk."" — priority set to Urgent due to keyword 'ambulance'.",
GH-202402,Flooding,Standard,"Classified as 'Flooding' based on description: ""Market area flooded. Traders suffering losses. Drain completely blocked."" — priority set to Standard.",
GH-202406,Drain Blockage,Standard,"Classified as 'Drain Blockage' based on description: ""Main stormwater drain 100% blocked with construction debris."" — priority set to Standard.",
GH-202407,Drain Blockage,Standard,"Classified as 'Drain Blockage' based on description: ""Drain blocked and mosquito breeding. Dengue concern."" — priority set to Standard.",
GH-202410,Pothole,Standard,"Classified as 'Pothole' based on description: ""Potholes causing vehicles to slow to 20kmph on fast road."" — priority set to Standard.",
GH-202411,Pothole,Urgent,"Classified as 'Pothole' based on description: ""Pothole swallowed entire motorcycle wheel. Rider hospitalised."" — priority set to Urgent due to keyword 'hospitalised'.",
GH-202412,Pothole,Urgent,"Classified as 'Pothole' based on description: ""School bus struggling to navigate 6 potholes in 200m stretch."" — priority set to Urgent due to keyword 'school'.",
GH-202417,Waste,Standard,"Classified as 'Waste' based on description: ""Heritage zone garbage overflow. Tourist photographs showing piles of waste."" — priority set to Standard.",
GH-202420,Noise,Standard,"Classified as 'Noise' based on description: ""Construction drilling from 5am daily near residential towers."" — priority set to Standard.",
GH-202422,Road Damage,Urgent,"Classified as 'Road Damage' based on description: ""Road collapsed partially. Crater 1m deep near residential gate."" — priority set to Urgent due to keyword 'collapsed'.",
GH-202424,Flooding,Standard,"Classified as 'Flooding' based on description: ""Underpass floods in light rain. Cars regularly abandoned."" — priority set to Standard.",
GH-202428,Waste,Standard,"Classified as 'Waste' based on description: ""Post-market waste not cleared. Area unusable by Sunday morning."" — priority set to Standard.",
GH-202432,Noise,Standard,"Classified as 'Noise' based on description: ""24hr supermarket delivery trucks idling with engines on."" — priority set to Standard.",
GH-202448,Flooding,Standard,"Classified as 'Flooding' based on description: ""Main drain blocked — entire locality at flooding risk this week."" — priority set to Standard.",
GH-202438,Other,Standard,"Classified as 'Other' based on description: ""Colony surrounded by fields that channel rainwater through main road."" — priority set to Standard.",NEEDS_REVIEW
45 changes: 31 additions & 14 deletions uc-0a/skills.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,33 @@
# skills.md
# INSTRUCTIONS: Generate a draft by prompting AI, then manually refine this file.
# Delete these comments before committing.

skills:
- name: [skill_name]
description: [One sentence — what does this skill do?]
input: [What does it receive? Type and format.]
output: [What does it return? Type and format.]
error_handling: [What does it do when input is invalid or ambiguous?]
- name: classify_complaint
description: Classifies a single civic complaint row into category, priority, reason, and flag based on RICE enforcement rules.
input: >
A dict representing one CSV row with keys:
complaint_id (str), description (str).
All other fields (ward, location, etc.) are available but must NOT be used for classification.
output: >
A dict with keys:
complaint_id (str) — copied verbatim from input,
category (str) — exactly one of: Pothole · Flooding · Streetlight · Waste · Noise · Road Damage · Heritage Damage · Heat Hazard · Drain Blockage · Other,
priority (str) — one of: Urgent / Standard / Low,
reason (str) — one sentence citing specific words from the description,
flag (str) — "NEEDS_REVIEW" or blank string.
error_handling: >
If description is empty or None, set category to Other, priority to Low,
reason to "No description provided — cannot classify", flag to "NEEDS_REVIEW".
Never raise an exception; always return a valid output dict.

- name: [second_skill_name]
description: [One sentence]
input: [Type and format]
output: [Type and format]
error_handling: [What does it do when input is invalid or ambiguous?]
- name: batch_classify
description: Reads an input CSV of complaints, applies classify_complaint to each row, and writes a results CSV.
input: >
input_path (str) — path to test_[city].csv,
output_path (str) — path to write results_[city].csv.
Input CSV must have columns: complaint_id, description (others may be present and are passed through).
output: >
Writes a CSV at output_path with columns:
complaint_id, category, priority, reason, flag.
Also prints a summary: total rows processed, Urgent count, NEEDS_REVIEW count, any rows that failed.
error_handling: >
If a row fails classification (unexpected exception), log the complaint_id and error,
write that row with category=Other, priority=Low, reason="Classification error", flag="NEEDS_REVIEW",
and continue processing remaining rows. Never abort the entire batch for a single bad row.
20 changes: 10 additions & 10 deletions uc-0b/agents.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
# agents.md
# INSTRUCTIONS: Generate a draft using your RICE prompt, then manually refine this file.
# Delete these comments before committing.

role: >
[FILL IN: Who is this agent? What is its operational boundary?]
Policy Compliance Summarizer. Produces accurate, clause-complete summaries of HR policy documents.
Must preserve every numbered obligation exactly as written in the source.

intent: >
[FILL IN: What does a correct output look like — make it verifiable]
Generate summary_hr_leave.txt containing all 10 mandatory clauses (2.3, 2.4, 2.5, 2.6, 2.7, 3.2, 3.4, 5.2, 5.3, 7.2)
with exact wording for multi-condition obligations. Output is verifiable against source document.

context: >
[FILL IN: What information is the agent allowed to use? State exclusions explicitly.]
Use ONLY policy_hr_leave.txt as source. Do NOT add any information not present in the source.
Exclusions: No assumptions about "standard practice", no generic policy language.

enforcement:
- "[FILL IN: Specific testable rule 1]"
- "[FILL IN: Specific testable rule 2]"
- "[FILL IN: Specific testable rule 3]"
- "[FILL IN: Refusal condition — when should the system refuse rather than guess?]"
- "Every numbered clause (2.3, 2.4, 2.5, 2.6, 2.7, 3.2, 3.4, 5.2, 5.3, 7.2) must be present in output"
- "Multi-condition obligations must preserve ALL conditions — clause 5.2 requires BOTH Department Head AND HR Director"
- "Never add information absent from source document — no scope bleed"
- "If clause cannot be summarised without meaning loss, quote verbatim and flag with [QUOTE] tag"
Loading