Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions uc-0a/TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# UC-0A Implementation TODO

## Plan Steps:
1. [x] Update uc-0a/agents.md with RICE definition
2. [x] Update uc-0a/skills.md with two skills definitions
3. [x] Implement uc-0a/classifier.py fully
4. [x] Test: cd uc-0a && python classifier.py --input ../data/city-test-files/test_pune.csv --output results_pune.csv
5. [x] Validate updated classifier.py fixes and retest
6. [x] Mark complete and attempt_completion

Current progress: classifier.py implemented. Ready for testing (step 4).
19 changes: 8 additions & 11 deletions uc-0a/agents.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,15 @@
# agents.md — UC-0A Complaint Classifier
# INSTRUCTIONS: Generate a draft using your RICE prompt, then manually refine this file.
# Delete these comments before committing.

role: >
[FILL IN: Who is this agent? What is its operational boundary?]
You are a municipal complaint classifier agent that operates solely on the provided complaint description text to produce structured classification output.

intent: >
[FILL IN: What does a correct output look like — make it verifiable]
Produce a dictionary with exactly these keys: category (one exact string), priority (Urgent/Standard), reason (one sentence citing specific words), flag (NEEDS_REVIEW or empty string). Output is verifiable by matching allowed values and presence of citations.

context: >
[FILL IN: What information is the agent allowed to use? State exclusions explicitly.]
Use only the complaint description text provided. Do not use external knowledge, location data, statistics, or assumptions beyond keyword presence. Exclusions: no web search, no city context, no prior complaints.

enforcement:
- "[FILL IN: Specific testable rule 1 — e.g. Category must be exactly one of: Pothole, Flooding, ...]"
- "[FILL IN: Specific testable rule 2 — e.g. Priority must be Urgent if description contains: injury, child, school, ...]"
- "[FILL IN: Specific testable rule 3 — e.g. Every output row must include a reason field citing specific words from the description]"
- "[FILL IN: Refusal condition — e.g. If category cannot be determined from description alone, output category: Other and flag: NEEDS_REVIEW]"
- "Category must be exactly one of: Pothole, Flooding, Streetlight, Waste, Noise, Road Damage, Heritage Damage, Heat Hazard, Drain Blockage, Other — no variations, subcategories, or inventions."
- "Priority must be Urgent if description contains any of: injury, child, school, hospital, ambulance, fire, hazard, fell, collapse (case-insensitive); otherwise Standard."
- "Reason must be present as one sentence citing 2-3 specific words/phrases directly from the description."
- "If description is ambiguous (no clear category match) or invalid, set category: Other and flag: NEEDS_REVIEW; do not guess or force-fit."

127 changes: 112 additions & 15 deletions uc-0a/classifier.py
Original file line number Diff line number Diff line change
@@ -1,34 +1,131 @@
"""
UC-0A — Complaint Classifier
Starter file. Build this using the RICE → agents.md → skills.md → CRAFT workflow.
Full implementation following agents.md RICE rules, skills.md specs, and README schema.
Processes input CSV (complaint_id, description) → output CSV (category, priority, reason, flag).
Enforces exact categories, Urgent keywords, citations, ambiguity handling.
"""
import argparse
import csv
import re
import os

# Exact categories from README
CATEGORIES = {
'Pothole', 'Flooding', 'Streetlight', 'Waste', 'Noise',
'Road Damage', 'Heritage Damage', 'Heat Hazard', 'Drain Blockage', 'Other'
}

# Urgent keywords (case-insensitive)
URGENT_KEYWORDS = {
'injury', 'child', 'school', 'hospital', 'ambulance',
'fire', 'hazard', 'fell', 'collapse'
}

def classify_complaint(row: dict) -> dict:
"""
Classify a single complaint row.
Returns: dict with keys: complaint_id, category, priority, reason, flag

TODO: Build this using your AI tool guided by your agents.md and skills.md.
Your RICE enforcement rules must be reflected in this function's behaviour.
Classify a single complaint row per RICE enforcement.
Input: {'complaint_id': str, 'description': str}
Output: {'category': str, 'priority': str, 'reason': str, 'flag': str}
"""
raise NotImplementedError("Build this using your AI tool + RICE prompt")

description = (row.get('description', '') or '').strip().lower()

if not description:
return {
'category': 'Other',
'priority': 'Standard',
'reason': 'Invalid or empty description.',
'flag': 'NEEDS_REVIEW'
}

# Check for Urgent
is_urgent = any(re.search(r'\b' + re.escape(kw) + r'\b', description, re.IGNORECASE) for kw in URGENT_KEYWORDS)
priority = 'Urgent' if is_urgent else 'Standard'

# Category matching (keyword-based, strict to list)
category_matches = {
'Pothole': ['pothole', 'crater', 'road hole', 'pot hole', 'hole in road'],
'Flooding': ['flood', 'waterlogging', 'flooding'],
'Streetlight': ['streetlight', 'street light', 'light not working', 'light', 'lamp'],
'Waste': ['garbage', 'waste', 'trash', 'rubbish'],
'Noise': ['noise', 'loud', 'loud music', 'barking', 'sound'],
'Road Damage': ['road damage', 'broken road', 'cracks', 'crack'],
'Heritage Damage': ['heritage', 'monument', 'monument damage'],
'Heat Hazard': ['heat', 'sunstroke', 'hot road', 'melting'],
'Drain Blockage': ['drain', 'sewage', 'sewer', 'blockage', 'clogged']
}

matched_keys = []
for cat, keywords in category_matches.items():
if any(re.search(r'\b' + re.escape(kw) + r'\b', description, re.IGNORECASE) for kw in keywords):
matched_keys.append(cat)

flag = ''
if len(matched_keys) == 1:
category = matched_keys[0]
else:
category = 'Other'
flag = 'NEEDS_REVIEW'

# Reason always cites words
all_keywords = [kw for kw_list in category_matches.values() for kw in kw_list] + list(URGENT_KEYWORDS)
cited_words_list = re.findall(r'\b(?:' + '|'.join(re.escape(kw) for kw in all_keywords) + r')\b', description, re.IGNORECASE)
cited_words = ', '.join(cited_words_list[:3])
urgent_words = [kw for kw in URGENT_KEYWORDS if re.search(r'\b' + re.escape(kw) + r'\b', description, re.IGNORECASE)]
urgency_note = f", urgent: {', '.join(urgent_words[:2])}" if urgent_words else ''
reason = f"Detected keywords: {cited_words}{urgency_note}" if cited_words else "No clear matching keywords found in description"

return {
'category': category,
'priority': priority,
'reason': reason,
'flag': flag
}

def batch_classify(input_path: str, output_path: str):
"""
Read input CSV, classify each row, write results CSV.

TODO: Build this using your AI tool.
Must: flag nulls, not crash on bad rows, produce output even if some rows fail.
Batch process CSV per skills.md.
Handles errors per row, writes output always.
"""
raise NotImplementedError("Build this using your AI tool + RICE prompt")

results = []
errors = []

if not os.path.exists(input_path):
print(f"Error: Input file {input_path} not found.")
return

try:
with open(input_path, 'r', newline='', encoding='utf-8') as infile:
reader = csv.DictReader(infile)
for row_num, row in enumerate(reader, 1):
try:
clean_row = {'complaint_id': row.get('complaint_id', f'Row{row_num}'), 'description': row.get('description', '')}
result = classify_complaint(clean_row)
results.append(result)
except Exception as e:
errors.append(f"Row {row_num}: {e}")
results.append({
'category': 'Other',
'priority': 'Standard',
'reason': f'Processing error: {str(e)}',
'flag': 'NEEDS_REVIEW'
})

fieldnames = ['category', 'priority', 'reason', 'flag']
with open(output_path, 'w', newline='', encoding='utf-8') as outfile:
writer = csv.DictWriter(outfile, fieldnames=fieldnames, delimiter=',', quoting=csv.QUOTE_MINIMAL)
writer.writeheader()
writer.writerows(results)

print(f"Processed {len(results)} rows. Output: {output_path}")
if errors:
print("Errors:", errors)

except Exception as e:
print(f"Batch error: {e}")

if __name__ == "__main__":
parser = argparse.ArgumentParser(description="UC-0A Complaint Classifier")
parser.add_argument("--input", required=True, help="Path to test_[city].csv")
parser.add_argument("--input", required=True, help="Path to test_[city].csv")
parser.add_argument("--output", required=True, help="Path to write results CSV")
args = parser.parse_args()
batch_classify(args.input, args.output)
Expand Down
16 changes: 16 additions & 0 deletions uc-0a/results_pune.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
category,priority,reason,flag
Pothole,Standard,Detected keywords: pothole,
Pothole,Urgent,"Detected keywords: pothole, school, urgent: school",
Other,Standard,No clear matching keywords found in description,NEEDS_REVIEW
Drain Blockage,Standard,Detected keywords: drain,
Other,Standard,No clear matching keywords found in description,NEEDS_REVIEW
Streetlight,Urgent,"Detected keywords: streetlight, hazard, urgent: hazard",
Waste,Standard,Detected keywords: garbage,
Other,Standard,No clear matching keywords found in description,NEEDS_REVIEW
Other,Standard,No clear matching keywords found in description,NEEDS_REVIEW
Other,Urgent,"Detected keywords: injury, urgent: injury",NEEDS_REVIEW
Other,Standard,No clear matching keywords found in description,NEEDS_REVIEW
Other,Standard,No clear matching keywords found in description,NEEDS_REVIEW
Heritage Damage,Standard,Detected keywords: heritage,
Waste,Standard,Detected keywords: waste,
Other,Urgent,"Detected keywords: fell, urgent: fell",NEEDS_REVIEW
25 changes: 11 additions & 14 deletions uc-0a/skills.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,13 @@
# skills.md
# INSTRUCTIONS: Generate a draft by prompting AI, then manually refine this file.
# Delete these comments before committing.

skills:
- name: [skill_name]
description: [One sentence — what does this skill do?]
input: [What does it receive? Type and format.]
output: [What does it return? Type and format.]
error_handling: [What does it do when input is invalid or ambiguous?]
- name: classify_complaint
description: Classifies a single municipal complaint description into category, priority, reason, and flag following strict schema.
input: Dictionary with keys 'complaint_id' (str), 'description' (str). Example: {'complaint_id': 'C1', 'description': 'Pothole on main road caused injury.'}
output: Dictionary with keys 'category' (str exact match), 'priority' (str: Urgent/Standard), 'reason' (str one sentence), 'flag' (str or empty).
error_handling: If description empty/invalid, return {'category': 'Other', 'priority': 'Standard', 'reason': 'Invalid or empty description.', 'flag': 'NEEDS_REVIEW'}; ambiguous cases also get Other + NEEDS_REVIEW.

- name: batch_classify
description: Reads input CSV (complaint_id, description), classifies each row using classify_complaint, writes output CSV (category, priority, reason, flag).
input: Two strings - input_path (str to CSV), output_path (str to write CSV).
output: None (writes file); prints success message.
error_handling: Skips bad rows logging error, continues processing others; ensures output CSV created even if some rows fail.

- name: [second_skill_name]
description: [One sentence]
input: [Type and format]
output: [Type and format]
error_handling: [What does it do when input is invalid or ambiguous?]
9 changes: 9 additions & 0 deletions uc-0b/TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# UC-0B Implementation TODO

## Approved Plan Steps
- [x] Step 1: Replace uc-0b/agents.md with RICE YAML agent definition
- [x] Step 2: Replace uc-0b/skills.md with YAML for retrieve_policy and summarize_policy
- [x] Step 3: Replace uc-0b/app.py with complete CLI implementation (syntax fixed)
- [x] Step 4: Test run `python uc-0b/app.py --input data/policy-documents/policy_hr_leave.txt --output uc-0b/summary_hr_leave.txt`
- [x] Step 5: Verify output (clauses >20, multiline preserved, sections correct)
- [x] Plan approved by user
25 changes: 14 additions & 11 deletions uc-0b/agents.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
# agents.md
# INSTRUCTIONS: Generate a draft using your RICE prompt, then manually refine this file.
# Delete these comments before committing.

role: >
[FILL IN: Who is this agent? What is its operational boundary?]
HR Policy Summarizer Agent. Specialized in parsing CMC Employee Leave Policy documents into structured Markdown summaries. Operational boundary limited to clause extraction, section grouping, and verbatim preservation from input policy text only.

intent: >
[FILL IN: What does a correct output look like — make it verifiable]
Produce verifiable Markdown output with:
- Title from policy header
- Effective date
- Sections grouped by clause prefix (1.x → Purpose, 2.x → Annual Leave, etc.)
- All clauses preserved exactly as multiline text
- >20 clauses detected
Success = 100% clause coverage without omission, condition drop, or added text.

context: >
[FILL IN: What information is the agent allowed to use? State exclusions explicitly.]
Allowed: Raw policy text from retrieve_policy skill.
Excluded: External knowledge, other policies, standard practices, assumptions about CMC operations. Use only content between policy header and end.

enforcement:
- "[FILL IN: Specific testable rule 1]"
- "[FILL IN: Specific testable rule 2]"
- "[FILL IN: Specific testable rule 3]"
- "[FILL IN: Refusal condition — when should the system refuse rather than guess?]"
- Every numbered clause (pattern \\d+\\.\\d+:) must appear in output with full multiline text
- Preserve ALL conditions (e.g., 5.2 requires BOTH Department Head AND HR Director approval)
- No hallucinated text, softening (e.g., "may" → "must"), or scope bleed (no "typically" phrases)
- If parsing fails or <20 clauses: refuse with error message "Insufficient clauses detected"
Loading