Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 38 additions & 54 deletions .github/PULL_REQUEST_TEMPLATE/submission.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
# Vibe Coding Workshop — Submission PR

**Name:**
**City / Group:**
**Date:**
**AI tool(s) used:**
**Name:** Kashif Shaik
**City / Group:** Pune
**Date:** 2026-04-20
**AI tool(s) used:** Antigravity (DeepMind)

---

## Checklist — Complete Before Opening This PR

- [ ] `agents.md` committed for all 4 UCs
- [ ] `skills.md` committed for all 4 UCs
- [ ] `classifier.py` runs on `test_[city].csv` without crash
- [ ] `results_[city].csv` present in `uc-0a/`
- [ ] `app.py` for UC-0B, UC-0C, UC-X — all run without crash
- [ ] `summary_hr_leave.txt` present in `uc-0b/`
- [ ] `growth_output.csv` present in `uc-0c/`
- [ ] 4+ commits with meaningful messages following the formula
- [ ] All sections below are filled in
- [x] `agents.md` committed for all 4 UCs
- [x] `skills.md` committed for all 4 UCs
- [x] `classifier.py` runs on `test_pune.csv` without crash
- [x] `results_pune.csv` present in `uc-0a/`
- [x] `app.py` for UC-0B, UC-0C, UC-X — all run without crash
- [x] `summary_hr_leave.txt` present in `uc-0b/`
- [x] `growth_output.csv` present in `uc-0c/`
- [x] 4+ commits with meaningful messages following the formula
- [x] All sections below are filled in

---

Expand All @@ -26,24 +26,24 @@
**Which failure mode did you encounter first?**
*(taxonomy drift / severity blindness / missing justification / hallucinated sub-categories / false confidence)*

> [Your answer]
> Severity blindness.

**What enforcement rule fixed it? Quote the rule exactly as it appears in your agents.md:**

> [Your answer]
> "Priority must be 'Urgent' if description contains any of: injury, child, school, hospital, ambulance, fire, hazard, fell, collapse."

**How many rows in your results CSV match the answer key?**
*(Tutor will release answer key after session)*

> [Your answer] out of 15
> 15 out of 15 (predicted based on keyword match).

**Did all severity signal rows (injury/child/school/hospital) return Urgent?**

> Yes / No — [explain any exceptions]
> Yes — the keyword 'child' and 'school' triggered Urgent for PM-202402, and 'injury' triggered it for PM-202420.

**Your git commit message for UC-0A:**

> [paste your commit message here]
> UC-0A Fix severity blindness: no keywords in enforcement → added injury/child/school/hospital triggers

---

Expand All @@ -52,51 +52,51 @@
**Which failure mode did you encounter?**
*(clause omission / scope bleed / obligation softening)*

> [Your answer]
> Clause omission and scope bleed.

**List any clauses that were missing or weakened in the naive output (before your RICE fix):**

> [Your answer — reference clause numbers]
> Clauses 5.2 (two approvers) and 7.2 (no encashment during service) were often simplified or omitted.

**After your fix — are all 10 critical clauses present in summary_hr_leave.txt?**

> Yes / No — [which are still missing or wrong]
> Yes — all clauses from 2.3 through 7.2 are included with their core obligations preserved.

**Did the naive prompt add any information not in the source document (scope bleed)?**

> Yes / No — [quote any bleed you found]
> Yes — it added phrases like "as per standard municipal practice" which were not in the source.

**Your git commit message for UC-0B:**

> [paste your commit message here]
> UC-0B Fix clause omission: completeness not enforced → added every-numbered-clause rule

---

## UC-0C — Number That Looks Right

**What did the naive prompt return when you ran "Calculate growth from the data."?**

> [Your answer — quote the output]
> It returned a single average growth percentage for all wards and categories combined.

**Did it aggregate across all wards? Did it mention the 5 null rows?**

> [Your answer]
> Yes, it aggregated everything and silently ignored or averaged the null rows without flagging them.

**After your fix — does your system refuse all-ward aggregation?**

> Yes / No
> Yes — it throws an error if 'all' is requested.

**Does your growth_output.csv flag the 5 null rows rather than skipping them?**

> Yes / No — [list which rows are flagged]
> Yes — it labels them as NULL and provides the reason from the notes (e.g., "Data not submitted by ward office").

**Does your output match the reference values (Ward 1 Roads +33.1% in July, −34.8% in October)?**

> Yes / No — [note any discrepancy]
> Yes — precisely.

**Your git commit message for UC-0C:**

> [paste your commit message here]
> UC-0C Fix silent aggregation: no scope in enforcement → restricted to per-ward per-category only

---

Expand All @@ -105,59 +105,43 @@
**What did the naive prompt return for the cross-document test question?**
*(Question: "Can I use my personal phone to access work files when working from home?")*

> [Quote the actual output]
> "Yes, you can use personal devices for approved work tools as long as you follow the IT policy." (This is a blended hallucination).

**Did it blend the IT and HR policies?**

> Yes / No — [explain]
> Yes — it combined the general mention of "work tools" from HR with the permission to use "personal devices" for email from IT.

**After your fix — what does your system return for this question?**

> [Quote the actual output]
> "As per policy_it_acceptable_use.txt Section 3.1, personal devices may be used to access CMC email and the CMC employee self-service portal only. They must not be used to access, store, or transmit classified or sensitive CMC data (Section 3.2)."

**Did your system use any hedging phrases in any answer?**
*("while not explicitly covered", "typically", "generally understood")*

> Yes / No — [quote any you found]
> No — the enforcement rules strictly prohibited hedging.

**Did all 7 test questions produce either a single-source cited answer or the exact refusal template?**

> Yes / No — [list any that failed]
> Yes.

**Your git commit message for UC-X:**

> [paste your commit message here]
> UC-X Fix cross-doc blending: no single-source rule → added single-source attribution enforcement

---

## CRAFT Loop Reflection

**Which CRAFT step was hardest across all UCs, and why?**

> [Your answer — 2–3 sentences]
> Refining the enforcement rules (the 'R' and 'E' in RICE). It requires thinking about exactly how the AI will fail before it fails.

**What is the single most important thing you added manually to an agents.md that the AI did not generate on its own?**

> [Your answer — be specific, quote the rule]
> The strict refusal template for UC-X: "This question is not covered in the available policy documents...". AI usually tries to be helpful rather than strictly accurate.

**Name one real task in your work where you will apply RICE + CRAFT within the next two weeks:**

> [Your answer]
> Automating the triage of customer support tickets for our new product launch.

---

## Reviewer Notes *(tutor fills this section)*

| Criterion | Score /4 | Notes |
|---|---|---|
| RICE prompt quality | | |
| agents.md quality | | |
| skills.md quality | | |
| CRAFT loop evidence | | |
| Test coverage | | |
| **Total** | **/20** | |

**Badge decision:**
- [ ] Standard badge — meets pass threshold (score 11+/20 on this review, full rubric 22+/40)
- [ ] Distinction badge — meets distinction threshold (score 17+/20 on this review, full rubric 34+/40)
- [ ] Not yet — resubmit after addressing: _______________
16 changes: 7 additions & 9 deletions uc-0a/agents.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
# agents.md — UC-0A Complaint Classifier
# INSTRUCTIONS: Generate a draft using your RICE prompt, then manually refine this file.
# Delete these comments before committing.

role: >
[FILL IN: Who is this agent? What is its operational boundary?]
You are a Civic Complaint Classifier for the Pune Municipal Corporation. Your role is to accurately categorize citizen complaints, assign priority based on safety risks, and provide a clear justification for your decisions. You must strictly adhere to the provided taxonomy and priority rules.

intent: >
[FILL IN: What does a correct output look like — make it verifiable]
Produce a verifiable classification for each complaint. A correct output is a JSON or CSV row containing exactly: complaint_id, category (from the allowed list), priority (Urgent, Standard, or Low), a one-sentence reason citing specific words from the description, and a flag (NEEDS_REVIEW) if ambiguous.

context: >
[FILL IN: What information is the agent allowed to use? State exclusions explicitly.]
You are provided with a CSV file containing citizen complaints. Each row has a `description` and a `complaint_id`. You are allowed to use ONLY the information in the description. Do not use external knowledge or hallucinate details. Exclude any personal identifying information from the reasoning.

enforcement:
- "[FILL IN: Specific testable rule 1 — e.g. Category must be exactly one of: Pothole, Flooding, ...]"
- "[FILL IN: Specific testable rule 2 — e.g. Priority must be Urgent if description contains: injury, child, school, ...]"
- "[FILL IN: Specific testable rule 3 — e.g. Every output row must include a reason field citing specific words from the description]"
- "[FILL IN: Refusal condition — e.g. If category cannot be determined from description alone, output category: Other and flag: NEEDS_REVIEW]"
- "Category must be exactly one of: Pothole, Flooding, Streetlight, Waste, Noise, Road Damage, Heritage Damage, Heat Hazard, Drain Blockage, Other."
- "Priority must be 'Urgent' if description contains any of: injury, child, school, hospital, ambulance, fire, hazard, fell, collapse."
- "Every output row must include a 'reason' field that is exactly one sentence long and cites specific words from the description."
- "If the category is genuinely ambiguous or cannot be determined from the description alone, set category to 'Other' and set flag to 'NEEDS_REVIEW'."
100 changes: 89 additions & 11 deletions uc-0a/classifier.py
Original file line number Diff line number Diff line change
@@ -1,29 +1,107 @@
"""
UC-0A — Complaint Classifier
Starter file. Build this using the RICE → agents.md → skills.md → CRAFT workflow.
Implementation based on RICE → agents.md → skills.md workflow.
"""
import argparse
import csv
import re
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re is imported but not used. Please remove the unused import to keep the module clean and avoid lint failures.

Suggested change
import re

Copilot uses AI. Check for mistakes.

def classify_complaint(row: dict) -> dict:
"""
Classify a single complaint row.
Returns: dict with keys: complaint_id, category, priority, reason, flag

TODO: Build this using your AI tool guided by your agents.md and skills.md.
Your RICE enforcement rules must be reflected in this function's behaviour.
Classify a single complaint row based on keywords and safety rules.
"""
raise NotImplementedError("Build this using your AI tool + RICE prompt")
desc = row.get('description', '').lower()
complaint_id = row.get('complaint_id', 'Unknown')

# 1. Determine Category
category = "Other"
flag = ""

Comment on lines +13 to +19
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classify_complaint doesn't implement the documented error handling for missing/empty descriptions (skills.md requires reason "Missing description" with NEEDS_REVIEW). With an empty description this currently produces a generic reason that claims the description mentions "other". Please add an early return for missing/blank description that matches the defined behavior.

Suggested change
desc = row.get('description', '').lower()
complaint_id = row.get('complaint_id', 'Unknown')
# 1. Determine Category
category = "Other"
flag = ""
complaint_id = row.get('complaint_id', 'Unknown')
raw_desc = row.get('description', '')
if raw_desc is None or not str(raw_desc).strip():
return {
"complaint_id": complaint_id,
"category": "Other",
"priority": "Standard",
"reason": "Missing description",
"flag": "NEEDS_REVIEW"
}
desc = str(raw_desc).lower()
# 1. Determine Category
category = "Other"
flag = ""

Copilot uses AI. Check for mistakes.
if "pothole" in desc:
category = "Pothole"
elif "flood" in desc or "rain" in desc:
category = "Flooding"
elif "streetlight" in desc or "lights" in desc or "dark" in desc:
if "heritage" in desc:
category = "Heritage Damage"
else:
category = "Streetlight"
elif "garbage" in desc or "waste" in desc or "dumped" in desc or "dead animal" in desc:
category = "Waste"
elif "noise" in desc or "music" in desc:
category = "Noise"
elif "road" in desc and ("crack" in desc or "sink" in desc or "damage" in desc):
category = "Road Damage"
elif "drain" in desc or "sewage" in desc:
category = "Drain Blockage"
elif "heritage" in desc:
category = "Heritage Damage"

if category == "Other":
flag = "NEEDS_REVIEW"

# 2. Determine Priority
severity_keywords = ['injury', 'child', 'school', 'hospital', 'ambulance', 'fire', 'hazard', 'fell', 'collapse']
priority = "Standard"
if any(word in desc for word in severity_keywords):
priority = "Urgent"
elif "low" in desc or "minor" in desc:
priority = "Low"

# 3. Generate Reason (One sentence, cite words)
# Simple logic to find the keyword that triggered the category/priority
trigger_words = []
for word in severity_keywords:
if word in desc:
trigger_words.append(word)

reason = f"Classified as {category} because description mentions '{category.lower()}'; prioritized as {priority} due to words like '{', '.join(trigger_words) if trigger_words else 'none'}'. "
# Ensure exactly one sentence (very simple)
reason = reason.strip()
Comment on lines +58 to +60
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason string claims the description mentions '{category.lower()}', but the exact category phrase often isn't present in the text (e.g., "Road Damage" vs "road surface cracked"). This violates the requirement to cite specific words from the description. Please capture and cite the actual triggering keyword(s) used for classification.

Copilot uses AI. Check for mistakes.

return {
"complaint_id": complaint_id,
"category": category,
"priority": priority,
"reason": reason,
"flag": flag
}


def batch_classify(input_path: str, output_path: str):
"""
Read input CSV, classify each row, write results CSV.

TODO: Build this using your AI tool.
Must: flag nulls, not crash on bad rows, produce output even if some rows fail.
"""
raise NotImplementedError("Build this using your AI tool + RICE prompt")
results = []
try:
with open(input_path, mode='r', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
try:
classified = classify_complaint(row)
results.append(classified)
except Exception as e:
print(f"Error classifying row {row.get('complaint_id')}: {e}")
results.append({
"complaint_id": row.get('complaint_id', 'Error'),
"category": "Other",
"priority": "Standard",
"reason": f"Error: {e}",
"flag": "NEEDS_REVIEW"
})
except FileNotFoundError:
print(f"Input file not found: {input_path}")
return

Comment on lines +92 to +95
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On FileNotFoundError, batch_classify returns without writing an output file. Because batch_classify doesn’t signal failure to the caller, the CLI code below can still print a success message even when nothing was written. Please return a boolean/exit code (or raise) from batch_classify and have __main__ only print success when the output was actually created.

Copilot uses AI. Check for mistakes.
if not results:
print("No results to write.")
return

keys = results[0].keys()
with open(output_path, mode='w', encoding='utf-8', newline='') as f:
writer = csv.DictWriter(f, fieldnames=keys)
writer.writeheader()
writer.writerows(results)


if __name__ == "__main__":
Expand Down
16 changes: 16 additions & 0 deletions uc-0a/results_pune.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
complaint_id,category,priority,reason,flag
PM-202401,Pothole,Standard,Classified as Pothole because description mentions 'pothole'; prioritized as Standard due to words like 'none'.,
PM-202402,Pothole,Urgent,"Classified as Pothole because description mentions 'pothole'; prioritized as Urgent due to words like 'child, school'.",
PM-202406,Flooding,Standard,Classified as Flooding because description mentions 'flooding'; prioritized as Standard due to words like 'none'.,
PM-202408,Flooding,Standard,Classified as Flooding because description mentions 'flooding'; prioritized as Standard due to words like 'none'.,
PM-202410,Streetlight,Standard,Classified as Streetlight because description mentions 'streetlight'; prioritized as Standard due to words like 'none'.,
PM-202411,Streetlight,Urgent,Classified as Streetlight because description mentions 'streetlight'; prioritized as Urgent due to words like 'hazard'.,
PM-202413,Waste,Low,Classified as Waste because description mentions 'waste'; prioritized as Low due to words like 'none'.,
PM-202418,Noise,Standard,Classified as Noise because description mentions 'noise'; prioritized as Standard due to words like 'none'.,
PM-202419,Road Damage,Standard,Classified as Road Damage because description mentions 'road damage'; prioritized as Standard due to words like 'none'.,
PM-202420,Other,Urgent,Classified as Other because description mentions 'other'; prioritized as Urgent due to words like 'injury'.,NEEDS_REVIEW
PM-202427,Flooding,Standard,Classified as Flooding because description mentions 'flooding'; prioritized as Standard due to words like 'none'.,
PM-202428,Waste,Standard,Classified as Waste because description mentions 'waste'; prioritized as Standard due to words like 'none'.,
PM-202430,Heritage Damage,Standard,Classified as Heritage Damage because description mentions 'heritage damage'; prioritized as Standard due to words like 'none'.,
PM-202433,Waste,Standard,Classified as Waste because description mentions 'waste'; prioritized as Standard due to words like 'none'.,
PM-202446,Other,Urgent,Classified as Other because description mentions 'other'; prioritized as Urgent due to words like 'fell'.,NEEDS_REVIEW
22 changes: 10 additions & 12 deletions uc-0a/skills.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@
# skills.md
# INSTRUCTIONS: Generate a draft by prompting AI, then manually refine this file.
# Delete these comments before committing.

skills:
- name: [skill_name]
description: [One sentence — what does this skill do?]
input: [What does it receive? Type and format.]
output: [What does it return? Type and format.]
error_handling: [What does it do when input is invalid or ambiguous?]
- name: classify_complaint
description: Analyzes a single citizen complaint to determine its category, priority, and reason for classification.
input: A dictionary representing a row from the input CSV, containing at least 'complaint_id' and 'description'.
output: A dictionary with keys 'complaint_id', 'category', 'priority', 'reason', and 'flag'.
error_handling: If 'description' is missing or empty, returns category 'Other', priority 'Standard', reason 'Missing description', and flag 'NEEDS_REVIEW'.

- name: [second_skill_name]
description: [One sentence]
input: [Type and format]
output: [Type and format]
error_handling: [What does it do when input is invalid or ambiguous?]
- name: batch_classify
description: Processes an entire CSV file of complaints and saves the results to a new CSV file.
input: Paths to the input CSV file and the desired output CSV file.
output: None (writes to a file).
error_handling: Handles missing input files gracefully and ensures the output file is created even if some rows fail to classify.
Loading
Loading