Version: 1.0 Date: 2026-01-17 Status: APPROVED
Display what exists, not what we assume.
All OpenAdapt viewers MUST adhere to strict data fidelity standards to maintain user trust and scientific reproducibility.
❌ PROHIBITED:
# WRONG: Making up descriptions
description = "Click System Settings icon in dock" # ← WHERE DID THIS COME FROM?
# WRONG: Assuming intent without evidence
action = "User navigates to settings" # ← DID THEY? HOW DO YOU KNOW?
# WRONG: Filling gaps with plausible values
if not event.description:
event.description = "Unknown action" # ← JUST LEAVE IT EMPTY!✅ REQUIRED:
# RIGHT: Use actual data from source
description = episode["steps"][i] # ← From episodes.json
# RIGHT: Use raw events if no semantic description
description = f"{event.type} at ({event.x}, {event.y})" # ← From capture.db
# RIGHT: Be explicit about missing data
description = None # or "" # ← Honest about what we don't haveEvery piece of displayed data MUST indicate its source:
| Data Type | Provenance Label | Example |
|---|---|---|
| Hardware Event | RAW |
mouse.down at (1248, 701) |
| ML-Inferred | ML-INFERRED (model, confidence) |
ML-INFERRED (GPT-4o, 0.92): "Click Settings icon" |
| Human-Labeled | HUMAN-LABELED |
HUMAN-LABELED: "Turn off Night Shift" |
| Derived | DERIVED (from: X) |
DERIVED (from: 13 mouse events): "13 clicks" |
Data Source = Where the data comes from (file path, database, API) Data Content = What the data says (values, descriptions, metadata)
✅ Correct Understanding:
- Source:
turn-off-nightshift/episodes.json(REAL file from actual recording) - Content:
"Click System Settings icon in dock"(ML-generated by GPT-4o) - Provenance: ML-INFERRED (GPT-4o, confidence: 0.92)
❌ Incorrect Understanding:
- Source: Real episodes.json ✓
- Content: Synthetic/made-up ✗ (It's ML-inferred, not invented!)
If uncertain about the semantic meaning, default to displaying raw event data:
# If we have a semantic description from ML
if episode.get("steps"):
display = episode["steps"][i]
provenance = f"ML-INFERRED ({episode['llm_model']}, {episode['boundary_confidence']:.2f})"
# If we only have raw events
elif event.type == "mouse.down":
display = f"Mouse click at ({event.x}, {event.y})"
provenance = "RAW"
# If we have neither
else:
display = None # Don't display anything
provenance = NoneAll data MUST preserve its provenance metadata:
Required Metadata Fields:
source: Where the data came from (file, DB table, API)provenance: How the data was created (raw, ML, human, derived)timestamp: When the data was created/capturedconfidence: For ML-inferred data, include model confidencemodel: For ML-inferred data, include model name/version
Example:
step = ExecutionStep(
action_details={
"description": "Click System Settings icon in dock",
"source": "episodes.json",
"provenance": "ml_inferred",
"model": "gpt-4o",
"confidence": 0.92,
"timestamp": "2026-01-17T12:00:00.000000",
}
)def load_real_capture_data(capture_path: Path) -> BenchmarkRun:
"""Load REAL data with proper provenance labeling."""
# Load episodes (ML-generated semantic data)
with open(capture_path / "episodes.json") as f:
episodes_data = json.load(f)
# Extract ML metadata
ml_model = episodes_data.get("llm_model", "unknown")
processing_timestamp = episodes_data.get("processing_timestamp", "unknown")
for episode in episodes_data["episodes"]:
for i, step_text in enumerate(episode["steps"]):
step = ExecutionStep(
action_type="ml_inferred", # ← Honest provenance
action_details={
"description": step_text,
"provenance": "ml_inferred",
"model": ml_model,
"confidence": episode["boundary_confidence"],
"processing_timestamp": processing_timestamp,
},
reasoning=f"ML interpretation ({ml_model}): {step_text}",
)
# Also provide raw event access for transparency
conn = sqlite3.connect(capture_path / "capture.db")
raw_events = load_raw_events(conn) # Make raw data available
return BenchmarkRun(
tasks=tasks,
executions=executions,
config={
"data_provenance": {
"episodes_source": str(capture_path / "episodes.json"),
"episodes_provenance": "ml_inferred",
"episodes_model": ml_model,
"raw_events_source": str(capture_path / "capture.db"),
"raw_events_count": len(raw_events),
}
}
)<!-- Show provenance badges -->
<div class="oa-action">
<span class="oa-badge oa-badge-ml" title="Generated by GPT-4o with 92% confidence">
ML-INFERRED
</span>
<span class="oa-action-details">
Click System Settings icon in dock
</span>
</div>
<!-- Provide raw data in expandable section -->
<details class="oa-raw-data">
<summary>View Raw Event Data</summary>
<pre>
Event Type: mouse.down
Coordinates: (1248.32, 701.73)
Timestamp: 1765672655.397
Button: left
</pre>
</details>
<!-- Show metadata -->
<div class="oa-metadata">
<div class="oa-metadata-item">
<span class="oa-label">Model:</span>
<span class="oa-value">GPT-4o</span>
</div>
<div class="oa-metadata-item">
<span class="oa-label">Confidence:</span>
<span class="oa-value">0.92</span>
</div>
<div class="oa-metadata-item">
<span class="oa-label">Processed:</span>
<span class="oa-value">2026-01-17 12:00:00</span>
</div>
</div>/* Provenance badges */
.oa-badge-raw {
background: var(--oa-info-bg);
color: var(--oa-info);
}
.oa-badge-ml {
background: var(--oa-accent-dim);
color: var(--oa-accent);
}
.oa-badge-human {
background: var(--oa-success-bg);
color: var(--oa-success);
}
.oa-badge-derived {
background: var(--oa-warning-bg);
color: var(--oa-warning);
}DO:
- Show
null,None, or empty string - Display "No data available"
- Hide the section entirely
DON'T:
- Fill with placeholder text like "Unknown"
- Make assumptions like "Probably clicked"
- Show "N/A" (implies data should exist but doesn't)
DO:
- Show all possible interpretations
- Display confidence scores
- Provide raw event data
DON'T:
- Pick the "most likely" option without indicating uncertainty
- Average or merge ambiguous values
- Hide low-confidence interpretations
Every viewer MUST pass these tests:
def test_data_lineage():
"""Verify every displayed value can be traced to source."""
viewer = load_viewer("benchmark_viewer.html")
for step in viewer.steps:
description = step.action_details["description"]
# Can we find this in the source data?
assert description in episodes_json["steps"] or \
description == format_raw_event(capture_db_events[i])def test_no_invented_data():
"""Verify no data was created by the viewer code."""
viewer_data = extract_displayed_data("benchmark_viewer.html")
source_data = load_all_source_data()
for value in viewer_data:
assert value in source_data.values() or \
value is_derived_from(source_data), \
f"Invented data detected: {value}"def test_provenance_labels():
"""Verify all data has provenance labels."""
viewer = load_viewer("benchmark_viewer.html")
for step in viewer.steps:
assert "provenance" in step.action_details, \
f"Missing provenance for step {step.step_number}"
assert step.action_details["provenance"] in [
"raw", "ml_inferred", "human_labeled", "derived"
], f"Invalid provenance: {step.action_details['provenance']}"Every data loader MUST document:
- What data it loads (files, tables, APIs)
- How it transforms data (raw → semantic)
- What it DOESN'T invent (explicit list)
- Provenance labels used (raw, ML, human, derived)
def load_real_capture_data(capture_path: Path) -> BenchmarkRun:
"""Load real capture data from openadapt-capture recording.
DATA SOURCES:
- capture.db: Raw hardware events (mouse, keyboard, screen)
- episodes.json: ML-generated semantic episode descriptions
DATA TRANSFORMATIONS:
- Raw events → Count statistics (e.g., "1046 mouse moves")
- Episodes → ExecutionStep objects (pass-through, no modification)
DATA NOT INVENTED:
- Step descriptions (from episodes.json, generated by GPT-4o)
- Action types (from episodes.json)
- Screenshots (from recording, not generated)
PROVENANCE:
- action_type: "ml_inferred" (from episodes.json)
- model: "gpt-4o" (from episodes.json metadata)
- confidence: 0.92 (from episodes.json boundary_confidence)
"""❌ WRONG:
<span>Click System Settings icon in dock</span>✅ RIGHT:
<span class="oa-badge-ml">ML-INFERRED (GPT-4o, 0.92)</span>
<span>Click System Settings icon in dock</span>❌ WRONG:
# Don't assume what the user was trying to do
description = "User opens settings to change display preferences"✅ RIGHT:
# Use what the ML model inferred, with confidence
description = episode["description"] # "User opens System Settings application"
confidence = episode["boundary_confidence"] # 0.92❌ WRONG:
# Don't make up data for missing screenshots
if not screenshot_path:
screenshot_path = "placeholder.png" # ← NO!✅ RIGHT:
# Be honest about missing data
if not screenshot_path:
return None # or display "No screenshot available"Before merging any code that displays data, verify:
- All displayed values traced to source (episodes.json, capture.db, etc.)
- No hardcoded descriptions invented by code
- Provenance labels present (RAW, ML-INFERRED, HUMAN-LABELED, DERIVED)
- ML data includes model name + confidence
- Missing data shown as missing (not filled with placeholders)
- Documentation explains data sources and transformations
- Tests verify no invented data
- DATA_PIPELINE_ANALYSIS.md - Complete analysis of nightshift data flow
- real_data_loader.py - Reference implementation
- episodes.json schema - ML output format
If you're unsure whether something violates data fidelity:
- Ask: "Where did this value come from?"
- If the answer is "I calculated/inferred/assumed it" → Label as DERIVED or ML-INFERRED
- If the answer is "I made it up for demo purposes" → Use ONLY in test data, mark clearly
- If the answer is "It's in the source file" → Include source metadata
When in doubt, show raw.