Skip to content

Latest commit

 

History

History
409 lines (319 loc) · 11.8 KB

File metadata and controls

409 lines (319 loc) · 11.8 KB

Data Fidelity Policy

Version: 1.0 Date: 2026-01-17 Status: APPROVED

Core Principle

Display what exists, not what we assume.

All OpenAdapt viewers MUST adhere to strict data fidelity standards to maintain user trust and scientific reproducibility.

Policy Statement

1. NEVER Invent Data

PROHIBITED:

# WRONG: Making up descriptions
description = "Click System Settings icon in dock"  # ← WHERE DID THIS COME FROM?

# WRONG: Assuming intent without evidence
action = "User navigates to settings"  # ← DID THEY? HOW DO YOU KNOW?

# WRONG: Filling gaps with plausible values
if not event.description:
    event.description = "Unknown action"  # ← JUST LEAVE IT EMPTY!

REQUIRED:

# RIGHT: Use actual data from source
description = episode["steps"][i]  # ← From episodes.json

# RIGHT: Use raw events if no semantic description
description = f"{event.type} at ({event.x}, {event.y})"  # ← From capture.db

# RIGHT: Be explicit about missing data
description = None  # or ""  # ← Honest about what we don't have

2. ALWAYS Label Provenance

Every piece of displayed data MUST indicate its source:

Data Type Provenance Label Example
Hardware Event RAW mouse.down at (1248, 701)
ML-Inferred ML-INFERRED (model, confidence) ML-INFERRED (GPT-4o, 0.92): "Click Settings icon"
Human-Labeled HUMAN-LABELED HUMAN-LABELED: "Turn off Night Shift"
Derived DERIVED (from: X) DERIVED (from: 13 mouse events): "13 clicks"

3. Distinguish Data Source vs Data Content

Data Source = Where the data comes from (file path, database, API) Data Content = What the data says (values, descriptions, metadata)

Example: The Nightshift Recording

Correct Understanding:

  • Source: turn-off-nightshift/episodes.json (REAL file from actual recording)
  • Content: "Click System Settings icon in dock" (ML-generated by GPT-4o)
  • Provenance: ML-INFERRED (GPT-4o, confidence: 0.92)

Incorrect Understanding:

  • Source: Real episodes.json ✓
  • Content: Synthetic/made-up ✗ (It's ML-inferred, not invented!)

4. When in Doubt, Show Raw

If uncertain about the semantic meaning, default to displaying raw event data:

# If we have a semantic description from ML
if episode.get("steps"):
    display = episode["steps"][i]
    provenance = f"ML-INFERRED ({episode['llm_model']}, {episode['boundary_confidence']:.2f})"

# If we only have raw events
elif event.type == "mouse.down":
    display = f"Mouse click at ({event.x}, {event.y})"
    provenance = "RAW"

# If we have neither
else:
    display = None  # Don't display anything
    provenance = None

5. Preserve Metadata

All data MUST preserve its provenance metadata:

Required Metadata Fields:

  • source: Where the data came from (file, DB table, API)
  • provenance: How the data was created (raw, ML, human, derived)
  • timestamp: When the data was created/captured
  • confidence: For ML-inferred data, include model confidence
  • model: For ML-inferred data, include model name/version

Example:

step = ExecutionStep(
    action_details={
        "description": "Click System Settings icon in dock",
        "source": "episodes.json",
        "provenance": "ml_inferred",
        "model": "gpt-4o",
        "confidence": 0.92,
        "timestamp": "2026-01-17T12:00:00.000000",
    }
)

Implementation Guidelines

For real_data_loader.py

def load_real_capture_data(capture_path: Path) -> BenchmarkRun:
    """Load REAL data with proper provenance labeling."""

    # Load episodes (ML-generated semantic data)
    with open(capture_path / "episodes.json") as f:
        episodes_data = json.load(f)

    # Extract ML metadata
    ml_model = episodes_data.get("llm_model", "unknown")
    processing_timestamp = episodes_data.get("processing_timestamp", "unknown")

    for episode in episodes_data["episodes"]:
        for i, step_text in enumerate(episode["steps"]):
            step = ExecutionStep(
                action_type="ml_inferred",  # ← Honest provenance
                action_details={
                    "description": step_text,
                    "provenance": "ml_inferred",
                    "model": ml_model,
                    "confidence": episode["boundary_confidence"],
                    "processing_timestamp": processing_timestamp,
                },
                reasoning=f"ML interpretation ({ml_model}): {step_text}",
            )

    # Also provide raw event access for transparency
    conn = sqlite3.connect(capture_path / "capture.db")
    raw_events = load_raw_events(conn)  # Make raw data available

    return BenchmarkRun(
        tasks=tasks,
        executions=executions,
        config={
            "data_provenance": {
                "episodes_source": str(capture_path / "episodes.json"),
                "episodes_provenance": "ml_inferred",
                "episodes_model": ml_model,
                "raw_events_source": str(capture_path / "capture.db"),
                "raw_events_count": len(raw_events),
            }
        }
    )

For Viewer HTML

<!-- Show provenance badges -->
<div class="oa-action">
    <span class="oa-badge oa-badge-ml" title="Generated by GPT-4o with 92% confidence">
        ML-INFERRED
    </span>
    <span class="oa-action-details">
        Click System Settings icon in dock
    </span>
</div>

<!-- Provide raw data in expandable section -->
<details class="oa-raw-data">
    <summary>View Raw Event Data</summary>
    <pre>
Event Type: mouse.down
Coordinates: (1248.32, 701.73)
Timestamp: 1765672655.397
Button: left
    </pre>
</details>

<!-- Show metadata -->
<div class="oa-metadata">
    <div class="oa-metadata-item">
        <span class="oa-label">Model:</span>
        <span class="oa-value">GPT-4o</span>
    </div>
    <div class="oa-metadata-item">
        <span class="oa-label">Confidence:</span>
        <span class="oa-value">0.92</span>
    </div>
    <div class="oa-metadata-item">
        <span class="oa-label">Processed:</span>
        <span class="oa-value">2026-01-17 12:00:00</span>
    </div>
</div>

CSS for Provenance Badges

/* Provenance badges */
.oa-badge-raw {
    background: var(--oa-info-bg);
    color: var(--oa-info);
}

.oa-badge-ml {
    background: var(--oa-accent-dim);
    color: var(--oa-accent);
}

.oa-badge-human {
    background: var(--oa-success-bg);
    color: var(--oa-success);
}

.oa-badge-derived {
    background: var(--oa-warning-bg);
    color: var(--oa-warning);
}

Handling Missing Data

When Data Doesn't Exist

DO:

  • Show null, None, or empty string
  • Display "No data available"
  • Hide the section entirely

DON'T:

  • Fill with placeholder text like "Unknown"
  • Make assumptions like "Probably clicked"
  • Show "N/A" (implies data should exist but doesn't)

When Data Is Ambiguous

DO:

  • Show all possible interpretations
  • Display confidence scores
  • Provide raw event data

DON'T:

  • Pick the "most likely" option without indicating uncertainty
  • Average or merge ambiguous values
  • Hide low-confidence interpretations

Testing Data Fidelity

Every viewer MUST pass these tests:

Test 1: Trace Data Lineage

def test_data_lineage():
    """Verify every displayed value can be traced to source."""
    viewer = load_viewer("benchmark_viewer.html")
    for step in viewer.steps:
        description = step.action_details["description"]

        # Can we find this in the source data?
        assert description in episodes_json["steps"] or \
               description == format_raw_event(capture_db_events[i])

Test 2: No Invented Data

def test_no_invented_data():
    """Verify no data was created by the viewer code."""
    viewer_data = extract_displayed_data("benchmark_viewer.html")
    source_data = load_all_source_data()

    for value in viewer_data:
        assert value in source_data.values() or \
               value is_derived_from(source_data), \
               f"Invented data detected: {value}"

Test 3: Provenance Labels Present

def test_provenance_labels():
    """Verify all data has provenance labels."""
    viewer = load_viewer("benchmark_viewer.html")

    for step in viewer.steps:
        assert "provenance" in step.action_details, \
               f"Missing provenance for step {step.step_number}"

        assert step.action_details["provenance"] in [
            "raw", "ml_inferred", "human_labeled", "derived"
        ], f"Invalid provenance: {step.action_details['provenance']}"

Documentation Requirements

Every data loader MUST document:

  1. What data it loads (files, tables, APIs)
  2. How it transforms data (raw → semantic)
  3. What it DOESN'T invent (explicit list)
  4. Provenance labels used (raw, ML, human, derived)

Example Documentation

def load_real_capture_data(capture_path: Path) -> BenchmarkRun:
    """Load real capture data from openadapt-capture recording.

    DATA SOURCES:
    - capture.db: Raw hardware events (mouse, keyboard, screen)
    - episodes.json: ML-generated semantic episode descriptions

    DATA TRANSFORMATIONS:
    - Raw events → Count statistics (e.g., "1046 mouse moves")
    - Episodes → ExecutionStep objects (pass-through, no modification)

    DATA NOT INVENTED:
    - Step descriptions (from episodes.json, generated by GPT-4o)
    - Action types (from episodes.json)
    - Screenshots (from recording, not generated)

    PROVENANCE:
    - action_type: "ml_inferred" (from episodes.json)
    - model: "gpt-4o" (from episodes.json metadata)
    - confidence: 0.92 (from episodes.json boundary_confidence)
    """

Common Violations

Violation 1: Hiding Provenance

WRONG:

<span>Click System Settings icon in dock</span>

RIGHT:

<span class="oa-badge-ml">ML-INFERRED (GPT-4o, 0.92)</span>
<span>Click System Settings icon in dock</span>

Violation 2: Assuming Intent

WRONG:

# Don't assume what the user was trying to do
description = "User opens settings to change display preferences"

RIGHT:

# Use what the ML model inferred, with confidence
description = episode["description"]  # "User opens System Settings application"
confidence = episode["boundary_confidence"]  # 0.92

Violation 3: Filling Gaps

WRONG:

# Don't make up data for missing screenshots
if not screenshot_path:
    screenshot_path = "placeholder.png"  # ← NO!

RIGHT:

# Be honest about missing data
if not screenshot_path:
    return None  # or display "No screenshot available"

Review Checklist

Before merging any code that displays data, verify:

  • All displayed values traced to source (episodes.json, capture.db, etc.)
  • No hardcoded descriptions invented by code
  • Provenance labels present (RAW, ML-INFERRED, HUMAN-LABELED, DERIVED)
  • ML data includes model name + confidence
  • Missing data shown as missing (not filled with placeholders)
  • Documentation explains data sources and transformations
  • Tests verify no invented data

See Also

Questions?

If you're unsure whether something violates data fidelity:

  1. Ask: "Where did this value come from?"
  2. If the answer is "I calculated/inferred/assumed it" → Label as DERIVED or ML-INFERRED
  3. If the answer is "I made it up for demo purposes" → Use ONLY in test data, mark clearly
  4. If the answer is "It's in the source file" → Include source metadata

When in doubt, show raw.