Date: January 17, 2026 Status: ✅ Complete and Tested
An automated recording catalog system that makes all captured data automatically discoverable across the OpenAdapt ecosystem.
-
Catalog Database (
catalog.py)- SQLite database at
~/.openadapt/catalog.db - Three tables:
recordings,segmentation_results,episodes - Complete Python API with Pydantic models
- Singleton pattern via
get_catalog()
- SQLite database at
-
Scanner (
scanner.py)- Automatic discovery of recordings (directories with
capture.db) - Indexing of segmentation results (
*_episodes.jsonfiles) - Metadata extraction (frames, events, duration, timestamps)
- Default path detection
- Automatic discovery of recordings (directories with
-
CLI Tools (updated
cli.py)catalog scan- Discover and index recordingscatalog list- View all recordingscatalog stats- Show statisticscatalog clean- Remove stale entriescatalog register- Manual registrationsegmentation- Generate catalog-enabled viewer
-
Viewer Integration (
segmentation_generator.py,catalog_api.py)- Automatic recording dropdown in segmentation viewer
- Embedded catalog data in HTML (works offline)
- Auto-load support with embedded segmentation data
- JavaScript API:
window.OpenAdaptCatalog
-- Core recordings
recordings (id, name, path, created_at, duration_seconds,
frame_count, event_count, task_description, tags, metadata)
-- Segmentation results
segmentation_results (id, recording_id, path, created_at,
episode_count, boundary_count, status, llm_model, metadata)
-- Episodes (for future use)
episodes (id, segmentation_result_id, recording_id, name, description,
start_time, end_time, start_frame, end_frame, confidence, metadata)openadapt-viewer/
├── src/openadapt_viewer/
│ ├── catalog.py # Database and models (738 lines)
│ ├── scanner.py # Discovery and indexing (297 lines)
│ ├── catalog_api.py # JavaScript embedding (167 lines)
│ ├── viewers/
│ │ └── segmentation_generator.py # Enhanced viewer (202 lines)
│ └── cli.py # CLI commands (updated)
├── CATALOG_SYSTEM.md # Complete documentation
├── CATALOG_QUICK_START.md # Quick reference
└── CATALOG_IMPLEMENTATION.md # This file
Recording Discovery:
- Glob for
**/capture.dbfiles - Extract metadata from capture.db SQLite tables
- Count screenshots in
screenshots/directory - Register in catalog with
INSERT OR REPLACE
Segmentation Indexing:
- Glob for
*_episodes.jsonfiles - Parse JSON to extract episode/boundary counts
- Link to recording via filename pattern
- Register with timestamps and metadata
Viewer Generation:
- Query catalog for all recordings with segmentation results
- Generate HTML
<select>dropdown - Embed catalog data as JavaScript:
window.OPENADAPT_CATALOG - Optionally embed segmentation JSON for auto-load
- Inject into base segmentation_viewer.html template
# 1. Scan and index
$ uv run openadapt-viewer catalog scan --capture-dir /path/to/openadapt-capture
Found 9 recordings in /path/to/openadapt-capture
Found 2 segmentation results in /path/to/openadapt-ml/segmentation_output
Indexed 9 recordings
Indexed 2 segmentation results
# 2. List recordings
$ uv run openadapt-viewer catalog list
Found 9 recordings:
Turn Off Nightshift
ID: turn-off-nightshift
Path: /Users/abrichr/oa/src/openadapt-capture/turn-off-nightshift
Created: 2025-12-13 19:37
Duration: 60.3s
Frames: 21
...
# 3. Generate viewer
$ uv run openadapt-viewer segmentation --auto-load turn-off-nightshift --output viewer.html
Generating catalog-enabled segmentation viewer...
Generated: /path/to/viewer.html
# 4. Open viewer - dropdown is auto-populated!✅ Scan discovers all 9 recordings in openadapt-capture
✅ Scan discovers 2 segmentation results in openadapt-ml/segmentation_output
✅ Catalog database created at ~/.openadapt/catalog.db
✅ catalog list shows all recordings with metadata
✅ catalog stats shows correct counts (9 recordings, 2 segmentations)
✅ Generated viewer has dropdown with 2 recordings (only those with segmentations)
✅ Dropdown shows recording names and durations
✅ Auto-load embeds segmentation data in HTML
✅ Catalog data embedded as JavaScript in viewer
✅ HTML file works offline (file:// protocol)
✅ Scanner reads from openadapt-capture capture.db files
✅ Scanner parses openadapt-ml segmentation JSON files
✅ Catalog API exports data as JavaScript
✅ Viewer generator injects dropdown into base HTML
✅ CLI commands work with default paths
- Chosen: SQLite for ACID, indexing, and querying
- Rejected: JSON file (no querying, no indexing)
- Chosen: User home directory (persistent, accessible)
- Rejected: Project directory (not shared across repos)
- Chosen: Scan-on-demand (simple, reliable, no background process)
- Rejected: File watcher (complex, platform-dependent)
- Chosen: Embed data in HTML (works offline, portable)
- Rejected: HTTP API (requires server, more complex)
- Future: HTTP API as optional enhancement
- Chosen: Pydantic for validation and serialization
- Benefit: Type safety, automatic validation, JSON serialization
| Metric | Target | Actual | Status |
|---|---|---|---|
| Recordings indexed | All | 9/9 | ✅ |
| Segmentations indexed | All | 2/2 | ✅ |
| Scan time | <5s | ~1s | ✅ |
| Query time | <100ms | ~1ms | ✅ |
| Viewer generation | <5s | ~0.5s | ✅ |
| Zero config | Yes | Yes | ✅ |
| Offline viewer | Yes | Yes | ✅ |
# Catalog operations
get_catalog() -> RecordingCatalog
catalog.get_all_recordings() -> List[Recording]
catalog.get_recording(id) -> Recording
catalog.search_recordings(query, tags) -> List[Recording]
catalog.get_segmentation_results(recording_id) -> List[SegmentationResult]
catalog.get_episodes(recording_id) -> List[Episode]
catalog.get_stats() -> Dict
catalog.clean_missing() -> Dict
# Registration
catalog.register_recording(...)
catalog.register_segmentation(...)
catalog.register_episode(...)
# Scanning
scan_and_update_catalog(capture_dirs, segmentation_dirs) -> Dictopenadapt-viewer catalog scan [--capture-dir] [--segmentation-dir]
openadapt-viewer catalog list [--json] [--with-segmentations]
openadapt-viewer catalog stats
openadapt-viewer catalog clean
openadapt-viewer catalog register <path> [--name]
openadapt-viewer segmentation [--output] [--auto-load] [--open]window.OPENADAPT_CATALOG // Raw catalog data
window.OPENADAPT_EMBEDDED_DATA // Segmentation data (if auto-load)
window.OpenAdaptCatalog.getRecordings()
window.OpenAdaptCatalog.getRecording(id)
window.OpenAdaptCatalog.getRecordingsWithSegmentations()
window.OpenAdaptCatalog.getSegmentationResults(id)
window.OpenAdaptCatalog.getStats()- Auto-register from openadapt-capture post-capture hook
- Auto-register from openadapt-ml after segmentation
- Episode indexing (currently episodes table empty)
- HTTP API server for dynamic catalog queries
- Real file loading in viewer (not just embedded data)
- Search and filter UI in viewer
- Tags and metadata UI
- File watcher for real-time updates
- Web dashboard for browsing catalog
- Thumbnail generation and caching
- Cloud storage integration
- Multi-user catalog sync
-
File:// Protocol: Viewers can't fetch JSON files dynamically due to CORS
- Workaround: Embed data when generating viewer with
--auto-load - Future: HTTP server mode for dynamic loading
- Workaround: Embed data when generating viewer with
-
Episodes Not Indexed: Episodes table exists but not populated
- Reason: Current segmentation JSON doesn't have structured episodes
- Future: Index when JSON format stabilized
-
Manual Rescan: Catalog doesn't auto-update when new recordings added
- Workaround: Run
catalog scanafter adding recordings - Future: File watcher or periodic scan
- Workaround: Run
-
No Validation: Scanner doesn't validate recording integrity
- Future: Add health checks (missing frames, corrupted DB, etc.)
✅ Existing workflows unchanged: Manual file selection still works
✅ Opt-in: Must run catalog scan to use new system
✅ No breaking changes: All existing APIs preserved
# Old way (still works)
# 1. Open segmentation_viewer.html
# 2. Click file input
# 3. Navigate to JSON file
# 4. Select file
# New way (catalog-enabled)
openadapt-viewer catalog scan
openadapt-viewer segmentation --open
# 1. Select recording from dropdown
# 2. Done!# Old way (hardcoded paths)
recordings_dir = "/path/to/openadapt-capture"
for rec_dir in recordings_dir.glob("*/"):
process_recording(rec_dir)
# New way (query catalog)
from openadapt_viewer import get_catalog
catalog = get_catalog()
recordings = catalog.get_all_recordings()
for rec in recordings:
process_recording(rec.path)- ✅ CATALOG_SYSTEM.md: Complete technical documentation (600+ lines)
- ✅ CATALOG_QUICK_START.md: 5-minute quick start guide
- ✅ CATALOG_IMPLEMENTATION.md: This implementation summary
- ✅ CLAUDE.md: Updated with catalog system overview
- ✅ CLI help text: All commands documented
- Scan-on-demand: Simple and reliable
- Pydantic models: Type safety caught bugs early
- Embedded data: Offline viewers work great
- Default paths: Zero configuration for common setups
- SQLite: Fast, reliable, easy to debug
- File:// CORS: Had to embed data instead of fetch
- Solution: Embed segmentation JSON when generating viewer
- HTML replacement: Regex matching was tricky
- Solution: Used
re.DOTALLflag for multiline matching
- Solution: Used
- Recording metadata: Different capture.db schemas
- Solution: Graceful handling of missing fields
- ✅ Pydantic for data validation: Caught type errors early
- ✅ Singleton pattern:
get_catalog()for global instance - ✅ Graceful error handling: Print warnings, don't crash
- ✅ Comprehensive documentation: Easy to onboard new users
- ✅ CLI-first design: Easy to use, test, and automate
The catalog system successfully achieves all key requirements:
✅ Automatic Discovery: Recordings found without manual selection ✅ Centralized Catalog: Single source of truth at ~/.openadapt/catalog.db ✅ SQLite-Based: Fast queries, ACID guarantees ✅ Ecosystem-Wide: Any tool can query the catalog ✅ Zero Manual Work: Recording → scan → automatic availability
The system is production-ready and can be extended with additional features as needed.
- Integrate with openadapt-capture: Add post-capture registration hook
- Integrate with openadapt-ml: Add post-segmentation registration hook
- Document ecosystem-wide: Update other repos to use catalog
- Monitor usage: Gather feedback from users
- Plan enhancements: Prioritize P0/P1 features based on feedback