Skip to content

Commit 6dc0b91

Browse files
jmcentireclaude
andcommitted
Add story-level learning with journey evaluation and phase tracking
Story/StoryStep models, StoryCollector for JSONL persistence, JourneyEvaluator (goal completion, step efficiency, backtracking, consistency), per-journey PhaseTracker, and ApprenticeStoryLearner orchestrator. All opt-in, backward compatible. 2791 tests passing (153 new). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent bde8b11 commit 6dc0b91

19 files changed

Lines changed: 3261 additions & 442 deletions

.constrain/sessions/c50e5d20-0809-4064-b922-256018ca9572.json

Lines changed: 138 additions & 0 deletions
Large diffs are not rendered by default.

component_map.yaml

Lines changed: 60 additions & 113 deletions
Original file line numberDiff line numberDiff line change
@@ -1,113 +1,60 @@
1-
version: "1"
2-
project: apprentice
3-
components:
4-
- id: config_loader
5-
role: library
6-
authority: configuration
7-
data_access:
8-
PUBLIC: read
9-
10-
- id: task_registry
11-
role: library
12-
authority: task_definitions
13-
data_access:
14-
PUBLIC: read_write
15-
16-
- id: router
17-
role: library
18-
authority: traffic_routing
19-
data_access:
20-
PUBLIC: read_write
21-
22-
- id: remote_api_client
23-
role: library
24-
authority: frontier_api
25-
data_access:
26-
PUBLIC: read_write
27-
AUTH: read
28-
29-
- id: local_model_server
30-
role: library
31-
authority: local_inference
32-
data_access:
33-
PUBLIC: read_write
34-
35-
- id: evaluators
36-
role: library
37-
authority: quality_scoring
38-
data_access:
39-
PUBLIC: read
40-
41-
- id: phase_manager
42-
role: library
43-
authority: phase_transitions
44-
data_access:
45-
PUBLIC: read_write
46-
47-
- id: training_data_store
48-
role: library
49-
authority: training_data
50-
data_access:
51-
PUBLIC: read_write
52-
PII: write
53-
54-
- id: pii_tokenizer
55-
role: library
56-
authority: pii_protection
57-
data_access:
58-
PII: read_write
59-
60-
- id: fine_tuning_orchestrator
61-
role: library
62-
authority: model_training
63-
data_access:
64-
PUBLIC: read_write
65-
66-
- id: budget_manager
67-
role: library
68-
authority: cost_tracking
69-
data_access:
70-
FINANCIAL: read_write
71-
72-
- id: audit_log
73-
role: library
74-
authority: audit_trail
75-
data_access:
76-
PUBLIC: write
77-
COMPLIANCE: write
78-
79-
- id: cli
80-
role: ingress
81-
protocol: cli
82-
authority: user_commands
83-
data_access:
84-
PUBLIC: read_write
85-
86-
edges:
87-
- from: cli
88-
to: router
89-
tier: internal
90-
- from: router
91-
to: remote_api_client
92-
tier: cross_boundary
93-
- from: router
94-
to: local_model_server
95-
tier: cross_boundary
96-
- from: router
97-
to: phase_manager
98-
tier: internal
99-
- from: router
100-
to: budget_manager
101-
tier: internal
102-
- from: training_data_store
103-
to: pii_tokenizer
104-
tier: internal
105-
- from: fine_tuning_orchestrator
106-
to: training_data_store
107-
tier: internal
108-
- from: phase_manager
109-
to: evaluators
110-
tier: internal
111-
- from: router
112-
to: audit_log
113-
tier: internal
1+
core_models:
2+
Story:
3+
module: "apprentice.models.story"
4+
dependencies: ["pydantic", "typing", "datetime"]
5+
purpose: "Multi-step narrative representation with metadata"
6+
7+
StoryStep:
8+
module: "apprentice.models.story_step"
9+
dependencies: ["pydantic", "TrainingExample"]
10+
purpose: "Individual step within story journey"
11+
12+
collection_layer:
13+
StoryCollector:
14+
module: "apprentice.collectors.story_collector"
15+
dependencies: ["Story", "StoryStep", "Chronicler"]
16+
purpose: "Aggregate and process story data from Chronicler"
17+
interfaces:
18+
- collect_story_events()
19+
- validate_story_consistency()
20+
- emit_training_examples()
21+
22+
evaluation_layer:
23+
JourneyEvaluator:
24+
module: "apprentice.evaluators.journey_evaluator"
25+
dependencies: ["Story", "metrics"]
26+
purpose: "Analyze journey patterns and efficiency"
27+
interfaces:
28+
- evaluate_journey_completion()
29+
- measure_step_efficiency()
30+
- detect_backtracking()
31+
- score_consistency()
32+
33+
orchestration:
34+
EnhancedPhaseManager:
35+
module: "apprentice.orchestration.phase_manager"
36+
dependencies: ["existing PhaseManager", "JourneyEvaluator"]
37+
purpose: "Per-journey-type phase transition tracking"
38+
extension_points:
39+
- journey_type_registration()
40+
- phase_transition_callbacks()
41+
- journey_specific_metrics()
42+
43+
configuration:
44+
StoryLearningConfig:
45+
module: "apprentice.config.story_learning"
46+
dependencies: ["pydantic", "base config"]
47+
purpose: "Story learning feature configuration"
48+
fields:
49+
- story_learning_enabled: bool = False
50+
- max_story_length: int = 50
51+
- story_retention_days: int = 30
52+
53+
integration_points:
54+
existing_atomic_router:
55+
modification: "none - preserved as-is"
56+
integration: "parallel story collection when enabled"
57+
58+
training_orchestrator:
59+
modification: "extended to handle Story objects"
60+
backward_compatibility: "TrainingExample processing unchanged"

constraints.yaml

Lines changed: 26 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,72 +1,26 @@
1-
version: "1"
2-
project: apprentice
3-
constraints:
4-
- id: C001
5-
name: pii_tokenization
6-
description: All training data must be PII-tokenized before storage
7-
severity: must
8-
classification: PII
9-
rationale: Raw PII in training data creates compliance liability
10-
11-
- id: C002
12-
name: phase_validation
13-
description: Phase transitions require statistical validation (correlation threshold) — no manual promotion
14-
severity: must
15-
classification: null
16-
rationale: Premature promotion degrades user experience
17-
18-
- id: C003
19-
name: budget_enforcement
20-
description: API budget exhaustion must degrade gracefully (fall back to local model, never crash)
21-
severity: must
22-
classification: FINANCIAL
23-
rationale: Budget overruns are unacceptable; service must continue
24-
25-
- id: C004
26-
name: audit_append_only
27-
description: Audit log is append-only JSONL with UTC timestamps — no updates or deletes
28-
severity: must
29-
classification: COMPLIANCE
30-
rationale: Audit trail integrity is required for debugging and compliance
31-
32-
- id: C005
33-
name: no_global_state
34-
description: No global state — all dependencies passed explicitly
35-
severity: must
36-
classification: null
37-
rationale: Global state prevents testing and makes composition impossible
38-
39-
- id: C006
40-
name: shadow_phase_required
41-
description: New tasks must start in shadow phase (100% frontier, local runs in background)
42-
severity: must
43-
classification: null
44-
rationale: Local model quality is unknown until shadow phase proves correlation
45-
46-
- id: C007
47-
name: external_boundary_abstraction
48-
description: All external boundaries (APIs, model servers, I/O) must be behind abstract interfaces
49-
severity: must
50-
classification: null
51-
rationale: Enables testing without network/GPU access
52-
53-
- id: C008
54-
name: pact_key_traceability
55-
description: Source modules with PACT keys must maintain them through code changes
56-
severity: should
57-
classification: null
58-
rationale: PACT keys enable production attribution via Sentinel
59-
60-
- id: C009
61-
name: config_fail_fast
62-
description: Invalid configuration must cause immediate failure with clear error message
63-
severity: must
64-
classification: null
65-
rationale: Silent config errors cause hard-to-diagnose runtime failures
66-
67-
- id: C010
68-
name: test_isolation
69-
description: All tests must run without GPU, API keys, or network access
70-
severity: must
71-
classification: null
72-
rationale: CI environments don't have GPUs or API keys
1+
backward_compatibility:
2+
atomic_routing: "must remain completely unaffected"
3+
existing_tests: "all 2628 tests must pass without modification"
4+
api_contracts: "no breaking changes to existing interfaces"
5+
6+
configuration:
7+
story_learning:
8+
enabled_flag: "story_learning_enabled: true"
9+
default_state: false
10+
opt_in_required: true
11+
12+
technical_constraints:
13+
python_version: "3.12+"
14+
pydantic_version: "v2"
15+
new_dependencies: "strictly prohibited"
16+
frozen_models: "constraint must be maintained"
17+
18+
data_handling:
19+
privacy: "story data retention must respect user privacy rights"
20+
storage: "efficient storage patterns for multi-step narratives required"
21+
consistency: "Chronicler and StoryCollector must maintain state agreement"
22+
23+
performance:
24+
existing_performance: "atomic task performance must not degrade"
25+
story_overhead: "story learning overhead must be minimal when disabled"
26+
memory_usage: "efficient memory management for long stories required"

pact.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
budget: 50.00
1+
budget: 20.00
2+
plan_only: true
23

34
backend: anthropic
45
model: claude-opus-4-6

primer_story.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Apprentice: Story-Level Learning
2+
3+
## What This Is
4+
5+
A targeted modification to Apprentice (adaptive model distillation) to support learning from multi-step stories instead of only atomic request/response pairs.
6+
7+
## Current State
8+
9+
Apprentice routes requests between frontier API and local model. Training data is collected as TrainingExample objects (request_id, task_type, prompt, remote_response, local_response, phase, confidence). The fine-tuning orchestrator expects single (user_prompt, assistant_response) pairs. Phase transitions are per-task-type.
10+
11+
Evaluators score individual responses (exact_match, semantic_similarity, structured_match, llm_judge, custom). No multi-step evaluation exists.
12+
13+
## What Changes
14+
15+
1. Add Story and StoryStep models to data_models.py
16+
2. Add StoryCollector to training_data_store.py (store/retrieve stories, convert steps to sequential training examples)
17+
3. Add JourneyEvaluator to evaluators.py (scores: goal_completion, step_efficiency, backtracking, consistency)
18+
4. Extend phase manager for per-journey-type phase tracking
19+
5. Opt-in via config: story_learning_enabled: true
20+
21+
## Why
22+
23+
When Chronicler emits stories (multi-step event narratives), Apprentice can learn from sequential patterns rather than isolated exchanges. This enables journey-level optimization — the local model learns to handle multi-turn flows, and phase transitions can vary by journey type (checkout may be autonomous while support is still coaching).
24+
25+
## Constraints
26+
27+
- Backward compatible: existing atomic task routing unaffected
28+
- Story support opt-in via config
29+
- No new external dependencies
30+
- All existing 2628 tests must pass
31+
- Python 3.12+, Pydantic v2, frozen models

prompt.md

Lines changed: 46 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,46 @@
1-
# Apprentice — System Context
2-
3-
## What It Is
4-
Adaptive model distillation. Routes between frontier API and local fine-tuned model, progressively shifting traffic as correlation proves quality.
5-
6-
## How It Works
7-
Request -> Router -> [frontier | local] -> Evaluator -> Phase Manager
8-
Phases: shadow -> canary -> primary -> autonomous
9-
10-
## Key Constraints
11-
- PII tokenized before storage (C001)
12-
- Phase transitions require statistical validation (C002)
13-
- Budget exhaustion degrades gracefully (C003)
14-
- Audit log is append-only (C004)
15-
- No global state (C005)
16-
- New tasks start in shadow phase (C006)
17-
18-
## Architecture
19-
28 components (21 leaf + 7 compositions). Core: router, phase_manager, evaluators, budget_manager, pii_tokenizer, audit_log.
20-
21-
## Done Checklist
22-
- [ ] PII tokenization verified before storage
23-
- [ ] Phase transition requires correlation threshold
24-
- [ ] Budget exhaustion falls back to local model
25-
- [ ] Tests pass without GPU/API/network
26-
- [ ] Audit trail is append-only and complete
1+
# Apprentice Story Learning Enhancement
2+
3+
## Overview
4+
Extend the existing Apprentice system to support multi-step story learning while maintaining backward compatibility with atomic task routing.
5+
6+
## Current System
7+
- Routes requests between frontier API and local model
8+
- Collects training data as request/response pairs
9+
- Handles atomic exchanges effectively
10+
- Has 2628 existing tests that must continue passing
11+
12+
## Enhancement Goals
13+
Add journey-level optimization capabilities including:
14+
- Multi-turn conversation flow handling
15+
- Per-journey-type phase transition tracking
16+
- Goal completion detection and measurement
17+
- Step efficiency analysis
18+
- Backtracking pattern recognition
19+
- Multi-step consistency scoring
20+
21+
## Key Requirements
22+
- **Backward Compatibility**: All existing atomic task routing must remain unaffected
23+
- **Opt-in Configuration**: Story learning enabled via `story_learning_enabled: true`
24+
- **No New Dependencies**: Work within existing Python 3.12+ and Pydantic v2 constraints
25+
- **Test Preservation**: All 2628 existing tests must pass
26+
- **Frozen Models**: Maintain existing model constraints
27+
28+
## New Components to Implement
29+
1. **Story Model**: Represents multi-step narratives with metadata
30+
2. **StoryStep Model**: Individual steps within a story journey
31+
3. **StoryCollector**: Aggregates and processes story data from Chronicler
32+
4. **JourneyEvaluator**: Analyzes journey patterns and efficiency metrics
33+
5. **Enhanced Phase Manager**: Extended for per-journey-type tracking
34+
35+
## Integration Points
36+
- Chronicler will emit multi-step event narratives
37+
- StoryCollector processes these narratives into training data
38+
- JourneyEvaluator provides optimization insights
39+
- Phase manager tracks journey-specific transitions
40+
41+
## Success Metrics
42+
- Journey completion rates by type
43+
- Multi-turn consistency scores
44+
- Step efficiency measurements
45+
- Backtracking frequency analysis
46+
- Goal achievement tracking

0 commit comments

Comments
 (0)