The learning package implements an outcome-driven feedback loop that observes execution results and continuously refines the governance policy. It consists of five interconnected engines:
- Outcome Engine — Records run results and metadata
- Pattern Engine — Extracts failure patterns and trends
- Reliability Engine — Computes adapter/skill confidence scores
- Adaptive Policy Engine — Generates policy overlays (risk multipliers, approval requirements, retry limits)
- Learning Cycle Manager — Orchestrates the full feedback loop
Key outcome: Policies adapt automatically. As agents and skills prove reliable (or unreliable), their enforcement rules tighten or relax accordingly.
Completed Run
↓
recordRunOutcome() (outcome-engine)
↓
Store outcome + metadata
↓
applyLearningCycle()
├─ updatePatterns() → identify failure trends
├─ rebuildReliability() → compute adapter scores
├─ buildAdaptivePolicyOverlays() → generate policy adjustments
└─ saveLearningState() → persist updated state
↓
Next run uses updated policy
Records run outcome with full execution context:
export interface RunOutcomeInput {
runId: string
result: 'success' | 'failure' | 'partial'
postExecutionScore: number // Quality metric (0–1)
rollbackOccurred: boolean
humanOverride: boolean
riskLevel: 'low' | 'medium' | 'high'
adaptersUsed: string[] // Which adapters executed
failureDetails?: {
dominantFailureType: string
// ...
}
}Outcome also emitted to observability memory-graph for correlation analysis.
Tracks failure patterns by adapter and failure type:
export interface FailurePattern {
id: string // "adapter-id::failure-type"
adapterId: string
failureType: string
fixSuggestion: string
confidence: number // 0.55–0.95 (learns over time)
occurrences: number // How many times seen
lastSeenAt: string // ISO timestamp
}Learning rule: Confidence increases by 0.05 per occurrence (capped at 0.95). Once 3+ occurrences → pattern triggers approval requirement in overlay.
Computes adapter reliability score using weighted formula:
reliabilityScore =
successRate * 0.6 // 60% weight: success/total
+ (1 - min(avgRetries, 3) / 3) * 0.2 // 20% weight: retry efficiency
+ qualityScore * 0.2 // 20% weight: execution qualityExample: Adapter with 80% success rate, avg 1.5 retries, 0.8 quality → score = 0.48 * 0.6 + 0.5 * 0.2 + 0.8 * 0.2 = 0.616
Generates policy overlays that tighten constraints for unreliable adapters:
export interface AdaptivePolicyOverlay {
adapterId: string
riskMultiplier: number // 0.9 (trusted) → 1.4 (risky)
suggestedMaxRetries: number // 2 (reliable) → 1 (unreliable)
requireApproval: boolean // true if score < 0.75 OR 3+ failures
reason: string
updatedAt: string
}Decision logic:
- If reliabilityScore < 0.7 → riskMultiplier = 1.4
- If reliabilityScore > 0.9 → riskMultiplier = 0.9
- If reliabilityScore < 0.75 OR repeated failures → requireApproval = true
Orchestrates full feedback loop on each outcome:
const result = applyLearningCycle({
state: currentLearningState,
outcome: runOutcome
})
// Returns: {
// updatedState: {...},
// policyDiff: {...}, // What changed
// constraintSuggestions: [...], // Recommendations
// summary: "Processed outcome..."
// }export interface LearningState {
agentProfiles: Record<string, AgentProfile>
thresholdPolicy: ThresholdPolicy
skillStats: Record<string, SkillStats>
}Persisted to: learning-store.json (project root or XDG data dir)
All outcomes appended to outcomes.jsonl (newline-delimited JSON):
{"runId": "run-123", "result": "success", ...}
{"runId": "run-124", "result": "failure", ...}
This log is the source of truth for pattern analysis and reliability calculation.
All state updates return new objects (never mutate in place):
// Pattern engine example
export function updatePatterns(
existing: FailurePattern[],
outcome: RunOutcome
): FailurePattern[] {
// Return new array, never modify existing
return [
...existing,
{ id, adapterId, ... } // new pattern
]
}Patterns and reliability scores build confidence gradually:
// Failure pattern confidence
found.confidence = Math.min(0.95, found.confidence + 0.05) // +5% per occurrence
// Reliability score (rebuilt from all outcomes)
reliabilityScore = weightedFormula(successRate, avgRetries, qualityScore)Reliability isn't just success rate—it combines:
- Success rate (60%) — Did it work?
- Retry efficiency (20%) — How many retries needed?
- Quality score (20%) — How good was the output?
This prevents one dimension from dominating.
Adaptive policy generates overlays—recommendations applied on top of base policy:
Base policy: "All adapters default to 2 max retries"
↓
Overlay (learned): "GitHub adapter is unreliable → require approval + 1 retry limit"
↓
Effective policy: "GitHub uses 1 retry + approval gate"
Test each engine independently:
// Test pattern learning
describe('pattern-engine', () => {
it('increments occurrences and confidence', () => {
const outcome = { success: false, dominantFailureType: 'auth', ... }
const updated = updatePatterns([...], outcome)
// Verify: new pattern with confidence 0.55, occurrences: 1
})
it('caps confidence at 0.95', () => {
const existing = [{ ..., occurrences: 9, confidence: 0.9 }]
const updated = updatePatterns(existing, outcome)
// Verify: confidence = 0.95, not 0.95+
})
})
// Test reliability calculation
describe('reliability-engine', () => {
it('weights success rate, retries, quality', () => {
const outcomes = [
{ success: true, retryCount: 1, qualityScore: 0.8, ... },
{ success: true, retryCount: 2, qualityScore: 0.9, ... },
{ success: false, retryCount: 3, qualityScore: 0.2, ... },
]
const reliability = rebuildReliability(outcomes)
// Verify: score ≈ 0.616 (calculated above)
})
})Test full learning cycle:
describe('learning cycle', () => {
it('processes outcome and updates all state', async () => {
const state = loadLearningState()
const outcome = { runId: 'run-123', result: 'failure', ... }
const result = applyLearningCycle({ state, outcome })
// Verify: state updated, policy diff recorded, suggestions generated
expect(result.updatedState.thresholdPolicy).toBeDefined()
expect(result.policyDiff).toHaveProperty('changes')
expect(result.summary).toContain('Processed outcome')
})
})85%+ across all engines. Integration tests cover happy path (outcome → state update). Unit tests cover edge cases (cap at 0.95 confidence, retry efficiency weighting).
Problem: Policy becomes too strict → approvals reject legitimate runs → outcome classified as "failure" → policy becomes even stricter → system loops.
Mitigation:
- Confidence capping (max 0.95) prevents infinite drift
- Policy overlays are additive (tighter constraints), never relaxed automatically
- Manual policy review required to loosen constraints
Problem: If you stop running a type of task (e.g., risky deploys), its reliability score becomes frozen at an old value.
Mitigation:
- Track
updatedAttimestamp; alert operators if overlay > 30 days old - Consider decay factor (gradually reduce confidence if no new outcomes)
- Review periodically:
buildLearningReport()shows strongest/weakest adapters
Problem: A task that runs once per month builds confidence slowly (need 3 occurrences for pattern).
Mitigation:
- Pattern confidence starts at 0.55, not 0.0 (give benefit of doubt)
- Use heuristic defaults (e.g., "GitHub auth failures → check token")
- Combine with manual policy annotations (don't rely only on learning)
Problem: If dominantFailureType or adaptersUsed not filled correctly, learning fails silently.
Mitigation:
- Validate
RunOutcomeInputwith schema (Zod) - Log warning if fields missing; don't process outcome
- Audit outcomes log regularly for gaps
Problem: Policy updated but control-service still uses cached config.
Mitigation:
- Learning cycle saves policy diff + generated overlays
- Service must reload config (e.g., on SIGHUP or endpoint)
- Document: "Policy changes require service restart or reload endpoint"
Control-service handler calls:
import { recordRunOutcome } from '@cku/learning'
// After execution completes
const outcome = {
runId,
result: success ? 'success' : 'failure',
postExecutionScore: computeQuality(result),
rollbackOccurred: steps.some(s => s.rolled_back),
...
}
recordRunOutcome(outcome) // Triggers learning cycleGovernance gates use overlays:
import { buildLearningReport } from '@cku/learning'
const report = buildLearningReport()
const overlay = report.overlays.find(o => o.adapterId === currentAdapter)
if (overlay?.requireApproval) {
gateManager.requireApprovalFor(gate)
}
if (overlay?.riskMultiplier) {
risk = computeRisk(base) * overlay.riskMultiplier
}Outcomes also emit to memory-graph:
appendMemoryGraphEvent({
type: 'outcome_recorded',
runId,
data: { result, postExecutionScore, rollbackOccurred, ... }
})Setup:
- GitHub adapter has had 2 failures (confidence: 0.65, occurrences: 2)
- Base reliability score: 0.7 (borderline)
Run 1: GitHub deploy fails (3rd failure)
1. recordRunOutcome({ result: 'failure', dominantFailureType: 'auth', adaptersUsed: ['github'], ... })
2. updatePatterns: Find existing GitHub::auth pattern
- occurrences: 2 → 3
- confidence: 0.65 + 0.05 = 0.70
3. rebuildReliability: Recalculate GitHub adapter score
- Now 3 failures / N runs → reliability ≈ 0.68 (down from 0.7)
4. buildAdaptivePolicyOverlays:
- GitHub reliability < 0.75 AND occurrences ≥ 3 → requireApproval = true
- Overlay created: { adapterId: 'github', riskMultiplier: 1.3, suggestedMaxRetries: 1, requireApproval: true }
5. saveLearningState: Store updated state
6. Policy report generated (markdown + JSON)
Result: Next GitHub deploy requires approval gate + 1-retry limit (instead of 2).
Run 2: GitHub deploy succeeds (operator approved)
1. recordRunOutcome({ result: 'success', adaptersUsed: ['github'], ... })
2. updatePatterns: GitHub::auth pattern not updated (pattern only on failure)
3. rebuildReliability: Recalculate GitHub score
- Now 3 failures + 1 success / N runs → reliability ≈ 0.72 (slight recovery)
4. buildAdaptivePolicyOverlays:
- Still requireApproval (confidence < 0.75 + 3+ failures)
- Overlay persists
Result: Overlay remains until 5+ more successes occur or manual policy change.
import { applyLearningCycle, recordRunOutcome } from '@cku/learning'
const outcome = { runId: 'run-123', result: 'failure', ... }
recordRunOutcome(outcome) // Persists + emits
const state = loadLearningState()
const result = applyLearningCycle({ state, outcome })
console.log(result.summary)
// "Processed outcome for run-123. Updated 5 agent profiles. Policy changes: 2. ..."import { buildLearningReportJson, buildLearningReportMarkdown } from '@cku/learning'
const state = loadLearningState()
const json = buildLearningReportJson(state, result)
const md = buildLearningReportMarkdown(result)
// Save to disk
fs.writeFileSync('learning-report.json', JSON.stringify(json, null, 2))
fs.writeFileSync('learning-report.md', md)import { loadLearningState } from '@cku/learning'
const state = loadLearningState()
console.log(`Total outcomes: ${state.outcomes.length}`)
console.log(`Strongest adapters:`, state.reliability.sort(by reliabilityScore).slice(0, 3))
console.log(`Recent patterns:`, state.patterns.filter(p => p.occurrences >= 3))No explicit config file for learning (stateless engines). Behavior controlled via code constants:
// pattern-engine.ts
const INITIAL_CONFIDENCE = 0.55
const MAX_CONFIDENCE = 0.95
const CONFIDENCE_INCREMENT = 0.05
// reliability-engine.ts
const WEIGHTS = {
successRate: 0.6,
retryEfficiency: 0.2,
qualityScore: 0.2,
}
// adaptive-policy.ts
const RISK_THRESHOLDS = {
high: 1.4, // score < 0.7
normal: 1.0,
low: 0.9, // score > 0.9
}To customize, modify constants in respective source files.
- Define types in
shared/governance-types.ts - Create engine file (e.g.,
packages/learning/src/new-engine.ts) - Implement core function (immutable, pure if possible)
- Integrate into learning cycle in
index.ts - Add tests with mocks for dependencies
- Update documentation (this file)
Example: Adding a "feedback score learning" engine:
// feedback-learning.ts
export function updateFeedbackScores(
existing: FeedbackScore[],
outcome: RunOutcomeInput
): FeedbackScore[] {
// Learn from operator feedback
return [...]
}
// index.ts
const feedbackScores = updateFeedbackScores(params.state.feedbackScores, params.outcome)
const updatedState = { ..., feedbackScores }# Inspect current learning state
cat learning-store.json | jq '.reliability | sort_by(.reliabilityScore)'
# View recent outcomes
tail -20 outcomes.jsonl
# Check pattern trends
cat learning-store.json | jq '.patterns | sort_by(.occurrences) | reverse | .[0:5]'recordRunOutcome()appends to outcomes log (O(1) write)- Memory-graph emit is async, non-blocking
- Learning cycle runs after outcome recorded (eventual consistency)
rebuildReliability()recalculates from all outcomes (O(N) where N = number of outcomes)- For 10k outcomes, expect ~50ms rebuild time
- Consider: Running learning cycle asynchronously or on schedule (e.g., hourly)
saveLearningState()writes to disk (O(1) with small state size)- No database required—JSON files sufficient for current scale
- 10–100k outcomes: Current approach fine (learning cycle ~50–200ms)
- 100k+ outcomes: Consider:
- Archiving old outcomes (move to separate file)
- Running learning cycle on schedule (not per-outcome)
- Aggregating outcomes in batches
import { metrics } from '@cku/observability'
metrics.recordOutcomeProcessed(outcome.result) // success/failure/partial
metrics.recordPatternDetected(pattern.adapterId, pattern.occurrences)
metrics.recordReliabilityChange(adapterId, oldScore, newScore)
metrics.recordOverlayApplied(overlay.adapterId, overlay.riskMultiplier)# Generate report
pnpm --filter learning run report
# Shows:
# - Top 3 strongest adapters (highest reliabilityScore)
# - Top 3 weakest adapters (lowest reliabilityScore)
# - Top 5 failure patterns (most occurrences)
# - Active policy overlaysDepends on:
- shared — Governance types, RunOutcomeInput
- observability — Memory-graph event emission
- governance — Policy evaluation (uses overlays)
- control-service — Triggers outcome recording
Used by:
- governance — Applies overlays to gate decisions
- control-service — Records outcomes, reads reports
- orchestrator — May adjust execution based on reliability scores
Related:
- Root CLAUDE.md — Monorepo overview
- System Architecture — Feedback loop diagram
- Testing Guide — Learning package test patterns
- Config Schema — Policy.json reference