|
| 1 | +# Critical Accuracy Regression Analysis - February 10, 2026 |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +**CRITICAL REGRESSION DETECTED:** M2Sim accuracy has degraded by 1,867% from 5.7% to 106.3% average error following PR #419 merge. |
| 6 | + |
| 7 | +## Impact Assessment |
| 8 | + |
| 9 | +### Accuracy Degradation |
| 10 | +- **Previous state:** 5.7% average error (world-class calibration achieved) |
| 11 | +- **Current state:** 106.3% average error (unacceptable for production) |
| 12 | +- **Degradation:** 1,867% increase in timing error |
| 13 | + |
| 14 | +### Affected Benchmarks |
| 15 | +| Benchmark | Previous Error | Current Error | Degradation | |
| 16 | +|-----------|----------------|---------------|-------------| |
| 17 | +| loadheavy | ~5% | 424.0% | 8,380% increase | |
| 18 | +| storeheavy | ~8% | 259.4% | 3,143% increase | |
| 19 | +| branchheavy | ~5% | 16.1% | 222% increase | |
| 20 | +| memorystrided | ~2% | 2.0% | Stable | |
| 21 | +| arithmetic | ~34.5% | 34.5% | Stable | |
| 22 | +| dependency | ~6.7% | 6.7% | Stable | |
| 23 | +| branch | ~1.3% | 1.3% | Stable | |
| 24 | + |
| 25 | +### Pattern Analysis |
| 26 | +**Memory operations severely affected:** Load and store intensive benchmarks show catastrophic regression while arithmetic operations remain stable. |
| 27 | + |
| 28 | +## Root Cause Investigation |
| 29 | + |
| 30 | +### Associated Changes |
| 31 | +**PR #419:** "Fix latency gaps: MADD/MSUB multiply and missing store/load latencies" |
| 32 | +- **Author:** Leo |
| 33 | +- **Merge date:** February 10, 2026 |
| 34 | +- **Scope:** Instruction latency assignments |
| 35 | + |
| 36 | +### Technical Analysis |
| 37 | +1. **MADD/MSUB multiply latency:** Fixed multiply latency assignment (expected impact) |
| 38 | +2. **Store/load latency fixes:** Added LDRSW, STR, STP, LDP, STRB, STRH latency assignments |
| 39 | +3. **Memory operation classification:** Updated IsMemoryOp and IsLoadOp classifications |
| 40 | + |
| 41 | +### Regression Hypothesis |
| 42 | +**Memory latency miscalibration:** The new latency assignments for memory operations appear to have disrupted the calibrated timing model, particularly affecting load/store intensive workloads. |
| 43 | + |
| 44 | +## Data Evidence |
| 45 | + |
| 46 | +### Current Accuracy Results (February 10, 2026) |
| 47 | +```json |
| 48 | +{ |
| 49 | + "average_error": 1.062782000098751, |
| 50 | + "max_error": 4.240411994883438, |
| 51 | + "calibrated_count": 7, |
| 52 | + "uncalibrated_count": 0 |
| 53 | +} |
| 54 | +``` |
| 55 | + |
| 56 | +### Matmul Calibration Status |
| 57 | +- **Current CPI:** 1.713 (up from 1.363) |
| 58 | +- **Status:** Expected increase due to multiply latency fix |
| 59 | +- **Concern:** May indicate broader timing model disruption |
| 60 | + |
| 61 | +## Critical Actions Required |
| 62 | + |
| 63 | +### Immediate (Cycle 40) |
| 64 | +1. **Leo investigation:** Review memory operation latency assignments in PR #419 |
| 65 | +2. **Targeted rollback:** Consider reverting memory-specific changes while preserving multiply fixes |
| 66 | +3. **Calibration validation:** Re-run memory subsystem calibration with corrected latencies |
| 67 | + |
| 68 | +### Strategic (Cycles 40-42) |
| 69 | +1. **Isolated testing:** Test multiply latency fixes separately from memory operation changes |
| 70 | +2. **Incremental validation:** Apply latency fixes one instruction type at a time |
| 71 | +3. **Regression prevention:** Establish accuracy monitoring for future latency changes |
| 72 | + |
| 73 | +## Technical Recommendations |
| 74 | + |
| 75 | +### Memory Operation Review |
| 76 | +- **LDRSW latency:** Verify 4-cycle LoadLatency assignment correctness |
| 77 | +- **Store operations:** Review STR, STP, STRB, STRH latency assignments |
| 78 | +- **Load operations:** Validate LDP latency assignment |
| 79 | + |
| 80 | +### Calibration Framework |
| 81 | +- **Baseline validation:** Confirm hardware baseline measurements remain valid |
| 82 | +- **Parameter isolation:** Test individual latency parameters for calibration impact |
| 83 | +- **Accuracy monitoring:** Implement CI checks to prevent future regressions |
| 84 | + |
| 85 | +## Production Impact |
| 86 | + |
| 87 | +**DEPLOYMENT BLOCKED:** Current 106.3% error rate is unacceptable for production use. |
| 88 | + |
| 89 | +**Recovery Timeline:** |
| 90 | +- **Target:** Return to <10% average error within 2-3 cycles |
| 91 | +- **Critical path:** Memory operation latency correction |
| 92 | +- **Validation:** Full calibration re-execution required |
| 93 | + |
| 94 | +## Conclusion |
| 95 | + |
| 96 | +This regression represents a critical failure in timing model accuracy. The correlation with PR #419 memory operation changes provides clear direction for remediation. Immediate action required to restore world-class accuracy performance. |
| 97 | + |
| 98 | +--- |
| 99 | +*Analysis by Alex - M2Sim Data Analysis & Calibration Specialist* |
| 100 | +*Generated: February 10, 2026* |
0 commit comments