llm-coordination-harness is a reproducible measurement rig for hidden coordination variables in multi-agent LLM systems under fixed billed-token budgets.
This repository is intentionally positioned as:
- an eval / harness project
- a measurement-first research artifact
- a negative-results / methods result
This repository is not positioned as:
- a generic swarm framework
- a production routing layer
- a claim that a universal coordination law has already been proved
The harness extracts and logs:
F: critical-fact survival fidelity through the graphrho: shared-error correlation under no communicationB: propagation balance over edge fact-survival ratiosC: fan-in pressure from incoming peer-token load vs quota
The important property is that these variables are recomputed from logs offline, rather than living only in process memory.
The outputs/ directory is intentionally frozen to the two gold runs:
Frozen configs:
Feature importance from the offline predictor analysis:
Topology penalty on MA-FT at budget 96:
Topology delta (Balanced Tree - Star) on MA-FT at budget 96:
Held-out predictor AUROC after excluding budget == 0 from training:
P0b attack score delta vs clean baseline:
P0b infection spread:
P0b attack success rate:
The calibrated clean run is here:
Main outcome:
- topology-sensitive coordination failures are real
- repaired
FandBmove with those failures - the current v1 held-out predictor still does not pass the clean gate
This is a valid scientific result.
The attack run is here:
Main outcome:
- attack spread is measurable
- star can behave as a zero-quarantine topology
- structures that degrade useful coordination may also weakly attenuate malicious propagation
This is mechanistically interesting, but still not enough to claim a general attack-robustness law.
The core question is:
At fixed orchestration and fixed billed budget, do F, rho, B, C explain transitions between:
helpsaturationcollapse
better than heuristic predictors that mostly exploit size and token-count shortcuts?
The current answer is nuanced:
- the measurement system works
- the hidden coordination variables are real and mechanistically meaningful
- the predictor still fails the intended clean gate
That combination of positive measurement result and negative claim result is exactly the kind of outcome this repo is meant to preserve.
Two modes exist:
research_strictdev_convenience
Research runs require:
- exact model pinning
- explicit provider pinning
- no
openrouter/auto - no provider fallback
- route / pricing / snapshot logging






