Skip to content

aak204/llm-coordination-harness

Repository files navigation

Llm coordination harness

Python Stage Mode Result Status

llm-coordination-harness is a reproducible measurement rig for hidden coordination variables in multi-agent LLM systems under fixed billed-token budgets.

This repository is intentionally positioned as:

  • an eval / harness project
  • a measurement-first research artifact
  • a negative-results / methods result

This repository is not positioned as:

  • a generic swarm framework
  • a production routing layer
  • a claim that a universal coordination law has already been proved

What It Measures

The harness extracts and logs:

  • F: critical-fact survival fidelity through the graph
  • rho: shared-error correlation under no communication
  • B: propagation balance over edge fact-survival ratios
  • C: fan-in pressure from incoming peer-token load vs quota

The important property is that these variables are recomputed from logs offline, rather than living only in process memory.

Golden Artifacts

The outputs/ directory is intentionally frozen to the two gold runs:

Frozen configs:

Visuals

Feature importance from the offline predictor analysis:

Feature Importance

Topology penalty on MA-FT at budget 96:

Topology Penalty

Topology delta (Balanced Tree - Star) on MA-FT at budget 96:

Topology Delta

Held-out predictor AUROC after excluding budget == 0 from training:

Predictor Holdout AUROC

P0b attack score delta vs clean baseline:

Attack Score Delta

P0b infection spread:

Attack Infection Spread

P0b attack success rate:

Attack Success Rate

Headline Results

Clean Phase (P0a)

The calibrated clean run is here:

Main outcome:

  • topology-sensitive coordination failures are real
  • repaired F and B move with those failures
  • the current v1 held-out predictor still does not pass the clean gate

This is a valid scientific result.

Stress Phase (P0b)

The attack run is here:

Main outcome:

  • attack spread is measurable
  • star can behave as a zero-quarantine topology
  • structures that degrade useful coordination may also weakly attenuate malicious propagation

This is mechanistically interesting, but still not enough to claim a general attack-robustness law.

Why This Repo Matters

The core question is:

At fixed orchestration and fixed billed budget, do F, rho, B, C explain transitions between:

  • help
  • saturation
  • collapse

better than heuristic predictors that mostly exploit size and token-count shortcuts?

The current answer is nuanced:

  • the measurement system works
  • the hidden coordination variables are real and mechanistically meaningful
  • the predictor still fails the intended clean gate

That combination of positive measurement result and negative claim result is exactly the kind of outcome this repo is meant to preserve.

OpenRouter Discipline

Two modes exist:

  • research_strict
  • dev_convenience

Research runs require:

  • exact model pinning
  • explicit provider pinning
  • no openrouter/auto
  • no provider fallback
  • route / pricing / snapshot logging

Where To Start

About

Reproducible evaluation harness for hidden coordination variables in multi-agent LLM systems.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors