Federated Learning + Privacy Attack Demo

Federated learning isn't private by default: shared gradients can be inverted to reconstruct individual training images. This repo shows the attack working, then shows differential privacy stopping it — at a measurable accuracy cost.

End-to-end implementation of FedAvg on CIFAR-10, the iDLG gradient-leakage attack (Zhao+ 2020 — improved over the original DLG with label inference), and Central Differential Privacy on FedAvg state-dict diffs (McMahan+ 2018). Reproducible benchmarks, an interactive Gradio demo, and a regression test that fails if the privacy claim ever breaks.

A private CIFAR-10 image, reconstructed from one client's gradient in 300 L-BFGS iterations. Left: target. Right: attacker's reconstruction (fresh randomly-initialized model). PSNR ≈ 22 dB.

TL;DR

Three findings the experiments make visible:

Plain FedAvg leaks. From one client's gradient release on a fresh round-1 model, iDLG recovers the private CIFAR-10 input with a clearly visible reconstruction. The label is recovered exactly via the iDLG sign-pattern trick.
DP at the gradient level defeats DLG. Wrapping a single gradient release in clip + Gaussian noise destroys the reconstruction visually — the attacker fits noise. Demo tabs 2 and 3 make this side-by-side.
Naive Central DP for training hits a hard ceiling on small federations. Adding the same primitive to FedAvg state-dict diffs over 20 rounds with 5 clients on a ~1 M-param CNN collapses utility at any meaningful ε. This is the Gaussian-mechanism curse-of-dimensionality: noise norm grows with √D while the clipped signal stays at C. The curves below quantify it. Production systems route around it with DP-SGD (per-sample, smaller effective sensitivity) and subsampling amplification with hundreds-of-thousands of clients (Opacus, TF Privacy).

Pipeline

flowchart LR
    A[FedAvg<br/>training] --> B[Client gradient<br/>or state diff]
    B --> C[iDLG attack]
    B --> D[Clip + Gaussian<br/>noise]
    D --> E[Protected<br/>release]
    C --> F[Recovered<br/>image]
    E --> G[Attack<br/>defeated]

    style A fill:#dbeafe,stroke:#2563eb
    style C fill:#fee2e2,stroke:#dc2626
    style D fill:#d1fae5,stroke:#059669
    style F fill:#fee2e2,stroke:#dc2626
    style G fill:#d1fae5,stroke:#059669

Stage	What it does	Key metric
FedAvg training	5 clients × CIFAR-10 (IID partition) × 20 rounds	Test accuracy per round
iDLG attack	Reconstruct private input from one client's gradient	PSNR / SSIM of reconstruction
Central DP	Clip + Gaussian noise on the FedAvg state-dict diff, ε-calibrated	(ε, δ)-DP target
Tradeoff	Accuracy vs reconstruction PSNR across DP budgets	Plot, four operating points

Quick start

git clone https://github.com/YanissAmz/federated-learning-privacy.git
cd federated-learning-privacy

# Install
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# Run unit + integration tests (the privacy story is asserted)
make test

# Launch the Gradio demo (requires results/*.json from `python -m scripts.evaluate`
# for tabs 1 & 4 — tabs 2 & 3 work without prior training)
make demo

# Reproduce the full results matrix (~25 min on RTX 3090, ~2h on CPU)
python -m scripts.evaluate

The demo is at http://127.0.0.1:7860 once running. Four tabs:

Federated training curves — accuracy per round under different DP budgets.
iDLG attack — no defense — pick a CIFAR-10 sample, watch the reconstruction emerge live (~30 s).
iDLG attack — DP defense — same attack, target ε is a slider; watch reconstruction degrade.
Privacy/utility tradeoff — accuracy and PSNR vs ε, side by side.

Project structure

src/
  fl/                FedAvg server (with optional Central DP), client, model, data
  attacks/           iDLG gradient inversion + label inference (Zhao+ 2020)
  defenses/          Central DP for state-dict diffs + gradient-level DP for the demo
  demo/              Gradio interface
configs/             YAML config (default.yaml)
scripts/             CLI entrypoints — train, attack, evaluate (full matrix)
tests/               Unit + end-to-end "privacy story" regression test
results/             Generated metrics JSON + figures + GIFs (committed)

How DP integrates with FedAvg here

Two complementary DP mechanisms ship in src/defenses/dp.py:

Central DP on the state-dict diff (used during training in scripts.train):

# server side, inside FLServer.aggregate()
diff       = client_state - global_state          # what this client wants to add
diff       = clip_state_diff(diff, max_norm=1.0)  # bound the update L2 norm
diff       = add_noise_to_diff(diff, sigma)       # σ = noise_multiplier * max_norm
new_state  = global_state + mean(all clipped+noised diffs)

Gradient-level DP (used by scripts.attack and the Gradio demo to defeat iDLG): same primitive, applied to the gradient list a client would share before the attacker sees it.

Privacy accounting

σ is calibrated from a target ε via zCDP composition (Bun & Steinke 2016), the tight bound used in modern moments-accountant style implementations:

# T compositions of Gaussian with noise σ → ρ_total = T / (2σ²) -zCDP
# zCDP → (ε, δ)-DP:  ε(ρ) = ρ + 2 √(ρ ln(1/δ))
# Inverting for σ given target (ε, δ, T):
let u = -√(ln(1/δ)) + √(ln(1/δ) + ε)
σ ≥ √(T / (2 u²))

This is much tighter than basic composition (σ ∝ √T/ε vs T/ε). For ε=8, δ=1e-5, T=20 it gives σ≈3.09 vs ≈13.6 with simple composition. See src/defenses/dp.py:noise_multiplier_from_epsilon.

Why Central DP collapses here, and what to do instead

For the Gaussian mechanism on a D-dim update with clipped L2 norm ≤ C:

	Magnitude
Clipped signal (1 client)	≤ C
Noise vector L2 norm (per round, after averaging over N clients)	√D × C × σ / N
SNR per coordinate	1 / (σ × √D)

For a SimpleCNN with D ≈ 1.1 M, σ=3 (ε=8, T=20), N=5: noise norm ≈ 660 vs signal ≤ 1. SGD over 20 rounds cannot recover. This is intrinsic to applying naive Gaussian to high-dim FedAvg with few clients.

Production routes around it:

DP-SGD (Opacus): per-sample clipping inside each client's local SGD. Sensitivity is now per-sample (~ √D/B × signal scale), and subsampling amplification reduces required σ further.
More clients (10³–10⁶): McMahan+ 2018 trains DP-FedAvg-LSTM with 763k devices and gets near-baseline accuracy at ε=4.6.
Sparsification or low-rank updates: only release a low-dim projection.

This repo implements the textbook Central-DP-FedAvg primitive correctly to show why it fails at this scale. It is intentionally not the production answer.

Results

CIFAR-10, 5 clients, IID partition, 20 rounds, 3 local epochs/round, batch 64, lr=0.01, SimpleCNN. Best of one seed (42). Full per-round metrics in results/<tag>_metrics.json.

Federated training under Central DP

Configuration	Final test accuracy	Δ vs no defense	σ (per-round noise multiplier)
no defense (vanilla FedAvg)	69.8%	—	0
DP target ε=64, δ=1e-5	10.0%	-59.8 pts	σ ≈ 0.597
DP target ε=16, δ=1e-5	10.4%	-59.4 pts	σ ≈ 1.707
DP target ε=4, δ=1e-5	10.4%	-59.4 pts	σ ≈ 5.796
DP target ε=1, δ=1e-5	10.2%	-59.6 pts	σ ≈ 21.916

iDLG attack on a fresh round-1 gradient

3 representative samples (indices 7, 42, 100). Numbers are the average across the 3 samples. Per-sample numbers are in results/attack_*_s*_metrics.json. Each .gif next to it shows the reconstruction emerging.

Configuration	Avg PSNR (dB) ↓ better defense	Avg SSIM	Label inference
no defense (raw gradient)	18.3	0.597	✓ (3/3 correct)
DP applied at ε=64	13.1	0.044	✗ (1/3 correct)
DP applied at ε=16	12.9	0.051	✗ (1/3 correct)
DP applied at ε=4	12.8	0.048	✗ (1/3 correct)
DP applied at ε=1	12.7	0.044	✗ (1/3 correct)

What can be reproduced

# Full matrix (4 FL trainings × 20 rounds + 12 attacks)
python -m scripts.evaluate                # ~25 min on RTX 3090

# Single training run
python -m scripts.train --rounds 20 --clients 5 --tag no_def
python -m scripts.train --rounds 20 --clients 5 --tag dp_eps8 --epsilon 8 --delta 1e-5

# Single attack on a sample
python -m scripts.attack --tag attack_no_def_s7 --sample 7
python -m scripts.attack --tag attack_dp_eps1_s7 --sample 7 --defense dp \
    --noise-multiplier 0.5 --max-norm 1.0

Every CLI dumps a JSON to results/ with the exact config, per-round numbers, and a deterministic seed (default 42). Figures and GIFs go to results/figures/.

The privacy story is regression-tested

tests/test_e2e_privacy.py runs end-to-end:

def test_dp_blocks_attack(self):
    psnr_clear = idlg_attack(model, target, defense="none")
    psnr_dp    = idlg_attack(model, target, defense="dp", epsilon=1)
    # Without DP, iDLG recovers a recognizable signal.
    # With DP, the gap is large enough that the reconstruction is unrecognizable.
    assert psnr_clear - psnr_dp >= 3.0

If anyone breaks the central claim of this repo, CI fails. Other tests cover gradient clipping bounds, ε→σ calibration, the iDLG label-inference path, and the FedAvg state-dict-diff DP integration.

11 tests, all pass on CPU in under a minute.

Limitations and honest caveats

Pedagogical scope, not a privacy library. Don't deploy this DP code. For real privacy deployments use Opacus (DP-SGD with RDP accounting) or Flower (FL plumbing) — both handle the curse-of-dimensionality issue this repo demonstrates.
Single-image attack. iDLG works best at batch size 1. Reconstruction quality drops sharply on batched gradients — a known limit of the DLG family. GradInversion (Yin+ 2021) and Inverting Gradients (Geiping+ 2020) handle batches better but are more complex to implement cleanly.
Naive Central DP collapses on small federations. Documented above. Production DP-FedAvg uses 10³–10⁶ clients + DP-SGD per-sample clipping; at that scale the same primitive gives smooth accuracy/privacy curves.
CIFAR-10 + SimpleCNN. Conclusions transfer qualitatively. Absolute reconstruction quality is best on small natural images; very deep/wide models are worse for both attack and defense numbers.
IID partitioning only. The infrastructure for Dirichlet-α non-IID partition is in src/fl/data.partition_non_iid and exposed via --non-iid; results matrix here is IID for clarity.

v0.3 — integrity attacks and robust aggregation

The repo now also handles the active threat model: malicious clients that send poisoned updates to bias the global model (utility attack) or to deanonymize a target client (privacy attack via collusion). Robust server aggregations defeat naive attacks; a defense-aware adversary slips past naive defenses.

# Run the full attack × aggregator matrix (~15 min on RTX 3090, GPU-bound):
python -m scripts.byzantine

# Inspect the heatmaps live:
python -m src.demo.app   # → Tab 5

Threat model (v0.3): K of N clients collude. Attacks live in src/attacks/byzantine.py, defenses in src/defenses/robust.py. Both compose with v0.2's DP — see FLServer.aggregate(aggregator=..., dp=...).

Attacks shipped

Attack	Mechanism	Goal
`SignFlipAttack`	Δ → -Δ	Degrade utility
`ConstantAttack`	Δ → c · 1	Loud baseline
`GradientSuppressionAttack`	N-1 colluding clients submit ±M pairwise-cancelling updates	Deanonymize the one remaining honest target client (the residual = target Δ / N is leaked)
`StealthSuppression`	Same as above, but each malicious update respects an estimated honest-norm envelope	Slip past median / trimmed-mean / Krum

Defenses shipped

Aggregator	Mechanism	Tolerance
`aggregate_mean`	Coord-wise mean (vanilla FedAvg)	Baseline; collapses on any non-trivial Byzantine
`aggregate_median`	Coord-wise median (Yin+ 2018)	Up to ⌊(N-1)/2⌋ Byzantine
`aggregate_trimmed_mean`	Drop top/bottom k% per coord	Up to ⌊k·N⌋ per side
`aggregate_krum`	Pick client closest to N-f-2 nearest peers (Blanchard+ 2017)	f Byzantine, requires N ≥ 2f+3
`filter_by_update_norm` (pre-filter)	Drop clients with ‖Δ‖ > C	Defeats `Constant` / loud `Suppression`

Empirical matrix

5 clients, K=2 malicious, target = client 0, 5 rounds, CIFAR-10, SimpleCNN. Numbers are filled by python -m scripts.byzantine writing to results/byzantine_summary.json.

	mean	median	trimmed_mean	Krum
`constant`, K=2	50.6% / cos=+0.86	10.0% / cos=+nan	46.5% / cos=+0.88	10.0% / cos=+nan
`sign_flip`, K=2	50.2% / cos=+0.87	37.7% / cos=+0.92	50.7% / cos=+0.91	41.9% / cos=+0.94
`stealth`, K=2	50.7% / cos=+0.85	48.5% / cos=+0.96	51.4% / cos=+0.92	49.9% / cos=+0.95
`suppression`, K=2	50.6% / cos=+0.86	49.2% / cos=+0.96	52.8% / cos=+0.93	53.2% / cos=+0.94

The hypothesis the matrix tests:

mean collapses under any of the four attacks.
median and Krum recover utility under sign_flip and constant, partially under suppression (the cancellation pattern bypasses per-coordinate aggregators), and fail more on stealth (designed for it).
trimmed_mean interpolates between mean and median.

End-to-end test

# tests/test_byzantine.py::TestE2EByzantineStory::test_median_recovers_honest_signal
assert d_median < d_mean              # median is closer to honest-only mean
assert (d_mean - d_median) / d_mean > 0.3   # by ≥ 30% of the corrupted-mean error

If anyone breaks the central v0.3 claim, CI fails.

Roadmap

Open milestones in docs/ROADMAP.md:

v0.3 — Byzantine / integrity attacks (sign-flip, constant, backdoor) + robust aggregation (median, trimmed mean, Krum). Complementary threat model to today's privacy-only focus.
v0.4 — DP-SGD via Opacus + RDP accounting (the production fix for the v0.2 "Central DP collapses" finding).
v0.5 — non-IID Dirichlet defaults + Flower integration.
v0.6 — stronger attacks (GradInversion, Inverting Gradients, R-GAP).

References

Zhu, Liu, Han. Deep Leakage from Gradients (DLG). NeurIPS 2019. arXiv:1906.08935
Zhao, Mopuri, Bilen. iDLG: Improved Deep Leakage from Gradients. 2020. arXiv:2001.02610
McMahan, Ramage, Talwar, Zhang. Learning Differentially Private Recurrent Language Models (DP-FedAvg). 2018. arXiv:1710.06963
Dwork, Roth. The Algorithmic Foundations of Differential Privacy. 2014. (Gaussian mechanism, simple composition.)

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
configs		configs
docs		docs
examples		examples
results		results
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Federated Learning + Privacy Attack Demo

TL;DR

Pipeline

Quick start

Project structure

How DP integrates with FedAvg here

Privacy accounting

Why Central DP collapses here, and what to do instead

Results

Federated training under Central DP

iDLG attack on a fresh round-1 gradient

What can be reproduced

The privacy story is regression-tested

Limitations and honest caveats

v0.3 — integrity attacks and robust aggregation

Attacks shipped

Defenses shipped

Empirical matrix

End-to-end test

Roadmap

References

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Federated Learning + Privacy Attack Demo

TL;DR

Pipeline

Quick start

Project structure

How DP integrates with FedAvg here

Privacy accounting

Why Central DP collapses here, and what to do instead

Results

Federated training under Central DP

iDLG attack on a fresh round-1 gradient

What can be reproduced

The privacy story is regression-tested

Limitations and honest caveats

v0.3 — integrity attacks and robust aggregation

Attacks shipped

Defenses shipped

Empirical matrix

End-to-end test

Roadmap

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages