Skip to content

feat(clustering): add Louvain and Leiden community detection#9

Open
seilat wants to merge 4 commits into
masterfrom
feat/clustering-community-detection
Open

feat(clustering): add Louvain and Leiden community detection#9
seilat wants to merge 4 commits into
masterfrom
feat/clustering-community-detection

Conversation

@seilat

@seilat seilat commented May 30, 2026

Copy link
Copy Markdown
Owner

Self-review PR (fork only — not upstream; holds for a jgrapht-dev signal per project cadence). Supersedes #7 and #8, combining both modularity-based community-detection algorithms into one reviewable unit.

What this adds to org.jgrapht.alg.clustering

  • LouvainClustering — the Louvain method (Blondel et al. 2008): alternating local-moving + aggregation, greedily maximising modularity. Weighted/self-loop aware, non-negative weights, getModularity() consistent with UndirectedModularityMeasurer.
  • LeidenClustering — the Leiden method (Traag-Waltman-van Eck 2019): adds a refinement phase so every reported community is well-connected — the guarantee Louvain lacks.
  • CommunityAggregation — package-private shared primitive (compact / numberOfCommunities / degree-conserving aggregate that compacts labels internally), used by both.

Both implement ClusteringAlgorithm<V> with constructors mirroring LabelPropagationClustering: (graph), (graph, rng), (graph, rng, tolerance).

Verification

  • 33 tests (Louvain 17 + Leiden 16), all green. checkstyle 0 violations, javadoc clean.

  • Leiden's headline everyCommunityIsConnected oracle asserts connectivity via ConnectivityInspector on each induced subgraph (structured + 30 random graphs).

  • JMH 4-way cost bench (perf/clustering) — ms/op on planted partitions:

    nodes LabelProp Louvain Leiden GreedyModularity
    125 0.22 0.28 0.42 1.38
    500 2.09 3.05 4.17 30.9

    Leiden ≈ 1.4–1.5× Louvain (refinement + connectivity split), 7–9× faster than GreedyModularity.

Honest scope notes

  • Leiden's modularity is competitive with Louvain, not strictly ≥ (worst gap on random graphs 4×10⁻⁴; it trades a sliver of Q for guaranteed connectivity).
  • Leiden simplifications vs the full paper (documented in the class javadoc): local-moving restarts from singletons each level; refinement gate is connectivity + positive gain rather than the resolution-parametrised well-connected test; connectivity is enforced unconditionally by a final connected-components split. resolution parameter deferred.
  • pom.xml exports the new perf.clustering package(s) in the surefire argLine (documented per-package pattern).

Commits

  1. feat Louvain + tests + bench
  2. fix Louvain zero/negative-weight guards (review follow-up)
  3. feat Leiden + CommunityAggregation refactor (Louvain behaviour-preserving) + connectivity oracle
  4. bench Leiden added to the 4-way JMH bench

🤖 Generated with Claude Code

seilat and others added 4 commits May 30, 2026 11:57
Add LouvainClustering implementing ClusteringAlgorithm<V> in
org.jgrapht.alg.clustering: alternating local-moving and aggregation
phases that greedily maximise modularity, with weighted/self-loop
handling and modularity conventions matching UndirectedModularityMeasurer
(reused for getModularity()). Constructors mirror LabelPropagationClustering
((graph), (graph, rng), (graph, rng, tolerance)).

- 15 JUnit5 tests: cliques->2, ring->3, complete->1 (Q=0), weighted,
  seed determinism, single/empty/isolated, self-loops, random
  partition-validity, getModularity vs measurer, beats all-singletons.
- JMH benchmark (perf/clustering) vs LabelPropagation and GreedyModularity
  on planted-partition graphs; export the new perf package(s) in the
  surefire argLine, matching the existing per-package pattern.

checkstyle clean (etc/jgrapht_checks.xml); javadoc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…weights

Adversarial review follow-ups on the Louvain PR:

- Zero-weight edges (2m == 0) made getModularity() return NaN because the
  modularity guard tested edgeSet().isEmpty() rather than the actual total
  weight. Guard on totalWeight > 0 instead; return 0 otherwise.
- Reject negative edge weights with IllegalArgumentException at compute time
  (modularity is undefined for negative weights), documented on the class.
- Replace the circular getModularityMatchesMeasurer test (getModularity
  delegates to the measurer) with a pinned exact value Q = 11/26 for the
  two-K4-plus-bridge graph; add regressions for zero-weight (no NaN) and
  negative-weight (throws).

17/17 tests pass; checkstyle clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…unities)

Builds on Louvain (E2):

- E2.0: extract package-private CommunityAggregation (compact /
  numberOfCommunities / degree-conserving aggregate that compacts labels
  internally) and refactor LouvainClustering to use it. Behaviour-preserving:
  Louvain's 17 tests stay green. The internal compaction removes the implicit
  'caller must pass compact labels' coupling flagged in review, so Leiden can
  aggregate its non-compact refined partitions safely.
- E2.1: LeidenClustering implementing ClusteringAlgorithm<V>. Multilevel
  local-moving + a refinement phase whose sub-communities grow only along
  edges (so each is connected), aggregation over the refined partition, and a
  final connected-components split that guarantees every reported community is
  connected -- the property Louvain lacks. Non-negative/zero-weight guards
  mirror Louvain.
- E2.2: 16 tests incl. the headline everyCommunityIsConnected oracle
  (ConnectivityInspector per induced subgraph, structured + 30 random graphs),
  clique/ring/complete/weighted parity, determinism, edge cases, and a
  modularity-vs-Louvain comparison (competitive within 0.02; Leiden trades a
  sliver of modularity for guaranteed connectivity).

33/33 clustering tests pass; checkstyle + javadoc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Leiden vs Louvain vs LabelPropagation vs GreedyModularity on planted
partitions. Leiden costs ~1.4-1.5x Louvain (refinement + connectivity-split
passes) while staying 7-9x faster than GreedyModularity; scaling parallels
Louvain. Cost (ms/op): 125-node Louvain 0.28 / Leiden 0.42; 500-node
Louvain 3.05 / Leiden 4.17.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant