Skip to content

Leiden vs Louvain disagree on multi-pair null model normalisation #111

@arnaudon

Description

@arnaudon

Problem

When a constructor returns a null model with more than one pair (n_null = len(null_model)/2 > 1), the Leiden and Louvain backends optimise different objectives.

In src/pygenstability/pygenstability.py:494-518 (_optimise(method=\"leiden\", ...)):

n_null = int(len(null_model) / 2)
for null in null_model[::2]:
    partitions.append(
        leidenalg.CPMVertexPartition(
            G,
            weights=quality_values,        # SAME quality for every layer
            node_sizes=null.tolist(),      # different null pair per layer
            correct_self_loops=True,
        )
    )
stability = sum(p.quality() for p in partitions) / n_null
stability += optimiser.optimise_partition_multiplex(
    partitions, layer_weights=n_null * [1.0 / n_null]
)

Per-layer Leiden quality is edges(H) − pair_k(H) (CPM with the layer's node_sizes). The total objective is

(1/n_null) · Σ_k [edges(H) − pair_k(H)]  =  edges(H) − (1/n_null)·Σ_k pair_k(H)

The C++ Louvain backend (generalizedLouvain/CPP/cliques/stability_gen.h:42-48) computes

edges(H) − Σ_k pair_k(H)

These agree iff n_null = 1. For n_null ≥ 2 they differ by (1 − 1/n_null) · Σ pair_k(H), which depends on the partition — so it's not just an absolute scale, the argmax can differ in principle.

Affected constructors

Empirically the existing signed_modularity users haven't reported a problem, so the partition argmax may be close in practice — but it's still a real semantic asymmetry between backends.

Options

  1. Scale node_sizes by √n_null before constructing CPMVertexPartition. The CPM null-model contribution is quadratic in ns, so scaling by √n_null makes each layer contribute n_null · pair_k(H); after Leiden's 1/n_null averaging this yields Σ pair_k(H), matching Louvain.

    • One-line change in _optimise.
    • Changes signed_modularity numerics → needs fixture regeneration.
  2. Use layer_weights=[1.0]*n_null and somehow subtract the duplicated edge contribution.

    • Harder: weights=quality_values is per-layer, so the same edge term appears n_null times. No clean way to remove the duplication without changing the per-layer quality semantics.
  3. Document and leave. Status quo. Acknowledge that for n_null ≥ 2 the Leiden path optimises a slightly different objective than Louvain.

Context

Came up while implementing #110 (sparse linearized_directed for α<1, which adds a second null pair). I opted for (3) there to keep the PR focused on the memory fix, but worth deciding the right long-term answer separately.

Reproducer (qualitative): on signed_modularity with a small signed graph, run pygenstability.run(..., method=\"louvain\") and method=\"leiden\" with the same seeds and compare absolute stabilities — they will not match scale.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions