Skip to content

feat(isomorphism): Weisfeiler-Lehman graph hash#10

Open
seilat wants to merge 4 commits into
masterfrom
feat/wl-graph-hash
Open

feat(isomorphism): Weisfeiler-Lehman graph hash#10
seilat wants to merge 4 commits into
masterfrom
feat/wl-graph-hash

Conversation

@seilat

@seilat seilat commented May 30, 2026

Copy link
Copy Markdown
Owner

What

Adds WeisfeilerLehmanGraphHash<V,E> to org.jgrapht.alg.isomorphism — a deterministic, isomorphism-invariant graph fingerprint built on the same 1-WL colour refinement as ColorRefinementAlgorithm. Useful as a fast isomorphism pre-filter and as a content key for caching/deduplicating graphs.

Equal hashes are necessary but not sufficient for isomorphism (documented + pinned by a test: a 6-cycle and two disjoint triangles collide under 1-WL).

API

new WeisfeilerLehmanGraphHash<>(graph);              // DEFAULT_ITERATIONS = 3
new WeisfeilerLehmanGraphHash<>(graph, iterations);
new WeisfeilerLehmanGraphHash<>(graph, iterations,
      vertexLabelFunction, edgeLabelFunction, parallelEdgeRule);

String getHash();                              // graph fingerprint
Map<V,String> getVertexHashes();               // final per-vertex colour
Map<V,List<String>> getVertexHashSequences();  // per-iteration subtree hashes

Capabilities

  • Degree-based initial colours by default; optional vertex/edge label providers fold application-defined labels into the colouring (edge weights ignored).
  • Directed-aware (in/out neighbours distinct); mixed graphs rejected; self-loops honoured.
  • ParallelEdgeRuleCOUNT (default, multiplicity) or IGNORE (collapse, matching ColorRefinementAlgorithm).
  • Convergence early-exit: iterations is an upper bound; the hash is stable at/beyond the colour-refinement fixed point (≤|V| rounds), so a huge iterations can't loop unbounded.
  • Full 256-bit SHA-256 internal labels (no truncation collisions); thread-safe (per-thread MessageDigest).

Test plan

  • 25 unit tests, all green: isomorphic relabelling (undirected + directed), structural discrimination, insertion-order invariance, the 1-WL blind-spot collision, directionality, iterations=0, same-degree-sequence separation, convergence stability (k=3 == k=MAX_VALUE), parallel-edge COUNT vs IGNORE, undirected + directed self-loops, vertex/edge label folding, ignored weights, isolated vertices, empty/single-vertex, per-vertex symmetry, per-iteration sequence structure + unmodifiability, argument validation.
  • JMH benchmark at org.jgrapht.perf.isomorphism — linear-in-iterations scaling on sparse/dense graphs, with the convergence plateau visible.
  • mvn -P checkstyle checkstyle:check clean; javadoc:javadoc no warnings.

Notes

Scope, naming, package placement, and the parallel-edge default are open for discussion on the jgrapht-dev list before any upstream consideration. This PR targets the fork.

🤖 Generated with Claude Code

seilat and others added 4 commits May 30, 2026 13:43
Add org.jgrapht.alg.hash.WeisfeilerLehmanGraphHash, a deterministic,
isomorphism-invariant graph fingerprint built on iterative colour
refinement (the same naive vertex classification underlying the 1-WL
isomorphism test, cf. ColorRefinementAlgorithm).

- getHash(): aggregates the sorted vertex-colour histogram of the
  initial degree round and every refinement round into a truncated
  SHA-256 digest. Invariant under vertex/edge insertion order and
  isomorphism; hash equality is necessary but not sufficient for
  isomorphism (documented; covered by the hexagon-vs-two-triangles
  regular-graph collision test).
- getVertexHashes(): final per-vertex colour (radius-k subtree hash).
- v1: unweighted, degree-initialised; directed graphs keep in/out
  neighbours distinct so orientation is reflected in the hash.

12 unit tests (iso-equality, discrimination, order-invariance, 1-WL
limitation, directionality, iterations parameter, edge cases). 0
checkstyle violations, no javadoc warnings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add WeisfeilerLehmanGraphHashPerformanceTest measuring getHash() cost
as a function of refinement iterations on sparse (avg degree 4) and
dense (avg degree 32) random graphs. Cells are small and ordered
smallest-first so the suite self-bounds within the fast-test budget.

Export the new org.jgrapht.perf.hash and its jmh_generated subpackage
in the surefire argLine, matching the existing per-package pattern
(ref jgrapht#1170).

Local run (500 vertices, forks=0): cost is linear in iterations and
scales with edge count as expected (sparse 0.29/0.82/1.34 ms,
dense 1.93/5.95/9.95 ms for 1/3/5 iterations).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address correctness, safety and documentation issues found in review:
- Reject mixed graphs (getType().isMixed()) instead of silently
  treating them as undirected, which dropped edge direction.
- Make hashing thread-safe via a per-thread (ThreadLocal) MessageDigest
  rather than a shared mutable instance field.
- Use the full 256-bit SHA-256 digest for internal vertex labels (was
  truncated to 64 bits), so distinct neighbourhoods are not collapsed
  by label collisions on large graphs.
- Stop refinement at the colour-refinement fixed point (<= |V| rounds);
  iterations is now an upper bound, so the hash is stable at/beyond
  convergence and a huge iterations value can no longer loop unbounded.
- Return an unmodifiable map from getVertexHashes().
- Document parallel-edge multiplicity (counted, unlike
  ColorRefinementAlgorithm which dedups), weight-ignoring, self-loop
  handling, directed semantics, convergence, thread-safety; add the
  missing @param and cite Weisfeiler-Leman 1968 / Shervashidze 2011.

Tests: 12 -> 20 (parallel-edge multiplicity, undirected+directed
self-loops, weight-invariance, isolated vertices, convergence stability
k=MAX_VALUE, directed isomorphism, unmodifiable/full-length vertex map).
20 green, 0 checkstyle, no javadoc warnings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, parallel-edge rule

Move WeisfeilerLehmanGraphHash into org.jgrapht.alg.isomorphism (next to
ColorRefinementAlgorithm / ColorRefinementIsomorphismInspector) rather
than a new single-class package, and flesh out v1 to a complete WL
hashing capability instead of deferring features:

- Custom initial vertex labels and edge labels via optional
  Function<V,String> / Function<E,String> providers (fold
  application-defined labels into the colouring; default stays
  degree-based / unlabelled).
- Per-iteration subtree hashes: getVertexHashSequences() returns each
  vertex's colour after 0,1,2,... rounds (the rooted-subtree "subgraph
  hashes" sequence), up to iterations or convergence.
- Configurable parallel-edge handling via ParallelEdgeRule: COUNT
  (default, multiplicity) or IGNORE (collapse, matching
  ColorRefinementAlgorithm's neighbour dedup) — resolves the
  consistency question by making it explicit.
- Stability-contract javadoc: digest is deterministic per version/config
  but not a cross-version serialisation format or crypto commitment.

Retains the review hardening: mixed-graph rejection, per-thread
MessageDigest, full 256-bit labels, convergence early-exit.

Move the JMH bench to org.jgrapht.perf.isomorphism and update the
surefire argLine exports accordingly.

Tests 20 -> 25 (vertex-label fold + end-symmetry break, edge-label fold,
ParallelEdgeRule COUNT vs IGNORE, per-iteration sequence structure /
unmodifiability, null-rule validation). 25 green, 0 checkstyle, no
javadoc warnings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant