feat(isomorphism): Weisfeiler-Lehman graph hash#10
Open
seilat wants to merge 4 commits into
Open
Conversation
Add org.jgrapht.alg.hash.WeisfeilerLehmanGraphHash, a deterministic, isomorphism-invariant graph fingerprint built on iterative colour refinement (the same naive vertex classification underlying the 1-WL isomorphism test, cf. ColorRefinementAlgorithm). - getHash(): aggregates the sorted vertex-colour histogram of the initial degree round and every refinement round into a truncated SHA-256 digest. Invariant under vertex/edge insertion order and isomorphism; hash equality is necessary but not sufficient for isomorphism (documented; covered by the hexagon-vs-two-triangles regular-graph collision test). - getVertexHashes(): final per-vertex colour (radius-k subtree hash). - v1: unweighted, degree-initialised; directed graphs keep in/out neighbours distinct so orientation is reflected in the hash. 12 unit tests (iso-equality, discrimination, order-invariance, 1-WL limitation, directionality, iterations parameter, edge cases). 0 checkstyle violations, no javadoc warnings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add WeisfeilerLehmanGraphHashPerformanceTest measuring getHash() cost as a function of refinement iterations on sparse (avg degree 4) and dense (avg degree 32) random graphs. Cells are small and ordered smallest-first so the suite self-bounds within the fast-test budget. Export the new org.jgrapht.perf.hash and its jmh_generated subpackage in the surefire argLine, matching the existing per-package pattern (ref jgrapht#1170). Local run (500 vertices, forks=0): cost is linear in iterations and scales with edge count as expected (sparse 0.29/0.82/1.34 ms, dense 1.93/5.95/9.95 ms for 1/3/5 iterations). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address correctness, safety and documentation issues found in review: - Reject mixed graphs (getType().isMixed()) instead of silently treating them as undirected, which dropped edge direction. - Make hashing thread-safe via a per-thread (ThreadLocal) MessageDigest rather than a shared mutable instance field. - Use the full 256-bit SHA-256 digest for internal vertex labels (was truncated to 64 bits), so distinct neighbourhoods are not collapsed by label collisions on large graphs. - Stop refinement at the colour-refinement fixed point (<= |V| rounds); iterations is now an upper bound, so the hash is stable at/beyond convergence and a huge iterations value can no longer loop unbounded. - Return an unmodifiable map from getVertexHashes(). - Document parallel-edge multiplicity (counted, unlike ColorRefinementAlgorithm which dedups), weight-ignoring, self-loop handling, directed semantics, convergence, thread-safety; add the missing @param and cite Weisfeiler-Leman 1968 / Shervashidze 2011. Tests: 12 -> 20 (parallel-edge multiplicity, undirected+directed self-loops, weight-invariance, isolated vertices, convergence stability k=MAX_VALUE, directed isomorphism, unmodifiable/full-length vertex map). 20 green, 0 checkstyle, no javadoc warnings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, parallel-edge rule Move WeisfeilerLehmanGraphHash into org.jgrapht.alg.isomorphism (next to ColorRefinementAlgorithm / ColorRefinementIsomorphismInspector) rather than a new single-class package, and flesh out v1 to a complete WL hashing capability instead of deferring features: - Custom initial vertex labels and edge labels via optional Function<V,String> / Function<E,String> providers (fold application-defined labels into the colouring; default stays degree-based / unlabelled). - Per-iteration subtree hashes: getVertexHashSequences() returns each vertex's colour after 0,1,2,... rounds (the rooted-subtree "subgraph hashes" sequence), up to iterations or convergence. - Configurable parallel-edge handling via ParallelEdgeRule: COUNT (default, multiplicity) or IGNORE (collapse, matching ColorRefinementAlgorithm's neighbour dedup) — resolves the consistency question by making it explicit. - Stability-contract javadoc: digest is deterministic per version/config but not a cross-version serialisation format or crypto commitment. Retains the review hardening: mixed-graph rejection, per-thread MessageDigest, full 256-bit labels, convergence early-exit. Move the JMH bench to org.jgrapht.perf.isomorphism and update the surefire argLine exports accordingly. Tests 20 -> 25 (vertex-label fold + end-symmetry break, edge-label fold, ParallelEdgeRule COUNT vs IGNORE, per-iteration sequence structure / unmodifiability, null-rule validation). 25 green, 0 checkstyle, no javadoc warnings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
WeisfeilerLehmanGraphHash<V,E>toorg.jgrapht.alg.isomorphism— a deterministic, isomorphism-invariant graph fingerprint built on the same 1-WL colour refinement asColorRefinementAlgorithm. Useful as a fast isomorphism pre-filter and as a content key for caching/deduplicating graphs.Equal hashes are necessary but not sufficient for isomorphism (documented + pinned by a test: a 6-cycle and two disjoint triangles collide under 1-WL).
API
Capabilities
ParallelEdgeRule—COUNT(default, multiplicity) orIGNORE(collapse, matchingColorRefinementAlgorithm).iterationsis an upper bound; the hash is stable at/beyond the colour-refinement fixed point (≤|V| rounds), so a hugeiterationscan't loop unbounded.MessageDigest).Test plan
iterations=0, same-degree-sequence separation, convergence stability (k=3 == k=MAX_VALUE), parallel-edgeCOUNTvsIGNORE, undirected + directed self-loops, vertex/edge label folding, ignored weights, isolated vertices, empty/single-vertex, per-vertex symmetry, per-iteration sequence structure + unmodifiability, argument validation.org.jgrapht.perf.isomorphism— linear-in-iterations scaling on sparse/dense graphs, with the convergence plateau visible.mvn -P checkstyle checkstyle:checkclean;javadoc:javadocno warnings.Notes
Scope, naming, package placement, and the parallel-edge default are open for discussion on the
jgrapht-devlist before any upstream consideration. This PR targets the fork.🤖 Generated with Claude Code