Implement GEFF v1 spec compatibility by ksugar · Pull Request #5 · live-image-tracking-tools/geff-java

ksugar · 2026-06-22T13:10:45Z

Summary

Implements full GEFF v1 spec compliance, including node_props_metadata /
edge_props_metadata, varlength properties, and the updated polygon storage path.
Adds PropMetadata and VarlengthProperty classes to support the v1 data model.
Migrates polygon storage from the legacy serialized_props/polygon/ layout to
nodes/props/polygon/ as a varlength property; backward-compatible read fallback is
retained for files written by earlier versions.
Replaces the fixed DEFAULT_CHUNK_SIZE = 1000 with computeFirstDimChunk(), which
targets ~8 MiB per chunk (power-of-two on the first dimension), matching the Python
reference implementation.
Writes Zarr arrays in little-endian byte order so Python / pandas can read the output
on little-endian systems without a "Big-endian buffer not supported" error; also
decompresses Blosc chunks before byte-swapping when compression is active.
Falls back to RawCompression when the native c-blosc library is absent.
Adds arbitrary props support (getProp / setProp / getProps) and varlength props
support (getVarlengthProperty / setVarlengthProperty / getVarlengthProperties)
to both GeffNode and GeffEdge; props are round-tripped through Zarr automatically.
Updates README to clarify Zarr Format 2-only support, corrects CITATION.cff metadata,
and moves internal planning docs to doc/.

Validation

All 42 unit tests pass (mvn test).
Cross-language round-trip tests pass (cd cross-language-tests && uv run run_tests.py).
Files written by this branch are readable by the Python geff reference
implementation (geff.is_geff_dataset() returns True).
Files written by the Python reference implementation are readable by this branch
(nodes, edges, polygon, and varlength props are restored correctly).
Backward-compatible read of files with legacy serialized_props/polygon/ layout.

- GeffNode: assign DEFAULT_COVARIANCE_3D (not 2D) to covariance3d in readFromN5 - GeffNode: fix off-by-one in polygon slice boundary check (< -> <=) - GeffUtils.readVarlengthProperty: cast missing array to byte[] instead of boolean[], since it is stored as UINT8; convert to boolean[] explicitly - GeffUtils.readVarlengthProperty: pass property name (from PropMetadata identifier or last path segment) instead of the full propPath to VarlengthProperty constructor - GeffUtils.writeOffsetsArray: fix column-major stride so the flat layout matches what FlattenedInts.at() expects (j + numColumns*i, not i + numNodes*j)

readFromN5 read covariance2d and covariance3d from disk but discarded the values, always falling back to defaults. Use the read FlattenedDoubles instead. Add testCovariance2dRoundTrip and testCovariance3dRoundTrip to cover write → read for both fields with non-default values.

Creates a GEFF with Python, injects covariance2d (N×4) and covariance3d (N×6) arrays plus their node_props_metadata entries, runs the Java round-trip, and verifies the values are preserved within floating-point tolerance.

- writeOffsetsArray: use UINT64 (was INT64) for varlength values array - writeMissingArray: patch .zarray dtype to |b1 (bool) after writing UINT8 - writeVarlengthProperty: add declaredDtype param so data array uses the dtype declared in metadata (e.g. uint64 stays uint64, not int64) - GeffNode.writeToN5: remove unsupported props (string dtype) from metadata after writing nodes so validate_structure does not find missing prop groups - RoundTripGeff: write metadata after nodes/edges so all metadata modifications (removals, varlength additions) are captured Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tion When GeffNode/GeffEdge write all properties (because no nodePropsMetadata / edgePropsMetadata is provided), the metadata fields were never populated, causing Python Pydantic validation to fail since node_props_metadata and edge_props_metadata are required fields. - GeffNode.writeToN5: register standard node props (t, x, y, z, color, track_id, radius, covariance2d, covariance3d) in metadata when writeAllProps - GeffEdge.writeToN5: register distance and score in metadata when writeAllProps - GeffMetadata.writeToN5: always write both metadata maps (defaults to {}) so the required Pydantic fields are always present in .zattrs

…tibility N5ZarrWriter serializes numeric arrays in big-endian format (">i4", ">f8"), which causes a "Big-endian buffer not supported on little-endian compiler" error in pandas (e.g. drop_duplicates on edge ids). GeffUtils.patchZarrLittleEndian: after writing, walks all .zarray files under the given group path, byte-swaps chunk file data in place, and updates the dtype from ">" to "<". Only processes uncompressed (null compressor) arrays since byte-swapping compressed data requires decompression first. Also create edges/props group unconditionally so the zarr structure is valid even when there are no edge properties (Python geff always writes this group). All 42 Java tests and all 5 cross-language round-trip tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

N5ZarrWriter's createDataset() is a no-op when the dataset already exists, so the blosc compressor entry in .zarray was never cleared. Subsequent chunk writes therefore still used blosc, and byte-swapping blosc-compressed bytes produced garbage data. Fix uses a three-step approach for compressed arrays: 1. Read decompressed data via a fresh N5ZarrWriter (blosc still active) 2. Directly patch .zarray to set "compressor": null before any new write 3. Open a second fresh writer (which now sees compressor:null) to write the raw big-endian chunks, then byte-swap to little-endian as before This unblocks the "Big-endian buffer not supported" pandas error when intracktive reads Java-exported GEFF files in production environments where blosc compression is available. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Calculate the chunk size based on the same algorithm used in the original geff implementation in Python. https://github.com/live-image-tracking-tools/geff/blob/35c8691fae11b3dda528a1c7ebb28afadde67d92/packages/geff/src/geff/core_io/_base_write.py#L310

cmalinmayor and others added 27 commits December 3, 2025 22:08

Initial implementation of round trip testing between Java and Python

24c79f0

Add note on how to run tests in readme

ad440c5

Update logback version to handle CVE warnings

55b3891

v1 spec compatibility implementation

034d7d8

Update v1 spec compatibility analysis and plan

ca37ef5

Implement Variable-length Property

d1345ea

update v1 spec compatibility plan for writing varlen props

46c30c0

Implement VarlengthPropertyWriting

e515270

Fallback to RawCompression if Blosc compression is not available

8f1512d

Add uv-related files for testing

109beff

Refactoring

26c4f23

Add covariance2d/3d cross-language round-trip test

b21271e

Creates a GEFF with Python, injects covariance2d (N×4) and covariance3d (N×6) arrays plus their node_props_metadata entries, runs the Java round-trip, and verifies the values are preserved within floating-point tolerance.

Unify Java indentation to tabs (1 tab per indent level)

adb210d

Update docs

82faad8

Update README

995a53a

Update CITATION.cff

0eef18d

Refactoring

0f3366b

Update polygon handling to follow the v1 spec

77dc5a9

Add supported Zarr format

11a3d23

Add methods for custom props and varlenProps

d02d64c

ksugar requested a review from tinevez June 23, 2026 07:44

tinevez approved these changes Jun 23, 2026

View reviewed changes

ksugar merged commit 76e3f99 into main Jun 23, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement GEFF v1 spec compatibility#5

Implement GEFF v1 spec compatibility#5
ksugar merged 27 commits into
mainfrom
v1-spec-compat-impl

ksugar commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ksugar commented Jun 22, 2026

Summary

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants