Implement GEFF v1 spec compatibility#5
Merged
Merged
Conversation
- GeffNode: assign DEFAULT_COVARIANCE_3D (not 2D) to covariance3d in readFromN5 - GeffNode: fix off-by-one in polygon slice boundary check (< -> <=) - GeffUtils.readVarlengthProperty: cast missing array to byte[] instead of boolean[], since it is stored as UINT8; convert to boolean[] explicitly - GeffUtils.readVarlengthProperty: pass property name (from PropMetadata identifier or last path segment) instead of the full propPath to VarlengthProperty constructor - GeffUtils.writeOffsetsArray: fix column-major stride so the flat layout matches what FlattenedInts.at() expects (j + numColumns*i, not i + numNodes*j)
readFromN5 read covariance2d and covariance3d from disk but discarded the values, always falling back to defaults. Use the read FlattenedDoubles instead. Add testCovariance2dRoundTrip and testCovariance3dRoundTrip to cover write → read for both fields with non-default values.
Creates a GEFF with Python, injects covariance2d (N×4) and covariance3d (N×6) arrays plus their node_props_metadata entries, runs the Java round-trip, and verifies the values are preserved within floating-point tolerance.
- writeOffsetsArray: use UINT64 (was INT64) for varlength values array - writeMissingArray: patch .zarray dtype to |b1 (bool) after writing UINT8 - writeVarlengthProperty: add declaredDtype param so data array uses the dtype declared in metadata (e.g. uint64 stays uint64, not int64) - GeffNode.writeToN5: remove unsupported props (string dtype) from metadata after writing nodes so validate_structure does not find missing prop groups - RoundTripGeff: write metadata after nodes/edges so all metadata modifications (removals, varlength additions) are captured Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion
When GeffNode/GeffEdge write all properties (because no nodePropsMetadata /
edgePropsMetadata is provided), the metadata fields were never populated,
causing Python Pydantic validation to fail since node_props_metadata and
edge_props_metadata are required fields.
- GeffNode.writeToN5: register standard node props (t, x, y, z, color,
track_id, radius, covariance2d, covariance3d) in metadata when writeAllProps
- GeffEdge.writeToN5: register distance and score in metadata when writeAllProps
- GeffMetadata.writeToN5: always write both metadata maps (defaults to {})
so the required Pydantic fields are always present in .zattrs
…tibility
N5ZarrWriter serializes numeric arrays in big-endian format (">i4", ">f8"),
which causes a "Big-endian buffer not supported on little-endian compiler"
error in pandas (e.g. drop_duplicates on edge ids).
GeffUtils.patchZarrLittleEndian: after writing, walks all .zarray files under
the given group path, byte-swaps chunk file data in place, and updates the
dtype from ">" to "<". Only processes uncompressed (null compressor) arrays
since byte-swapping compressed data requires decompression first.
Also create edges/props group unconditionally so the zarr structure is valid
even when there are no edge properties (Python geff always writes this group).
All 42 Java tests and all 5 cross-language round-trip tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
N5ZarrWriter's createDataset() is a no-op when the dataset already exists, so the blosc compressor entry in .zarray was never cleared. Subsequent chunk writes therefore still used blosc, and byte-swapping blosc-compressed bytes produced garbage data. Fix uses a three-step approach for compressed arrays: 1. Read decompressed data via a fresh N5ZarrWriter (blosc still active) 2. Directly patch .zarray to set "compressor": null before any new write 3. Open a second fresh writer (which now sees compressor:null) to write the raw big-endian chunks, then byte-swap to little-endian as before This unblocks the "Big-endian buffer not supported" pandas error when intracktive reads Java-exported GEFF files in production environments where blosc compression is available. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Calculate the chunk size based on the same algorithm used in the original geff implementation in Python. https://github.com/live-image-tracking-tools/geff/blob/35c8691fae11b3dda528a1c7ebb28afadde67d92/packages/geff/src/geff/core_io/_base_write.py#L310
tinevez
approved these changes
Jun 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
node_props_metadata/edge_props_metadata, varlength properties, and the updated polygon storage path.PropMetadataandVarlengthPropertyclasses to support the v1 data model.serialized_props/polygon/layout tonodes/props/polygon/as a varlength property; backward-compatible read fallback isretained for files written by earlier versions.
DEFAULT_CHUNK_SIZE = 1000withcomputeFirstDimChunk(), whichtargets ~8 MiB per chunk (power-of-two on the first dimension), matching the Python
reference implementation.
on little-endian systems without a "Big-endian buffer not supported" error; also
decompresses Blosc chunks before byte-swapping when compression is active.
RawCompressionwhen the native c-blosc library is absent.getProp/setProp/getProps) and varlength propssupport (
getVarlengthProperty/setVarlengthProperty/getVarlengthProperties)to both
GeffNodeandGeffEdge; props are round-tripped through Zarr automatically.and moves internal planning docs to
doc/.Validation
mvn test).cd cross-language-tests && uv run run_tests.py).geffreferenceimplementation (
geff.is_geff_dataset()returnsTrue).(nodes, edges, polygon, and varlength props are restored correctly).
serialized_props/polygon/layout.