Skip to content

Add loader for animovement/aniframe Parquet files#963

Open
roaldarbol wants to merge 16 commits into
neuroinformatics-unit:mainfrom
roaldarbol:animovement-reader
Open

Add loader for animovement/aniframe Parquet files#963
roaldarbol wants to merge 16 commits into
neuroinformatics-unit:mainfrom
roaldarbol:animovement-reader

Conversation

@roaldarbol

@roaldarbol roaldarbol commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements a reader for the aniframe format (Parquet files produced by the animovement R ecosystem), enabling data exchange between the movement Python package and the aniframe/aniread R packages.

Closes #307.


Background

The aniframe format is a long-format tidy data frame (one row per individual x keypoint x time) serialised as a Parquet file with rich metadata stored in the file-level schema metadata. The ecosystem includes:

  • aniframe -- core data structures (S3 class + metadata spec)
  • aniread -- readers/writers for many tracking formats including movement's own netCDF

The aniread read_movement() function already reads movement netCDF into aniframe; this PR implements the reverse direction.


Format overview (confirmed from a real aniframe Parquet file)

Columns (three conceptual slots)

Slot Default columns Optional columns
what (entity identity) individual, keypoint model, track
when (temporal) time session, trial
where (spatial) x, y z, rho, phi, theta
confidence confidence (optional)

Column types observed: individual as int32, keypoint as Arrow dictionary (categorical), session/trial as int32, time as int32, x/y/confidence as float64.

Parquet metadata

The aniframe metadata is stored in the Parquet file-level schema metadata under the key b'r', serialised using R's native ASCII serialisation format (version 3, starts A\n3\n...). It is not JSON. Fields include:

Field Type Default
source string NA
sampling_rate numeric (Hz) NA
unit_time factor "frame"
unit_space factor "px"
unit_angle factor "rad"
reference_frame factor "allocentric"
coordinate_system factor "cartesian_2d"
point_of_reference factor "bottom_left"
variables_what / variables_when / variables_where char vectors see above

Design decisions

Column mapping: aniframe -> movement

aniframe column movement Action
individual individuals Direct -- canonical name
keypoint keypoints Direct -- canonical name
track individuals Rename with warning
time time coord Convert units (see below); preserve original values (aniframe time starts at 1, not 0)
x, y space = ["x", "y"] Direct 2D Cartesian
x, y, z space = ["x", "y", "z"] Direct 3D Cartesian
confidence confidence variable Optional; fill NaN if absent

Extra variables_what/variables_when columns (e.g., model, session, trial): the general rule is whether the extra columns are resolvable -- i.e., they contain only a single unique value and can be safely dropped:

  • Extra column with one unique value -> drop with an info-level log message
  • Extra column with multiple unique values -> error out (movement cannot represent hierarchical identity or multi-level time contexts)

A file with session=1 and trial=1 (constants) would load cleanly under this rule.

Polar/spherical coordinates (rho, phi, theta): error out -- movement only supports Cartesian space coordinates.

Metadata mapping

aniframe movement .attrs Notes
source source_software Forward original source name (e.g., "SLEAP", "DeepLabCut"); fall back to "aniframe" if source is NA
sampling_rate fps Hz = fps, direct
unit_time time_unit Auto-convert all units to seconds; derive fps from sampling_rate
unit_space space_unit (custom) Preserved as custom attribute
reference_frame reference_frame (custom) Preserved as custom attribute
point_of_reference point_of_reference (custom) Warn when "bottom_left" (movement/napari convention is top-left origin)

Time unit handling: automatically convert any unit_time value to seconds and derive fps from sampling_rate where available:

  • "frame" -> pass raw frame numbers to from_numpy() without fps (time axis = original integers, e.g. starting at 1)
  • "s" -> already in seconds; set fps = sampling_rate
  • "ms", "us", "ns" -> divide to seconds; set fps = sampling_rate
  • "m", "h" -> convert to seconds; set fps = sampling_rate

Scope

  • Poses only for this PR. Bounding-box support (aniframe with keypoint = "centroid") is deferred.
  • Load only (aniframe -> movement). The reverse is already handled by aniread's read_movement().

Dependencies

pyarrow

  • Not currently in movement's dependency tree -- not in core dependencies, not in any optional extras. xarray[accel,io,viz] does not pull it in transitively (confirmed by inspection of the installed environment).
  • Needed for both reading the Parquet file and accessing the file-level schema metadata (where the aniframe R metadata lives).
  • Should be added as a core dependency or gated with a runtime check_installed check following the pattern in aniread's check_arrow().

R metadata decoding (rdata)

  • The metadata is R's ASCII serialisation format, not JSON -- it cannot be decoded with standard library tools.
  • rdata is a pure-Python R serialisation parser. Its only dependencies are numpy, xarray, and pandas -- all already required by movement. This makes it a low-cost addition with no new transitive dependencies.
  • Alternative: skip full metadata decoding, infer variables_what/when/where from column names (same heuristics aniframe itself uses), and require the user to pass fps/source_software explicitly.

Open questions / items for discussion

  1. pyarrow as core vs optional dependency: Not currently in the dep tree. Add to [project.dependencies], or guard with a runtime check and ask users to install it as needed?

  2. point_of_reference coordinate flip: aniframe defaults to "bottom_left" origin; movement/napari conventionally uses top-left (image coordinates). For now the loader will warn and leave data as-is. Automatic y-axis flipping could be added later.

  3. Metadata decoding strategy: Use rdata (pure Python, no new transitive deps) for full metadata, or infer from column names and accept that source_software/fps/unit_time may need to be provided by the caller?

  4. source_software fallback when aniframe source is NA: use "aniframe", or require the caller to pass source_software explicitly?


Files to create / modify

New files

  • movement/io/load_aniframe.py -- loader function from_aniframe_file() and internal parsing helpers
  • tests/test_unit/test_io/test_load_aniframe.py -- unit tests

Modified files

  • movement/validators/files.py -- add ValidAniframeParquet attrs class
  • movement/io/load.py -- update SourceSoftware type alias; import new loader
  • movement/io/__init__.py -- import load_aniframe to trigger loader registration
  • docs/source/user_guide/input_output.md -- add row to supported formats table
  • pyproject.toml -- add pyarrow; consider rdata
  • movement/sample_data.py -- register sample aniframe Parquet file (once added to GIN)

Test plan

  • Load a minimal 2D aniframe Parquet file (synthetic fixture) -> correct position, confidence, individuals, keypoints, time, fps, source_software
  • track column renamed to individual with a UserWarning
  • model column with multiple values -> ValueError
  • model column with single value -> dropped with info log
  • session/trial with multiple values -> ValueError
  • session/trial with single value -> dropped with info log
  • Polar/spherical variables_where -> ValueError
  • Missing confidence column -> filled with NaN
  • 3D Cartesian (x, y, z) -> space = ["x", "y", "z"]
  • unit_time = "ms" + sampling_rate = 30 -> correct seconds time axis and fps = 30
  • unit_time = "frame" -> time axis preserved as original integers (e.g. starting at 1)
  • point_of_reference = "bottom_left" -> UserWarning emitted
  • source_software forwarded from aniframe source metadata
  • Auto-inference (source_software="auto") correctly identifies .parquet files

Generated with Claude Code

@codecov

codecov Bot commented Apr 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.38235% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.41%. Comparing base (82b117b) to head (425dc68).

Files with missing lines Patch % Lines
movement/io/aniframe.py 90.62% 18 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##              main     #963      +/-   ##
===========================================
- Coverage   100.00%   99.41%   -0.59%     
===========================================
  Files           41       42       +1     
  Lines         2815     3087     +272     
===========================================
+ Hits          2815     3069     +254     
- Misses           0       18      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add `ValidAniframeParquet` file validator to `validators/files.py`
- Add `from_aniframe_file()` loader in new `movement/io/load_aniframe.py`
- Register "aniframe" as a recognised `SourceSoftware` in `load.py`
- Import `load_aniframe` in `io/__init__.py` to trigger registration
- Add `pyarrow` and `rdata` as core dependencies in `pyproject.toml`
- Add 57-test suite in `tests/test_unit/test_io/test_load_aniframe.py`

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roaldarbol

roaldarbol commented Apr 16, 2026

Copy link
Copy Markdown
Contributor Author

Sample files needed on GIN

Before this PR can be merged we need at least one (ideally two) real aniframe files on GIN for integration tests. I'll generate these from animovemnt so the embedded R metadata serialisation is exercised end-to-end — the current unit tests cover all edge cases but mock the metadata-decoding step which isn't ideal.

Required: 1 × 2D file

The primary integration test file. It should have:

  • 2D coordinates (x, y) and a confidence column
  • sampling_rate set to a real value (e.g. 30) — the existing test fixture has NaN here, so fps-from-metadata is currently never tested on real data
  • source set to a known software name (e.g. "SLEAP" or "DeepLabCut")
  • point_of_reference = "top_left" — so the load completes cleanly without triggering the flip warning (that warning path is already covered by a unit test)
  • At least 2 individuals and 2 keypoints
  • ~10–20 frames — keep it small for fast downloads
  • session and trial columns each holding a single value — confirms the single-value-drop path runs on real metadata

Nice to have: 1 × 3D file

A small 3D file (x, y, z) to verify 3D support against real aniframe output. The 3D array-building path is unit-tested synthetically but has never run against a file produced by R.

What you don't need separate files for

The following are all covered by the synthetic unit tests and do not need dedicated GIN files: trackindividual rename, polar coordinate rejection, multi-value session/trial error, missing metadata, and the bottom-left origin warning.

roaldarbol and others added 3 commits April 16, 2026 14:56
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace triple nested loop with itertools.product and an inner function.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roaldarbol

Copy link
Copy Markdown
Contributor Author

So , outstanding things that have cropped up:

  • animovement allows users to convert time to pretty much anything, us, ms, s, min, h, etc. (same goes for spatial units), and then the unit is recorded in the metadata. For now, I assume that you would like to just get seconds? The current solution is to have a lookup table and convert on import.
  • the reference frame in animovement is almost always bottom_left - I think I'll need to add a metadata field that allows converting back to original top-left (which is what we often get from the trackers themselves) so it can also be converted back to top-left for import into movement.
  • there are some warning in _decode_aniframe_metadata from the rdata package that I have silenced (it's just that the package doesn't know the aniframe class (which makes complete sense).
  • extra column names: for extra columns, e.g. computed variables such as "speed", should they just be named as-is, or should they be prefixed to avoid namespace clashes? I'm leaning towards as-is - simply because that's what I would expect as a user.
  • as mentioned in the comment above, we need some aniframe parquet files on GIN. Once I know exactly what should be tested, I can make a minimal set of them and @niksirbi can upload them?
  • and finally, the dependencies - do we add pyarrow as a required dependency?

roaldarbol and others added 3 commits April 16, 2026 16:17
- Use metadata variables_what/when/where to classify DataFrame columns
  instead of hardcoded frozensets; missing fields raise ValueError
- Extra columns not belonging to any aniframe category are inferred to
  their minimum xarray dimensions and added as Dataset variables
- Constant extra columns are logged at INFO and skipped
- Numeric extra columns stored as float64; others as object dtype
- Add _extract_meta_vars, _infer_extra_dims, _build_extra_array helpers
- Update _resolve_columns to use metadata vars and return extra col list
- Expand test suite from 57 to 71 tests covering all new code paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per maintainer guidance: tracks are interpreted as individuals by design.
Users who need to stitch tracks should do so before loading.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roaldarbol roaldarbol marked this pull request as ready for review April 16, 2026 14:30
@niksirbi

Copy link
Copy Markdown
Member

Thanks for the PR @roaldarbol! So excited that this is finally happening.

I'll do a first high-level pass at this PR next week, after which I'll get in touch with you to get sample data files.
After I put them on GIN, you can write some tests using them, and I can have another review pass after that.

Sounds like a plan?

@roaldarbol

Copy link
Copy Markdown
Contributor Author

Sounds good to me!

roaldarbol and others added 4 commits April 22, 2026 13:40
Adds an `extra_var_dims` floor parameter so callers can ensure extra
data variables always carry specific dimensions (e.g. `"individuals"`)
even in single-individual/single-keypoint files where auto-inference
would otherwise collapse that axis away.

Accepts a plain string or a tuple of strings for convenience.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract _normalise_extra_var_dims helper to reduce cognitive complexity
of from_aniframe_file back within the C901 limit. Shorten two test
docstrings to stay within the 79-character line limit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Picks up the typer dependency from the typer branch to pre-empt
the conflict when that branch merges to main.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roaldarbol

Copy link
Copy Markdown
Contributor Author

For extra columns, I have made the function automatically find the minimal number of dimensions needed to describe it so it automatically gets the correct dims (e.g. if you have the area of the individual, then in animovement all keypoints will have the identical value - and when imported into movement, it would resolve to be a time, individual variable.

The edge case is for single individual/single keypoint cases where extra variables would resolve to just time - to ensure that extra columns be attributes to the correct dimensions I just added an extra argument (extra_var_dims) that allows the user to override it and e.g. say extra_var_dims = "individual" if they should all be that. It also allows specification on a per variable basis:

extra_var_dims={
                      "temperature": ("time",),
                      "bbox_area":   ("time", "individuals"),
                      "speed":       ("time", "individuals"),
                  })

@roaldarbol roaldarbol mentioned this pull request Apr 28, 2026
9 tasks

@niksirbi niksirbi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a first pass at this @roaldarbol.

Here are my comments, mostly focusing on design decisions and the questions you'd asked. I'v also left several inline comments, but have not gone much into implementation details (those can be handled later).

  1. pyarrow / rdata dependencies. Pyarrow is a heavy native package , so forcing it on all users for one file format is out of proportion. The good news is that both packages are also available on conda-forge (that's a hard requirement for us). My recommendation would be to gate both pyarrow and rdata behind a new aniframe optional extra. We can also do a runtime check on top of that, as you suggested, with a clear error message ("install with pip install movement[aniframe] or conda install -c conda-forge pyarrow rdata") if someone tries to use the new loader without the requisite dependencies. On movement's conda-forge feedstock (I update that upon release), I will add both packages under run_constrained:, not run:. That keeps the default conda install lean, and still protects against version incompatibilities if a user installs them separately. This matches the pattern used by other scientific Python projects (including xarray) for their heavy optional backends. We will need to also update the installation guide accordingly.
  2. point_of_reference = "bottom_left" handling. I'm fine with just emitting a warning for now, but you may also want to consider doing the y-flip by default. The reason is that with "bottom_left" the data will not correctly overlay on a video/frame as is. So if we intend to use our napari GUI as a viewer of aniframe .parquet files (would be neat!), auto-converting to movement's "top_left" convention would make things easier. In the near future, it would enable someone to drag and drop a video, followed by the parquet file into napari.
  3. Metadata decoding strategy. The rdata package is lightweight and the full-metadata
    path is much better UX than "pass fps / source manually". Keep as-is, but make it part of an optional extra dependency (see point 1).
  4. source_software fallback when source is NA. Let's not require a user to pass source_software. I'm fine with falling back to either aniframe or None (as the code currently does).
  5. Time-unit handling. Converting everything to seconds is the right default with fallback to frame units when that's not possible (I think your implementation already takes that approach).
  6. Extra-column namespace. Unprefixed is fine and matches user expectations; this is the right default.
  7. Sample data on GIN. The 2D + 3D spec in the comment thread looks right. Feel free to share the files we me and I will put them on GIN. For each .parquet file, also fill our a metadata.yaml entry as [described here](https://movement.neuroinformatics.dev/latest/community/contributing.html#metadata-yaml-example-entry). For the 2D file, also provide a sample frame at minimum, and optionally the corresponding video if you can share it. This will make it possible to user-test the napari overlay.
  8. Loading aniframe files in napari. If we want the .parquet files to be also load-able into napari, you will have to at minimum updated the SUPPORTED_POSES_FILES constant in movement/napari/loader_widgets.py. See also point 2 above.

Comment thread movement/io/load_aniframe.py Outdated
Comment thread movement/io/load_aniframe.py Outdated
Comment thread movement/io/load_aniframe.py Outdated
Comment thread movement/io/load_aniframe.py Outdated
Comment thread movement/io/load_aniframe.py Outdated
Comment thread movement/io/load_aniframe.py Outdated
Comment thread movement/io/load_aniframe.py Outdated
Comment thread movement/io/load_aniframe.py Outdated
Comment thread movement/io/load_aniframe.py Outdated
Comment thread movement/io/load_aniframe.py Outdated
@roaldarbol

roaldarbol commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for such a detailed review @niksirbi! Will go through it later this week!

Before I do anything else, I'm trying to look for lightweight alternatives to pyarrow as it would be preferable to keep the deps obligatory if at all possible - would something like fastparquet, which only also adds ramjam + fsspec, be small enough? Other alternatives are duckdb or polars.

Edit: From the fastparquret docs:

March 2026. The release of pandas 3.0 has broken a number of things in fastparquet. Since pandas now depends explicitly on pyarrow, there is no longer any demand for the existence of this project, and it is being retired. Perhaps use will continue for those still using pandas 2.x, but we anticipate no further development.

For the auto-flipping y-axis, I need to implement something - a metadata field - that keeps the y value, ideally the frame height, but otherwise probably the max. y value from the data (I think that's what I currently use in the readers), which would allow converting between bottom_left and top_left on the fly. And then this value would be pulled in in this reader.

@niksirbi

Copy link
Copy Markdown
Member

Before I do anything else, I'm trying to look for lightweight alternatives to pyarrow as it would be preferable to keep the deps obligatory if at all possible - would something like fastparquet, which only also adds ramjam + fsspec, be small enough? Other alternatives are duckdb or polars.

Thanks for looking into alternatives! I spent some time looking into them as well. I think we can exclude fastparquet based on the note you've posted. Regarding pandas' dependency on pyarrow, an initial read of that note is slightly misleading: pyarrow isn't included among pandas' core dependencies, it's gated behind parquet extras, see the pyproject.toml. But in any case, that fact is encouraging about the stability and trustworthiness of pyarrow; it's the standard go-to choice within the scientific Python ecosystem for reading parquet files, and more established in that context than duckdb or polars.

My reasoning for having it as an optional dependency is that it's heavy-ish and necessary for only a small subset of users (this may change in the future though...). The counter-argument would be to avoid complicating installation instructions.

Before we reach a decision on this, I'd be curious to hear why you think it's preferable to keep the deps obligatory.

For the auto-flipping y-axis, I need to implement something - a metadata field - that keeps the y value, ideally the frame height, but otherwise probably the max. y value from the data (I think that's what I currently use in the readers), which would allow converting between bottom_left and top_left on the fly. And then this value would be pulled in in this reader.

Ah yeah, that makes sense. If you prefer, you're welcome to proceed here without waiting for that, and tackle the auto-flipping in a future PR (opening an issue to avoid forgetting it). Up to you!

@roaldarbol

roaldarbol commented Apr 30, 2026

Copy link
Copy Markdown
Contributor Author

Ah yeah, that makes sense. If you prefer, you're welcome to proceed here without waiting for that, and tackle the auto-flipping in a future PR (opening an issue to avoid forgetting it). Up to you!

I think I'll get it done for this PR. My thought is basically, at read time, to encode either (1) the frame height or (2) the max(y) as y_height. Then a simple flipping function new_y = h - y where h is y_height and y is the vector of y values, will convert back and forth between top and bottom origin. I'll implement such a function in animovement, should I also make a function for it in this movement PR, or should I just do it at read time - pro's is that it could be used for changing the reference frame going forwards.

  • If I make such a function, where would you like for it to go?
  • And do you think y_height sounds like a reasonable metadata field name?

For pyarrow, yeah I also don't think they got it correct - see the discussion of it here: pandas-dev/pandas#54466.

I think my main concern is on the conda-forge side of things, that users need to explicitly add them as extra dependencies. But with no leaner good alternative I think it's just the way it will have to be. :-) In that case, I imagine we should add an informative error, something like:

try:
    import pyarrow.parquet as pq
except ImportError as e:
    raise ImportError(
        "Reading Parquet files requires the optional 'aniframe' "
        "dependencies (pyarrow, rdata), which are not installed.\n\n"
        "Install them with one of:\n"
        "  pip install 'movement[aniframe]'\n"
        "  conda install -c conda-forge pyarrow rdata\n"
        "  pixi add pyarrow rdata\n\n"
        "See https://movement.neuroinformatics.dev/latest/user_guide/installation.html"
        "for details."
    ) from e

EDIT: I actually quite like the tabbed install instructions, so I don't think it'll be an issue, maybe we'd just need to add it there too? https://movement.neuroinformatics.dev/latest/user_guide/installation.html#install-the-package

@niksirbi

niksirbi commented May 1, 2026

Copy link
Copy Markdown
Member

I think I'll get it done for this PR. My thought is basically, at read time, to encode either (1) the frame height or (2) the max(y) as y_height. Then a simple flipping function new_y = h - y where h is y_height and y is the vector of y values, will convert back and forth between top and bottom origin. I'll implement such a function in animovement, should I also make a function for it in this movement PR, or should I just do it at read time - pro's is that it could be used for changing the reference frame going forwards.

  • If I make such a function, where would you like for it to go?
  • And do you think y_height sounds like a reasonable metadata field name?

Sounds sensible. For this PR, I would just implement this at .parquet read time
There's also an argument for making this a standalone utility inside movement.transforms, but it will be easier to think about that once we have the first (read-time) implementation here.

I think my main concern is on the conda-forge side of things, that users need to explicitly add them as extra dependencies. But with no leaner good alternative I think it's just the way it will have to be. :-) In that case, I imagine we should add an informative error, something like:

In the call today, we decided to have pyarrow as extras for now (we may 'promote' it to core dependency in the future).

I think raising an informative error, like the one you suggested, is a great idea.
And yes, I think we should mention this 'extra' in the tabbed installation instructions (as we do for napari).

Implements the inline and high-level review comments on neuroinformatics-unit#963.

Refactor:
- Move public `from_aniframe_file` into `load_poses.py` next to other
  format-specific loaders; keep private helpers in a new `aniframe.py`
  module (mirroring the `nwb.py` pattern).
- Split `ValidAniframeParquet` into `_file_validator + _parquet_validator`
  and fold the `variables_what/when/where` required-fields check into
  the validator itself.
- Rename `point_of_reference` → `origin` throughout (metadata key,
  dataset attr, docstrings, tests).

Behaviour:
- Add `use_frame_numbers_from_file=False` flag so callers can preserve
  the file's `time` column values (irregular sampling, non-zero start)
  and convert them to seconds via the now-wired `_TIME_UNIT_TO_SECONDS`
  lookup.
- Drop the `extra_var_dims` parameter; instead auto-expand inferred
  dims to include any canonical dim that is a singleton in this file,
  so output shapes stay consistent across files differing only in
  singleton-dim sizes (e.g. 1- vs 3-individual).
- Auto-flip y when `origin == "bottom_left"`. Uses `y_height` from the
  metadata when present, otherwise falls back to `max(y)` from the data.
  Sets `ds.attrs["origin"] = "top_left"` and records `y_height` so the
  flip can be reversed later. The previous bottom-left `UserWarning` is
  gone — the data is now oriented correctly.

Build:
- Move `pyarrow` and `rdata` into a new `aniframe` optional extra. Bare
  `import movement` no longer requires either; the loader (and
  validator) raise a clear `ImportError` with install instructions only
  when an aniframe code path is actually invoked.
- Update the installation guide with the new extra across all three
  install tabs (conda-forge, pip, uv) including combined-extras syntax.
- Register `aniframe / *.parquet` in the napari plugin's
  `SUPPORTED_POSES_FILES` so the GUI can browse and load aniframe files.

Cleanups:
- `_decode_aniframe_metadata` now raises `ValueError` with a clear
  message on missing `b"r"` key or an undecodable R blob (was
  log-and-return-`{}`, which produced a confusing two-step error).
- Special-case `bool` dtype in `_build_extra_array` (previously coerced
  to `float64` with NaN fill via `is_numeric_dtype`).
- `individual_names` now uses first-occurrence ordering to match
  `keypoint_names`.
- Drop the redundant `or unit_time == "s"` clause in `_resolve_fps`.
- Docstring fixes: link to the aniframe spec, example uses
  `from_aniframe_file` directly, `_resolve_columns` notes "INFO log"
  instead of "warning".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@roaldarbol

roaldarbol commented May 9, 2026

Copy link
Copy Markdown
Contributor Author

Alright, now I've spent a few hours going through the comments and addressing them. Here's a few things I think are worth flagging:

  • In aniframe I renamed point_of_reference to origin, so I renamed this everywhere (metadata key, ds.attrs, docstrings, tests) to fit.
  • I removed extra_var_dims completely and replaced with the singleton-canonical-dim auto-expansion you suggested.
  • Now auto-flips y unconditional when origin == "bottom_left". Uses y_height from the metadata when present, falls back to max(y) from the data otherwise, which is how I've also implemented it in aniframe. After flipping, ds.attrs["origin"] is set to top_left and y_height is recorded so the flip can be reversed.
  • I also added the dependencies as extras in pyproject.toml, do please have a look to see whether it matches the style you'd expect.
  • _parquet_validator is now a generic factory that mirrors _hdf5_validator. The aniframe-specific @file.validator method on ValidAniframeParquet no longer touches pq directly, but delegates the b"r" key check to _decode_aniframe_metadata, then does the required-fields check on top.

 into animovement-reader

# Conflicts:
#	pyproject.toml
SonarQube flagged the direct == checks on ds.attrs["y_height"] in the
new bottom_left → top_left flip tests. Switch to pytest.approx, matching
the existing fps assertions in the same file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud

sonarqubecloud Bot commented May 9, 2026

Copy link
Copy Markdown

@roaldarbol

Copy link
Copy Markdown
Contributor Author

Additional point that we should discuss:

  • aniframes can contain multiple trials or sessions or other such temporal variables (variables_when). How should movement handle that, if at all? I wonder whether we automatically could return a list of xarrays? Or should it fail?

@niksirbi

Copy link
Copy Markdown
Member

Thanks for the updates @roaldarbol. Have been busy with other projects this week but I will have another pass on this next week latest. I'll also think about the sessions/trials question.

@niksirbi

Copy link
Copy Markdown
Member

Btw, it's looking like #973 is going to be merged ahead of this one, so you will be able to do away with the singular-plural conversion in this PR.

@roaldarbol

roaldarbol commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up for the when issue.

  • If there are more than one video (so multiple values in when variables), then we give an error.
  • We can supply a parameter (let's session_keys, very not decided) that takes a dict with the name of the variable and the value, e.g. {session: 1, trial: 3}.
  • For the error message, provide the names of the when variables - and maybe also potential values?
  • Otherwise encourage users to split up on write from aniframe (needs its own issue on aniread).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement I/O for parquet files

2 participants