Proposal: Simplify the project-conversion workflow with a Session-based API

After the recent work in #49, where I tried to chain together all the individual functions from `poseinterface.io`, I was inspired to come up with a higher-level API for the project-conversion workflow, i.e. **Step 1** of the below schematic.

<img width="4083" height="1354" alt="Image" src="https://github.com/user-attachments/assets/f80b28a7-401a-4881-967c-863a673e7884" />

The workflow currently involved running four per-asset functions (the *primitives*):
- `video_to_poseinterface`
- `annotations_to_poseinterface`
- `frames_to_poseinterface`
- `predictions_to_poseinterface`

It's on the caller to glue them together: discover the source paths, build the per-session output directories under, and call the four primitives in the right order with matching arguments.

The gallery example in #49 makes this concrete: only about a third of its per-session loop body is conversion calls, the rest is path wrangling and output-layout construction. 
The same pattern would apply to anyone writing this kind of pipeline against their own data. 

The proposal is to introduce a generic `Session` dataclass (or maybe `attrs` class for easier built-in validation?) and two orchestration functions in a new `poseinterface/core.py`. The existing primitives in `poseinterface/io.py` can stay mostly unchanged; the new functions would just compose them and write to the spec layout.

## Proposed public API

```python
@dataclass
class Session:
    sub_id: str
    ses_id: str
    cam_id: str                         
    video_path: Path
    split: Literal["Train", "Test"] = "Train" 
    annotations_path: Path | None = None
    frames_source_dir: Path | None = None
    predictions_path: Path | None = None

def convert_session(
    session: Session,
    benchmark_root: Path,
    *,
    project: str,
) -> Path: ...

def convert_project(
    sessions: Iterable[Session],
    benchmark_root: Path,
    *,
    project: str,
) -> None: ...
```

- `Session` carries everything we need to know about one session: its identifiers, which split it should go in, and the source paths. The above session signature is mostly informed by our recent work on DLC projects, we may need to tweak it to generalise well across pose estimation libraries.
- `convert_session` runs the conversion primitives and writes to `<benchmark_root>/<split>/<project>/sub-X_ses-Y/`, following the [benchmark dataset spec](https://poseinterface.neuroinformatics.dev/benchmark-dataset.html). 
- `convert_project` is a thin loop for converting multiple sessions belonging to the same project (sessions self-route to Train/Test via their `split`).

These 3 elements could be exposed at the top API level: `from poseinterface import Session, convert_session, convert_project`.

## Concretely

The user constructs `Session` objects directly with explicit source paths and calls `convert_project`:

```python
sessions = [
    Session(
        sub_id="...", ses_id="...", cam_id="...",
        video_path=Path(".../video.mp4"),
        annotations_path=Path(".../annotations.csv"),
        frames_source_dir=Path(".../labeled-data/<stem>"),
        predictions_path=Path(".../predictions.csv"),
        split="Train",
    ),
    # ... one per session
]
convert_project(sessions, benchmark_root, project="MyProject")
```

Once this API is in place, we can layer source-specific adapters/constructors on top — e.g. a helper that walks a DLC (or SLEAP, etc.) project and yields a list of `Session` objects in one go, so the user doesn't have to enumerate sessions by hand. That's a separate proposal once we've lived with this one for a bit; easier to design the batch conversion layer once we've settled on the per-session API layer.

Clip extraction (**Step 2** of the workflow) stays separate, because clip params (`duration`, `start_frame`) are typically iterated independently of the one-shot conversion.

Happy to draft a PR after other PRs #45 and #49 are merged, assuming a consensus emerges around this proposal. Let me know your thoughts!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Simplify the project-conversion workflow with a Session-based API #50

Proposed public API

Concretely

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Proposal: Simplify the project-conversion workflow with a Session-based API #50

Description

Proposed public API

Concretely

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions