Skip to content

Commit 42ac536

Browse files
sfmigniksirbiclaude
authored
Format videos to poseinterface spec. Extract clip function. (#39)
* Draft clip extraction CLI * Rename _to_poseinterface. Add video conversion bits * Types and docstring cleanup * Revamp clips module * Fixes * Clamp duration if it exceeds video length * Add minimal tests for video conversion * Add test for extracting cliplabels from video * Add tests for clip extraction * Factor out common fixtures. Test invalid inputs for extract_clip * Add docstrings * Add CLI and entrypoint tests * Add sub_ses_cam_id fixture * Remove for diff with main * Apply suggestions from code review Co-authored-by: Niko Sirmpilatze <niko.sirbiladze@gmail.com> * Fix stem * Fix log message when reencoding is needed * Add info message to extract_clip * Make clip extraction optional * Fix image_ids and annotation ids in extracted labels for a clip * Fuse annotation remapping * Fix id check in tests * Rename full video temporary "cliplabels.json" file to "videolabels" * Add note in docstring * Add section to the benchmark dataset spec and refer to it in docstring notes * Clarify behaviour re videolabels in CLI help * Make "Intermediate file `videolabels.json`" a subsection * fix typos * Fix test fixtures lost during rebase onto main Restore TEST_DATA_DLC_DIR/TEST_DATA_SLEAP_DIR constants, the sub_ses_cam_ids fixture, and the updated CSV source path that were introduced by the annotations_to_coco refactor on main but dropped during the rebase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * make ruff happy * Fix missing space in extract-clip CLI description Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add type annotations to _extract_cliplabels Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Guard against None return from _get_video_encoding_info Raise RuntimeError if ffmpeg fails to read encoding info, and widen the return type of _get_codec_pixelformat to dict[str, str | None] to match the upstream VideoEncodingInfo field types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add API docs for video_to_poseinterface and extract_clip Also fix cross-reference to extract_clip in the benchmark dataset spec by using the fully qualified module path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Niko Sirmpilatze <niko.sirbiladze@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2881508 commit 42ac536

9 files changed

Lines changed: 798 additions & 4 deletions

File tree

docs/source/api_index.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,14 @@ io
1111
:template: function.rst
1212

1313
annotations_to_poseinterface
14+
video_to_poseinterface
15+
16+
clips
17+
-----
18+
.. currentmodule:: poseinterface.clips
19+
20+
.. autosummary::
21+
:toctree: api_generated
22+
:template: function.rst
23+
24+
extract_clip

docs/source/benchmark-dataset.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,18 @@ For a clip starting at frame 1000 with a duration of 5 frames, the `images` arra
243243
Here `id: 0` through `id: 4` are the local clip indices, while `frame-1000` through `frame-1004` in the `file_name` values refer to the original frame positions in the session video.
244244
:::
245245

246+
(target-videolabels)=
247+
#### Intermediate file `videolabels.json`
248+
249+
:::{note}
250+
This file is **not a required part of a benchmark dataset**. It is an intermediate cache file useful for data contributors when preparing labelled clips, and it is documented here only because it is optionally auto-discovered by the `extract-clip` command and the corresponding {func}`~poseinterface.clips.extract_clip` function.
251+
:::
252+
253+
* A `videolabels.json` file uses the **same schema as [`cliplabels.json`](target-cliplabels)**, but it refers to a full video rather than to a clip of it.
254+
* It is produced once per video (e.g. by converting model predictions for the entire video into the `cliplabels` schema) and reused to extract any number of clip label files from that video.
255+
* When present alongside a session video as `sub-<subjectID>_ses-<sessionID>_cam-<camID>_videolabels.json`, the `extract-clip` command will slice it into per-clip `cliplabels.json` files matching the requested frame ranges.
256+
* In the `videolabels.json` file, each entry in the `images` list uses the **0-based frame index in the video** as its `id` (same convention as [frame labels](target-framelabels)).
257+
246258
(target-startlabels)=
247259
### Clip start labels (`startlabels.json`)
248260

poseinterface/clips.py

Lines changed: 245 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,245 @@
1+
"""Functions to extract clips from ``poseinterface`` videos."""
2+
3+
import argparse
4+
import json
5+
import logging
6+
import sys
7+
from pathlib import Path
8+
9+
import sleap_io as sio
10+
11+
12+
def extract_clip(
13+
video_path: str | Path,
14+
start_frame: int,
15+
duration: int,
16+
) -> tuple[Path, Path | None]:
17+
"""Extract a video clip (and its clip labels if available).
18+
19+
Reads the source video and saves a ``.mp4`` clip to a ``Clips/``
20+
subdirectory next to the source video. If a sibling
21+
``*_videolabels.json`` file exists (holding labels for the entire
22+
session video, using the same schema as ``cliplabels.json``), a
23+
matching ``_cliplabels.json`` containing only the annotations within
24+
the requested frame range is also written.
25+
26+
27+
Parameters
28+
----------
29+
video_path
30+
Path to the input ``.mp4`` video. The filename should follow
31+
the convention ``sub-<subjectID>_ses-<sessionID>_cam-<camID>.mp4``, and
32+
if a sibling labels file exists, its filename should be
33+
``sub-<subjectID>_ses-<sessionID>_cam-<camID>_videolabels.json``.
34+
start_frame
35+
Index of the first frame to include in the clip (0-based).
36+
duration
37+
Number of frames to include in the clip. If ``start_frame +
38+
duration`` exceeds the video length, the duration is clamped to the
39+
remaining frames and a warning is logged.
40+
41+
Returns
42+
-------
43+
clip_path : Path
44+
Path to the output clip file.
45+
clip_json : Path | None
46+
Path to the ``_cliplabels.json`` file for the clip if extracted,
47+
None otherwise.
48+
49+
Raises
50+
------
51+
ValueError
52+
If ``start_frame`` is negative or ``duration`` is not positive.
53+
54+
Notes
55+
-----
56+
This function optionally consumes a ``*_videolabels.json`` file, sibling
57+
to the input video file and holding labels for the entire video. This
58+
file is an intermediate cache useful for data contributors: it follows
59+
the same schema as ``cliplabels.json`` but it refers to the full video,
60+
rather than to a clip of it. The ``*_videolabels.json`` file is not part
61+
of the published benchmark dataset. For further details, see the
62+
"Intermediate file: `videolabels.json`" section of the benchmark
63+
dataset specification.
64+
65+
This function assumes that the ``id`` field in the ``images`` list of the
66+
source ``*_videolabels.json`` corresponds to 0-based global frame indices
67+
of the full video.
68+
"""
69+
# Check input values
70+
if start_frame < 0:
71+
raise ValueError(
72+
f"start_frame must be non-negative, got {start_frame}"
73+
)
74+
if duration <= 0:
75+
raise ValueError(f"duration must be positive, got {duration}")
76+
77+
# Create "Clips" directory if it doesn't exist
78+
video_path = Path(video_path)
79+
clips_dir = video_path.parent / "Clips"
80+
clips_dir.mkdir(parents=True, exist_ok=True)
81+
82+
# Read video as array
83+
video = sio.load_video(video_path)
84+
logging.info(
85+
f"filename: {video_path.name}, fps: {video.fps}, shape: {video.shape}"
86+
)
87+
88+
# Clamp duration if it exceeds the video length
89+
if start_frame + duration > video.shape[0]:
90+
duration = video.shape[0] - start_frame
91+
logging.warning(
92+
"Clip exceeds video length. "
93+
f"Clamping duration to {duration} frames."
94+
)
95+
96+
# Slice clip and save as mp4
97+
clip = video[start_frame : start_frame + duration]
98+
clip_path = (
99+
clips_dir / f"{video_path.stem}_start-{start_frame}_dur-{duration}.mp4"
100+
)
101+
sio.save_video(clip, clip_path, fps=video.fps)
102+
103+
# Generate cliplabels.json only if a companion videolabels.json file exists
104+
video_json = video_path.parent / f"{video_path.stem}_videolabels.json"
105+
if video_json.exists():
106+
clip_json = _extract_cliplabels(
107+
video_path, clips_dir, start_frame, duration
108+
)
109+
logging.info(
110+
f"Extracted clip {clip_path.name} with labels {clip_json.name} "
111+
f"({duration} frames from start_frame={start_frame})."
112+
)
113+
else:
114+
clip_json = None
115+
logging.info(
116+
f"Extracted clip {clip_path.name} "
117+
f"({duration} frames from start_frame={start_frame}). "
118+
"No companion *_videolabels.json found; skipping label extraction."
119+
)
120+
121+
return clip_path, clip_json
122+
123+
124+
def _extract_cliplabels(
125+
video_path: Path, clips_dir: Path, start_frame: int, duration: int
126+
) -> Path:
127+
"""Extract clip labels from the sibling *_videolabels.json file."""
128+
# Read file with labels for the whole video
129+
video_json = video_path.parent / f"{video_path.stem}_videolabels.json"
130+
with open(video_json) as f:
131+
video_labels = json.load(f)
132+
133+
# Compute clip end frame
134+
end_frame = start_frame + duration
135+
136+
# Keep only data from the images in the clip, re-indexing ids to be
137+
# 0-based within the clip. file_name is left untouched to retain in it
138+
# the global (video-based) frame index
139+
clip_labels = {}
140+
clip_labels["images"] = [
141+
{
142+
**img,
143+
"id": img["id"] - start_frame, # overwrite id
144+
}
145+
for img in video_labels["images"]
146+
if start_frame <= img["id"] < end_frame
147+
]
148+
149+
# Keep only annotations within the clip, remapping image_id to the local
150+
# (clip-based) frame index, and renumbering annotation ids to be 1-based
151+
# within the clip.
152+
clip_labels["annotations"] = [
153+
{
154+
**annot,
155+
"image_id": annot["image_id"] - start_frame, # overwrite image_id
156+
"id": new_id,
157+
}
158+
for new_id, annot in enumerate(
159+
(
160+
ant
161+
for ant in video_labels["annotations"]
162+
if start_frame <= ant["image_id"] < end_frame
163+
), # generator lazily yields only annotations within the clip
164+
start=1, # annotation ids are 1-based within clip
165+
)
166+
]
167+
# pass categories unchanged
168+
clip_labels["categories"] = video_labels["categories"]
169+
170+
# Save json with filtered data to clips directory
171+
clip_json = (
172+
clips_dir / f"{video_path.stem}_"
173+
f"start-{start_frame}_dur-{duration}_cliplabels.json"
174+
)
175+
with open(clip_json, "w") as f:
176+
json.dump(clip_labels, f)
177+
178+
return clip_json
179+
180+
181+
def main(args: argparse.Namespace) -> None:
182+
"""Run clip extraction from parsed command-line arguments.
183+
184+
Parameters
185+
----------
186+
args
187+
Parsed arguments containing ``video_path``, ``start_frame``,
188+
and ``duration``.
189+
"""
190+
# Extract clip
191+
extract_clip(args.video_path, args.start_frame, args.duration)
192+
193+
194+
def parse_args(args: list[str]) -> argparse.Namespace:
195+
"""Parse command-line arguments for clip extraction.
196+
197+
Parameters
198+
----------
199+
args
200+
List of command-line argument strings (e.g. ``sys.argv[1:]``).
201+
202+
Returns
203+
-------
204+
argparse.Namespace
205+
Parsed arguments with attributes ``video_path`` (str),
206+
``start_frame`` (int), and ``duration`` (int).
207+
"""
208+
parser = argparse.ArgumentParser(
209+
description=(
210+
"Extract clips from video (and corresponding "
211+
"clip labels if available)."
212+
)
213+
)
214+
parser.add_argument(
215+
"--video_path",
216+
type=str,
217+
required=True,
218+
help="Path to video file to clip. The filename should follow "
219+
"the convention ``sub-<subjectID>_ses-<sessionID>_cam-<camID>.mp4``, "
220+
"and if a sibling labels file exists, its filename should be "
221+
"``sub-<subjectID>_ses-<sessionID>_cam-<camID>_videolabels.json``.",
222+
)
223+
parser.add_argument(
224+
"--start_frame",
225+
type=int,
226+
required=True,
227+
help="Start frame of the clip as a 0-based index.",
228+
)
229+
parser.add_argument(
230+
"--duration",
231+
type=int,
232+
required=True,
233+
help="Total length of the output clip in frames",
234+
)
235+
return parser.parse_args(args)
236+
237+
238+
def wrapper() -> None:
239+
"""Entry point for the ``extract-clip`` console script."""
240+
args = parse_args(sys.argv[1:])
241+
main(args)
242+
243+
244+
if __name__ == "__main__":
245+
wrapper()

0 commit comments

Comments
 (0)