Skip to content

Flaky JSON error in encoder tests #1330

@Dan-Flores

Description

@Dan-Flores

Context:

We have encoder tests that rely on ffprobe to compare our encoded media's metadata with ffmpeg's.

Problem:

These tests are flaky and sometimes fail on Windows + FFmpeg8 with the error: json.decoder.JSONDecodeError: Expecting ',' delimiter.

These tests normally pass after a retry, but currently they add noisy failures to PRs, so we should find and resolve the underlying cause, or switch to a more reliable output format such as ffprobe's default format, which we use in _get_video_metadata.

Example failures:

TestAudioEncoder::test_against_cli on Windows + FFmpeg8, for mp3 and flac:

Details
================================== FAILURES ===================================
__ TestAudioEncoder.test_against_cli[to_file-flac-32000-1-999999999-asset0] ___

self = <test.test_encoders.TestAudioEncoder object at 0x000001D7D3AB19A0>
asset = TestAudio(filename='nasa_13013.mp4.audio.mp3', default_stream_index=0, stream_infos={0: TestAudioStreamInfo(sample_rat...,  2.3536e-04,  2.7501e-04,  2.1080e-04,
         -2.1618e-05, -8.9567e-05, -4.4332e-04, -5.0099e-04, -2.7716e-04]])]})
bit_rate = 999999999, num_channels = 1, sample_rate = 32000, format = 'flac'
method = 'to_file'
tmp_path = WindowsPath('C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/test_against_cli_to_file_flac_38')
capfd = <_pytest.capture.CaptureFixture object at 0x000001D7D599EAD0>
with_ffmpeg_debug_logs = None

    @needs_ffmpeg_cli
    @pytest.mark.parametrize("asset", (NASA_AUDIO_MP3, SINE_MONO_S32))
    @pytest.mark.parametrize("bit_rate", (None, 0, 44_100, 999_999_999))
    @pytest.mark.parametrize("num_channels", (None, 1, 2))
    @pytest.mark.parametrize("sample_rate", (8_000, 32_000))
    @pytest.mark.parametrize(
        "format",
        [
            # TODO: https://github.com/pytorch/torchcodec/issues/837
            pytest.param(
                "mp3",
                marks=pytest.mark.skipif(
                    IS_WINDOWS and ffmpeg_major_version <= 5,
                    reason="Encoding mp3 on Windows is weirdly buggy",
                ),
            ),
            pytest.param(
                "wav",
                marks=pytest.mark.skipif(
                    ffmpeg_major_version == 4,
                    reason="Swresample with FFmpeg 4 doesn't work on wav files",
                ),
            ),
            "flac",
        ],
    )
    @pytest.mark.parametrize("method", ("to_file", "to_tensor", "to_file_like"))
    def test_against_cli(
        self,
        asset,
        bit_rate,
        num_channels,
        sample_rate,
        format,
        method,
        tmp_path,
        capfd,
        with_ffmpeg_debug_logs,
    ):
        # Encodes samples with our encoder and with the FFmpeg CLI, and checks
        # that both decoded outputs are equal
    
        encoded_by_ffmpeg = tmp_path / f"ffmpeg_output.{format}"
        subprocess.run(
            ["ffmpeg", "-i", str(asset.path)]
            + (["-b:a", f"{bit_rate}"] if bit_rate is not None else [])
            + (["-ac", f"{num_channels}"] if num_channels is not None else [])
            + ["-ar", f"{sample_rate}"]
            + [
                str(encoded_by_ffmpeg),
            ],
            capture_output=True,
            check=True,
        )
    
        encoder = AudioEncoder(self.decode(asset).data, sample_rate=asset.sample_rate)
        params = dict(
            bit_rate=bit_rate, num_channels=num_channels, sample_rate=sample_rate
        )
        if method == "to_file":
            encoded_by_us = tmp_path / f"output.{format}"
            encoder.to_file(dest=str(encoded_by_us), **params)
        elif method == "to_tensor":
            encoded_by_us = encoder.to_tensor(format=format, **params)
        elif method == "to_file_like":
            file_like = io.BytesIO()
            encoder.to_file_like(file_like, format=format, **params)
            encoded_by_us = file_like.getvalue()
        else:
            raise ValueError(f"Unknown method: {method}")
    
        captured = capfd.readouterr()
        if format == "wav":
            assert "Timestamps are unset in a packet" not in captured.err
        if format == "mp3":
            assert "Queue input is backward in time" not in captured.err
        if format in ("flac", "wav"):
            assert "Encoder did not produce proper pts" not in captured.err
        if format in ("flac", "mp3"):
            assert "Application provided invalid" not in captured.err
    
        assert_close = torch.testing.assert_close
        if sample_rate != asset.sample_rate:
            if platform.machine().lower() == "aarch64":
                rtol, atol = 0, 1e-2
            else:
                rtol, atol = 0, 1e-3
    
            if sys.platform == "darwin":
                assert_close = partial(assert_tensor_close_on_at_least, percentage=99)
        elif format == "wav":
            rtol, atol = 0, 1e-4
        elif format == "mp3" and asset is SINE_MONO_S32 and num_channels == 2:
            # Not sure why, this one needs slightly higher tol. With default
            # tolerances, the check fails on ~1% of the samples, so that's
            # probably fine. It might be that the FFmpeg CLI doesn't rely on
            # libswresample for converting channels?
            rtol, atol = 0, 1e-3
        else:
            rtol, atol = None, None
    
        if IS_WINDOWS_WITH_FFMPEG_LE_70 and format == "mp3":
            # We're getting a "Could not open input file" on Windows mp3 files when decoding.
            # TODO: https://github.com/pytorch/torchcodec/issues/837
            return
    
        samples_by_us = self.decode(encoded_by_us)
        samples_by_ffmpeg = self.decode(encoded_by_ffmpeg)
    
        assert_close(
            samples_by_us.data,
            samples_by_ffmpeg.data,
            rtol=rtol,
            atol=atol,
        )
        assert samples_by_us.pts_seconds == samples_by_ffmpeg.pts_seconds
        assert samples_by_us.duration_seconds == samples_by_ffmpeg.duration_seconds
        assert samples_by_us.sample_rate == samples_by_ffmpeg.sample_rate
    
        if method == "to_file":
>           validate_frames_properties(actual=encoded_by_us, expected=encoded_by_ffmpeg)

test\test_encoders.py:387: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test\test_encoders.py:64: in validate_frames_properties
    frames_actual, frames_expected = (
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
test\test_encoders.py:65: in <genexpr>
    json.loads(
C:\Users\runneradmin\miniconda3\envs\test\Lib\json\__init__.py:352: in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\Users\runneradmin\miniconda3\envs\test\Lib\json\decoder.py:345: in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <json.decoder.JSONDecoder object at 0x000001D7B88BF8C0>
s = '{\n    "frames": [\n        {\n            "pts": 0,\n            "pts_time": "0.000000",\n            "duration": 23...": "0.024000",\n            "sample_fmt": "s32",\n            "nb_samples": 768,\n            "channels": 1\n        }'
idx = 0

    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.
    
        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.
    
        """
        try:
>           obj, end = self.scan_once(s, idx)
                       ^^^^^^^^^^^^^^^^^^^^^^
E           json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1640 column 10 (char 44777)

C:\Users\runneradmin\miniconda3\envs\test\Lib\json\decoder.py:361: JSONDecodeError
---------------------------- Captured stderr call -----------------------------
[AVFormatContext @ 000001D7CFE9D4C0] Opening 'C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_against_cli_to_file_flac_38\output.flac' for reading
[file @ 000001D7D30FBB00] Setting default whitelist 'file,crypto,data'
[flac @ 000001D7CFE9D4C0] Format flac probed with size=2048 and score=100
[flac @ 000001D7CFE9D4C0] Before avformat_find_stream_info() pos: 8286 bytes read:32768 seeks:0 nb_streams:1
[flac @ 000001D7CFE9D4C0] All info found
[flac @ 000001D7CFE9D4C0] After avformat_find_stream_info() pos: 28766 bytes read:32768 seeks:0 frames:1
[SWR @ 000001D7D338CA80] Using fltp internally between filters
[flac @ 000001D7CFE9D4C0] first_dts 0 not matching first dts 396288 (pts 396288, duration 2304) in the queue
[AVIOContext @ 000001D7CB58BB40] Statistics: 489375 bytes read, 0 seeks
[AVFormatContext @ 000001D7CFE9DCC0] Opening 'C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_against_cli_to_file_flac_38\ffmpeg_output.flac' for reading
[file @ 000001D7D30FC340] Setting default whitelist 'file,crypto,data'
[flac @ 000001D7CFE9DCC0] Format flac probed with size=2048 and score=100
[flac @ 000001D7CFE9DCC0] Before avformat_find_stream_info() pos: 8365 bytes read:32768 seeks:0 nb_streams:1
[flac @ 000001D7CFE9DCC0] All info found
[flac @ 000001D7CFE9DCC0] After avformat_find_stream_info() pos: 28845 bytes read:32768 seeks:0 frames:1
[SWR @ 000001D7D338CA80] Using fltp internally between filters
[flac @ 000001D7CFE9DCC0] first_dts 0 not matching first dts 396288 (pts 396288, duration 2304) in the queue
[AVIOContext @ 000001D7CB58F7C0] Statistics: 489454 bytes read, 0 seeks
=========================== short test summary info ===========================
FAILED test/test_encoders.py::TestAudioEncoder::test_against_cli[to_file-flac-32000-1-999999999-asset0] - json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1640 column 10 (char 44777)
=========== 1 failed, 1232 passed, 595 skipped in 547.98s (0:09:07) ===========

TestVideoEncoder::test_video_encoder_against_ffmpeg_cli on Windows + FFmpeg8 for flv, probably others.

Details
================================== FAILURES ===================================
_ TestVideoEncoder.test_video_encoder_against_ffmpeg_cli[30-to_file-encode_params2-flv] _

self = <test.test_encoders.TestVideoEncoder object at 0x000001D382E05950>
tmp_path = WindowsPath('C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/test_video_encoder_against_ffm14')
format = 'flv'
encode_params = {'crf': None, 'pixel_format': 'yuv420p', 'preset': 'ultrafast'}
method = 'to_file', frame_rate = 30

    @needs_ffmpeg_cli
    @pytest.mark.parametrize(
        "format",
        (
            "mov",
            "mp4",
            "avi",
            "mkv",
            "flv",
            pytest.param(
                "webm",
                marks=[
                    pytest.mark.slow,
                    pytest.mark.skipif(
                        ffmpeg_major_version == 4
                        or (IS_WINDOWS and ffmpeg_major_version >= 6),
                        reason="Codec for webm is not available in this FFmpeg installation.",
                    ),
                ],
            ),
        ),
    )
    @pytest.mark.parametrize(
        "encode_params",
        [
            {"pixel_format": "yuv444p", "crf": 0, "preset": None},
            {"pixel_format": "yuv420p", "crf": 30, "preset": None},
            {"pixel_format": "yuv420p", "crf": None, "preset": "ultrafast"},
            {"pixel_format": "yuv420p", "crf": None, "preset": None},
        ],
    )
    @pytest.mark.parametrize("method", ("to_file", "to_tensor", "to_file_like"))
    @pytest.mark.parametrize("frame_rate", [30, 29.97])
    def test_video_encoder_against_ffmpeg_cli(
        self, tmp_path, format, encode_params, method, frame_rate
    ):
        pixel_format = encode_params["pixel_format"]
        crf = encode_params["crf"]
        preset = encode_params["preset"]
    
        if format in ("avi", "flv") and pixel_format == "yuv444p":
            pytest.skip(f"Default codec for {format} does not support {pixel_format}")
    
        source_frames = self.decode(TEST_SRC_2_720P.path)
    
        # Encode with FFmpeg CLI
        temp_raw_path = str(tmp_path / "temp_input.raw")
        with open(temp_raw_path, "wb") as f:
            f.write(source_frames.permute(0, 2, 3, 1).cpu().numpy().tobytes())
    
        ffmpeg_encoded_path = str(tmp_path / f"ffmpeg_output.{format}")
        # Some codecs (ex. MPEG4) do not support CRF or preset.
        # Flags not supported by the selected codec will be ignored.
        ffmpeg_cmd = [
            "ffmpeg",
            "-y",
            "-f",
            "rawvideo",
            "-pix_fmt",
            "rgb24",  # Input format
            "-s",
            f"{source_frames.shape[3]}x{source_frames.shape[2]}",
            "-r",
            str(frame_rate),
            "-i",
            temp_raw_path,
        ]
        if pixel_format is not None:  # Output format
            ffmpeg_cmd.extend(["-pix_fmt", pixel_format])
        if preset is not None:
            ffmpeg_cmd.extend(["-preset", preset])
        if crf is not None:
            ffmpeg_cmd.extend(["-crf", str(crf)])
        # Output path must be last
        ffmpeg_cmd.append(ffmpeg_encoded_path)
        subprocess.run(ffmpeg_cmd, check=True)
        ffmpeg_frames = self.decode(ffmpeg_encoded_path).data
    
        # Encode with our video encoder
        encoder = VideoEncoder(frames=source_frames, frame_rate=frame_rate)
        encoder_output_path = str(tmp_path / f"encoder_output.{format}")
    
        if method == "to_file":
            encoder.to_file(
                dest=encoder_output_path,
                pixel_format=pixel_format,
                crf=crf,
                preset=preset,
            )
            encoder_frames = self.decode(encoder_output_path)
        elif method == "to_tensor":
            encoded_output = encoder.to_tensor(
                format=format,
                pixel_format=pixel_format,
                crf=crf,
                preset=preset,
            )
            encoder_frames = self.decode(encoded_output)
        elif method == "to_file_like":
            file_like = io.BytesIO()
            encoder.to_file_like(
                file_like=file_like,
                format=format,
                pixel_format=pixel_format,
                crf=crf,
                preset=preset,
            )
            encoder_frames = self.decode(file_like.getvalue())
        else:
            raise ValueError(f"Unknown method: {method}")
    
        assert ffmpeg_frames.shape[0] == encoder_frames.shape[0]
    
        # MPEG codec used for avi format does not accept CRF
        percentage = 94 if format == "avi" else 99
    
        # Check that PSNR between both encoded versions is high
        for ff_frame, enc_frame in zip(ffmpeg_frames, encoder_frames):
            res = psnr(ff_frame, enc_frame)
            assert res > 30
            assert_tensor_close_on_at_least(
                ff_frame, enc_frame, percentage=percentage, atol=2
            )
    
        # Only compare video metadata on ffmpeg versions >= 6, as older versions
        # are often missing metadata
        if ffmpeg_major_version >= 6 and method == "to_file":
            fields = [
                "duration",
                "duration_ts",
                "r_frame_rate",
                "time_base",
                "nb_frames",
            ]
            ffmpeg_metadata = self._get_video_metadata(
                ffmpeg_encoded_path,
                fields=fields,
            )
            encoder_metadata = self._get_video_metadata(
                encoder_output_path,
                fields=fields,
            )
            assert ffmpeg_metadata == encoder_metadata
    
            # Check that frame timestamps and duration are the same
            fields = ("pts", "pts_time")
            if format != "flv":
                fields += ("duration", "duration_time")
>           ffmpeg_frames_info = self._get_frames_info(
                ffmpeg_encoded_path, fields=fields
            )

test\test_encoders.py:1172: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test\test_encoders.py:669: in _get_frames_info
    frames = json.loads(result.stdout)["frames"]
             ^^^^^^^^^^^^^^^^^^^^^^^^^
C:\Users\runneradmin\miniconda3\envs\test\Lib\json\__init__.py:352: in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\Users\runneradmin\miniconda3\envs\test\Lib\json\decoder.py:345: in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <json.decoder.JSONDecoder object at 0x000001D3E8A36E40>
s = '{\n    "frames": [\n        {\n            "pts": 0,\n            "pts_time": "0.000000"\n        },\n        {\n    ... "pts_time": "0.933000"\n        },\n        {\n            "pts": 967,\n            "pts_time": "0.967000"\n        }'
idx = 0

    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.
    
        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.
    
        """
        try:
>           obj, end = self.scan_once(s, idx)
                       ^^^^^^^^^^^^^^^^^^^^^^
E           json.decoder.JSONDecodeError: Expecting ',' delimiter: line 122 column 10 (char 2412)

C:\Users\runneradmin\miniconda3\envs\test\Lib\json\decoder.py:361: JSONDecodeError
---------------------------- Captured stderr call -----------------------------
ffmpeg version 8.0.1 Copyright (c) 2000-2025 the FFmpeg developers

  built with clang version 22.1.0

  configuration: --prefix=/d/bld/ffmpeg_1773007679189/_h_env/Library --cc=clang.exe --cxx=clang++.exe --nm=llvm-nm --ar=llvm-ar --disable-doc --enable-openssl --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libharfbuzz --enable-libfontconfig --enable-libopenh264 --enable-libdav1d --ld=lld-link --target-os=win64 --enable-cross-compile --toolchain=msvc --host-cc=clang.exe --extra-libs=ucrt.lib --extra-libs=vcruntime.lib --extra-libs=oldnames.lib --strip=llvm-strip --disable-stripping --host-extralibs= --disable-libopenvino --enable-gpl --enable-libx264 --enable-libx265 --enable-libmp3lame --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libvorbis --enable-libopus --enable-librsvg --enable-libjxl --enable-libwebp --enable-ffplay --enable-vulkan --enable-libshaderc --pkg-config=/d/bld/ffmpeg_1773007679189/_build_env/Library/bin/pkg-config

  libavutil      60.  8.100 / 60.  8.100

  libavcodec     62. 11.100 / 62. 11.100

  libavformat    62.  3.100 / 62.  3.100

  libavdevice    62.  1.100 / 62.  1.100

  libavfilter    11.  4.100 / 11.  4.100

  libswscale      9.  1.100 /  9.  1.100

  libswresample   6.  1.100 /  6.  1.100

[rawvideo @ 00000230F0BD3FC0] Estimating duration from bitrate, this may be inaccurate

Input #0, rawvideo, from 'C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_video_encoder_against_ffm14\temp_input.raw':

  Duration: 00:00:01.00, start: 0.000000, bitrate: 663552 kb/s

  Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1280x720, 663552 kb/s, 30 tbr, 30 tbn

[out#0/flv @ 00000230F0B97FC0] Codec AVOption preset (Encoding preset) has not been used for any stream. The most likely reason is either wrong type (e.g. a video option with no video streams) or that it is a private option of some decoder which was not actually used for any stream.

Stream mapping:

  Stream #0:0 -> #0:0 (rawvideo (native) -> flv1 (flv))

Press [q] to stop, [?] for help

Output #0, flv, to 'C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_video_encoder_against_ffm14\ffmpeg_output.flv':

  Metadata:

    encoder         : Lavf62.3.100

  Stream #0:0: Video: flv1 ([2][0][0][0] / 0x0002), yuv420p(tv, progressive), 1280x720, q=2-31, 200 kb/s, 30 fps, 1k tbn

    Metadata:

      encoder         : Lavc62.11.100 flv

    Side data:

      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: N/A

[out#0/flv @ 00000230F0B97FC0] video:334KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.201926%

frame=   30 fps=0.0 q=31.0 Lsize=     335KiB time=00:00:01.00 bitrate=2743.2kbits/s speed=9.94x elapsed=0:00:00.10    

=========================== short test summary info ===========================
FAILED test/test_encoders.py::TestVideoEncoder::test_video_encoder_against_ffmpeg_cli[30-to_file-encode_params2-flv] - json.decoder.JSONDecodeError: Expecting ',' delimiter: line 122 column 10 (char 2412)
=========== 1 failed, 1217 passed, 596 skipped in 525.29s (0:08:45) ===========

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions