Add compression context cache by kdt3rd · Pull Request #2440 · AcademySoftwareFoundation/openexr

kdt3rd · 2026-05-28T10:24:09Z

This adds a cache for a (de)compression context to the core decode pipeline struct.

This is a backwards compatible change (i.e. older versions of code compiled against previous headers will still work), with the exception that it does add the requirement that people actually use the provided initializer when defining exr_decode_pipeline_t things. This has been relaxed to assume that if it's not the new size, it must be the old size, which should work, although in theory if it's uninitialized, it could have a bad value (but should also trigger an uninitialized read if people check their sanitizers, which is good).

This should supercede #2128 with a backwards compatible mechanism

…sion Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

This will allow a couple of the compression types to avoid constantly creating a context object internally every chunk when the pipeline is re-used (which it is most times). Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

Adds some helper routines to avoid code duplication as well in enabling this Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

lji-ilm · 2026-05-29T03:01:23Z

Hi Kimball, this is the branch that needs some profiling against the dev tip right?
Looking at the code change, probably both zip and htj2k would need a test..?

palemieux · 2026-05-29T21:10:18Z

.\exrmetrics.exe -z htj2k32 ACES_OT_VWG_SampleFrames\ACES_OT_VWG_SampleFrames.0041.exr --passes 20 is 40% faster on Windows compared to main because of the reduced memory allocation overhead.

lji-ilm · 2026-06-03T07:16:19Z

I run part of my research project's pipeline on this branch and the tip of main, results below, which is somewhat different from Pierre's.
Maybe this is a bit too glorified than needed for this particular patch, but I was also thinking about asking people for feedbacks for the ongoing project, so I'll use this opportunity to show a mini pipeline run with a few key ideas, and feel free to comment on the methodology too, in addition to the result.

Materials

Environment

Alma 9
Cmake 3.26
gcc 11.5

EXR Builds

EXR built on main tip, commit [f19c773], in the results identified as main.
EXR built on Kimball's patch, commit kdt3rd@6d85a45 , in the results identified as patch.

Test Data

60 images retrieved from the Netflix Open Content "SolLevance" (20) and "Sparks" (20) shows, as well as programmatically generated images (20). As a starting point, all images are RGBA-half-4K(4096x2160) uncompressed scanline EXR, so all of them are roughly 70,778,880 bytes (70 Mbs).

The programmatically generated images (20) include a few native 4K generations from openEXR's example code https://github.com/AcademySoftwareFoundation/openexr/blob/main/src/examples/drawImage.cpp . I have patched those codes so they render natively to 4K. I also collected a few other mundan edge cases, such as all 0, all 1, checkerboard, etc. A few special frames from the movies, such as text-only credit roll, are categorized as "programmatically generated". A Stanford bunny render is also included, because i cannot afford not to.

Then, the 60 starting images are converted to two more sets with different data layouts -- 64x64 tiled, and 512x512 tiled. All uncompressed, these tiled images are still roughly 70Mbs. In total, we have 180 EXR images with roughly the same size (metadata makes their sizes slightly differ), while having different data layout and contents.

Methodology

Baseline Timing

Before we run exrmetrics, a small C++ program like the following was run on a few of these 70Mb images, and the timing collected is averaged into a Baseline Timing number:

// collect baseline read timing
std::ifstream file(filePath, std::ios::binary);
auto startTime = std::chrono::steady_clock::now();
file.read(buffer.data(), fileSize));
auto endTime = std::chrono::steady_clock::now();

// collect baseline write timing
std::ofstream file(filePath, std::ios::binary);
auto startTime = std::chrono::steady_clock::now();
file.write(buffer.data(), fileSize));
auto endTime = std::chrono::steady_clock::now();

The rationale of the Baseline Timing is to check the required raw IO time to read or write the amount of data corresponding to 4K-RGBA-half pixels in the test environment. Subsequently, all timing of read and write tasks are reported as time factor, that is, the timing output of the exrmetrics run divided by this baseline timing (read or write, accordingly).

Usually the time factor would be greater than 1, implying using a compression is slower than simply IO the pixels from the disk; however, there are cases when the factor is less than 1, notably with the "all 1" or "all 0" synthetic images, wherein the compressor just repeat a number 4K times in the RAM. I was hoping this metrology setup can make the test result more environment agnostic, compared to listing the absolute timing numbers from exrmetrics.

Results

Here is the table of timing factor (averaged across all data layouts) of my exrmetrics runs on the 180 images, all tests are single threaded. I didn't get to analyzing multi-threading in this mini project.

Smaller timing factor means better performance, and are marked in green. Bigger timing factor means slower, and are marked in red.

Interestingly the patch made write faster while read a tad slower.

Here is the plot of time factor of reading and writing tasks, grouped by compression type:

Here are the breakout plots for different compression types in my run, groups by data layouts. As expected the small tile case (tile 64x64) introduce a performance hit. What is interesting to see is that htj2k have a lot of data points on both top and bottom being considered "outliers" that would be outside of the standard "Interquartile range from Q1 to Q3" statistical setup, while zip only have outliers on the bottom -- which are probably those all 0 and all 1s. I don't have an interpretation for this, but it isn't related about this patch either.

lji-ilm · 2026-06-03T07:20:55Z

Credits to the Google Gemini bot for a lot of testing/visualization scaffolding code.
source data dump:
kimball_benchmark2.csv

palemieux · 2026-06-03T15:51:06Z

@lji-ilm Which platform(s) did you run the test on?

For the HTJ2K compressor, I was expecting gains primarily on Windows, where the cost of pre-allocating memory is high and, hence, tearing up/down the compressor once for each chunk has a significant.

lji-ilm · 2026-06-03T16:35:43Z

@lji-ilm Which platform(s) did you run the test on?

For the HTJ2K compressor, I was expecting gains primarily on Windows, where the cost of pre-allocating memory is high and, hence, tearing up/down the compressor once for each chunk has a significant.

It's on linux (Alma) so i wasn't surprised that it was somewhat different from yours. I probably should say this on the first post, will edit that in. Kimball mentioned he also expect a slightly performance hit on read but we did not expect to see the gain on writing performance.

cary-ilm · 2026-06-06T00:42:10Z

 #include "internal_decompress.h"
 #include "internal_structs.h"
 #include "internal_xdr.h"
+#include "internal_xdr.h"


Suggested change

#include "internal_xdr.h"

duplicate include of internal_xdr.h

cary-ilm · 2026-06-06T01:10:31Z

+}
+#endif
+
+#endif /* OPENEXR_CORE_DECOMPRESS_H */


Suggested change

#endif /* OPENEXR_CORE_DECOMPRESS_H */

#endif /* OPENEXR_CORE_LEGACY_STRUCTS_H */

cary-ilm · 2026-06-06T01:52:59Z


 void
 testDeepZIPSCompression (const std::string& tempdir)
 {}


Claude suggested this additioal test to verify that the context pointer is the same on subsequent chunks, and verify it's freed at the end.

Suggested change

{}

////////////////////////////////////////

static void

writeZIPFile (const std::string& filename)

{

exr_context_t f;

int partidx;

exr_context_initializer_t cinit = EXR_DEFAULT_CONTEXT_INITIALIZER;

exr_attr_box2i_t dataW = {0, 0, IMG_WIDTH - 1, IMG_HEIGHT - 1};

EXRCORE_TEST_RVAL (exr_start_write (

&f, filename.c_str (), EXR_WRITE_FILE_DIRECTLY, &cinit));

EXRCORE_TEST_RVAL (

exr_add_part (f, "scan", EXR_STORAGE_SCANLINE, &partidx));

EXRCORE_TEST_RVAL (exr_initialize_required_attr_simple (

f, partidx, IMG_WIDTH, IMG_HEIGHT, EXR_COMPRESSION_ZIP));

EXRCORE_TEST_RVAL (exr_set_data_window (f, partidx, &dataW));

EXRCORE_TEST_RVAL (exr_add_channel (

f, partidx, "H", EXR_PIXEL_HALF, EXR_PERCEPTUALLY_LOGARITHMIC, 1, 1));

EXRCORE_TEST_RVAL (exr_write_header (f));

std::vector<uint16_t> row (IMG_WIDTH, 0x3c00u); /* half 1.0 */

exr_encode_pipeline_t encoder = EXR_ENCODE_PIPELINE_INITIALIZER;

exr_chunk_info_t cinfo;

int32_t scansperchunk;

EXRCORE_TEST_RVAL (exr_get_scanlines_per_chunk (f, 0, &scansperchunk));

bool first = true;

for (int y = 0; y < IMG_HEIGHT; y += scansperchunk)

{

EXRCORE_TEST_RVAL (exr_write_scanline_chunk_info (f, 0, y, &cinfo));

if (first)

{

EXRCORE_TEST_RVAL (exr_encoding_initialize (f, 0, &cinfo, &encoder));

}

else

{

EXRCORE_TEST_RVAL (exr_encoding_update (f, 0, &cinfo, &encoder));

}

encoder.channels[0].encode_from_ptr = (const uint8_t*) row.data ();

encoder.channels[0].user_pixel_stride = 2;

encoder.channels[0].user_line_stride = 0;

if (first)

{

EXRCORE_TEST_RVAL (

exr_encoding_choose_default_routines (f, 0, &encoder));

}

EXRCORE_TEST_RVAL (exr_encoding_run (f, 0, &encoder));

first = false;

}

EXRCORE_TEST_RVAL (exr_encoding_destroy (f, &encoder));

EXRCORE_TEST_RVAL (exr_finish (&f));

}

void

testCompressionContextCache (const std::string& tempdir)

{

std::string filename = tempdir + "compression_context_cache.exr";

writeZIPFile (filename);

exr_context_t f;

exr_context_initializer_t cinit = EXR_DEFAULT_CONTEXT_INITIALIZER;

EXRCORE_TEST_RVAL (exr_start_read (&f, filename.c_str (), &cinit));

exr_decode_pipeline_t decoder = EXR_DECODE_PIPELINE_INITIALIZER;

exr_chunk_info_t cinfo;

int32_t scansperchunk;

int32_t height;

EXRCORE_TEST_RVAL (exr_get_scanlines_per_chunk (f, 0, &scansperchunk));

exr_attr_box2i_t dw;

EXRCORE_TEST_RVAL (exr_get_data_window (f, 0, &dw));

height = dw.max.y - dw.min.y + 1;

/* Need at least 2 chunks to verify reuse */

EXRCORE_TEST (height > scansperchunk);

void* first_context = NULL;

bool first = true;

int chunks_decoded = 0;

for (int y = dw.min.y; y <= dw.max.y; y += scansperchunk)

{

EXRCORE_TEST_RVAL (exr_read_scanline_chunk_info (f, 0, y, &cinfo));

if (first)

{

EXRCORE_TEST_RVAL (exr_decoding_initialize (f, 0, &cinfo, &decoder));

}

else

{

EXRCORE_TEST_RVAL (exr_decoding_update (f, 0, &cinfo, &decoder));

}

/* skip output — just decompress */

for (int c = 0; c < decoder.channel_count; ++c)

{

decoder.channels[c].decode_to_ptr = NULL;

decoder.channels[c].user_pixel_stride = 0;

decoder.channels[c].user_line_stride = 0;

}

if (first)

{

EXRCORE_TEST_RVAL (

exr_decoding_choose_default_routines (f, 0, &decoder));

}

EXRCORE_TEST_RVAL (exr_decoding_run (f, 0, &decoder));

if (first)

{

/* context must be populated after the first chunk */

EXRCORE_TEST (decoder.compression_context != NULL);

EXRCORE_TEST (decoder.free_compression_context != NULL);

first_context = decoder.compression_context;

}

else

{

/* same pointer every subsequent chunk — context is reused, not recreated */

EXRCORE_TEST (decoder.compression_context == first_context);

}

first = false;

++chunks_decoded;

}

EXRCORE_TEST (chunks_decoded >= 2);

/* destroy must clear the context */

EXRCORE_TEST_RVAL (exr_decoding_destroy (f, &decoder));

EXRCORE_TEST (decoder.compression_context == NULL);

EXRCORE_TEST (decoder.free_compression_context == NULL);

EXRCORE_TEST_RVAL (exr_finish (&f));

remove (filename.c_str ());

}

cary-ilm

A couple typos and a suggested test from Claude.

kdt3rd added 6 commits May 28, 2026 21:56

Properly initialize decode pipeline struct in prep for adding new ver…

fee39a1

…sion Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

Enable zip to use new compression cache in decode pipe

408342e

Adds some helper routines to avoid code duplication as well in enabling this Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

Enable htj2k to use new compression context cache in decode

ed31761

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

remove dead code after prints were removed

2fac01e

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

Relax the struct size check to just assume v1 if not a new size

e636fe4

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

kdt3rd added this to the Next Minor Release milestone May 28, 2026

add new header file to bazel build

6d85a45

Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>

kdt3rd added the v3.5 label May 29, 2026

cary-ilm reviewed Jun 6, 2026

View reviewed changes

cary-ilm mentioned this pull request Jun 11, 2026

Request for comment: add support for compressor/decompressor-specific init/destroy #2128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compression context cache#2440

Add compression context cache#2440
kdt3rd wants to merge 7 commits into
AcademySoftwareFoundation:mainfrom
kdt3rd:add_compression_ctxt_cache

kdt3rd commented May 28, 2026

Uh oh!

lji-ilm commented May 29, 2026

Uh oh!

palemieux commented May 29, 2026

Uh oh!

lji-ilm commented Jun 3, 2026 •

edited

Loading

Uh oh!

lji-ilm commented Jun 3, 2026

Uh oh!

palemieux commented Jun 3, 2026

Uh oh!

lji-ilm commented Jun 3, 2026

Uh oh!

cary-ilm Jun 6, 2026

Uh oh!

cary-ilm Jun 6, 2026

Uh oh!

cary-ilm Jun 6, 2026

Uh oh!

cary-ilm left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	#endif /* OPENEXR_CORE_DECOMPRESS_H */
	#endif /* OPENEXR_CORE_LEGACY_STRUCTS_H */

+{}
+////////////////////////////////////////
+static void
+writeZIPFile (const std::string& filename)
+{
+    exr_context_t             f;
+    int                       partidx;
+    exr_context_initializer_t cinit = EXR_DEFAULT_CONTEXT_INITIALIZER;
+    exr_attr_box2i_t          dataW = {0, 0, IMG_WIDTH - 1, IMG_HEIGHT - 1};
+    EXRCORE_TEST_RVAL (exr_start_write (
+        &f, filename.c_str (), EXR_WRITE_FILE_DIRECTLY, &cinit));
+    EXRCORE_TEST_RVAL (
+        exr_add_part (f, "scan", EXR_STORAGE_SCANLINE, &partidx));
+    EXRCORE_TEST_RVAL (exr_initialize_required_attr_simple (
+        f, partidx, IMG_WIDTH, IMG_HEIGHT, EXR_COMPRESSION_ZIP));
+    EXRCORE_TEST_RVAL (exr_set_data_window (f, partidx, &dataW));
+    EXRCORE_TEST_RVAL (exr_add_channel (
+        f, partidx, "H", EXR_PIXEL_HALF, EXR_PERCEPTUALLY_LOGARITHMIC, 1, 1));
+    EXRCORE_TEST_RVAL (exr_write_header (f));
+    std::vector<uint16_t> row (IMG_WIDTH, 0x3c00u); /* half 1.0 */
+    exr_encode_pipeline_t encoder = EXR_ENCODE_PIPELINE_INITIALIZER;
+    exr_chunk_info_t      cinfo;
+    int32_t               scansperchunk;
+    EXRCORE_TEST_RVAL (exr_get_scanlines_per_chunk (f, 0, &scansperchunk));
+    bool first = true;
+    for (int y = 0; y < IMG_HEIGHT; y += scansperchunk)
+    {
+        EXRCORE_TEST_RVAL (exr_write_scanline_chunk_info (f, 0, y, &cinfo));
+        if (first)
+        {
+            EXRCORE_TEST_RVAL (exr_encoding_initialize (f, 0, &cinfo, &encoder));
+        }
+        else
+        {
+            EXRCORE_TEST_RVAL (exr_encoding_update (f, 0, &cinfo, &encoder));
+        }
+        encoder.channels[0].encode_from_ptr   = (const uint8_t*) row.data ();
+        encoder.channels[0].user_pixel_stride = 2;
+        encoder.channels[0].user_line_stride  = 0;
+        if (first)
+        {
+            EXRCORE_TEST_RVAL (
+                exr_encoding_choose_default_routines (f, 0, &encoder));
+        }
+        EXRCORE_TEST_RVAL (exr_encoding_run (f, 0, &encoder));
+        first = false;
+    }
+    EXRCORE_TEST_RVAL (exr_encoding_destroy (f, &encoder));
+    EXRCORE_TEST_RVAL (exr_finish (&f));
+}
+void
+testCompressionContextCache (const std::string& tempdir)
+{
+    std::string filename = tempdir + "compression_context_cache.exr";
+    writeZIPFile (filename);
+    exr_context_t             f;
+    exr_context_initializer_t cinit = EXR_DEFAULT_CONTEXT_INITIALIZER;
+    EXRCORE_TEST_RVAL (exr_start_read (&f, filename.c_str (), &cinit));
+    exr_decode_pipeline_t decoder = EXR_DECODE_PIPELINE_INITIALIZER;
+    exr_chunk_info_t      cinfo;
+    int32_t               scansperchunk;
+    int32_t               height;
+    EXRCORE_TEST_RVAL (exr_get_scanlines_per_chunk (f, 0, &scansperchunk));
+    exr_attr_box2i_t dw;
+    EXRCORE_TEST_RVAL (exr_get_data_window (f, 0, &dw));
+    height = dw.max.y - dw.min.y + 1;
+    /* Need at least 2 chunks to verify reuse */
+    EXRCORE_TEST (height > scansperchunk);
+    void* first_context  = NULL;
+    bool  first          = true;
+    int   chunks_decoded = 0;
+    for (int y = dw.min.y; y <= dw.max.y; y += scansperchunk)
+    {
+        EXRCORE_TEST_RVAL (exr_read_scanline_chunk_info (f, 0, y, &cinfo));
+        if (first)
+        {
+            EXRCORE_TEST_RVAL (exr_decoding_initialize (f, 0, &cinfo, &decoder));
+        }
+        else
+        {
+            EXRCORE_TEST_RVAL (exr_decoding_update (f, 0, &cinfo, &decoder));
+        }
+        /* skip output — just decompress */
+        for (int c = 0; c < decoder.channel_count; ++c)
+        {
+            decoder.channels[c].decode_to_ptr     = NULL;
+            decoder.channels[c].user_pixel_stride = 0;
+            decoder.channels[c].user_line_stride  = 0;
+        }
+        if (first)
+        {
+            EXRCORE_TEST_RVAL (
+                exr_decoding_choose_default_routines (f, 0, &decoder));
+        }
+        EXRCORE_TEST_RVAL (exr_decoding_run (f, 0, &decoder));
+        if (first)
+        {
+            /* context must be populated after the first chunk */
+            EXRCORE_TEST (decoder.compression_context != NULL);
+            EXRCORE_TEST (decoder.free_compression_context != NULL);
+            first_context = decoder.compression_context;
+        }
+        else
+        {
+            /* same pointer every subsequent chunk — context is reused, not recreated */
+            EXRCORE_TEST (decoder.compression_context == first_context);
+        }
+        first = false;
+        ++chunks_decoded;
+    }
+    EXRCORE_TEST (chunks_decoded >= 2);
+    /* destroy must clear the context */
+    EXRCORE_TEST_RVAL (exr_decoding_destroy (f, &decoder));
+    EXRCORE_TEST (decoder.compression_context == NULL);
+    EXRCORE_TEST (decoder.free_compression_context == NULL);
+    EXRCORE_TEST_RVAL (exr_finish (&f));
+    remove (filename.c_str ());
+}

Conversation

kdt3rd commented May 28, 2026

Uh oh!

lji-ilm commented May 29, 2026

Uh oh!

palemieux commented May 29, 2026

Uh oh!

lji-ilm commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Materials

Environment

EXR Builds

Test Data

Methodology

Baseline Timing

Results

Uh oh!

lji-ilm commented Jun 3, 2026

Uh oh!

palemieux commented Jun 3, 2026

Uh oh!

lji-ilm commented Jun 3, 2026

Uh oh!

cary-ilm Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

cary-ilm Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

cary-ilm Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

cary-ilm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lji-ilm commented Jun 3, 2026 •

edited

Loading