Skip to content

Add log-based mean-square error (LogMSE) to exrmetrics#2448

Open
palemieux wants to merge 7 commits into
AcademySoftwareFoundation:mainfrom
sandflow:feature/exrmetrics-mse
Open

Add log-based mean-square error (LogMSE) to exrmetrics#2448
palemieux wants to merge 7 commits into
AcademySoftwareFoundation:mainfrom
sandflow:feature/exrmetrics-mse

Conversation

@palemieux

@palemieux palemieux commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Add --distortion option to exrmetrics to compute LogMSE between original pixel data and the re-read result after compression, reported independently for each part.

Constraints:

  • Only scanline and tiled parts are processed; deep parts are skipped.
  • Parts with mixed channel types or zero channels are skipped.
  • HALF and FLOAT channels use the LogMSE metric described in the white paper below.

Output:

  • JSON: per-part "log_mse" (half/float) or "mse" (uint) fields inside
    the existing "parts" array; no file-level aggregate is emitted.
  • CSV: one "part N mse" column per part

OPEN QUESTIONS:

TODO:

  • add unit tests

LogMSE white paper

exrmetrics-distortion-metric-v3.pdf

@palemieux palemieux force-pushed the feature/exrmetrics-mse branch from 0d7a2c6 to e3107a8 Compare June 1, 2026 00:43
@peterhillman

Copy link
Copy Markdown
Contributor

This metric might cause confusion. Isn't it true that a difference between -2 and +2 would come out as zero in the metric? It strongly penalizes unimportant tiny changes to small values. (Arguably it's more akin to a relative error or PSNR than a true MSE). I would tend to prefer a linear Mean Absolute Difference metric. It's also summing across channels, so RGBA images would have the statistics of alpha muddled into the color channel, which could be misleading. Reporting the error per-channel might be better, which would also allow for parts with mixed channel types.

Thorough analysis of the visual effect of lossy compression requires some knowledge of any rendering transform, and applying a perceptual measurement to the result. This would require some amount of color management, which is outside the scope of the OpenEXR library, but something like OpenImageIO might better address. Having some kind of measure of lossiness in exrmetrics is useful, but I think it should be minimalist to discourage over-dependence on the information it provides.

I would keep computing metrics as an option, so it can fail with an error if it cannot provide complete information (e.g. for files with deep or mipmapped parts) instead of providing no output or partial output

@palemieux

Copy link
Copy Markdown
Collaborator Author

This metric might cause confusion. Isn't it true that a difference between -2 and +2 would come out as zero in the metric?

It should not since, given a=-2 and b=-2, the metric M is:

M(a,b) = (f(a) - f(b))^2 = (-1 * log(abs(-2)) + eps) - 1 * log(abs(2) + eps))^2 = (2 * log(2))^2

with:

f(x) = sign(x) * log(|x| + eps)

and assuming eps << 2.

It strongly penalizes unimportant tiny changes to small values. (Arguably it's more akin to a relative error or PSNR than a true MSE). I would tend to prefer a linear Mean Absolute Difference metric. It's also summing across channels, so RGBA images would have the statistics of alpha muddled into the color channel, which could be misleading. Reporting the error per-channel might be better, which would also allow for parts with mixed channel types.

Thorough analysis of the visual effect of lossy compression requires some knowledge of any rendering transform, and applying a perceptual measurement to the result.

I would argue that, since the ultimate rendering transform is not necessarily known (or predictable) at the time of OpenEXR rendering, the metric should give equal weight to all octaves of dynamic range. In other words, the error metric M between a and a small relative distortion (1 + e)*a should be constant, regardless of the value of a:

M(a, (1 + e) * a) = constant.

log(a + eps) satisfies this constraint.

This would require some amount of color management, which is outside the scope of the OpenEXR library, but something like OpenImageIO might better address. Having some kind of measure of lossiness in exrmetrics is useful, but I think it should be minimalist to discourage over-dependence on the information it provides.

I would keep computing metrics as an option, so it can fail with an error if it cannot provide complete information (e.g. for files with deep or mipmapped parts) instead of providing no output or partial output

@peterhillman

Copy link
Copy Markdown
Contributor

It should not since, given a=-2 and b=-2, the metric M is:

M(a,b) = (f(a) - f(b))^2 = (-1 * log(abs(-2)) + eps) - 1 * log(abs(2) + eps))^2 = (2 * log(2))^2

Ah yes, I misread. So if a = 1000 and b=-0.001 the "error" is very nearly zero, which is certainly an unexpected result!

I would argue that, since the ultimate rendering transform is not necessarily known (or predictable) at the time of OpenEXR rendering, the metric should give equal weight to all octaves of dynamic range.

It may well be known to the tester themselves, since they are aware of the pipeline - at least of a pipeline they'd want to assume for the purposes of testing. That's why I would rather not rely on exrmetrics to do this kind of analysis, and perform a more thorough test. The quantization error round-tripping to Perceptually Quantized space and back to linear is close to being absolute for small values and is relative for large ones.

All this assumes that we are only interested in measuring color, not alpha or other data channel types. Perhaps it would be worth offering a choice of metrics?

@palemieux

Copy link
Copy Markdown
Collaborator Author

It should not since, given a=-2 and b=-2, the metric M is:
M(a,b) = (f(a) - f(b))^2 = (-1 * log(abs(-2)) + eps) - 1 * log(abs(2) + eps))^2 = (2 * log(2))^2

Ah yes, I misread. So if a = 1000 and b=-0.001 the "error" is very nearly zero, which is certainly an unexpected result!

No. (1 * log(abs(1000)) + eps) - (-1) * log(abs(0.0001) + eps))^2 = (log(1000) + log(0.0001)^2 = 91.7381250466

assuming eps = 10^(-8), which is the value for half floats.

log() is the natural logarithmic function.

I would argue that, since the ultimate rendering transform is not necessarily known (or predictable) at the time of OpenEXR rendering, the metric should give equal weight to all octaves of dynamic range.

It may well be known to the tester themselves, since they are aware of the pipeline - at least of a pipeline they'd want to assume for the purposes of testing. That's why I would rather not rely on exrmetrics to do this kind of analysis, and perform a more thorough test. The quantization error round-tripping to Perceptually Quantized space and back to linear is close to being absolute for small values and is relative for large ones.

It is definitely true that knowing the perceptual space that image will eventually be rendered to will improve the measurement of errors.

This is not mutually exclusive with the concept of exrmetrics providing a neural measure of distortion.

All this assumes that we are only interested in measuring color, not alpha or other data channel types. Perhaps it would be worth offering a choice of metrics?

@peterhillman

Copy link
Copy Markdown
Contributor

Indeed 1000 and -0.0001 (minus one ten thousandth) produce different results, but my example was -0.001 (minus one thousandth), which I tested running your code.

If we ignore eps, then for negative values your test takes the log of the absolute value and negates the result.
But log(a) = -log(1/a) for all a, so your metric treats a as identical to -1/a

@palemieux

Copy link
Copy Markdown
Collaborator Author

Indeed 1000 and -0.0001 (minus one ten thousandth) produce different results, but my example was -0.001 (minus one thousandth), which I tested running your code.

If we ignore eps, then for negative values your test takes the log of the absolute value and negates the result. But log(a) = -log(1/a) for all a, so your metric treats a as identical to -1/a

Thanks for catching this and your patience.

One simple approach is to modify the metric to:

f(x) = sign(x) * [log(|x| + eps) - log(eps)]

In other words, ensure that f(x) > 0, so that

(1 * (log(abs(1000)) + eps) + log(eps)) - (-1) * log(abs(0.001) + eps) + log(eps))^2 = (ln(1000 + 2^-24) - log(2^-24) + log(0.001) - log(2^-24))^2 = 1106

@palemieux

Copy link
Copy Markdown
Collaborator Author

@peterhillman See a more detailed discussion of the LogMSE metric at https://github.com/user-attachments/files/28735544/exrmetrics-distortion-metric-v3.pdf

@palemieux palemieux force-pushed the feature/exrmetrics-mse branch 3 times, most recently from 6cf3e43 to 1f18d8b Compare June 9, 2026 04:30
@palemieux palemieux changed the title Add mean-square error (MSE) to exrmetrics Add log-based mean-square error (LogMSE) to exrmetrics Jun 10, 2026
@palemieux palemieux force-pushed the feature/exrmetrics-mse branch from 71874fe to 69f1c0b Compare June 10, 2026 22:23
@palemieux palemieux marked this pull request as ready for review June 10, 2026 22:23
palemieux and others added 7 commits June 10, 2026 20:54
Signed-off-by: Pierre-Anthony Lemieux <pal@palemieux.com>
Signed-off-by: Pierre-Anthony Lemieux <pal@sandflow.com>
Signed-off-by: Pierre-Anthony Lemieux <pal@sandflow.com>
Improve the handling of non-finite samples
Signed-off-by: Pierre-Anthony Lemieux <pal@palemieux.com>
Signed-off-by: Pierre-Anthony Lemieux <pal@sandflow.com>
Remove mention of HTJ2KL256 in this branch

Signed-off-by: Pierre-Anthony Lemieux <pal@sandflow.com>
Signed-off-by: Pierre-Anthony Lemieux <pal@sandflow.com>
@palemieux palemieux force-pushed the feature/exrmetrics-mse branch from 9d7abef to 02c9bc2 Compare June 11, 2026 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants