Conversation
This is similar to qpSum but codec-independent. Since PSNR requires additional computation it is defined with an accompanying psnrMeasurements counter to allow the computation of an average PSNR. Defined as three components for the Y, U and V planes respeectively. See also https://datatracker.ietf.org/doc/html/rfc8761#section-5
|
@henbos ^ |
|
This needs to be presented at the next virtual interim. Youenn mentions he'd like to hear about use case of the metric. |
|
https://www.researchgate.net/publication/383545049_Low-Complexity_Video_PSNR_Measurement_in_Real-Time_Communication_Products has a whole paper about this. tl;dr is "qp is codec dependent", PSNR is not (but comes at a cost hence this can not be a simple sum) @youennf the folks who implemented https://developer.apple.com/documentation/videotoolbox/kvtcompressionpropertykey_calculatemeansquarederror?changes=l_4_8 might be able to tell you more too. cc @taste1981 |
|
I have not read the paper, is a preprint available? But based on my own interactions with PSNR, there is a source and decoded image. Is the measurement on the outbound-rtp related to source and the encoded image, i.e., the PSNR due to the encoder. |
|
This one is encoder PSNR, not scaling PSNR. Scaling PSNR would end up living on media-source in stats. |
|
The other thing that I am curious about is if the PSNR requires decoding the encoded video or is this calculated as part of the encoder operation. Mainly the impact on CPU if it requires some kind of decode step, I wonder if this is only calculated applied to I-frames or huge-frames. |
|
@sprangerik have you looked at this? |
|
@jesup thoughts? |
|
I am supportive of this. For context, see also https://webrtc-review.googlesource.com/c/src/+/368960 |
The idea is to have it as part of the encoder process. The encoder is by definition also a decoder, so it can directly use both the raw input and reconstructed state without penalty. The actual PSNR calculation will of course often incur an extra CPU hit, unless it is already a part of e.g. a rate-distortion aware rate controller - but that's not often the case for real-time encoders. That's why it's proposed to limit the frequency of PSNR calculations. This of course means the user cannot count on PSNR metrics being populated. Even for a given stream, the PSNR values might suddenly disappear if e.g. there is a software/hardware switching event and only one implementation supports PSNR output. |
|
since webrtc-stats anyway does aggregate values, we could do a sumPsnr and countFrames, i.e., each time a psnr is calculated it is added and corresponding frame count counter goes up. If it is done for all frames, we would not need a frame counter |
|
This issue was discussed in WebRTC February 2025 meeting – (#794 Add PSNR) |
|
This seems like a nice feature that could have a few uses. I do wonder if it could be a separate API instead of part of the outbound RTP stats. My initial concerns are calculating this data regardless if the application is even interested in the data and no specification or recommendations on frequency of measurements. Some pros and cons that come to mind if this were implemented as a separate API instead.
Cons:
If this should remain in the stats could we consider adding some sort of getStats object to enable logging for this kind of data? |
|
WebRTC users routinely log getStats data, so adding this would not be any big overhead. If the stats are collected on a timescale of seconds, the overhead is usually negligible. (polling stats for every frame is not a good idea). |
|
If I understand correctly, the concern of overhead is in the browser doing an expensive calculation most websites would never request (though per-frame is not an issue due to caching, per-second might be; is never an acceptable frequency?) How expensive is this computation? Our webstats model is like a boat we keep loading with new stuff. Eventually, it becomes problematic. At some point (maybe now?) might we wish we had something like this? await sender.getStats({verbosity: "high"}) // low | medium (default) | high |
|
I don't think WebRTC has to do these measurements very often for the PSNR measurements to be valuable and if they aren't done very often (say every second or every several seconds) then I don't think we need to make API changes. A similar example is that if you negotiate corruption-detection we do corruptionMeasurements, but since we only make these once per second they don't have any significant performance implications compared to the rest of the decoding pipeline. |
|
Btw this is unrelated to the polling frequency since the metrics only update when a measurement is made and a measurement happens in the background whether or not the app is polling getStats. (Polling getStats several times per second is bad because of the overhead of that call, not because of counters incrementing in the background) |
|
One concern is this stat seems to require making two getStats call over some interval. E.g. is the use case here to try one encoder setting, get stats, then wait 1 second and call getStats again expecting two different measurements? If so this might cause divide by zero error in one browser but not another. |
|
All metrics in the getStats API are used like so: "delta foo / delta bar", that is true whether it is a rate (delta bytesSent / delta timestamp), or a measurement thingy (delta totalCorruptionProbability / delta corruptionMeasurements) or something more exotic like (delta qpSum / delta framesEncoded) or even (jitterBufferDelay / jitterBufferEmittedCount). I could go on with more examples but "divide by zero" is something that the user of this API should be aware of |
|
qpSum / framesDecoded might be a better example of a foot gun since that could fail when network glitches but not in stable environment. In practice web developers will make helper functions that does lookup of deltas and rates taking care of foot guns. Also you have to be prepared for a metric not being present all of a sudden |
|
Yes I didn't mean to suggest the divide by zero hazard was limited to this API. The difference is something like I think the concern in this case is:
|
|
PSNR is similar to qp so having it in getStats makes sense. As the paper says we have done this at a frequency higher than one per second on devices where battery consumption is a concern and it works there. Hardware encoder support makes this "cheaper" even. Note that the calculation is done by the encoder so can not be triggered by calling getStats with some magic option. I considered whether it was possible to gate it on the corruption detection RTP header extension but that would have been quite awkward since it is not closely related (not without precedence, quite a few statistics depend on header extensions) When I say "A/B testing" consider a project like Jitsi moving to AV1, in particular the "Metrics Captured" which, unsurprisingly, relies on getStats. See here for how one uses PSNR to evaluate when it is available. Such experiments are designed not to compare 🍎 to 🍌 (different browsers, different operating systems) so letting a UA decide on sampling frequency is not a concern as long as it does so consistently. |
Polling is just asking "do you have any new measurements for me?" It doesn't matter if app polling interval and browser polling interval aligns or not and it's clear from the guidelines that there is no control of sampling period. So I would argue that the only thing that matters is if the measurements are arriving at a granular enough level to be useful. If the concern is that a browser implementer doesn't know what a useful measurement interval is, maybe we can provide some guidance there, but I fail to see the interop issue with different polling intervals that are all within a "useful" range. FTR I think 15 second is too large of an interval since a lot can happen in that period of time. |
|
I would not even poll getStats for A/B testing purposes. One would typically poll periodically and use the last result or call getStats explicitly before closing the peerconnection and then calculate the average PSNR as psnrSum_{y,u,v}/psnrMeasurements. Only calls with enough psnrMeasurements should be taken into account which one needs to irrespective of sampling frequency to exclude "short calls". (while we are rambling: it seems Firefox throws when calling getStats on a closed peerconnection which is not my understanding of #3 arguably with all the transceivers gone all the interesting stats disappear nowadays) |
That's fine for telemetry. I think our concern was more someone making runtime decisions off stats, e.g.: // probe and switch to best codec for media being sent right now:
let bestCodec, bestY = 0;
for (const codec of sender.getParameters().codecs) {
const params = sender.getParameters();
params.encodings[0].codec = codec;
await sender.setParameters(params);
await wait(1000);
const ortp1 = [...(await sender.getStats()).values()].find(({type}) => type == "outbound-rtp");
await wait(1000);
const ortp2 = [...(await sender.getStats()).values()].find(({type}) => type == "outbound-rtp");
const y = (ortp2.psnrSum.y - ortp1.psnrSum.y) / (ortp2.psnrMeasurements - ortp1.psnrMeasurements);
if (bestY < y) { bestY = y; bestCodec = codec;
}
const params = sender.getParameters();
params.encodings[0].codec = bestCodec; }
await sender.setParameters(params);
This might be good for an implementer to know. |
This would map essentially 1:1 with the implementation used, and that can already pretty easily be inferred (e.g. via https://www.w3.org/TR/webrtc-stats/#dom-rtcoutboundrtpstreamstats-encoderimplementation or platform+https://www.w3.org/TR/webrtc-stats/#dom-rtcoutboundrtpstreamstats-powerefficientencoder, not to mention info from WebCodecs, WebGPU, parsing data from encoded transform, etc etc). So while it might be a new "bit", it doesn't actually provide any new information imo.
Can we let the implementor's guideline just be along what has been said above, e.g. "the frequency should be as high as possible as long as the performance impact can be kept negligible". I don't see a reason to change the frequency based on codec type, only by implementation performance overhead. For a given implementation though I don't see a reason to change the frequency - detailing that this should be fixed for a given UA seems fine to me. |
Both A possibility is to restrict psnr in the same manner.
I find it useful information. |
Those don't seem like great examples as they're blocked on exposing hardware is allowed, unless we're suggesting adding that requirement here? What are the other examples?
Agreed. Clarifying these assumptions in the guidance can only help.
Doesn't tying it too tightly to performance make it another performance metric? I like the part that it should not vary by codec. |
Even on hardware encoders it has an impact on power consumption and the return on doing it on every frame is not there, see the parts in the paper that talk about subsampling. I'm fine with gating on HW. |
If so, PSNR gathering could be opt-in, something like: Is it overkill? |
Quoting https://w3c.github.io/webrtc-stats/#guidelines-for-design-of-stats-objects: |
|
Until now, there was no concern about stats being potentially computer intensive. |
@youennf this sounds like a different PR. Is the gating on hardware here sufficient to merge this PR? I think some guidance on frequency here would be nice, but maybe we can add that later after some more implementer experience? |
Other examples... If it's "powerEfficient = false" then it's essentially a SW implementation and the implementation will follow from browser type and version. If "powerEfficient = true" it's going to be a hardware encoder, and then in 99.99% of cases map exactly to the GPU used. The GPU can be found in a number of different ways, e.g. using https://developer.mozilla.org/en-US/docs/Web/API/GPUAdapterInfo Another way I mentioned was using WebCodecs. Simplest example: take a known (non-zero) input image and encode it using a known bitrate, then take the resulting bitstream data array that came out of the encoder and create hash from it. That will basically give you a per-implementation unique signature. Even if we try to forcefully add some noise to screw with that identification mechanism, there are other aspects that can be used to detected the implementation - e.g. which selection of coding tools that were used, filter parameter selection, cropping behavior, etc etc. You can of course do this with webrtc as well, capturing the bitstream data with an encoded transform instead, it's just a little more unwieldy. Once you know which encoder implementation the user has, a bunch of data can inferred (e.g. feature capabilities, performance characteristics, etc) - but that data doesn't add any more fingerprinting surface compared to just knowing the implementation. Behavior when it comes to PSNR implementation would fall into this category as well. |
@sprangerik any updates on this PR? |
|
@jan-ivar: please summarize your concerns (I am happy to request a standards position if required) and provide alternatives with implementations. I am fine having this in -provisional-stats. This was a courtesy, it was quite apparently not even reciprocated with reading the published paper on the subject. |
|
It seems members here are mostly in agreement (gate on HW, add minimal guidance)? If this PR lacks a champion I'm happy to commandeer it to get it over the finish line. |
|
gating on HW was already been added 2e31240 Your continued question for "guidance" despite a published paper is surprising, in particular given folks working on encoders are typically very familiar with PSNR and given that Mozilla is effectively a mere consumer of libWebRTC where the implementation of the logic deciding on which frames to request the encoder to calculate PSNR would live. |
|
Specs should try to help browser implementors, especially those that do not rely on libwebrtc. @sprangerik already provided some guidelines, which could be used to add an informal note.
|
I interpreted this to mean keeping the same frequency across encoders in the same UA was desirable (modulo HW/SW). Which is it? |
I intended that to mean per encoder implementation and UA. So e.g. for Firefox version X and encoder implementation Y i know the frequency will be fixed. |
Neither of these saves you from talking to a subject matter expert. Or reading the paper. Or doing power consumption measurements.
Where is the value in specifying an arbitrary restriction? |
jan-ivar
left a comment
There was a problem hiding this comment.
LGTM with the guidance! This should help with interop. Since PSNR measurements can be costly, the lack of a floor seems necessary to not impose arbitrary limits on the UAs ability to remain performant.
|
I look forward to your independent implementations! |
the Y, U and V components, applications can do a weighted average. https://webrtc-review.googlesource.com/c/src/+/368960 implements the codec changes, this change wires those up to getStats. spec PR: w3c/webrtc-stats#794 BUG=webrtc:388070060 Change-Id: Idba317422e8cfe40f3c2c7b16e4072d2c6042b3f Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/375021 Commit-Queue: Philipp Hancke <phancke@meta.com> Reviewed-by: Henrik Boström <hbos@webrtc.org> Reviewed-by: Harald Alvestrand <hta@webrtc.org> Cr-Commit-Position: refs/heads/main@{#45414}
Upstream commit: https://webrtc.googlesource.com/src/+/b62361bfe989884f4137039d1e75d67b260bd1b2 Expose video encode PSNR (in supported codecs) in stats the Y, U and V components, applications can do a weighted average. https://webrtc-review.googlesource.com/c/src/+/368960 implements the codec changes, this change wires those up to getStats. spec PR: w3c/webrtc-stats#794 BUG=webrtc:388070060 Change-Id: Idba317422e8cfe40f3c2c7b16e4072d2c6042b3f Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/375021 Commit-Queue: Philipp Hancke <phancke@meta.com> Reviewed-by: Henrik Boström <hbos@webrtc.org> Reviewed-by: Harald Alvestrand <hta@webrtc.org> Cr-Commit-Position: refs/heads/main@{#45414}
Upstream commit: https://webrtc.googlesource.com/src/+/b62361bfe989884f4137039d1e75d67b260bd1b2 Expose video encode PSNR (in supported codecs) in stats the Y, U and V components, applications can do a weighted average. https://webrtc-review.googlesource.com/c/src/+/368960 implements the codec changes, this change wires those up to getStats. spec PR: w3c/webrtc-stats#794 BUG=webrtc:388070060 Change-Id: Idba317422e8cfe40f3c2c7b16e4072d2c6042b3f Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/375021 Commit-Queue: Philipp Hancke <phanckemeta.com> Reviewed-by: Henrik Boström <hboswebrtc.org> Reviewed-by: Harald Alvestrand <htawebrtc.org> Cr-Commit-Position: refs/heads/main{#45414} UltraBlame original commit: fc9953ad16391ae420e3ac729b4e6c88f61e2c90
Upstream commit: https://webrtc.googlesource.com/src/+/b62361bfe989884f4137039d1e75d67b260bd1b2 Expose video encode PSNR (in supported codecs) in stats the Y, U and V components, applications can do a weighted average. https://webrtc-review.googlesource.com/c/src/+/368960 implements the codec changes, this change wires those up to getStats. spec PR: w3c/webrtc-stats#794 BUG=webrtc:388070060 Change-Id: Idba317422e8cfe40f3c2c7b16e4072d2c6042b3f Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/375021 Commit-Queue: Philipp Hancke <phanckemeta.com> Reviewed-by: Henrik Boström <hboswebrtc.org> Reviewed-by: Harald Alvestrand <htawebrtc.org> Cr-Commit-Position: refs/heads/main{#45414} UltraBlame original commit: fc9953ad16391ae420e3ac729b4e6c88f61e2c90
Upstream commit: https://webrtc.googlesource.com/src/+/b62361bfe989884f4137039d1e75d67b260bd1b2 Expose video encode PSNR (in supported codecs) in stats the Y, U and V components, applications can do a weighted average. https://webrtc-review.googlesource.com/c/src/+/368960 implements the codec changes, this change wires those up to getStats. spec PR: w3c/webrtc-stats#794 BUG=webrtc:388070060 Change-Id: Idba317422e8cfe40f3c2c7b16e4072d2c6042b3f Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/375021 Commit-Queue: Philipp Hancke <phanckemeta.com> Reviewed-by: Henrik Boström <hboswebrtc.org> Reviewed-by: Harald Alvestrand <htawebrtc.org> Cr-Commit-Position: refs/heads/main{#45414} UltraBlame original commit: fc9953ad16391ae420e3ac729b4e6c88f61e2c90

This is similar to qpSum but codec-independent.
Since PSNR requires additional computation it is defined with an
accompanying psnrMeasurements counter to allow the computation of
an average PSNR.
Defined as a record with components for the Y, U and V planes respectively.
See also
https://datatracker.ietf.org/doc/html/rfc8761#section-5
Preview | Diff