Looks very cool assuming all the comparisons are correct & fair and there’s no major failure cases. Quick link to the HTML version of the paper to save you a couple of clicks: https://arxiv.org/html/2605.05148v1
Since this is by Apple, I’m certainly curious if this is aimed at becoming the new default format for Apple devices. What kind of effort does it take to do that, beyond getting the paper published?
On the PR summary page, the “speed” column should be labeled “time”. Time is lower-is-better, whereas speed means higher-is-better.
The BD rate column could also use a less cryptic label. (Though maybe the audience is paper reviewers and not me.) The paper itself doesn’t even write out what the BD acronym in “BD rate” stands for, but it seems like it would be fair and accurate and better to call the column maybe something like relative compressed size, and mention the exact metric in the caption — where there’s already an explanation of BD rate.
I’m somewhat confused by, and slightly skeptical TBH, of the device timings. Are they correct & fair? Why is the NN-only portion almost as fast on an iPhone 17 compared to a V100 when the V100 has 4x the FP throughput? Is it comparing apples to apples (ha!), and is the GPU implementation reasonable? The data suggests the GPU implementation is not saturating the GPU.
Also why are there several different GPU models? And why is V100 even used? V100 is four generations old and not even supported anymore.
this is interesting. would be cool to explore something like integrating a vlm to add a "semantic" term to the loss function. looking through the comparisons, some of the baseline codecs create meaningfully different details (as could be described by text) in the images.
Looks very cool assuming all the comparisons are correct & fair and there’s no major failure cases. Quick link to the HTML version of the paper to save you a couple of clicks: https://arxiv.org/html/2605.05148v1
Since this is by Apple, I’m certainly curious if this is aimed at becoming the new default format for Apple devices. What kind of effort does it take to do that, beyond getting the paper published?
On the PR summary page, the “speed” column should be labeled “time”. Time is lower-is-better, whereas speed means higher-is-better.
The BD rate column could also use a less cryptic label. (Though maybe the audience is paper reviewers and not me.) The paper itself doesn’t even write out what the BD acronym in “BD rate” stands for, but it seems like it would be fair and accurate and better to call the column maybe something like relative compressed size, and mention the exact metric in the caption — where there’s already an explanation of BD rate.
I’m somewhat confused by, and slightly skeptical TBH, of the device timings. Are they correct & fair? Why is the NN-only portion almost as fast on an iPhone 17 compared to a V100 when the V100 has 4x the FP throughput? Is it comparing apples to apples (ha!), and is the GPU implementation reasonable? The data suggests the GPU implementation is not saturating the GPU.
Also why are there several different GPU models? And why is V100 even used? V100 is four generations old and not even supported anymore.
this is interesting. would be cool to explore something like integrating a vlm to add a "semantic" term to the loss function. looking through the comparisons, some of the baseline codecs create meaningfully different details (as could be described by text) in the images.