[HN Gopher] One of these JPEGs is not like the other
       ___________________________________________________________________
        
       One of these JPEGs is not like the other
        
       Author : jstanley
       Score  : 96 points
       Date   : 2021-11-19 11:57 UTC (11 hours ago)
        
 (HTM) web link (blog.benjojo.co.uk)
 (TXT) w3m dump (blog.benjojo.co.uk)
        
       | dunham wrote:
       | I clicked through expecting them to mention CMYK jpegs, which I
       | remember causing a few compatibility issues back in the day.
        
       | hereforphone wrote:
       | Renders the presumably intended image (no "white box") in Brave.
       | By the way I'm really glad I switched from Firefox and highly
       | recommend it.
        
       | flerovium wrote:
       | > that there is actually no consistent way to decode a H.264
       | video stream. Don't ever expect two decoders (especially hardware
       | assisted) to give you identical outputs.
       | 
       | Madness indeed.
        
         | goalieca wrote:
         | Not really. There's a lot of post processing going on to make
         | up for the lossy part of the compression. It's actually quite
         | wonderful that the technology and methods can improve over
         | time.
        
         | brigade wrote:
         | Any H.264 decoder that doesn't produce bit identical output is
         | non-conformant to the spec; this was one of the big lessons of
         | MPEG4 part 2, that you really don't want the drift that comes
         | from repeatedly using "near enough" reconstruction as
         | prediction in a loop.
         | 
         | Well, he could be including YUV -> RGB conversion, which while
         | the conversion matrix is specified, the required precision and
         | chroma upsampling method are outside the scope of the spec.
        
         | oshiar53-0 wrote:
         | Stuffs like these are fingerprint goldmines.
        
         | cornstalks wrote:
         | I'd really like to see a citation for the claim there is "no
         | consistent way to decode a H.264 video stream."
         | 
         | H.264 decoding should be fully deterministic and consistent, I
         | believe. That said, bugs exist, shortcuts are taken, and not
         | all decoders are truly compliant. But "some decoders are non-
         | compliant" is very different from "it's impossible to
         | consistently decode H.264".
        
           | pjc50 wrote:
           | Yeah, having done some work on this in the past I think both
           | are true: the standard _should_ decode deterministically, but
           | it 's also quite complicated, so most of them are buggy.
           | 
           | (A colleague of mine filed something like 30% of all the bugs
           | on the H265 bug tracker, for instances where the reference
           | implementation produced different results than the textual
           | specification. Because we'd written a parser for the English
           | text ..)
           | 
           | There is also the concept of a "profile", a set of features
           | which an encoder/decoder may limit itself to on small
           | hardware. http://blog.mediacoderhq.com/h264-profiles-and-
           | levels/
        
           | AceJohnny2 wrote:
           | Back in the day when I was involved with video codec fw, I
           | remember we were having trouble with some codec's (VP8?)
           | deblocking pass, We had to implement that in SW because the
           | particular method that codec used wasn't possible with our
           | HW.
           | 
           | A very legitimate question we asked ourselves was whether we
           | could skip it altogether: the pass did smooth out edges
           | between MacroBlocks, but on our test streams you really had
           | to look closely at still frames to see the difference.
           | 
           | So I can imagine different codecs implementing different
           | parts of standard, as long as you could produce a valid
           | stream/image that was acceptable to the human eye.
           | 
           | (I don't remember what the final decision was for our
           | deblocking. I would guess that certified compliance required
           | it to be implemented)
        
         | SavantIdiot wrote:
         | Guess what: in general, no two CPU floating-point architectures
         | produce exactly same results.
         | 
         | EDIT: Also, try converting JPEGs to uncompressed PNG using
         | Python with Pillow or OpenCV, then try on C using libjpeg and
         | the stb_image header library. All four create slightly
         | different output. I discovered this evaluating the accuracy of
         | a MobileNet implementation. The accuracy of the model changed
         | based on what language/library converted the JPEGs.
        
           | tshaddox wrote:
           | Is that likely in practice? I thought nearly everything will
           | conform to IEEE 754 and give identical results for floating-
           | point arithmetic.
        
             | SavantIdiot wrote:
             | Nope. RISCV gives slightly different FP64 answers than X86
             | which gives slightly different answers than Arm (NEON).
             | 
             | Optimizations in FMACs can lead to LSB differences. IEEE
             | doesn't specify _how_ an FMAC is to be implemented, only
             | the format of the number.
             | 
             | (Also: I'm 99% sure I'm right... still 1% could be wrong as
             | I understand it.)
        
               | zokier wrote:
               | > Optimizations in FMACs can lead to LSB differences.
               | IEEE doesn't specify how an FMAC is to be implemented,
               | only the format of the number.
               | 
               | Quoth the spec:
               | 
               | > Each of the computational operations that return a
               | numeric result specified by this standard shall be
               | performed as if it first produced an intermediate result
               | correct to infinite precision and with unbounded range,
               | and then rounded that intermediate result, if necessary,
               | to fit in the destination's format (see 4 and 7).
        
             | [deleted]
        
             | Findecanor wrote:
             | As long as the you use only the four elementary arithmetic
             | operations in the same order with the same types, and use
             | compiler flags that disables any optimisations, then you
             | _should_ get the same result. (knock on wood)
             | 
             | Different standard libraries for different processors could
             | have different precision for transcendental functions
             | though. There are those that are written to always produce
             | reproducible results though; and in the case of CRlibm,
             | proven correctly rounded.
             | 
             | Older compilers for x86 used the x87 FPU which used a
             | 80-bit type internally but all modern compilers for x86 and
             | x86-64 use SSE/AVX instead which supports IEEE types.
        
             | dragontamer wrote:
             | > Is that likely in practice? I thought nearly everything
             | will conform to IEEE 754 and give identical results for
             | floating-point arithmetic.
             | 
             | What compiler settings? With -O3, GCC will flush-to-zero
             | your denormals, because Intel CPUs have _SEVERE_ penalties
             | on subnormal floats.
             | 
             | You don't even get the same results with -O vs -O3 on the
             | same programming language on the same system... let alone
             | different CPUs.
             | 
             | https://carlh.net/plugins/denormals.php
             | 
             | Denormals are part of IEEE 754, but in practice are more
             | complicated for hardware to figure out. So as a common
             | efficiency trick, you disable the processing of denormals
             | entirely (IE: all denormals are seen as "0" to the
             | hardware, speeding up processing severely).
             | 
             | On Intel processors, a configurable register can set up how
             | your floating-point units act when they come across a
             | denormal. So an inline-assembly statement can change your
             | results, even if it was no where "close" to your code.
        
               | kimixa wrote:
               | -O3 shouldn't enable flushing denormals to zero on gcc,
               | the standard optimisation flags _shouldn 't_ affect the
               | results - if they do it's likely a bug.
               | 
               | To enable that, you need to use -Ofast or -ffast-math,
               | there's a reason they're not default enabled after all.
        
               | SavantIdiot wrote:
               | Do you know what are typically circumstances under which
               | this becomes problematic? Clearly these issues don't
               | impact FPS games, or music quality, but do they start
               | becoming problems in weather forecasting or particle
               | simulations? What kind of applications are impacted?
        
               | dragontamer wrote:
               | > Clearly these issues don't impact FPS games
               | 
               | On the contrary. One of the most common multiplayer bugs
               | is the multiplayer-desync due to different settings of
               | floating-point arithmetic.
               | 
               | https://stackoverflow.com/questions/1948929/keeping-sync-
               | in-...
               | 
               | This stuff happens in pretty much every video game. You
               | "solve" it by...
               | 
               | 1. In FPS games, most _clients_ serve as the source of
               | truth. This means that in most FPS games, you allow the
               | client to "hack" the location data for clipping, speed-
               | hacks, and the like. (Server calculated you at point X.
               | Client claims to be at point X+.05. Well, client wins the
               | argument).
               | 
               | 2. In games like Factorio, an extraordinary amount of
               | effort is placed upon finding all sources of desync
               | (https://wiki.factorio.com/Desynchronization) and
               | squashing them. Protocols communicate with integers as
               | much as possible, avoiding any floating point issues.
               | 
               | > but do they start becoming problems in weather
               | forecasting or particle simulations? What kind of
               | applications are impacted?
               | 
               | I'm no supercomputer expert. But for reliability, a lot
               | of supercomputer simulations step through twice to
               | "verify" the results. That means both simulations must be
               | setup with the same settings, so that your floats all
               | line up.
               | 
               | In my undergraduate studies, my professor for numeric
               | computing stressed the importance of sorting numbers: not
               | only to avoid floating-point cancellation error, but also
               | to have a precisely defined ordering for supercomputer
               | applications.
               | 
               | You want to do things like add / subtract numbers from
               | smallest to biggest, and all the positive numbers add
               | together, then all the negative numbers add together and
               | only do one subtraction. Cancellation error is serious
               | business!!
        
               | kergonath wrote:
               | > I'm no supercomputer expert. But for reliability, a lot
               | of supercomputer simulations step through twice to
               | "verify" the results. That means both simulations must be
               | setup with the same settings, so that your floats all
               | line up.
               | 
               | In such cases, the result of _one_ simulation is not
               | enough, and we do several ones with slightly different
               | initial conditions, just to make sure that we don't spend
               | time analysing an outlier. Exact reproducibility is
               | important to validate the codes running test suites, but
               | for the purpose of using simulations to make predictions,
               | it does not really matter as long as each simulation is
               | physically accurate enough.
        
               | db48x wrote:
               | The Factorio Friday Facts dev diaries are a great read,
               | and they've talked about the effects of floating point
               | numbers multiple times. Sometimes in reference to
               | multiplayer, but even bugs in single player have been
               | tracked back to floating point numbers:
               | https://www.factorio.com/blog/post/fff-297
        
               | SavantIdiot wrote:
               | Wow, this is such a great answer. Thanks for the
               | information!
        
             | magicalhippo wrote:
             | AFAIK IEEE 754 does allow for some variation in the results
             | of operations like log and exp. I recall this being an
             | issue with the LHC@Home BOINC project, where simulations
             | computed on Intels would run for a long time before
             | reaching the termination condition (beam hitting wall)
             | while when computed on AMDs would terminate quickly.
             | 
             | For a project relying on multiple runs on different
             | computers for verification of correctness, this was a
             | serious issue. IIRC before they solved it properly, they
             | had to increase the number of validations per piece of
             | work, as well as trying to segregate the work so a work
             | unit was only issued to either Intel or AMD but not both.
             | This only reduced the impact somewhat though.
             | 
             | From their[1] page:
             | 
             |  _Besides helping build the LHC, LHC@home has proved to be
             | an invaluable, even essential, tool for investigating and
             | solving fundamental problems with floating-point
             | arithmetic. Different processors, even of the same brand,
             | all conforming more or less to the IEE754 standard, produce
             | different results, a problem further compounded by the
             | different libraries provided for Linux and Windows. In
             | particular, a major problem was the evaluation of the
             | elementary functions, such as exponent and logarithm, which
             | are not covered by the standard at all (apart from the
             | square root function). This problem was solved by the use
             | of the correctly rounded elementary function library
             | (crlibm) developed by a group at the Ecole Nationale
             | Superieure in Lyon. Due to the often chaotic motion of the
             | particles even a one unit of difference in the last digit
             | of a binary number produces completely different results,
             | as the difference grows exponentially as the particle
             | circulates for up to one million turns round the LHC. Thus
             | it is extremely difficult to find bugs or to verify new
             | software and hardware platforms. This methodology has
             | allowed us to detect failing hardware, otherwise unnoticed,
             | and to find compiler bugs. With your help, it is being
             | extended to ensure identical results on any IEE754
             | compliant processor with a standard compliant compiler (C++
             | or FORTRAN) which will allow us to work on any such system
             | and even on Graphical Processing Units (GPUs)._
             | 
             | [1]:
             | https://lhcathome.web.cern.ch/projects/sixtrack/sixtrack-
             | and... (
        
           | ivegotnoaccount wrote:
           | For the JPEG part: That is not surprising, for several
           | reasons: - Depending on if the IDCT (2D cosine-based
           | frequency domain => time domain) is implemented using
           | floating point or fixed-point, the ways roundings are done,
           | 0.5 offsets or not etc, the number of bits of precisions, you
           | will have small errors. - Most JPEGs are in i420 format, that
           | is "for a 16x16 Y block, there is one 8x8 Cb block and one
           | 8x8 Cr block". The upsampling algorithm may differ both in
           | precision and on the choice of the algorithm itself. - Even
           | in YCbCr => RGB conversion, there may be 1-bit differences.
           | More if they used other constants.
           | 
           | Note that even between two versions of the same library, you
           | can get different results.
        
       | extra88 wrote:
       | > if you see a white box, it's because you don't support the
       | bizzare Arithmetic coding extention
       | 
       | On an Intel Mac running Big Sur, that image (title "444-fp-acode-
       | prog.jpg") is only solid white in Chrome; in Safari, it's solid
       | black and in Firefox it doesn't load at all, there's the broken
       | image placeholder.
        
         | Aaargh20318 wrote:
         | Interestingly it renders just fine in Safari on iOS 15
        
         | extra88 wrote:
         | Since others are having mixed results with the rendering of the
         | image in mobile Safari, I wonder if the differences are due to
         | hardware decoding.
         | 
         | My Mac has a I7-9750H (Coffee Lake) processor with an AMD
         | Radeon Pro 5300M.
         | 
         | The image does render on my iPhone with an Apple A14 Bionic
         | processor. The 2020 iPad Air (4th gen) also has this processor.
         | 
         | If "2020 iPad" means iPad 10.2" 8th Gen, that has an A12 Bionic
         | processor.
         | 
         | If the image doesn't render on an Apple mobile device, maybe
         | it's one with an older processor?
        
         | lelandfe wrote:
         | M1 on Monterey. White in Chrome, renders normally in Safari
        
         | gumby wrote:
         | Interesting: on my 2020 iPad it displays fine in Safari
        
           | alberth wrote:
           | I can also confirm.
           | 
           | iPhone XS (iOS 15.1 / A12), displays a black box.
           | 
           | 2020 iPad Air (i _PadOS_ 15.1  / A14), displays image
           | perfectly.
        
             | mnd999 wrote:
             | I have the same iPhone XS and it renders fine. Bizarre.
        
             | styfle wrote:
             | Are both running the same version of iOS?
        
               | alberth wrote:
               | Both run 15.1. But apparently iOS vs iPadOS have
               | differences.
        
               | extra88 wrote:
               | What processor does your iPhone have? Maybe it's a
               | hardware decoding difference. It renders on my iPhone
               | with a A14 Bionic processor.
        
               | alberth wrote:
               | I just updated my original post. A12 Bionic is my iPhone
               | (XS).
               | 
               | Interesting if it's hardware assisted.
        
               | extra88 wrote:
               | If the comment you replied to meant iPad 10.2" 8th Gen by
               | "2020 iPad," that also has an A12 Bionic.
               | 
               | If their "2020 iPad" was actually a 4th gen Air (A14
               | Bionic) or 2nd iPad Pro (A12 _Z_ Bionic), it could still
               | be a hardware decoding difference.
        
               | extra88 wrote:
               | Since 2019, iPads no longer run iOS, they run iPadOS. Of
               | course, there's still a lot of overlap between the two
               | but differences can creep in. A difference that shouldn't
               | be relevant to JPEG rendering is iPadOS Safari loads the
               | desktop versions of websites by default.
        
         | superkuh wrote:
         | My Intel Core2Duo running Ubuntu 10.04 (~2011) and a browser
         | from 2016 renders all of the jpegs on the page perfectly.
        
           | extra88 wrote:
           | Core 2 Duos predate Intel adding any hardware decoding for
           | JPEG. Some, but probably not all, of the rendering failures
           | people are seeing may be due to software handing off JPEG
           | decoding to hardware but some hardware not being able to
           | handle the Arithmetic coding extension.
           | 
           | My guess is your JPEG rendering is being done by libjpeg,
           | which has supported Arithmetic coding since 2009.
           | 
           | https://en.wikipedia.org/wiki/Libjpeg
        
             | spookthesunset wrote:
             | That is so weird that even now we have hardware decoding
             | jpegs. And it is so seamless you don't even notice it.
        
         | grishka wrote:
         | Tried opening it in Photoshop and got this error:
         | 
         | Could not complete your request because reading arithmetic
         | coded JPEG files is not implemented.
         | 
         | I find it interesting that they wrote an error message this
         | specific.
        
           | kingcharles wrote:
           | Thank FSM for the developer who did that. How many times a
           | day do you try to complete a task and just get a generic
           | "Error" from an application?
           | 
           | Also interesting that Photoshop, with dozens of developers,
           | didn't deem this format interesting enough to add support for
           | it.
        
       | xscott wrote:
       | > bizzare Arithmetic coding extention
       | 
       | Frustrating to hear it called bizarre. Arithmetic encoding
       | should've been the default if IBM hadn't sat on a patent for
       | mathematics. A huge amount of bandwidth has been wasted sending
       | images around the world because Huffman encoding avoided legal
       | headaches.
       | 
       | https://en.wikipedia.org/wiki/Arithmetic_coding#History_and_...
       | 
       | This is also the reason for the 2 in bzip2.
        
         | rob74 wrote:
         | Be that as it may, but I can also feel the pain of implementing
         | an old-as-balls image standard which has lots of historical
         | options which account for, let's say, 70% of the effort of
         | implementing the full standard, but are only used in .5% of the
         | images available "in the wild". If these parts are then left
         | out of implementations, it's regrettable from an academic point
         | of view, but still understandable...
        
         | rasz wrote:
         | It wasnt all bad. AS is VERY slow.
        
         | lifthrasiir wrote:
         | AC was way slower than Huffman back when JPEG was first defined
         | in 1992, so it would have taken many years to see AC-encoded
         | JPEG in the wild even without a patent concern.
        
       ___________________________________________________________________
       (page generated 2021-11-19 23:01 UTC)