[HN Gopher] Compressing Images with Neural Networks
___________________________________________________________________
Compressing Images with Neural Networks
Author : skandium
Score : 154 points
Date : 2024-03-17 18:28 UTC (1 days ago)
(HTM) web link (mlumiste.com)
(TXT) w3m dump (mlumiste.com)
| esafak wrote:
| It is not going to take off if it is not significantly better,
| and has browser support. WebP took off thanks to Chrome, while
| JPEG2000 floundered. If not native browser support, maybe the
| codec could be shipped by WASM or something?
|
| The interesting diagram to me is the last one, for computational
| cost, which shows the 10x penalty of the ML-based codecs.
| dinkumthinkum wrote:
| I think it is an interesting discussion, learning experience
| (no pun intended). I think this is more of a stop on a research
| project than a proposal; I could be wrong.
| dylan604 wrote:
| Did JPEG2000 really flounder? If your concept of it being a
| consumer facing product as a direct replacement for JPEG, then
| I could see being unsuccessful in that respect. However,
| JPEG2000 has found its place in the professional side of
| things.
| esafak wrote:
| Yes, I do mean broad- rather than niche adoption. I myself
| used J2K to archive film scans.
|
| One problem is that without broad adoption, support even in
| niche cases is precarious; the ecosystem is smaller. That
| makes the codec not safe for archiving, only for
| distribution.
|
| The strongest use case I see for this is streaming video,
| where the demand for compression is highest.
| dylan604 wrote:
| But that's like saying it's difficult to drive your Formula
| 1 car to work every day. It's not meant for that, so it's
| not the car's fault. It's a niche thing built to satisfy
| the requirements of a niche need. I would suggest this is
| "you're holding it wrong" type of situations that isn't
| laughable.
| yccs27 wrote:
| There was absolutely an initiative to make J2K a
| widespread standard
| dylan604 wrote:
| There was absolutely an initiative to make Esperanto a
| widespread language. But neither point has anything to do
| with how things actually are
| actionfromafar wrote:
| Huh, one more point for considering J2K for film scan
| archiving.
| dylan604 wrote:
| it's well past the considering stage. J2K is used more
| than people think even if we're not using to spread cat
| memes across the interwebs. J2K is used in DCPs sent to
| movie theaters for digital projections. J2K is used as
| lossless masters for films. the Library of Congress uses
| it as well. this isn't even attempting to make an
| exhaustive list of use, but it's not something being
| looked into. it's being used every day
| actionfromafar wrote:
| Well, I meant for me personally. Currently using TIFF.
| :-)
| sitkack wrote:
| For archiving, I'd recommend having a wasm decompressor
| along with some reference output. Could also ship an image
| viewer as an html file with all the code embedded.
| dylan604 wrote:
| Why the need for all things to be browser based? Why
| introduce the performance hit for something that brings
| no compelling justification? What problem is this
| solution solving? Why can't things just be native
| workflows and not be shoveled into a browser?
| benreesman wrote:
| Not the parent but one imagines that WASM could be a good
| target for decompressing or otherwise decoding less-
| adopted formats/protocols because WASM _is_ fairly
| broadly-adopted and seems to be at least holding steady
| if not growing as an executable format: it seems unlikely
| that WASM disappears in the foreseeable future.
|
| Truly standard ANSI C along with a number of other
| implementation strategies (LLVM IR seems unlikely to be
| going anywhere) seem just as durable as WASM if not more,
| but there are applications where you might not want to
| need a C toolchain and WASM can be a fit there.
|
| One example is IIUC some of the blockchain folks use WASM
| to do simultaneous rollout of iterations to consensus
| logic in distributed systems: everyone has to upgrade at
| the same time to stay part of the network.
| wizzwizz4 wrote:
| Wasm is simple, well-defined, small enough that one
| person can implement the whole thing in a few weeks, and
| (unlike the JVM) is usable without its standard library
| (WASI).
|
| LLVM isn't as simple: there's not really such thing as
| target-independent LLVM IR, there are lots of very
| specific keywords with subtle behavioural effects on the
| code, and it's hard to read. I think LLVM is the only
| full implementation of LLVM. (PNaCl was a partial
| reimplementation, but it's dead now.)
|
| ANSI C is a very complicated language and very hard to
| implement correctly. Once Linux switches to another
| language or we stop using Linux, C will go the way of
| Fortran.
|
| Part of archiving information has _always_ been format
| shifting. Never think you can store information, forget
| about it for a thousand years (or even five), and have it
| available later.
| benreesman wrote:
| I think we probably agree about most things but I'll
| nitpick here and there.
|
| ANSI C is among the simpler languages to have serious
| adoption. It's a bit tricky to use correctly because much
| of its simplicity derives from leaving a lot of the
| complexity burden on the author or maintainer, but the
| language specification is small enough in bytes to fit on
| a 3.5" floppy disk, and I think there are conforming
| implementations smaller than that!
|
| You seem to be alluding to C getting replaced by Rust as
| that's the only other language with so much as a device
| driver to its name in the Linux kernel. Linus is on the
| record recently saying that it will be decades before
| Rust has a serious share of the core: not being an active
| kernel code tribute I'm inclined to trust his forecast
| more than anyone else's.
|
| But Rust started at a complexity level comparable to
| where the C/C++ ecosystem ended up after 40 years of
| maintaining substantial backwards compatibility, and
| shows no signs of getting simpler. The few bright spots
| (like syntax for the Either monad) seem to be getting
| less rather than more popular, the bad habits it learned
| from C++ (forcing too much into the trait system and the
| macro mechanism) seem to have all the same appeal that
| template madness does to C++ hackers who don't know e.g.
| Haskell well. And in spite of the fact that like 80% of
| my user land is written in Rust, I'm unaware of even a
| single project that folks can't live without that's
| married to Rust.
|
| Rust is very cool, does some things very well, and it
| wouldn't be hard to do a version of it without net-
| negative levels of opinionated on memory management, but
| speaking for myself I'm still watching Nim and V and Zig
| and Jai and a bunch of other things, because Rust takes
| after its C++ heritage more than its Haskell heritage,
| and it's not entrenched enough in real industry to
| justify its swagger in places like HN.
|
| The game is still on for what comes after C: Rust is in
| the lead, but it's not the successor C deserves.
| userbinator wrote:
| _That makes the codec not safe for archiving, only for
| distribution._
|
| Could you explain what you mean by "not safe for
| archiving"? The standard is published and there are
| multiple implementations, some of which are open-source.
| There is no danger of it being a proprietary format with no
| publicly available specification.
| dylan604 wrote:
| Not the GP, but for archiving, you want to know that
| you'll be able to decode the files well into the future.
| If you adopt a format that's not well accepted and the
| code base gets dropped and not maintained so that in the
| future it is no longer able to be run on modern gear,
| your archive is worthless.
|
| As a counter, J2K has been well established by the
| professional market even if your mom doesn't know
| anything about what it is. It has been standardized by
| the ISO, so it's not something that will be forgotten
| about. It's a good tool for the right job. It's also true
| that not all jobs will be the right ones for that tool
| esafak wrote:
| I was not thinking of J2K as being problematic for
| archiving but these new neural codecs. My point being
| that performance is only one of the criteria used to
| evaluate a codec.
| ufocia wrote:
| Royalty costs are often the other.
| geor9e wrote:
| The thing about ML models is the penalty is a function of
| parameters and precision. It sounds like the researchers
| cranked them to max to try to get the very best compression.
| Maybe later they will take that same model, and flatten layers
| and quantize the weights to can get it running 100x faster and
| see how well it still compresses. I feel like neural networks
| have a lot of potential in compression. Their whole job is
| finding patterns.
| ufocia wrote:
| Better or cheaper, e.g. AV1?
| amelius wrote:
| How do we know we don't get hands with 16 fingers?
| ogurechny wrote:
| Valid point. Conventional codecs draw things on screen that are
| not in the original, too, but we are used to low quality images
| and videos, and learned to ignore the block edges and smudges
| unconsciously. NN models "recover" much complex and plausible-
| looking features. It is possible that some future general
| purpose image compressor would do the same thing to small
| numbers lossy JBIG2 did.
| ufocia wrote:
| How do we know whether it's an image with 16 fingers or it just
| looks like 16 fingers to us?
|
| I looked at the bear example above and I could see how either
| the AI thought that there was an animal face embedded in the
| fur or we just see the face in the fur. We see all kinds of
| faces on toast even though neither the bread slicers nor the
| toasters intend to create them.
| jfdi wrote:
| Anyone know of open models useful (and good quality) for going
| the other way? I.e., Input is a 800x600 jpg and output is 4k
| version.
| lsb wrote:
| You're looking for what's called upscaling, like with Stable
| Diffusion: https://huggingface.co/stabilityai/stable-
| diffusion-x4-upsca...
| cuuupid wrote:
| There are a bunch of great upscaler models although they tend
| to hallucinate a bit, I personally use magic-image-refiner:
|
| https://replicate.com/collections/super-resolution
| hansvm wrote:
| I haven't explored the current SOTA recently, but super-
| resolution has been pretty good for a lot of tasks for few
| years at least. Probably just start with hugging-face [0] and
| try a few out, especially diffusion-based models.
|
| [0]
| https://huggingface.co/docs/diffusers/api/pipelines/stable_d...
| godelski wrote:
| Look for SuperResolution. These models will typically come as a
| GAN, Normalizing Flow (or Score, NODE), or more recently
| Diffusion (or SNODE) (or some combination!). The one you want
| will depend on your computational resources, how lossy you are
| willing to be, and your image domain (if you're unwilling to
| tune). Real time (>60fps) is typically going to be a GAN or
| flow.
|
| Make sure to test the models before you deploy. Nothing will be
| lossless doing superresolution but flows can get you lossless
| in compression.
| sitkack wrote:
| Or else you get Ryan Gosling
| https://news.ycombinator.com/item?id=24196650
| davidbarker wrote:
| Magnific.ai (https://magnific.ai) is a paid tool that works
| well, but it is expensive.
|
| However, this weekend someone released an open-source version
| which has a similar output.
| (https://replicate.com/philipp1337x/clarity-upscaler)
|
| I'd recommend trying it. It takes a few tries to get the
| correct input parameters, and I've noticed anything approaching
| 4x scale tends to add unwanted hallucinations.
|
| For example, I had a picture of a bear I made with Midjourney.
| At a scale of 2x, it looked great. At a scale of 4x, it adds
| bear faces into the fur. It also tends to turn human faces into
| completely different people if they start too small.
|
| When it works, though, it really works. The detail it adds can
| be incredibly realistic.
|
| Example bear images:
|
| 1. The original from Midjourney:
| https://i.imgur.com/HNlofCw.jpeg
|
| 2. Upscaled 2x: https://i.imgur.com/wvcG6j3.jpeg
|
| 3. Upscaled 4x: https://i.imgur.com/Et9Gfgj.jpeg
|
| ----------
|
| The same person also released a lower-level version with more
| parameters to tinker with.
| (https://replicate.com/philipp1337x/multidiffusion-upscaler)
| quaintdev wrote:
| Here's free and open source alternative that works pretty
| well
|
| https://www.upscayl.org/
| aspyct wrote:
| That magnific.ai thingy is taking a lot of liberty on the
| images, and denaturing it.
|
| Their example with the cake is the most obvious. To me, the
| original image shows a delicious cake, and the modified one
| shows a cake that I would rather not eat...
| hug wrote:
| Every single one of their before & after photos looks worse
| in the after.
|
| The cartoons & illustrations lose all of their gradations
| in feeling & tone with every outline a harsh edge. The
| landscapes lose any sense of lushness and atmosphere,
| instead taking a high-clarity HDR look. Faces have
| blemishes inserted the original actor never had. Fruit is
| replaced with wax imitation.
|
| As an artist, I would never run any of my art through
| anything like this.
| jasonjmcghee wrote:
| Both of these links to replicate 404 for me
| davidbarker wrote:
| Ah, the user changed their username.
|
| https://replicate.com/philz1337x/clarity-upscaler
| https://replicate.com/philz1337x/multidiffusion-upscaler
| jfdi wrote:
| thank you! will enjoy reviewing each of these
| codercowmoo wrote:
| Current SOTA open source is I believe SUPIR (Example -
| https://replicate.com/p/okgiybdbnlcpu23suvqq6lufze), but it
| needs a lot of VRAM, or you can run it through replicate, or
| here's the repo (https://github.com/Fanghua-Yu/SUPIR)
| physPop wrote:
| This is called super resolution (SR). 2x SR is pretty safe and
| easy (so every pixel in becomes 2x2 out, in your example
| 800x600->1600x1200). Higher scalings are a lot harder and prone
| to hallucination, weird texturing, etc.
| holoduke wrote:
| How much vram is needed? And computing power? To open a webpage
| you soon need 24gb and 2 seconds of 1000 watts energy to
| uncompress images. Bandwidth is reduced from 2mb to only 20kb.
| guappa wrote:
| > Bandwidth is reduced from 2mb to only 20kb.
|
| Plus the entire model, which comes with incorrect cache headers
| and must be redownloaded all the time.
| Dwedit wrote:
| There was an earlier article (Sep 20, 2022) about using the
| Stable Diffusion VAE to perform image compression. Uses the VAE
| to change from pixel space to latent space, dithers the latent
| space down to 256 colors, then when it's time to decompress it,
| it de-noises that.
|
| https://pub.towardsai.net/stable-diffusion-based-image-compr...
|
| HN discussion: https://news.ycombinator.com/item?id=32907494
| dheera wrote:
| I've done a bunch of experiments on my own on the Stable
| Diffusion VAE.
|
| Even when going down to 4-6 bits per latent space pixel the
| results are surprisingly good.
|
| It's also interesting what happens if you ablate individual
| channels; ablating channel 0 results in faithful color but
| shitty edges, ablating channel 2 results in shitty color but
| good edges, etc.
|
| The one thing it fails catastrophically on though is small text
| in images. The Stable Diffusion VAE is not designed to
| represent text faithfully. (It's possible to train a VAE that
| does slightly better at this, though.)
| 3abiton wrote:
| How does the type of image (Anime, vs Photo realistic, vs
| Painting vs etc .m) affect the compression results? Is there
| a noticable difference?
| dheera wrote:
| I haven't noticed much difference between these. They're
| all well-represented in the VAE training set.
| StiffFreeze9 wrote:
| How badly will its lossy-ness change critical things? In 2013,
| there were Xerox copiers with aggressive compression that changed
| numbers,
| https://www.theregister.com/2013/08/06/xerox_copier_flaw_mea...
| bluedino wrote:
| If I zoom all the way with my iPhone, the camera-assisting
| intelligence will mess up numbers too
| qrian wrote:
| The mentioned Xerox copier incident was not an OCR failure,
| but the copier actively changed the numbers in the original
| image due to its image compression algorithm.
| barfbagginus wrote:
| Here's some of the context: www.dkriesel.com/blog/2013/0810
| _xerox_investigating_latest_mangling_test_findings
|
| Learn More: https://www.dkriesel.com/start?do=search&id=en%
| 3Aperson&q=Xe...
|
| Brief: Xerox machines used template matching to recycle the
| scanned images of individual digits that recur in the
| document. In 2013, Kriesel discovered this procedure was
| faulty.
|
| Rationale: This method can create smaller PDFs,
| advantageous for customers that scan and archive numerical
| documents.
|
| Prior art:
| https://link.springer.com/chapter/10.1007/3-540-19036-8_22
|
| Tech Problem: Xerox's template matching procedure was not
| reliable, sometimes "papering over" a digit with the wrong
| digit!
|
| PR Problem: Xerox press releases initially claimed this
| issue did not happen in the factory default mode. Kriesel
| demonstrated this was not true, by replicating the issue in
| all of the factory default compression modes including the
| "normal" mode. He gave a 2015 FrOSCon talk, "Lies, damned
| lies and scans".
|
| Interesting work!
| lifthrasiir wrote:
| Any lossy compressor changes the original image for better
| compression at expense of the perfect accuracy.
| skandium wrote:
| Exactly, in practice the alternatives are either blocky
| artifacts (JPEG and most other traditional codecs),
| blurring everything (learned codecs optimised for MSE) or
| "hallucinating" patterns when using models like GANs.
| However, in practice even the generative side of
| compression models is evaluated against the original
| image rather than only output quality, so the outputs
| tend to be passable.
|
| To see what a lossy generator hallucinating patterns
| means in practice, I recommend viewing HiFiC vs original
| here: https://hific.github.io/
| im3w1l wrote:
| Tradtional lossy compressors have well-understood
| artifacts. In particular they provide guarantees such
| that you can confidently say that an object in the image
| could not be an artifact.
| fieldcny wrote:
| The word perfect is misplaced, the trade off is size vs
| fidelity (aka accuracy)
| lifthrasiir wrote:
| This JBIG2 "myth" is too widespread. It is true that Xerox's
| algorithm mangled some numbers in its JBIG2 output, but it is
| not an inherent flaw of JBIG2 to start, and Xerox's encoder
| misbehaved almost exclusively for lower dpis---300dpi or more
| was barely affected. Other artifacts at lower resolution can
| exhibit similar mangling as well (specifics would of course
| vary), and this or similar incident wasn't repeated so far. So
| I don't feel it is even a worthy concern at this point.
| thrdbndndn wrote:
| 1. No one, at least not OP, ever said it's a inherent flaw of
| JBIG2. The fact it's an implementation error on XeroX's end
| is a good technical detail to know, but it is irrelevant to
| the topic.
|
| 2. "Lower DPI" is extremely common if your definition for
| that is 300dpi. At my company, all the text document are
| scanned at 200dpi by default. And 150dpi or even lower is
| perfectly readable if you don't use ridiculous compression
| ratios.
|
| > Other artifacts at lower resolution can exhibit similar
| mangling as well (specifics would of course vary)
|
| Majority of traditional compressions would make text
| unreadable when compression is too high or the source
| material is too low-resolution. They don't substitute one
| number for another in an "unambiguous" way (i.e. it clearly
| shows a wrong number instead of just a blurry blob that could
| be both).
|
| The "specifics" here is exactly what the whole topic is focus
| on, so you can't really gloss over it.
| lifthrasiir wrote:
| > 1. No one, at least not OP, ever said it's a inherent
| flaw of JBIG2. The fact it's an implementation error on
| XeroX's end is a good technical detail to know, but it is
| irrelevant to the topic.
|
| It is relevant only when you assume that lossy compression
| has no way to control or even know of such critical
| changes. In reality most lossy compression algorithms use a
| rate-distortion optimization, which is only possible when
| you have some idea about "distortion" in the first place.
| Given that the error rarely occurred in higher dpis, its
| cause should have been either a miscalculation of
| distortion or a misconfiguration of distortion thresholds
| for patching.
|
| In any case, a correct implementation should be able to do
| the correct thing. It would have been much problematic if
| similar cases were repeated, since it would mean that it is
| much harder to write a correct implementation than
| expected, but that didn't happen.
|
| > Majority of traditional compressions would make text
| unreadable when compression is too high or the source
| material is too low-resolution. They don't substitute one
| number for another in an "unambiguous" way (i.e. it clearly
| shows a wrong number instead of just a blurry blob that
| could be both).
|
| Traditional compressions simply didn't have much
| computational power to do so. The "blurry blob" is
| something with lower-frequency components only by
| definition, and you have only a small number of them, so
| they were easier to preserve even with limited resources.
| But if you have and recognize a similar enough pattern, it
| _should_ be exploited for further compression. Motion
| compensation in video codecs were already doing a similar
| thing, and either a filtering or intelligent quantization
| that preserves higher-frequency components would be able to
| do so too.
|
| ----
|
| > 2. "Lower DPI" is extremely common if your definition for
| that is 300dpi. At my company, all the text document are
| scanned at 200dpi by default. And 150dpi or even lower is
| perfectly readable if you don't use ridiculous compression
| ratios.
|
| I admit I have generalized too much, but the choice of scan
| resolution is highly specific to contents, font sizes and
| even writing systems. If you and your company can cope with
| lower DPIs, that's good for you, but I believe 300 dpi is
| indeed the safe minimum.
| _kb wrote:
| The suitable lossy-ness (of any compression method) is entirely
| dependant on context. There is no one size fits all approach
| for all uses cases.
|
| One key item with emerging 'AI compression' techniques is the
| information loss is not deterministic which somewhat
| complicates assessing suitability.
| fl7305 wrote:
| > the information loss is not deterministic
|
| It is technically possible to make it deterministic.
|
| The main reason you don't deterministic outputs today is that
| Cuda/GPU optimizations make the calculations run much faster
| if you let them be undeterministic.
|
| The internal GPU scheduler will then process things in the
| order it thinks is fastest.
|
| Since floating point is not associative, you can get
| different results for (a + (b + c)) and ((a + b) + c).
| _kb wrote:
| The challenge goes beyond rounding errors.
|
| Many core codecs are pretty good at adhering to reference
| implementations, but are still open to similar issues so
| may not be bit exact.
|
| With a DCT or wavelet transform, quantisation, chroma
| subsampling, entropy coding, motion prediction and the
| suite of other techniques that go into modern media
| squishing it's possible to mostly reason about what type of
| error will come out the other end of the system for a yet
| to be seen input.
|
| When that system is replaced by a non-linear box of
| mystery, this ability is lost.
| begueradj wrote:
| That was interesting (info in your link)
| thomastjeffery wrote:
| Lossy compression has the same problem it has always had: lossy
| metadata.
|
| The contextual information surrounding intentional data loss
| needs to be preserved. Without that context, we become ignorant
| of the missing data. Worst case, you get replaced numbers.
| Average case, you get lossy->lossy transcodes, which is why we
| end up with degraded content.
|
| There are only two places to put that contextual information:
| metadata and watermarks. Metadata can be written to a file, but
| there is no guarantee it will be copied with that data.
| Watermarks fundamentally degrade the content once, and may not
| be preserved in derivative works.
|
| I wish that the generative model explosion would result in a
| better culture of metadata preservation. Unfortunately, it
| looks like the focus is on metadata instead.
| rottc0dd wrote:
| Something similar by Fabrice Bellard:
|
| https://bellard.org/nncp/
| p0w3n3d wrote:
| Some people are fans of Metallica or Taylor Swift. I think
| Fabrice Bellard should get the same attention!
| p0w3n3d wrote:
| And the same money for performance, of course
| skandium wrote:
| If you look at the winners of the Hutter prize, or especially
| the Large Text Compression Benchmark, then almost every
| approach uses some kind of machine learning approach for the
| adaptive probability model and then either arithmetic coding or
| rANS to losslessly encode it.
|
| This is intuitive, as the competition organisers say:
| compression is prediction.
| mbtwl wrote:
| A first NN based image compression standard is currently being
| developed by JPEG. More information can be found here:
| https://jpeg.org/jpegai/documentation.html
|
| Best overview you can probably get from "JPEG AI Overview Slides"
| calebm wrote:
| All learning is compression
___________________________________________________________________
(page generated 2024-03-18 23:02 UTC)