[HN Gopher] Stable Diffusion based image compression
___________________________________________________________________
Stable Diffusion based image compression
Author : nanidin
Score : 416 points
Date : 2022-09-20 03:58 UTC (19 hours ago)
(HTM) web link (matthias-buehlmann.medium.com)
(TXT) w3m dump (matthias-buehlmann.medium.com)
| bjornsing wrote:
| If it's a VAE then the latents should really be distributions,
| usually represented as the mean and variance of a normal
| distribution. If so then it should be possible to use the
| variance to determine to what precision a particular latent needs
| to be encoded. Could perhaps help increase the compression
| further.
| nullc wrote:
| Why aren't they scaled to have uniform variances?
| euphetar wrote:
| I am currently also playing around with this. The best part is
| that for storage you don't need to store the reconstructed image,
| just the latent representation and the VAE decoder (which can do
| the reconstructing later). So you can store the image as
| relatively few numbers in a database. In my experiment I was able
| to compress a (512, 384, 3) RGB image to (48, 64, 4) floats. In
| terms of memory it was a 8x reduction.
|
| However, on some images the artefacts are terrible. It does not
| work as a general-purpose lossy compressor unless you don't care
| about details.
|
| The main obstacle is compute. The model is quite large, but hdds
| are cheap. The real problem is that reconstruction requires a GPU
| with lots of VRAM. Even with a GPU it's 15 seconds to reconstruct
| an image in Google Collab. You could do it on CPU, but then it's
| extremely slow. This is only viable if compute costs go down a
| lot.
| holoduke wrote:
| In the future you can have full 16k movies representing only
| 1.44mb seeds. A giant 500 petabyte trained model file can run
| those movies. You can even generate your own movie by uploading a
| book.
| monokai_nl wrote:
| Probably very unlikely, but sometimes I wonder if Jan Sloot did
| something like this back in '95:
| https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System
| aaroninsf wrote:
| I would call this "confabulation" more than compression.
|
| Its accuracy is proportional to and bounded by the training data;
| I suspect in practice it's got a specific strength (filling in
| fungible detail) and as discussed ITT with fascinating and gnarly
| corners, some specific failure modes which are going to lead to
| bad outcomes.
|
| At least with "lossy" CODECs of various kinds, even if you don't
| attend to absence until you do an A/B comparison, you can
| perceive the difference when you do do those comparisons.
|
| In this case the serious peril is that an A/B comparison is
| [soon] going to just show difference. "What... is... the Real?"
|
| When you contemplate that an ever-increasing proportion of the
| training data itself stems from AI- or otherwise-enhanced
| imagery,
|
| our hold on the real has never felt weaker, and our vulnerability
| to the rewriting of reality, has never felt more present.
| bane wrote:
| The basic premise of these kinds of compression algorithms is
| actually pretty clever. Here's a very _very_ trivialization of
| this style of approach:
|
| 1. both the compressor and decompressor contain knowledge beyond
| the algorithm used to compress/decompress some data
|
| 2. in this case the knowledge might be "all the images in the
| world"
|
| 3. when presented with an image, the compressor simply looks up
| some index or identifier of the the image
|
| 4. the identifier is passed around as the "compressed image"
|
| 5. "decompression" means looking up the identifier and retrieving
| the image
|
| I've heard this called "compression via database" before and it
| can give the appearance of defeating Shannon theorem for
| compression even though it doesn't do that at all.
|
| Of course the author's idea is significantly more sophisticated
| than the approach above, and trades a lossy approach for some
| gains in storage and retrieval efficiency (we don't have to have
| a copy of all of the pictures in the world in both the compressor
| and the decompressor). The evaluation note of not using any known
| image for the tests further challenges the approach and helps
| sus-out where there are specific challenge like poor
| reconstruction of specific image constructs like faces or text --
| I suspect that there are many other issues like these but the
| author honed in on these because we (as literate humans) are
| particularly sensitive to them.
|
| In these types of lossy compression approaches (as opposed to the
| above which is lossless) the basic approach is:
|
| 1. Throw away data until you get to the desired file size. You
| usually want to come up with some clever scheme to decide what
| data you toss out. Alternative, just hash the input data using
| some hash function that produces just the right number of bits
| you want, but use a scheme that results in a hash digest that can
| act as a (non-unique) index to the original image in a table of
| every image in the world.
|
| 2. For images it's usually easy to eliminate pixels (resolution)
| and color (bit-depth, channels, etc.). In this specific case, the
| author uses an variational autoencoder to "choose" what gets
| tossed. I suspect the autoencoder is very good at preserving
| information rich, or high-entropy, information dense slices of a
| latent space or something. At any rate, this produces something
| that to us sorta kinda looks like a very low resolution, poorly
| colored postage stamp of the original image, but actually
| contains more data than that. I think at this point it can just
| be considered the hash digest.
|
| 3. this hash digest, or VAE encoded image or whatever we want to
| call it, is what's passed around as the "compressed" data.
|
| 4. just like above, "decompression" means effectively looking up
| the value in a "database". If we are working with hash digests,
| there was probably a collision during the construction of the
| database of all images, so we lost some information. In this case
| we're dealing with stable diffusion and instead of a simple
| index->table entry, our "compressed" VAE image wraps through some
| hyperspace to find the nearest preserved data. Since the VAE
| "pixels" probably align close to data dense areas of the space
| you tend to get back data that closely represents the original
| image. It's still a database lookup in that sense, but it's
| looking more for "similar" rather than "exact matches" which when
| used to rebuild the image give a good approximation of the
| original.
|
| Because it's an "approximation" it's "lossy". In fact I think
| it'd be more accurate to say it's "generally lossy" as there is a
| chance the original image can be reproduced _exactly_ ,
| especially if it's in the original training data. Which is why
| the author was careful not to use anything from that set.
|
| Because we've stored so much information in the compressor and
| decompressor, it can also give the appearance of defeating
| Shannon entropy for compression except it's also not because:
|
| a) it's generally lossy
|
| b) just like the original example above we're cheating by simply
| storing lots of information elsewhere
|
| There's probably some deep mathematical relationship between the
| author's approach and compressive sensing.
|
| Still, it's useful, and has the possibility of improving data
| transmission speeds at the cost of storing lots of local data at
| both ends.
|
| Source: Many years ago before deep learning was even a "thing", I
| worked briefly on some compression algorithms in an effort to
| reduce data transfer issues in telecom poor regions. One of our
| approaches was not too dissimilar to this -- throw away a bunch
| of the original data in a structured way and use a smart
| algorithm and some stored heuristics in the decompressor to guess
| what we threw away. Our scheme had the benefit of almost
| absolutely trivial "compression" with the downside of massive
| computational needs on the "decompression" side, but had lots of
| nice performance guarantees which you could use to design the
| data transport stuff around.
|
| *edit* sorry if this explanation is confusing, it's been a while
| and it's also very late where I am. I just found this post really
| fun.
| nl wrote:
| For people interested in more about this, it's probably worth
| reading the Hutter Prize FAQ: http://prize.hutter1.net/hfaq.htm
| tomxor wrote:
| Doesn't decompression require the entire stable fusion model?
| (and the exact same model at that)
|
| This could be interesting but I'm wondering if the compression
| size is more a result of the benefit of what is essentially a
| massive offline dictionary built into the decoder vs some
| intrinsic benefit to processing the image in latent space based
| on the information in the image alone.
|
| That said... I suppose it's actually quite hard to implement a
| "standard image dictionary" and this could be a good way to do
| that.
| operator-name wrote:
| The latent space _is_ the massive offline dictionary, and the
| benifit is not having to hand craft the massive offline
| dictionary?
| tomxor wrote:
| For those of us unfamiliar... roughly how large is that in
| terms of bytes?
| tantalor wrote:
| I thought that's what "some important caveats" was going to be,
| but no, article didn't mention this.
| thehappypm wrote:
| Haha. Here's a faster compression model. Make a database of
| every image ever made. Compute a thumbprint and use that as the
| index of the database. Boom!
| Sohcahtoa82 wrote:
| A quick Google says there are 10^72 to 10^82 atoms in the
| universe.
|
| Assuming 24-bit color, if you could store an entire image in
| a single atom, then you could store images that are only 60
| pixels and each atom would still have a unique image.
| thehappypm wrote:
| Not every possible image has been produced!
| Sohcahtoa82 wrote:
| I'll get started, then!
| Xcelerate wrote:
| Great idea to use Stable Diffusion for image compression. There
| are deep links between machine learning and data compression
| (which I'm sure the author is aware of).
|
| If you could compute the true conditional Kolmogorov complexity
| of an image or video file given all visual online media as the
| prior, I imagine you would obtain mind-blowing compression
| ratios.
|
| People complain of the biased artifacts that appear when using
| neural networks for compression, but I'm not concerned in the
| long term. The ability to extract algorithmic redundancy from
| images using neural networks is obviously on its way to
| outclassing manually crafted approaches, and it's just a matter
| of time before we are able to tack on a debiasing step to the
| process (such that the distribution of error between the
| reconstructed image and the ground truth has certain nice
| properties).
| aaaaaaaaaaab wrote:
| Save around a kilobyte with a decompressor that's ~5Gbyte.
| egypturnash wrote:
| _To evaluate this experimental compression codec, I didn't use
| any of the standard test images or images found online in order
| to ensure that I'm not testing it on any data that might have
| been used in the training set of the Stable Diffusion model
| (because such images might get an unfair compression advantage,
| since part of their data might already be encoded in the trained
| model)._
|
| I think it would be _very interesting_ to determine if these
| images _do_ come back with notably better compression.
| bane wrote:
| Given the approach, they'll probably come back with better
| reconstruction/decompression too.
| pishpash wrote:
| Not clear. Fully encoding the training images could not be a
| feasible aspect of a good auto-encoder.
| Dwedit wrote:
| On another note, you can also downscale an image, save it as a
| JPEG or whatever, then Upscale it back using AI upscaling.
| madsbuch wrote:
| It is really interesting to talk about semantic lossy
| compression, which is probably what we get.
|
| Where recreating with traditional codices introduce syntactic
| noise, then this will introduce semantic noise.
|
| Imagine seeing a high res perfect picture, just until you see the
| source image and discover that it was reinterpreted..
|
| It is also going to be interesting, to see if this method will be
| chosen for specific pictures, eg. pictures of celebrity objects
| (or people, when/if issues around that resolve), but for novel
| things, we need to use "syntactical" compression.
| lastdong wrote:
| Extraordinary! Is it going to be called Pied Piper?
| mjan22640 wrote:
| What they do is essentially a fractal compression with an
| external library of patterns (that was IIRC pattented but the
| patent should be long expired).
| pishpash wrote:
| This does remind of fractal compression [1] from the 90's which
| never took off for various reasons which will be relevant here
| as well.
|
| [1] https://en.wikipedia.org/wiki/Fractal_compression
| eru wrote:
| Compare compressed sensing's single pixel camera:
| https://news.mit.edu/2017/faster-single-pixel-camera-lensles...
| fritzo wrote:
| I'd love to see a series of increasingly compressed images, say
| 8kb -> 4kb -> 2kb -> ... -> 2bits -> 1bit. This would be a great
| way to demonstrate the increasing fictionalization of the
| method's recall.
| minimaxir wrote:
| For text, GPT-2 was used in a similar demo a year ago albeit said
| demo is now defunct:
| https://news.ycombinator.com/item?id=23618465
| DrNosferatu wrote:
| Nice work!
|
| However, a cautionary tale on AI medical image "denoising":
|
| (and beyond, in science)
|
| - See the artifacts?
|
| The algorithm plugs into ambiguous areas of the image stuff it
| has seen before / it was trained with. So, if such a system was
| to "denoise" (or compress, which - if you think about it - is
| basically the same operation) CT scans, X-rays, MRIs, etc., in
| ambiguous areas it could plug-in diseased tissue where the
| ground-truth was actually healthy.
|
| Or the opposite, which is even worse: substitute diseased areas
| of the scan with healthy looking imagery it had been trained on.
|
| Reading recent publications that try to do "denoising" or
| resolution "enhancement" in medical imaging contexts, the authors
| seem to be completely oblivious to this pitfall.
|
| (maybe they had a background as World Bank / IMF economists?)
| ska wrote:
| People have been publishing fairly useless papers "for" medical
| imaging enhancement/improvement for 3+ decades now. NB this is
| not universal (there are some good ones) and _not_ limited to
| AI techniques, although essentially every AI technique that
| comes along gets applied to compression
| /denoising/"superres"/etc. if it can, eventually.
|
| The main problems is that that typical imaging researchers are
| too far from actual clinical applications, and often trying to
| solve the wrong problems. It's a structural problem with
| academic and clinical incentives, as much as anything else.
| fny wrote:
| There is nothing in the article suggesting this should be used
| for medical imaging.
| gregw134 wrote:
| Fun to imagine this could show up in future court cases. Is the
| picture true, or were details changed by the ai compression
| algorithm?
| petesergeant wrote:
| From the article:
|
| > a bit of a danger of this method: One must not be fooled by
| the quality of the reconstructed features -- the content may be
| affected by compression artifacts, even if it looks very clear
|
| ... plus an excellent image showing the algorithm straight
| making stuff up, so I suspect the author is aware.
| anarticle wrote:
| In my xp, medical imaging at the diagnostic tier uses only
| lossless (JPEG2000 et al). It was explicitly mentioned on our
| SOP/policies that we had to have a lossless setup.
|
| Very sketchy to use a super resolution for diagnostics. In
| research (flourescence), sure.
|
| ref: my direct experience of pathology slide scanning machines
| and their setup.
| adammarples wrote:
| Mentioned in TFA at least twice
| Der_Einzige wrote:
| Sounds like you need lossless compression.
|
| I was told that the GPT-2 text compression variant was a
| lossless compressor (https://bellard.org/libnc/gpt2tc.html),
| why is stable diffusion lossy?
| operator-name wrote:
| Probably something to do with the variational auto encoder,
| which is lossy.
| theemathas wrote:
| Here's a similar case of a scanner using a traditional
| compression algorithm. It has a bug in the compression
| algorithm, which made it replace a number in the scanned image
| with a different number.
|
| https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...
| sgc wrote:
| That is completely outside all my expectations prior to
| reading it. The consequences are potentially life and death,
| or incarceration, etc, and yet they did nothing until called
| out and basically forced to act.
|
| A good reminder that the bug can be anywhere, and when things
| stop working we often need to get very dumb, and just
| methodically troubleshoot.
| function_seven wrote:
| We programmers tend to think our abstractions match reality
| somehow. Or that they don't leak. Or even if they _do_
| leak, that leakage won 't spill down several layers of
| abstraction.
|
| I used to install T1 lines a long time ago. One day we had
| a customer that complained that their T1 was dropping every
| afternoon. We ran tests on the line for extended periods of
| time trying to troubleshoot the problem. Every test passed.
| Not a single bit error no matter what test pattern we used.
|
| We monitored it while they used it and saw not a single
| error, except for when the line completely dropped. We
| replaced the NIU card, no change.
|
| Customer then hit us with, "it looks like it only happens
| when Jim VNCs to our remote server".
|
| Obviously a userland program (VNC) could not possibly cause
| our NIU to reboot, right?? It's several layers "up the
| stack" from the physical equipment sending the DS1 signal
| over the copper.
|
| But that's what it was. We reliably triggered the issue by
| running VNC on their network. We ended up changing the NIU
| and corresponding CO card to a different manufacturer (from
| Adtran to Soneplex I think?) to fix the issue. I wish I had
| had time to really dig into that one, because obviously
| other customers used VNC with no issues. Adtran was our
| typical setup. Nothing else was weird about this standard
| T1 install. But somehow the combination of our equipment,
| their networking gear, and that program on that workstation
| caused the local loop equipment to lose its mind.
|
| This number-swapping story hit me the same way. We would
| all expect a compression bug to manifest as blurry text, or
| weird artifacts. We would never suspect a clean
| substitution of a meaningful symbol in what is "just a
| raster image".
| jend wrote:
| Reminds me of this story:
| http://blog.krisk.org/2013/02/packets-of-death.html
|
| tldr: Specific packet content triggers a bug in the
| firmware of an Intel network card and bricks it until
| powered off.
| SV_BubbleTime wrote:
| Iirc, this was an issue or conspiracy fuel or whatever with
| the birth certificate that Obama released. That some of the
| unique elements in the scan repeated over and over.
| caycep wrote:
| I assume something like jpeg (used in the DICOM standard
| today) has more eyes on the code than proprietary Xerox
| stuff? hopefully at least...
|
| I have seen weird artifacts on MRI scans, specifically the
| FLAIR image enhancement algorithm used on T2 images, i.e.
| white spots, which could in theory be interpreted by a
| radiology as small strokes or MS...so I always take what I
| see with a grain of salt..
| ska wrote:
| The DICOM standard stuff did have a lot of eyes on it, and
| was tuned toward fidelity which helps. It's not perfect,
| but what is.
|
| MRI artifacts though are a whole can of worms, but
| fundamentally most of them come from a combination of the
| EM physics involved, and the reconstruction algorithm
| needed to produce an image from the frequency data.
|
| I'm not sure what you mean by "image enhancement
| algorithm"; FLAIR is a pulse sequence used to suppress
| certain fluid signals, typically used in spine and brain.
|
| Many of the bright spots you see in FLAIR are due to B1
| inhomogeneity, iirc (it's been a while though)
| ska wrote:
| Probably worth mentioning also that "used in DICOM
| standard" is true but possibly misleading to someone
| unfamiliar with it.
|
| DICOM is a vast standard. In it's many crevasses, it
| contains wire and file encoding schemas, some of which
| include (many different) image data types, some of which
| allow (multiple) compression schemes, both lossy and
| lossless, as well as metadata schemes. These include JPEG,
| JPEG-LS, JPEG-2000, MPEG2/4, HVEC.
|
| I think you have to encode the compression ratio as well,
| if you do lossy compression. You definitely have to note
| that you did lossy compression.
| [deleted]
| vjeux wrote:
| How long does it take to compress and decompress an image that
| way?
| fzzt wrote:
| The prospect of the images getting "structurally" garbled in
| unpredictable ways would probably limit real-world applications:
| https://miro.medium.com/max/4800/1*RCG7lcPNGAUnpkeSsYGGbg.pn...
|
| There's something to be said about compression algorithms being
| predictable, deterministic, and only capable of introducing
| defects that stand out as compression artifacts.
|
| Plus, decoding performance and power consumption matters,
| especially on mobile devices (which also happens be the setting
| where bandwidth gains are most meaningful).
| kevincox wrote:
| While that is kind of true it is also sort of the point.
|
| The optimal lossy compression algorithm would be based on
| humans as a target. it would remove details that we wouldn't
| notice to reduce the target size. If you show me a photo of a
| face in front of some grass the optimal solution would likely
| be to reproduce that face in high detail but replace the grass
| with "stock imagery".
|
| I guess it comes down to what is important. In the past
| algorithms were focused on visual perception, but maybe we are
| getting so good at convincingly removing unnecessary detail
| that we need to spend more time teaching the compressor what
| details are important. For example if I know the person in the
| grass preserving the face is important. If I don't know them
| then it could be replaced by a stock face as well. Maybe the
| optimal compression of a crowd of people is the 2 faces of
| people I know preserved accurately and the rest replaced with
| "stock" faces.
| anilakar wrote:
| Remember the Xerox scan-to-email scandal in which tiling
| compression was replacing numbers in structural drawings? We're
| talking about similar repercussions here.
| behnamoh wrote:
| This reminds me of a question I have about SD: why can't it do
| a simple OCR to know those are characters not random shapes?
| It's baffling that neither SD nor DE2 have any understanding of
| the content they produce.
| nl wrote:
| > why can't it do a simple OCR to know those are characters
| not random shapes?
|
| It's pretty easy to add this if you wanted to.
|
| But a better method would be to fine tune on a bunch of
| machine-generated images of words if you want your model to
| be good at generating characters. You'll need to consider
| which of the many Unicode character sets you want your model
| to specialize in though.
| Xcelerate wrote:
| You could certainly apply a "duct tape" solution like that,
| but the issue is that neural networks were developed to
| replace what were previously entire solutions built on a
| "duct tape" collection of rule-based approaches (see the
| early attempts at image recognition). So it would be nice to
| solve the problem in a more general way.
| montebicyclelo wrote:
| Just a note that stable diffusion is/can be deterministic (if
| set an rng seed).
| shrx wrote:
| I was told (on the Unstable Diffusion discord, so this info
| might not be reliable) that even with using the same seed the
| results will differ if the model is running on a different
| GPU. This was also my experience when I couldn't reproduce
| the results generated by the discord's SD txt2img generating
| bot.
| nl wrote:
| It absolutely should be reproducable, and in my experience
| it is.
|
| I do tend to use the HuggingFace version though.
| montebicyclelo wrote:
| I'm not sure about the different GPU issue. But if that is
| an issue, the model can be made deterministic (probably
| compromising inference speed), by making sure the
| calculations are computed deterministically.
| cma wrote:
| With compression you often make a prediction then delta off of
| it. A structurally garbled one could be discarded or just
| result in a worse baseline for the delta.
| bscphil wrote:
| A few thoughts that aren't related to each other.
|
| 1. This is a brilliant hack. Kudos.
|
| 2. It would be great to see the best codecs included in the
| comparison - AVIF and JPEG XL. Without those it's rather
| incomplete. No surprise that JPEG and WEBP totally fall apart at
| that bitrate.
|
| 3. A significant limitation of the approach seems to be that it
| targets extremely low bitrates where other codecs fall apart, but
| at these bitrates it incurs problems of its own (artifacts take
| the form of meaningful changes to the source image instead of
| blur or blocking, very high computational complexity for the
| decoder).
|
| When only moderate compression is needed, codecs like JPEG XL
| already achieve very good results. This proof of concept focuses
| on the extreme case, but I wonder what would happen if you
| targeted much higher bitrates, say 5x higher than used here. I
| suspect (but have no evidence) that JPEG XL would improve in
| fidelity _faster_ as you gave it more bits than this SD-based
| technique. _Transparent_ compression, where the eye can 't tell a
| visual difference between source and transcode (at least without
| zooming in) is the optimal case for JPEG XL. I wonder what sort
| of bitrate you'd need to provide that kind of guarantee with this
| technique.
| leeoniya wrote:
| also thought it was odd that AVIF was not compared - it would
| show a major quality and size improvement over WebP.
| [deleted]
| goombacloud wrote:
| The comparison doesn't make much sense because for fair
| comparisons you have to measure decompressor size plus encoded
| image size. The decompressor here is super huge because it
| includes the whole AI model. Also, everyone needs to have the
| exact same copy of the model in the decompressor for it to work
| reliably.
| wongarsu wrote:
| Only if decompressor and image are transmitted over the same
| channel at the same time, and you only have a small number of
| images. When compressing images for the web I don't care if a
| webp decompressor is smaller than a jpg or png decompressor,
| because the recipient already has all of those.
|
| Of course stable diffusion's 4GB is much more extreme than
| Brotli's 120kb dictionary size, and would bloat a Browser's
| install size substantially. But for someone like Instagram or
| a Camera maker it could still make sense. Or imagine phones
| having the dictionary shipped in the OS to save just a couple
| kB on bad data connections.
| operator-name wrote:
| Even if dictionaries were shipped, the biggest difficulty
| would be performance and resources. Most of these models
| require beefy compute and a large amount of VRAM that isn't
| likely to ever exist on end devices.
|
| Unless that can be resolved it just doesn't make sense to
| use it as a (de)compressor.
| a-dub wrote:
| hm. would be interesting to see if any of the perceptual image
| compression quality metrics could be inserted into the vae step
| to improve quality and performance...
| fjkdlsjflkds wrote:
| This is not really "stable-diffusion based image compression",
| since it only uses the VAE part of "stable diffusion", and not
| the denoising UNet.
|
| Technically, this is simply "VAE-based image compression" (that
| uses stable diffusion v1.4's pretrained variational autoencoder)
| that takes the VAE representations and quantizes them.
|
| (Note: not saying this is not interesting or useful; just that
| it's not what it says on the label)
|
| Using the "denoising UNet" would make the method more
| computationally expensive, but probably even better (e.g., you
| can quantize the internal VAE representations more aggressively,
| since the denoising step might be able to recover the original
| data anyway).
| gliptic wrote:
| It is using the UNet, though.
| nl wrote:
| It does use the UNet to denoise the VAE compressed image:
|
| "The dithering of the palettized latents has introduced noise,
| which distorts the decoded result. But since Stable Diffusion
| is based on de-noising of latents, we can use the U-Net to
| remove the noise introduced by the dithering."
|
| The included Colab doesn't have line numbers, but you can see
| the code doing it: # Use Stable Diffusion U-Net
| to de-noise the dithered latents latents =
| denoise(latents) denoised_img = to_img(latents)
| display(denoised_img) del latents print('VAE
| decoding of de-noised dithered 8-bit latents')
| print('size: {}b = {}kB'.format(sd_bytes, sd_bytes/1024.0))
| print_metrics(gt_img, denoised_img)
| fjkdlsjflkds wrote:
| I stand corrected, then :) cheers.
| zcw100 wrote:
| You can do lossless neural compression too.
| fho wrote:
| > Quantizing the latents from floating point to 8-bit unsigned
| integers by scaling, clamping and then remapping them results in
| only very little visible reconstruction error.
|
| This might actually be interesting/important for the OpenVINO
| adaptation of SD ... from what I gathered from the OpenVINO
| documentation, quantizing is actually a big part of optimizing as
| this allows the usage of Intels new(-ish) NN instruction sets.
| stavros wrote:
| Didn't I do this last week?
| dwohnitmok wrote:
| Indeed one way of looking at intelligence is that it is a method
| of compressing the external universe.
|
| See e.g. the Hutter Prize.
| mjan22640 wrote:
| The feeling of understanding is essentially a decompression
| result being successfuly pattern matched.
| dan_mctree wrote:
| Our sight is light detection compressed into human thought
|
| Written language is human thought compressed into words
|
| Digital images are light detection compressed into bits
|
| Text to images AI compress digital images into written language
|
| Then how do the AI weights relate to human thought?
| Jack000 wrote:
| The vae used in stable diffusion is not ideal for compression. I
| think it would be better to use the vector-quantized variant (by
| the same authors of latent diffusion) instead of the KL variant,
| then store the indexes for each quantized vector using standard
| entropy coding algorithms.
|
| From the paper the VQ variant also performs better overall, SD
| may have chosen the KL variant only to lower vram use.
| GaggiX wrote:
| KL models performs better than VQ models as you can see in the
| latent diffusion repo by CompVis.
| Jack000 wrote:
| just checked the paper again and yes you're right, the KL
| version is better on the openimages dataset. The VQ version
| is better in the inpainting comparison.
|
| In this case you'd still want to use the VQ version though,
| it doesn't make sense to do an 8bit quantization on the KL
| vectors when there's an existing quantization learned through
| training.
| akvadrako wrote:
| I would like to see this with much smaller file sizes - like 100
| bytes. How well can SD preserve the core subjects or meaning of
| the photos?
| pishpash wrote:
| You can already "compress" them down to a few words, so you
| have your answer there.
| fla wrote:
| Is there a general name for this kind of latent space round-trip
| compression ? If not, I think a good name could be "interpretive
| compression"
| pyinstallwoes wrote:
| This relates to a strong hunch that consciousness is tightly
| coupled to whatever compression is as an irreducible entity.
|
| Memory <> Compression <> Language <> Signal Strength <> Harmonics
| and Ratios
| mjan22640 wrote:
| Consciousness is IMHO being avare of being avare. The mystic
| specialty of it is IMHO a mental illusion, like the Penrose
| ladder optical illusion.
| eru wrote:
| I see the relation between compression and consciousness. But
| what do you mean by irreducible entity, and how does it relate
| to the two?
| pyinstallwoes wrote:
| By irreducible entity, as the yet undefined entity that sits
| at the nexus of mathematics, philosophy, computation, logic
| (consciousness).
|
| It's not a well defined ontology yet. So whatever it is, at
| its irreducible size pinpointing it as a thing in which gives
| rise to such other things.
| eru wrote:
| What kind of reductions would be disallowed?
| nl wrote:
| I don't understand much of what the OP is saying.
|
| But I do like the Stephen Wolfram idea of consciousness being
| the way a computationally bounded observer develops a
| coherent view of a branching universe.
|
| This is related to compression because it a (lossy!)
| reduction in information.
|
| I understand that Wolfram is controversial, but the
| information-transmission-centric view of reality he works
| with makes a lot of intuitive sense to me.
|
| https://writings.stephenwolfram.com/2021/03/what-is-
| consciou...
| jwr wrote:
| While this is great as an experiment, before you jump into
| practical applications, it is worth remembering that the
| decompressor is roughly 5GB in size :-)
| red75prime wrote:
| It reminded me of a scene from "A Fire Upon the Deep" where
| connection bitrate is abysmal, but the video is crisp and
| realistic. It is used as a tool for deception, as it happens.
| Invisible information loss has its costs.
| Dwedit wrote:
| This is why for compression tests, they incorporate the size of
| everything needed to decompress the file. You can compress down
| to 4.97KB all you want, just include the 4GB trained model.
| janekm wrote:
| Is that true? I have never seen this done for any image
| compression comparisons that I have seen (i.e. only data that
| is specific to the image that is being compressed is included,
| not standard tables that are always used by the algorithm like
| the quantisation tables used in JPG compression)
| jerf wrote:
| Yes, it is done all the time.
|
| However, several people here are conflating "best compression
| as determined for a competition" and "best compression for
| use in the real world". There is an important relationship
| between them, absolutely, but in the real world we do not
| download custom decoders for every bit of compressed content.
| Just because there is a competition that quite correctly
| measures the entire size of the decompressor and encoded
| content does not mean that is now the only valid metric to
| measure decompression performance. The competitions use that
| metric for good and valid reasons, but those good and valid
| reasons are only vaguely correlated to the issues faced in
| the normal world.
|
| (Among the reasons why competitions must include the size of
| the decoder is that without that the answer is trivial; I
| define all your test inputs as a simple enumeration of them
| and my decoder hard-codes the output as the test values. This
| is trivially the optimal algorithm, making competition
| useless. If you could have a real-world encoder that worked
| this well, and had the storage to implement it, it would be
| optimal, but you can't possibly store all possible messages.
| For a humorous demonstration of this encoding method, see the
| classic joke: https://onemansblog.com/2010/05/18/prison-joke/
| )
| fsiefken wrote:
| For text compression benchmarks it's done
| http://mattmahoney.net/dc/text.html
|
| Matt doesn't do this on the Silesia corpus compression
| benchmark, even though it would make sense there as well:
| http://mattmahoney.net/dc/silesia.html
|
| So a compressor of a few gigabyte would make sense if you
| would have a set of pictures of more then a few gigabyte.
| It's a bit similar to preprocessing text compression with a
| dictionary and adding the dictionary to the extractor to
| squeeze a bit more bytes.
| goombacloud wrote:
| By the way, the leading nncp in the LTCB (text.html) "is a
| free, experimental file compressor by Fabrice Bellard,
| released May 8, 2019" :)
| Gigachad wrote:
| Do you also include the library to render a jpeg? And maybe the
| whole OS required to display it on your screen?
|
| There are very many uses where any fixed overhead is
| meaningless. Imagine archiving billions of images for long term
| storage. The 4GB model quickly becomes meaningless.
| stavros wrote:
| > Do you also include the library to render a jpeg? And maybe
| the whole OS required to display it on your screen?
|
| No, what does that have to do with reconstructing the
| original data?
|
| If the fixed overhead works for you, that's fine, but
| including it is not meaningless.
| 112233 wrote:
| Fixed overheads are never meaningless. O(n^2) algorithm that
| processes your data in 5s is faster on your data than O(log
| n) that takes 20 hours.
|
| Long term storage of billions of images is meaningless, if it
| takes billions of years to archive these images.
| Gigachad wrote:
| It's a one time cost rather than per image. You need the
| 4GB model only once and then you can uncompress unlimited
| images.
| 112233 wrote:
| Yes, but each image needs access to this 4GB (actually, I
| have no idea how much RAM it takes up), plus whatever the
| working set size is. It is a non-trivial overhead that
| really limits throughput of your system, so you can
| process less images in parallel, so compressing billion
| of images in reasonable time suddenly may cost much more
| than the amount of storage it would save, compared to
| other methods.
| quickthrower2 wrote:
| If this were used in the wild, do you need a copy of the model
| locally to decompress the images?
| coffee_beqn wrote:
| And how much compute time/power does "decompressing" take
| compared to a jpg?
| mcbuilder wrote:
| Yes, but possibly not the entire model, hypothetically for
| instance some fine-tuning on compression and then distillation.
| Gigachad wrote:
| I can imagine some uses for this. Imagine having to archive a
| massive dataset where it's unlikely any individual image will
| be retrieved and where perfect accuracy isn't required.
|
| Could cut down storage costs a lot.
| kgeist wrote:
| I heard Stable Diffusion's model is just 4 GB. It's incredible
| that billions of images could be squeezed in just 4 GB. Sure it's
| lossy compression but still.
| eru wrote:
| In this regard, stable diffusion is not so much comparable to a
| corpus of jpeg images, but with the jpeg compression
| algorithms.
| akomtu wrote:
| I think it's easy to explain. If we split all those images into
| small 8x8 chunks, and put all the chunks into a fuzzy and a bit
| lossy hashtable, we'll see that many chunks are very similar
| and can be merged into one. To address this "space of 8x8
| chunks" we'll apply PCA to them, just like in jpeg, and use
| only the top most significant components of the PCA vectors.
|
| So in essense, this SD model is like an Alexandria library of
| visual elements, arranged on multidomensional shelves.
| nl wrote:
| I don't think that thinking of it as "compression" is useful,
| and more than an artist recreating the Mona Lisa from memory is
| "decompressing" it. The process that diffusion models use is
| fundamentally different to decompression.
|
| For example, if you prompt Stable Diffusion with "Mona Lisa"
| and look at the iterations, it is clearer what is happening -
| it's not decompressing so much as drawing something it knows
| looks like Mona Lisa and then iterating to make it look clearer
| and clearer.
|
| It clearly "knows" what the Mona Lisa looks like, but what is
| is doing isn't copying it - it's more like recreating a thing
| that looks like it.
|
| (And yes I realize lots of artist on Twitter are complaining
| that it is copying their work. I think "forgery" is a better
| analogy than "stealing" though - it can create art that looks
| like a Picasso or whatever, but it isn't copying it in a
| conventional sense)
| Gigachad wrote:
| Forgery requires some kind of deception/fraud. Painting an
| imitation of the Mona Lisa isn't forgery. Trying to sell it
| as if it is the original is.
| nl wrote:
| Yes I agree with this too.
|
| I think using that language is better than "stealing",
| because the immoral act is the passing off, not training of
| the model.
| ilaksh wrote:
| What if I just want something pretty similar but not necessarily
| the exact image. Maybe there could be a way to find a somewhat
| similar text prompt as a starting point, and then add in some
| compressed information to adjust the prompt output to be just a
| bit closer to the original?
| MarkusWandel wrote:
| The one with the different buildings in the reconstructed image
| is a bit spooky. I've always argued that human memory is highly
| compressed, storing, for older memories anyway, a "vibe" plus
| pointers to relevant experiences/details that can be used to
| flesh it out as needed. Details may be wrong in the
| recollecting/retelling, but the "feel" is right.
|
| And here we have computers doing the same thing! Reconstructing
| an image from a highly compressed memory and filling in
| appropriate, if not necessarily exact details. Human eye looks at
| it casually and yeah, that's it, that's how I remember it. Except
| that not all the details are right.
|
| Which is one of those "Whoa!" moments, like many many years ago,
| when I wrote a "Connect 4" implementation in BASIC on the
| Commodore 64, played it and lost! How did the machine get so
| smart all of a sudden?
| illubots wrote:
| In theory, it would be possible to benefit from the ability of
| Stable Diffusion to increase perceived image quality without even
| using a new compression format. We could just enhance existing
| JPG images in the browser.
|
| There already are client side algorithms that increase the
| quality of JPGs a lot. For some reason, they are not used in
| browsers yet.
|
| A Stable Diffusion based enhancement would probably be much nicer
| in most cases.
|
| There might be an interesting race to do client side image
| enhancements coming to the browsers over the next years.
| codeflo wrote:
| One interesting feature of ML-based image encoders is that it
| might be hard to evaluate them with standard benchmarks, because
| those are likely to be part of the training set, simply by virtue
| of being scraped from the web. How many copies of Lenna has
| Stable Diffusion been trained with? It's on so many websites.
| zxexz wrote:
| We might enter a time when every time a new model/compression
| algo is introduced, a new series of benchmark images may need
| to be introduced/taken and ALL historical benchmarks of major
| compression algos redone on the new images.
| seydor wrote:
| Is there something like this for live video chat?
| FrostKiwi wrote:
| I thought this was another take on this parody post:
| https://news.ycombinator.com/item?id=32671539
|
| But no, it's the real deal. Great job author.
| nl wrote:
| This but for video using the "infilling" version for changing
| parts between frames.
|
| The structural changes per frame matter much less. Send a 5kB
| image every keyframe then bytes per subsequent image with a
| sketch of the changes and where to mask them on the frame.
|
| Modern video codecs are pretty amazing though, so not sure how it
| would compare in frame size
| willbudd wrote:
| I've been thinking about more or less the same idea, but the
| computational edge inference costs probably makes it
| impractical for most of today's client devices. I see a lot of
| potential in this direction in the near future though.
| nl wrote:
| I think it's unclear how much computational resources the
| uncompression steps take.
|
| At the moment it's fairly fast, but RAM hungry. But this
| article makes it clear that quantizing the representation
| works well (at least for the VAE). It's possible quantized
| models could also do decent jobs.
| swayvil wrote:
| This is the algorithmic equivalent of a metaphor.
| bane wrote:
| Goodness, I love this. It's a great description of the
| approach.
| criddell wrote:
| Before I clicked through to the article, I thought maybe they
| were taking an image and spitting out a prompt that would
| produce an image substantially similar to the original.
| sod wrote:
| This may give insights in how brain memory and thinking works.
|
| Imagine if some day a computer could take a snapshot of the
| weights and memory bits of the brain and then reconstruct
| memories and thoughts.
| epmaybe wrote:
| This kind of already fits a little bit with how the brain
| processes images where there is information lacking.
| Neurocognitive specialists can likely correct me on the
| following.
|
| Glaucoma is a disease where one slowly loses peripheral vision,
| until a small central island remains or you go completely
| blind.
|
| So do patients perceive black peripheral vision? Or blurred
| peripheral vision?
|
| Not really...patients actually make up the surrounding
| peripheral vision, sometimes with objects!
| SergeAx wrote:
| Does anybody understand from the article, how much data needed to
| be downloaded first on decompression side? The entire SD weights
| 2GB array, right?
| RosanaAnaDana wrote:
| Something interesting about the San Francisco test image is that
| if you start to look into the details, its clear that some real
| changes have been made to the city. Rather than losing texture or
| grain or clarity, the information lost in this is information
| about the particular layout of a neighborhood of streets, which
| has now been replaced as if some one were drawing the scene from
| memory. A very different kind of loss that with out the original
| might be imperceptible because the information that was lost
| isn't replaced with random or systematic noise, but rather new,
| structured information..
| jhrmnn wrote:
| It's interesting that this is closer to how human memory
| operates--we're quite good in unconsciously fabricating false
| yet strong memories.
| laundermaf wrote:
| True, but I'd like to continue using products that produce
| close-to-real images. Phones nowadays already process images
| at lot. The moment they start replacing pixels it'll all be
| fake.
|
| And... Some manufacturer apparently already did it on their
| ultra zoom phones when taking photos of the moon.
| NavinF wrote:
| Meh. Cameras have been "replacing pixels" for as long as
| I've been alive. Consider that a 4K camera only has 2k*4k
| pixels whereas a 4K screen has 2k*4k*3 subpixels.
|
| 2/3 of the image is just dreamed up by the ISP (image
| signal processor) when it debayers the raw image.
|
| I'm not aware of any consumer hardware that has open source
| ISP firmware or claims to optimize for accuracy over
| beauty.
| montroser wrote:
| Okay, but a camera doing this is unlikely to dream up
| plausible features that didn't actually exist in the
| scene.
| NavinF wrote:
| Of course it is! Try feeding static into a modern ISP. It
| will find patterns that don't exist.
| taberiand wrote:
| I would've thought anyone relying on lossy-compressed images of
| any sort already needs to be aware of the potential effects, or
| otherwise isn't really concerned by the effect on the image
| (and I'd guess that the vast majority of use cases actually
| don't care if parts of the image are essentially "imaginary")
| aaaaaaaaaaab wrote:
| The good old JBIG2 debacle.
|
| "When used in lossy mode, JBIG2 compression can potentially
| alter text in a way that's not discernible as corruption. This
| is in contrast to some other algorithms, which simply degrade
| into a blur, making the compression artifacts obvious.[14]
| Since JBIG2 tries to match up similar-looking symbols, the
| numbers "6" and "8" may get replaced, for example.
|
| In 2013, various substitutions (including replacing "6" with
| "8") were reported to happen on many Xerox Workcentre
| photocopier and printer machines, where numbers printed on
| scanned (but not OCR-ed) documents could have potentially been
| altered. This has been demonstrated on construction blueprints
| and some tables of numbers; the potential impact of such
| substitution errors in documents such as medical prescriptions
| was briefly mentioned."
|
| https://en.m.wikipedia.org/wiki/JBIG2
| tlrobinson wrote:
| One thing that worries me about generative AI is the
| degradation of "truth" over time. AI will be the cheapest way
| to generated content, by far. It will sometimes get facts
| subtly wrong, and eventually that AI generated content will be
| used to train future models. Rinse and repeat.
| jacobr1 wrote:
| The interesting thing is that is some ways this is a return
| to pre-modern era of lossy information transmission between
| the generations. Every story is re-molded by the re-teller.
| Languages change and thus the contextual interpretations.
| Even something a seemingly static as a book gets slowly
| modified as scribes rewrite scrolls over centuries.
| poszlem wrote:
| We are getting closer and closer to a simulacrum and
| hyperreality.
|
| We used to create things that were trying to simulate
| (reproduce) reality, but now we are using those "simulations"
| we'd created as if they were the real thing. With time we
| will be getting farther away from the "truth" (as you put
| it), and yes - I share your worry about that.
|
| https://en.wikipedia.org/wiki/Simulacrum
|
| EDIT: A good example I heard that explains what a simulacrum
| is was this: Ask a random person to draw a photo of a princes
| and see how many will draw a disney princess (which already
| was based on real princesses) vs how many will draw one
| looking like Catherine of Aragon or another real princess.
| intrasight wrote:
| art is truth
| Xcelerate wrote:
| So you've described humans.
| _nalply wrote:
| Currently computers can reliably do maths. Later AI will
| unreliably do maths. Exactly like humans.
| pishpash wrote:
| So it will get stupider... maybe the singularity isn't
| bad like too smart but bad like dealing with too many
| stupid people.
| ballenf wrote:
| Maybe making (certain kinds of) math mistakes is a sign
| of intelligence.
| ciphol wrote:
| The nice thing about math is that often it's much harder
| to find a proof than to verify that proof. So math AI is
| allowed to make lots of dumb mistakes, we just want it to
| make the occasional real finding too.
| MauranKilom wrote:
| Unless we also ask AI to do the proof verification...
| rowanG077 wrote:
| Why would you do that? Proof verification is pretty much
| a solved problem.
| gpderetta wrote:
| Both stupider and less deterministic, but also and
| smarter and more flexible. Like humans.
| tlrobinson wrote:
| Fair point, though I feel there's a difference as AI can
| generate content much more quickly.
| jefftk wrote:
| Similar to how we have low-background (pre-nuclear) steel,
| might we have pre-transformer content?
| Lorin wrote:
| Jpeg bitrot 2.0
| blacksmith_tb wrote:
| Certainly possible, though we also have many hundreds of
| millions of people walking the globe taking pictures of
| things with their phones (not all of which are public to be
| used for training, but still).
| fny wrote:
| I've started seeing more of this crap show up on the front
| page of Google.
| sharemywin wrote:
| Kind of like how chicken taste like everything.
| robbomacrae wrote:
| Yes indeed. I've been looking for an auto summarizer that
| reliably doesn't change the content. So far everything I've
| tried will make up or edit a key fact once in a while.
| z3c0 wrote:
| Anywhere that truth matters will be unaffected. If such
| deviations from truth can withhold, then the truth never
| mattered. False assumptions will never hold where they can't,
| because reality is quite pervasive. Ask anyone who's had to
| productionize an ML model in a setting that requires a foot
| in reality. Even a single-digit drop in accuracy can have
| resounding effects.
| thaumasiotes wrote:
| There was a scandal when it was discovered that Xerox machines
| were doing this; in that case, the example showed "photocopies"
| replacing numbers in documents with other numbers.
| smitec wrote:
| There is a talk about that issue [1].
|
| During my PhD this issue came up amongst those in the group
| looking into compressed sensing in MRI. Many reconstruction
| methods (AI being a modern variant) work well because a best
| guess is visually plausible. These kinds of methods fall
| apart when visually plausible and "true" are different in a
| meaningful way. The simplest examples here being the numbers
| in scanned documents, or in the MRI case, areas of the brain
| where "normal brain tissue" was on average more plausible
| than "tumor".
|
| [1]: http://www.dkriesel.com/en/blog/2013/0802_xerox-
| workcentres_...
| nl wrote:
| It's worth noting that these problems are things to be
| aware of, not the complete showstoppers some people seem to
| think that they are.
| thaumasiotes wrote:
| I'm having a hard time seeing where the random
| substitution of all numbers isn't supposed to be a
| complete showstopper.
| nl wrote:
| Well for example you train the VAE to reduce the
| compression on characters.
| kgwgk wrote:
| The right amount of compression in a photocopy machine is
| zero.
|
| Compression that gives you a blurred image is a trade-
| off.
|
| But what does it mean to "be aware of" compression that
| may give you a crisp image of some made up document?
| nl wrote:
| > The right amount of compression in a photocopy machine
| is zero.
|
| This isn't an obvious statement to me. If you've had the
| misfortune of scanning documents to PDF and getting the
| 100MB per page files automatically emailed to you then
| you might see the benefit in all that white space being
| compressed somehow.
|
| > But what does it mean to "be aware of" compression that
| may give you a crisp image of some made up document?
|
| This isn't something I said. A good compression system
| for documents will not change characters in any
| circumstances.
| rjmunro wrote:
| If you are making an image of a cityscape to illustrate
| an article it probably doesn't matter what the city looks
| like. But if the article is about the architecture of the
| specific city, it probably does, so you need to 'be
| aware' that the image you are showing people isn't
| correct, and reduce the compression.
| kgwgk wrote:
| This subthread was about changing numbers in scanned
| documents and vanishing tumors in medical images.
| rowanG077 wrote:
| An medical sensor filling in "plausible" information is
| not a show stopper? I hope you are never in control of
| making decisions like that.
| nl wrote:
| To be aware of when you are building compression systems.
|
| It's perfectly possible to build neural network based
| compression systems that do not output false information.
| lm28469 wrote:
| > not the complete showstoppers some people seem to think
| that they are.
|
| idk if I had to second guess every single result coming
| out of a machine it would be a showstopper for me. This
| isn't pokemon go, tumor detection is serious matter
| pishpash wrote:
| Why would you want to lossily compress any medical image
| is beyond me. You get equipment to make precise high-
| resolution measurements, it goes without saying that you
| do not want noise added to that.
| kybernetikos wrote:
| Yeah, if it were actually adopted as a way to do compression,
| it seems likely to lead to even worse problems than JBIG2 did
| https://news.ycombinator.com/item?id=6156238
|
| Invisibly changing the content rather than the image quality
| seems like a really concerning failure mode for image
| compression!
|
| I wonder if it'd be possible to use SD as part of a lossless
| system - use SD as something that tells us the liklihood of
| various pixel values given the rest of the image and combine
| that liklihood with a huffman encoding. Either way, fantastic
| hack, but we really should avoid using anything lossy built on
| AI for image compression.
| pishpash wrote:
| Give it "enough" bits and it won't be a problem. How many is
| enough is the question.
| eloisius wrote:
| Imagine a world where bandwidth constraints meant
| transmitting a hidden compressed representation that gets
| expanded locally by smart TVs that have pretrained weights
| baked into the OS. Everyone sees a slightly different
| reconstitution of the same input video. Firmware updates that
| push new weights to your TV result in stochastic changes to a
| movie you've watched before.
| jacobr1 wrote:
| You could still use some kind of adaptive huffman coding.
| Current compression schemes have some kind of dictionary
| embedded in the file to map between the common strings and
| the compressed representation. Google tried proposing SDCH
| a few years using a common dictionary for wep pages. There
| isn't any reason why we can't be a bit more deterministic
| and share a much larger latent representation of "human
| visual comprehension" or whatever to do the same. It
| doesn't need to be stochastic once generated.
| kybernetikos wrote:
| "The weather forecast was correct as broadcast, sir, it's
| just your smart TV thought it was more likely that the
| weather in your region would be warm on that day, so it
| adjusted the symbol and temperature accordingly"
| ZiiS wrote:
| It opens up an interesting question that is it suggesting
| "improvements" that could be done in the real world.
| RosanaAnaDana wrote:
| Are you suggesting a lossy but 'correct' version?
|
| IE, the algorithm ignores and loses the 'irrelevant'
| information, but holds the important stuff?
| phkahler wrote:
| This needs to be compared with automated tests. A lack of
| visual artifacts doesnt mean an accurate representation of the
| image in this case.
| freediver wrote:
| Arguably this is still fine with the definition of lossy
| compression. The compressed image still roughly shows the idea
| of the original image.
| perryizgr8 wrote:
| I believe ML techniques are the future of video/image
| compression. When you read a well written novel, you can kind of
| construct images of characters, locations and scenes in your
| mind. You can even draw these scenes, and if you're a good
| artist, those won't have any artifacts.
|
| I don't expect future codecs to be able to reduce a movie to a
| simple text stream, but maybe it could do something in the same
| vein. Store abstract descriptions instead of bitmaps. If the
| encoding and decoding are good enough, your phone could
| reconstruct an image that closely resembles what the camera
| recorded. If your phone has to store a 50Gb model for that, it
| doesn't seem too bad, especially if the movie file could be
| measured in tens of megabytes.
|
| Or it could go in another direction, where file sizes remain in
| the gigabytes, but quality jumps to extremely crisp 8k that you
| can zoom into or move the camera around if you want.
|
| Can't wait for this stuff!
| UniverseHacker wrote:
| From the title, I expected this to be basically pairing stable
| diffusion with an image captioning algorithm by 'compressing' the
| image to a simple human readable description, and then
| regenerating a comparable image from the text. I imagine that
| would work and be possible, essentially an autoencoder with a
| 'latent space' of single short human readable sentences.
|
| The way this actually works is pretty impressive. I wonder if it
| could be made lossless or less lossy in a similar manner to FLAC
| and/or video compression algorithms... basically first do the
| compression, and then add on a correction that converts the
| result partially or completely into the true image. Essentially,
| e.g. encoding real images of the most egregiously modified
| regions of the photo and putting them back over the result.
| Waterluvian wrote:
| I wonder if this technique could be called something like
| "abstraction" rather than "compression" given it will actually
| change information rather than its quality.
|
| Ie. "There's a neighbourhood here" is more of an abstraction than
| "here's this exact neighbourhood with the correct layout just
| fuzzy or noisy."
| seydor wrote:
| like a MIDI file
| Sohcahtoa82 wrote:
| Well, a MIDI file says nothing about the sound a Trumpet
| makes, whereas this SD-based abstraction does give a general
| idea of what your neighborhood should look like.
|
| Maybe it's more like a MOD file?
| rowanG077 wrote:
| I would say any compression is abstraction in a certain sense.
| A simple example is a gradient. A lossy compression might
| abstract over the precise pixel value and simply records a
| gradiant that almost matches the raw input. You could even make
| the argument that lossless compressions is abstraction. A 2D
| grid with 5px lines and 50px spacing between them could
| feasibly be captured really well using a classical compression
| scheme.
|
| What AI offers is just a more powerful and opaque way of doing
| the same thing.
| ipunchghosts wrote:
| What does johanne balle have to say about this?
___________________________________________________________________
(page generated 2022-09-20 23:00 UTC)