[HN Gopher] The dangers behind image resizing (2021)
___________________________________________________________________
The dangers behind image resizing (2021)
Author : qwertyforce
Score : 264 points
Date : 2023-02-16 10:11 UTC (12 hours ago)
(HTM) web link (zuru.tech)
(TXT) w3m dump (zuru.tech)
| singularity2001 wrote:
| funny that they use tf and pytorch in this context without even
| mentioning their fantastic upsampling capabilities
| thrdbndndn wrote:
| I'm shocked. I don't even know this is a thing.
|
| By that I mean, I know what bilinear/bicubic/lanczos resizing
| algorithms are, and I know they should at least have acceptable
| results (compared to NN).
|
| But I don't know famous libraries (especially OpenCV which is a
| computer vision library!) could have such poor results.
|
| Also a side note, IIRC bilinear and bicubic have constants in the
| equation. So technically when you're comparing different
| implementations you need to make sure this input (parameters) is
| the same. But this shouldn't excuse the extreme poor results in
| some.
| NohatCoder wrote:
| At least bilinear and bicubic have a widely agreed upon
| specific definition. The poor results are the result of that
| definition. They work reasonably for upscaling, but downscaling
| more than a trivial amount causes them to weigh a few input
| pixels highly and outright ignore most of the rest.
| leni536 wrote:
| > bicubic have a widely agreed upon specific definition
|
| Not so fast: https://entropymine.com/imageworsener/bicubic/
| pacaro wrote:
| I've seen more than one team find that reimplementing an OpenCV
| capability that they use gain them both in quality and
| performance.
|
| This isn't necessarily a criticism of OpenCV, often the OpenCV
| implementation is, of necessity, quite general, and a specific
| use-case can engage optimizations not available in the general
| case
| version_five wrote:
| I'd argue that if your ML model is sensitive to the anti-aliasing
| filter used in image resizing, you've got bigger problems than
| that. Unless it's actually making a visible change that spoils
| whatever it is the model supposed to be looking for. To use the
| standard cat / dog example, filter choice or resampling choice is
| not going to change what you've got a picture of, and if your
| model is classifying based in features that change with
| resampling, it's not trustworthy.
|
| If one is concerned about this, one could intentionally vary the
| resampling or deliberately add different blurring filters during
| training to make the model robust to these variations
| derefr wrote:
| You say that "if your model is classifying based in features
| that change with resampling, it's not trustworthy."
|
| I say that choice of resampling algorithm is what determines
| whether a model can learn the rule "zebras can be recognized by
| their uniform-width stripes" or not; as a bad resample will
| result in non-uniform-width stripes (or, at sufficiently small
| scales, loss of stripes!)
| version_five wrote:
| Unless it's actually making a visible change that spoils
| whatever it is the model supposed to be looking for
| derefr wrote:
| A zebra having stripes that alternate between 5 black
| pixels, and 4 black pixels + 1 dark-grey pixel, isn't
| actually a visible change to the human eye. But it's
| visible to the model.
| alanbernstein wrote:
| I'm not saying your general argument is wrong, but...
| zebra stripes are not made out of pixels. A model that
| requires a photograph of a zebra to align with the
| camera's sensor grid also has bigger problems.
| hprotagonist wrote:
| > I'd argue that if your ML model is sensitive to the anti-
| aliasing filter used in image resizing, you've got bigger
| problems than that.
|
| I've seen it cause trouble in every model architecture i've
| tried.
| version_five wrote:
| What kinds of model architectures? I'm curious to play with
| it myself
| hprotagonist wrote:
| most object detection models will show variability in
| bounding box confidences and coordinates.
|
| it's not a huge instability, but you can absolutely see
| performance changes.
| [deleted]
| JackFr wrote:
| Hmmm. With respect to feeding an ML system, are visual glitches
| and artifacts important? Wouldn't the most important thing to use
| a transformation which preserves as much information as possible
| and captures relevant structure? If the intermediate picture
| doesn't look great, who cares if the result is good.
|
| Ooops. Just thought about generative systems. Nevermind.
| brucethemoose2 wrote:
| Just speaking from experience, GAN upscalers pick up artifacts
| in the training dataset like a bloodhound.
|
| You can use this to your advantage by purposely introducing
| them into the lowres inputs so they will be removed.
| AtNightWeCode wrote:
| Thought this article was going to be about DDOS...
| mythz wrote:
| What are some good image upscaler libraries that exist? I'm
| assuming the high quality ones would need to use some AI model to
| fill in missing detail.
| brucethemoose2 wrote:
| Depends on your needs!
|
| Zimg is a gold standard to me, but yeah, you can get better
| output depending on the nature of your content and hardware. I
| think ESRGAN is state-of-the-art above 2x scales, with the
| right community model from upscale.wiki, but it is slow and
| artifacty. And pixel art, for instance, may look better
| upscaled with xBRZ.
| soderfoo wrote:
| Waifu2x - I've used the library to upscale both old photos and
| videos with enough success to be pleased with the results.
|
| https://github.com/nagadomi/waifu2x
| thr0wnawaytod4y wrote:
| Came here for a new ImageTragick but got actual resizing problems
| dark-star wrote:
| downscaling images introduces artifacts and throws away
| information! news at 5!
| intrasight wrote:
| I was sort of expecting them to describe this danger to resizing:
| one can feed a piece of an image into one of these new massive ML
| models and get back the full image - with things that you didn't
| want to share. Like cropping out my ex.
|
| IS ML sort of like a universal hologram in that respect?
| fIREpOK wrote:
| I favored cropping even back in 2021
| godshatter wrote:
| If their worry is the differences between algorithms in libraries
| in different execution environments, shouldn't they either find a
| library they like that can be called from all such environments
| or if they can't find one or there is no single library that can
| be used in all environments then shouldn't they just write their
| own using their favorite algorithm? Why make all libraries do
| this the same way? Which one is undeniably correct?
| TechBro8615 wrote:
| That's basically what they did, which they mention in the last
| paragraph of the article. They released a wrapper library [0]
| for Pillow so that it can be called from C++:
|
| > Since we noticed that the most correct behavior is given by
| the Pillow resize and we are interested in deploying our
| applications in C++, it could be useful to use it in C++. The
| Pillow image processing algorithms are almost all written in C,
| but they cannot be directly used because they are designed to
| be part of the Python wrapper. We, therefore, released a
| porting of the resize method in a new standalone library that
| works on cv::Mat so it would be compatible with all OpenCV
| algorithms. You can find the library here: pillow-resize.
|
| [0] https://github.com/zurutech/pillow-resize
| est wrote:
| Is there any hacks/study to maximize the downsampling errors?
|
| E.g. looks totally different on original vs 224x224 pictures
| version_five wrote:
| There is a "resizing attack" that's been published that does
| what you're suggesting
|
| https://embracethered.com/blog/posts/2020/husky-ai-image-res...
| account42 wrote:
| > The definition of scaling function is mathematical and should
| never be a function of the library being used.
|
| Horseshit. Image resizing or any other kind of resampling is
| essentially always about filling in missing information. The is
| no mathematical model that will tell you for certain what the
| missing information is.
| [deleted]
| planede wrote:
| Arguably downscaling does not fill in missing information, it
| only throws away information. Still, implementations vary a lot
| here. There might not be a consensus of a unique correct way to
| do downscaling, but there are certain things that you certainly
| don't want to do. Like doing naive linear arithmetic on sRGB
| color values.
| HPsquared wrote:
| Interpolation is still filling in missing information, it's
| just possible to get a pretty good estimate.
| willis936 wrote:
| This is wrong. Interpolation below Nyquist (downsampling)
| results in a subset of the original Information (capital I
| information theory information).
| astrange wrote:
| Images aren't bandlimited so the conditions don't apply
| for that.
|
| That's why a vector image rendered at 128x128 can look
| better/sharper than one rendered at 256x256 and scaled
| down.
| willis936 wrote:
| They are band-limited. That's why you get aliasing when
| taking unfiltered photos above Nyquist without AA
| filters.
|
| In your example the lower res image would be using most
| of its bandwidth while the higher res image would be
| using almost none of its bandwidth.
|
| Images are 2D discrete signals. Everything you know about
| 1D DSP applies to them.
| astrange wrote:
| If some of the edges are infinitely sharp, and you know
| which ones they are by looking at them, as in my example,
| then it's using more than all its bandwidth at any
| resolution.
| orlp wrote:
| One interesting complication for a lot of photos is that
| the bandwidth of the green channel is twice as high as
| the red and blue channels due to the Bayer filter mosaic.
| Gordonjcp wrote:
| Aha, no! Downscaling *into a discrete space by an arbitrary
| amount* is absolutely filling in missing information.
|
| Take the naive case where you downscale a line of four pixels
| to two pixels - you can simply discard two of them so you go
| from `0,1,2,3` to `0,2`. It looks okay.
|
| But what happens if you want to scale four pixels to three?
| You could simply throw one away but then things will look
| wobbly and lumpy. So you need to take your four pixels, and
| fill in a missing value that lands slap bang between 1 and 2.
| Worse, you actually need to treat 0 and 3 as missing values
| too because they will be somewhat affected by spreading them
| into the middle pixel.
|
| So yes, downscaling does have to compute missing values even
| in your naive linear interpolation!
| meindnoch wrote:
| >Take the naive case where you downscale a line of four
| pixels to two pixels - you can simply discard two of them
| so you go from `0,1,2,3` to `0,2`. It looks okay.
|
| This is already wrong, unless the pixels are band-limited
| to Nyquist/4. Trivial example where this is not true:
| 1 0 1 0
|
| If such a signal is decimated by 2 you get
| 1 1
|
| Which is not correct.
| im3w1l wrote:
| For downscaling, area averaging is simple and makes a lot
| of intuitive sense and gives good results. To me it's
| basically the definition of downscaling.
|
| Like yeah, you can try to get clever and preserve the
| artistic intent or something with something like
| seamcarving but then I wouldn't call it downscaling
| anymore.
| planede wrote:
| I suggest to read up on this:
|
| https://entropymine.com/imageworsener/pixelmixing/
| im3w1l wrote:
| Hmm, maybe I was wrong then!
| [deleted]
| actionfromafar wrote:
| The article talks about downsampling, not upsampling, just so
| we are clear about that.
|
| And besides, a ranty blog post pointing out pitfall can still
| be useful for someone else coming from the same naive (in a
| good/neutral way) place as the author.
| mytailorisrich wrote:
| Not at all. He is correct that those functions are defined
| mathematically and that the results _should_ therefore be the
| same using any libraries which claim to implement them.
|
| An example used in the article:
| https://en.wikipedia.org/wiki/Lanczos_resampling
| jcynix wrote:
| Now that's an interesting topic for photographers who like to
| experiment with anamorphic lenses for panoramas.
|
| An anamorphic lens (optically) "squeezes" the image onto the
| sensor, and afterwards the digital image has to be "desqueezed"
| (i.e. upscaled in one axis) to give you the "final" image. Which
| in turn is downscaled to be viewed on either a monitor or a
| printout.
|
| But the resulting images I've seen until now nevertheless look
| good. I think that's because in natural images you have not that
| many pixel-level details. And we mostly see downscaled images on
| the web or in youtube videos most of the time ...
| biscuits1 wrote:
| This article throws a red flag on proving negative(s). This is
| impossible with maths. The void is filled by human subjectivity.
| In a graphical sense, "visual taste."
| IYasha wrote:
| So, what are the dangers? (what's the point of the article?) That
| you'll get different model with same originals processed by
| different algorithms?
|
| The comparison of resizing algorithms is not something new,
| importance of adequate input data is obvious, difference in image
| processing algorithms availability is also understandable.
| Clickbaity.
| azubinski wrote:
| A friend of mine decided to take up image resizing on the third
| lane of a six-lane highway.
|
| And he was hit by a truck.
|
| So it's true about the danger of image resizing.
| TechBro8615 wrote:
| If you read to the end, they link to a library they made for
| solving the problem by wrapping Pillow C functions to be
| callable in C++
| erulabs wrote:
| Image resizing is one of those things that most companies seem to
| build in-house over and over. There are several hosted services,
| but obviously sending your users photos to a 3rd party is pretty
| weak. For those of us looking for a middle-ground: I've had great
| success with imgproxy (https://github.com/imgproxy/imgproxy)
| which wraps libvips and well is maintained.
| planede wrote:
| Problems with image resizing is a much deeper rabbit hole than
| this. Some important talking points:
|
| 1. The form of interpolation (this article).
|
| 2. The colorspace used for doing the arithmetic for
| interpolation. You most likely want a linear colorspace here.
|
| 3. Clipping. Resizing is typically done in two phases, once
| resizing in x then in y direction, not necessarily in this order.
| If the kernel used has values outside of the range [0, 1] (like
| Lanczos) and for intermediate results you only capture the range
| [0,1], then you might get clipping in the intermediate image,
| which can cause artifacts.
|
| 4. Quantization and dithering.
|
| 5. If you have an alpha channel, using pre-multiplied alpha for
| interpolation arithmetic.
|
| I'm not trying to be exhaustive here. ImageWorsener's page has a
| nice reading list[1].
|
| [1] https://entropymine.com/imageworsener/
| PaulHoule wrote:
| I've definitely learned a lot about these problems from the
| viewpoint of art and graphic design. When using Pillow I
| convert to linear light with high dynamic range and work in
| that space.
|
| One pet peeve of mine is algorithms for making thumbnails, most
| of the algorithms from the image processing book don't really
| apply as they are usually trying to interpolate between points
| based on a small neighborhood whereas if you are downscaling by
| a large factor (say 10) the obvious thing to do is sample the
| pixels in the input image that intersect with the pixel in the
| output image (100 in that case.)
|
| That box averaging is a pretty expensive convolution so most
| libraries usually downscale images by powers of 2 and then
| interpolate from the closest such image which I think is not
| quite perfect and I think you could do better.
| starkd wrote:
| Isn't this generally addressed by applying a gaussian blur
| before downsizing? I know this introduces an extra processing
| step, but I always figured this was necessary.
| PaulHoule wrote:
| That's an even more expensive convolution since you're
| going to average 100 or so points for each of those 100
| points!
|
| Practically people think that box averaging is too
| expensive (pretty much it is like that Gaussian blur but
| computed on fewer output points.)
| puterich123 wrote:
| I played a little with FFT Gaussian blur. It uses the
| frequency domain, and so does not have to average
| hundreds of points, but rather transforms the image and
| the blur kernel into the frequency domain. There it
| performs a pointwise multiplication and transforms the
| image back. It's way faster than the direct convolution.
| eutectic wrote:
| Box filtering should be pretty cheap; it is separable,
| and can be implemented with a moving average. Overall
| just a handful of operations per pixel.
| deadbeeves wrote:
| Having to process 100 source pixels per destination pixel
| to shrink 10x seems like an inefficient implementation.
| If you downsample each dimension individually you only
| need to process 20 pixels per pixel. This is the same
| optimization used for Gaussian blur.
| nyanpasu64 wrote:
| If you downscale by a factor of 2 using bandlimited
| resampling every time, followed by a single final shrink,
| you'll theoretically get identical results to a single
| bandlimited shrinking operation. Of course real world image
| resampling kernels (Lanczos, cubic, magic kernel) are very
| much truncated compared to the actual sinc kernel (to avoid
| massive ringing which looks unacceptable in images), so the
| results won't be mathematically perfect. And linear/area-
| based resampling is even less mathematically optimal,
| although they don't cause overshoot.
| bombcar wrote:
| Captain D on premulitplication and the alpha channel (with
| regards to video): https://www.youtube.com/watch?v=XobSAXZaKJ8
| SuchAnonMuchWow wrote:
| A connected rabbit hole is image decoding of lossy format such
| as jpeg: from my experience depending on the library used
| (opencv vs tensorflow vs pillow) you get rgb values that varies
| between 1-2% of each others with default decoders.
| BlueTemplar wrote:
| And also (for humans at least) the rabbit hole coming from
| effectively displaying the resulting image : various forms of
| subpixel rendering for screens, various forms of printing...
| which are likely to have a big influence on what is "acceptable
| quality" or not.
| ChrisMarshallNY wrote:
| _> 3. Clipping. Resizing is typically done in two phases, once
| resizing in x then in y direction, not necessarily in this
| order. If the kernel used has values outside of the range [0,
| 1] (like Lanczos) and for intermediate results you only capture
| the range [0,1], then you might get clipping in the
| intermediate image, which can cause artifacts._
|
| Also, gamut clipping and interpolation[0]. That's a real
| rabbithole.
|
| [0] https://www.cis.rit.edu/people/faculty/montag/PDFs/057.PDF
| _(Downloads a PDF)_
| guruparan18 wrote:
| Another thing I had experienced before was a document picture I
| used after downsizing to mandatory upload size had a
| character/number randomly changed (6 to b or d). Don't remember
| which exactly and had to convert the doc to PDF that managed it
| better.
| abainbridge wrote:
| I'd also add speed to that list. Resizing is an expensive
| operation. Correctness is often traded off for speed. I've
| written code that deliberately ignored the conversation to a
| linear color space and back in order to gain speed.
| phkahler wrote:
| Yeah I was shocked at how naive this quote is:
|
| >> The definition of scaling function is mathematical and
| should never be a function of the library being used.
|
| I could just as easily say "hey, why is you NN affected by
| image artifacts, isn't it supposed to be robust?"
| peepee1982 wrote:
| Wouldn't the clipping be solved by using floating point numbers
| during the filtering process?
| planede wrote:
| It would. It would also not accumulate quantization errors
| from an intermediate result. Having said that there are
| precedents for having the intermediate image pixels in
| integral values.
|
| Here is imageworsener's article about this[1]
|
| [1] https://entropymine.com/imageworsener/clamp-int/
| peepee1982 wrote:
| I love sites like these. Had never heard of Image Worsener
| before. Thanks!
| contravariant wrote:
| If you're doing interpolation you probably don't want a linear
| colourspace. At least not linear in the way that light works.
| Interpolation minimizes deviations in the colourspace you're
| in, so you want it to be somewhat perceptual to get it right.
|
| Of course if you're not interpolating but _downscaling_ the
| image (which isn 't really an interpolation, the value at a
| particular position in the image does not remain the same) then
| you do want a linear colourspace to avoid brightening /
| darkening details, but you need a perceptual colourspace to
| minimize ringing etc. It's an interesting puzzle.
| actionfromafar wrote:
| Wow, points 2, 3 and 5 wouldn't have occured to me even if I
| tried. Thanks. I now have a mental note to look stuff up if my
| resizing ever gives results I'm not happy with. :)
| planede wrote:
| Point 2 is the most important one, and the most egregious
| error. Even most browsers implement it wrong (at least the
| last time I checked, I confirmed it again with Edge).
|
| Here is the most popular article about this problem [1].
|
| Warning: once you start noticing incorrect color blending
| done in sRGB space, then you will see it everywhere.
|
| [1] http://www.ericbrasseur.org/gamma.html
| zokier wrote:
| I imagine that beyond just using linearized srgb using
| perceptually uniform colorspace such as oklab would bring
| further improvement. Although I suppose the effect might be
| somewhat subtle in most real-world images.
| planede wrote:
| For downscaling, I doubt that. If you literally squint
| your eyes or unfocus your eyes, then colors you see will
| be mixed in a linear colorspace. It makes sense for
| downscaling to follow that.
|
| Upscaling is much more difficult.
| bsenftner wrote:
| When image generating AIs first appeared, the color space
| interpolations were terribly wrong. One could see hue
| rainbows practically anywhere blending occurred.
| thrdbndndn wrote:
| Another classic:
| https://www.youtube.com/watch?v=LKnqECcg6Gw
| londons_explore wrote:
| Browsers now 'deliberately' do it wrong, because web
| developers have come to rely on the fact that a 50/50 blend
| of #000000 and #FFFFFF is #808080
| unconed wrote:
| Linear RGB blending also requires >8 bit per channel for
| the result to avoid noticeable banding.
|
| It is unquestionably superior though.
| planede wrote:
| I'm a little bit sympathetic for doing it wrong on
| gradients (having said that SVG spec has an opt-in to do
| the interpolation in linear colorspace, and browsers
| don't implement it). But not for images.
| account42 wrote:
| Browsers (and other tools) can't even agree on the color
| space for some images, e.g. "Portable" Network Graphics.
| [deleted]
| brucethemoose2 wrote:
| For those going down this rabbit hole, perceptual downscaling is
| state of the art, and the closest thing we have to a Python
| implementation is here (with a citation of the original paper):
| https://github.com/WolframRhodium/muvsfunc/blob/master/muvsf...
|
| Other supposedly better CUDA/ML filters give me strange results.
| anotheryou wrote:
| Hm, any examples of that?
|
| I found https://dl.acm.org/doi/10.1145/2766891 but I don't like
| the comparisons. Any designer will tell you, after down-scaling
| you do a minimal sharpening pass. The "perceptual downscaling"
| looks slightly over-sharpened to me.
|
| I'd love to compare something I sharpened in photoshop with
| these results.
| brucethemoose2 wrote:
| That implementation is pretty easy to run! The whole Python
| block (along with some imports) is something like:
|
| clip = core.imwri.Read(img)
|
| clip = muf.ssim_downscale(clip, x, y)
|
| clip = core.imwri.Write(clip, imgoutput)
|
| clip.set_output()
|
| > Any designer will tell you, after down-scaling you do a
| minimal sharpening pass
|
| This is probably wisdom from bicubic scaling, but you usually
| dont need further sharpening if you use a "sharp" filter like
| Mitchell.
|
| Anyway I havent run butteraugli or ssim metrics vs other
| scalers, I just subjectively observed that ssim_downscale was
| preserving some edges in video frames that Spline36,
| Mitchell, and Bicubic were not preserving.
| thrdbndndn wrote:
| There are so many gems in VapourSynth scene.
|
| I really wish there are some better general-purpose imaging
| libraries that steadily implement/copy these useful filters, so
| that more people can use them out of the box.
|
| Most of languages I've involved are surprisingly lacking in
| this regard despite their huge potential use cases.
|
| Like, in case of Python, Pillow is fine but it has nothing
| fancy. You can't even fine-tune parameters of bicubic, let
| alone billions of new algorithms from video communities.
|
| OpenCV or ML tools like to re-invent the wheels themselves, but
| often only the most basic ones (and badly as noted in this
| article).
| brucethemoose2 wrote:
| VapourSynth is great for ML stuff actually, as it can
| ingest/output numpy arrays or PNGs, and work with native
| FP32.
|
| A big sticking point is variable resolution, which it
| technically supports but doesn't really like without some
| workarounds.
|
| But yeah I agree, its kinda tragic that the ML community is
| stuck with the simpler stuff.
| WithinReason wrote:
| torch.nn.functional.interpolate has an "antialias" switch that's
| off by default
| qwertyforce wrote:
| It seems it was introduced after 1.9.0
|
| https://pytorch.org/docs/1.9.0/generated/torch.nn.functional...
| WithinReason wrote:
| You're right, looks like it was added with 1.11 on March 10,
| 2022. Seems like an important feature to miss so long!
| ricardobeat wrote:
| Was hoping to see libvips in the comparison, which is widely
| used.
|
| I wonder why it's not adopted by any of these frameworks?
| hgomersall wrote:
| The bigger problem is that the pixel domain is not a very good
| domain to be operating in. How many hours and of training and
| thousands of images are used to essentially learn about Gabor
| filters.
| cynicalsecurity wrote:
| Finally someone said it.
___________________________________________________________________
(page generated 2023-02-16 23:01 UTC)