[HN Gopher] DifuzCam: Replacing Camera Lens with a Mask and a Di...
___________________________________________________________________
DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model
Author : dataminer
Score : 108 points
Date : 2024-08-17 16:50 UTC (1 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| valine wrote:
| I get the feeling that lens free cameras are the future.
| Obviously the results here are no where near good enough, but
| given the rapid improvement of diffusion models lately the
| trajectory seems clear.
|
| Would love to lose the camera bump on the back of my phone.
| xandrius wrote:
| So you just get something based off GPS, time of day and
| rotation?
|
| Or no photos anymore?
| valine wrote:
| Not sure I understand the question. This paper is about using
| diffusion models to reconstruct usable images from raw sensor
| data. The diffusion model in essence replaces the lens.
| exe34 wrote:
| there would be a bunch of holes and a ccd at the back, just
| no growths of bulbous lens.
| smallerize wrote:
| https://www.standard.co.uk/news/tech/ai-camera-images-paragr...
| valine wrote:
| Not sure what this has to do with anything. The paper I was
| commenting on is using diffusion models to parse raw light
| hitting the sensor as an alternative to a glass lens. No one
| wants an image generation model hooked up to weather data--
| that's kind of ridiculous.
| a1o wrote:
| Oh god, we are going to make lens a premium feature now aren't
| we?
| chpatrick wrote:
| It would be pretty great if cheap phones can get good cameras
| with this technology.
| 082349872349872 wrote:
| If/when cheap enough, even non-phone devices (POS terminals,
| vending machines, etc) will have cameras; will living in a
| camera-free environment become the premium feature?
| chpatrick wrote:
| Cameras are already cheap enough to put in everything.
| fpoling wrote:
| There is no camera. It is just a diffusion model trained on a
| big set that tries to reconstruct the picture. Essentially
| this is not much different from what Samsung did with their
| AI-enhanced camera that detected a moon and replaced that
| with high-resolution picture.
| holoduke wrote:
| And yet, we will probably get all camera software on our
| phones with unlimited zoom and details. Turning grain in
| crispy clear pictures. Inpainting, outpainting etc. In 5
| years from now everybody uses it. Everything becomes fake.
| chpatrick wrote:
| The pictures in the paper are pretty damn close, and this
| is just a prototype. Plus, as you said, phones already have
| AI filters.
| wl wrote:
| The text on the Thor Labs "Lab Snacks" box is giant and
| still unreadable, the model interpolating total junk. It
| seems like there's nowhere near enough signal there.
| teruakohatu wrote:
| This is quite amazing that using a diffuser rather than a lens,
| then using a diffusion model can reconstruct an image so well.
|
| The downside of this is that is heavily relies on the model to
| construct the image. Much like those colorisation models applied
| to old monochrome photos, the results will probably always look a
| little off based on the training data. I could imagine taking a
| photo of some weird art installation and the camera getting
| confused.
|
| You can see examples of this when the model invented fabric
| texture on the fabric examples and converted solar panels to
| walls.
| nine_k wrote:
| The model basically guesses and reinvents what these diffuse
| pixels might be. It's more like a painter producing a picture
| from memory.
|
| It inevitably means that the "camera" visually parses the scene
| and then synthesizes its picture. The intermediate step is a
| great moment to semantically edit the scene. Recolor and
| retexture things. Remove some elements of the scene, or even
| add some. Use different rendering styles.
|
| Imagine pointing such a "camera" at person standing next to a
| wall, and getting a picture of the person with their skin
| lacking any wrinkles, clothes looking more lustrous as if it
| were silk, not cotton, and the graffiti removed from the wall
| behind.
|
| Or making a "photo" that turns a group of kids into a group of
| cartoon superheroes, while retaining their recognizable faces
| and postures.
|
| (ICBM course, photo evidence made with digital cameras should
| long have been inadmissible in courts, but this would hasten
| the transition.)
| neom wrote:
| Kinda reminds me of this a bit:
| https://arstechnica.com/gadgets/2020/11/nvidia-used-
| neural-n...
| wizzwizz4 wrote:
| > _photo evidence made with digital cameras should long have
| been inadmissible in courts_
|
| Sworn testimony is admissible in courts. I think the "you can
| just make evidence up" threshold was passed a few thousand
| years ago. The courts still, mostly, work.
| xg15 wrote:
| Yes. One solution to the problem if false testimony was
| photo evidence...
|
| But I'm less worried about the courts and more about media
| that might publish photos without realizing they are AI
| generated - or ordinary people using those cameras without
| understanding how they work and then not realizing there
| may be some details in the pictures that are plain fantasy.
| wizzwizz4 wrote:
| Most people I know would understand "the camera comes
| with a built-in filter" to mean "what the camera
| photographs isn't what you'd see if you looked". The
| media publishing misleading (or misleadingly-captioned)
| photos is a problem as old as print photography.
| wilted-iris wrote:
| Turned a guy right into a tree. This would have fascinating
| implications if deployed broadly.
| mjburgess wrote:
| Does a camera without a lens make any physics sense? I cannot see
| how the scene geometry could be recoverable. Rays of light
| travelling from the scene arrive in all directions.
|
| Intuitively, imagine moving your eye at every point along some
| square inch. Each position of the eye is a different image. Now
| all those images _overlap_ on the sensor.
|
| If you look at the images in the paper, everything except their
| most macro geometry and colour pallet is clearly generated --
| since it changes depending on the prompt.
|
| So at a guess, the lensless sensor gets this massive overlap of
| all possible photos at that location and so is able, at least, to
| capture minimal macro geometry and colour. This isn't going to be
| a useful amount of information for almost any application.
| orbital-decay wrote:
| Yes, compressed sensing cameras do exactly that. They
| reconstruct a photometrically correct image without the need
| for focusing optics or pixel arrays. They have limitations (not
| fundamental ones though) but they're useful for special use
| cases like X-ray or LWIR single-pixel imaging where focusing
| optics and pixel arrays are impossible or expensive. It was
| first used on X-ray telescopes in 1970's in the form of coded
| aperture, before the grazing incidence mirrors.
| nialv7 wrote:
| It's lensless, but it's not just a naked sensor. It still has
| an optical element - read the paper.
| ziofill wrote:
| +1 for the Thor labs candy box
| karmakaze wrote:
| This is not a 'camera' per se. It's more like a human vision
| system that samples light and hallucinates an appropriate image
| based on context. The image is constructed from the data more
| than it is reconstructed. And like human vision, it can be
| correct more often than not to be useful.
| cactusplant7374 wrote:
| Thanks for the summary. I was looking for this.
| Snoozus wrote:
| It's not as crazy as it seems, a pinhole camera doesn't have any
| lenses either and works just fine. The hole size is a tradeoff
| between brightness and detail. This one has many holes and uses
| software to puzzle their images back together.
| dekhn wrote:
| For those interested in various approaches to lens-free imaging,
| Laura Waller at Berkeley has been pursuing this area for some
| time.
|
| https://waller-lab.github.io/DiffuserCam/ https://waller-
| lab.github.io/DiffuserCam/tutorial.html includes instructions and
| code to build your own
| https://ieeexplore.ieee.org/abstract/document/8747341
| https://ieeexplore.ieee.org/document/7492880
| astrange wrote:
| Note the difference between a "diffuser" and a "diffusion
| model" here.
| 6gvONxR4sf7o wrote:
| Re: is this a camera or not, I recently realized that my fancy
| mirrorless camera is closer to this than i'd previously thought.
|
| The sensor has a zillion pixels but each one only measures one
| color. for example, the pixel at index (145, 2832) might only
| measure green, while its neighbor at (145, 2833) only measures
| red. So we use models to fill in the blanks. We didn't measure
| redness at (145, 2832) so we guess based on the redness nearby.
|
| This kind of guessing is exactly what modern CV is so good at. So
| the line of what is a camera and what isn't is a bit blurry to
| begin with.
| klysm wrote:
| The structure you are referring to is a Bayer Array. Algorithms
| that do the guessing are called debayering algorithms
| 6gvONxR4sf7o wrote:
| I think that's just a particular (very common) case. In
| general it's called demosaicing, right?
| astrange wrote:
| They don't use models (although you certainly could.) They
| usually use plain old closed-form solutions.
| 6gvONxR4sf7o wrote:
| Eh, I came to ML from the stats side of things, so maybe I
| use "models" more expansively. They definitely use some
| things tuned to typical pictures sometimes (aka tuned to a
| natural dataset). On camera, it's much more constrained, but
| in postprocessing, more sophisticated solutions pop up.
|
| The wikipedia article on demosaicing has an algorithms
| section with a nice part on tradeoffs, how making assumptions
| about the kinds of pictures that will be taken can increase
| accuracy in distribution but introduce artifacts out of
| distribution.
|
| The types of models you see used on camera are pretty
| constrained (camera batteries are already a prime complaint),
| but there's a whole zoo of stuff used today in off-camera
| processing. And they're slowly making they're way on-camera
| as dedicated "AI processors" (I assume tiny TPU-like chips)
| are already making their way into cameras.
| Dibby053 wrote:
| This would be impressive if the examples weren't taken from the
| same dataset (Laion-5B) that was used to train the Stable
| Diffusion model it's using.
| Legend2440 wrote:
| They also show actual images they took of real scenes. See
| figure 6.
| Thomashuet wrote:
| I don't understand the use of a textual description. In which
| scenario do you not have enough space for a lens and yet have a
| textual description of the scene?
| pvillano wrote:
| I wonder how it "reacts" to optical illusions? The ones we're
| familiar with are optimized for probing the limits of the human
| visual system, but there might be some overlap
| albert_e wrote:
| so this is like use of (in a different species) light sensitive
| patches of skin instead of the eye balls (lenses) that most
| animals on earth evolved ?
|
| interesting.
|
| even if this does not immediately replace traditional cameras and
| lenses... I am wondering if this can add a complementary set of
| capabilities to a traditional camera say next to a phone's camera
| bump/island/cluster...so that we can drive some enhanced use
| cases
|
| maybe store the wider context in raw format alongside the EXIF
| data ...so that future photo manipulation models can use that
| ambient data to do more realistic edits / in painting / out
| painting etc?
|
| I am thinking this will benefit 3D photography and video graphics
| a lot if you can capture more of the ambient data, not strictly
| channeled through the lenses
| xg15 wrote:
| Oh great, waiting for the first media piece where pictures from
| this "camera" are presented as evidence. (Or the inverse, where
| actual photographic evidence is disputed because who knows if the
| camera didn't have AI stuff built in)
___________________________________________________________________
(page generated 2024-08-18 23:02 UTC)