[HN Gopher] DifuzCam: Replacing Camera Lens with a Mask and a Di...
       ___________________________________________________________________
        
       DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model
        
       Author : dataminer
       Score  : 108 points
       Date   : 2024-08-17 16:50 UTC (1 days ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | valine wrote:
       | I get the feeling that lens free cameras are the future.
       | Obviously the results here are no where near good enough, but
       | given the rapid improvement of diffusion models lately the
       | trajectory seems clear.
       | 
       | Would love to lose the camera bump on the back of my phone.
        
         | xandrius wrote:
         | So you just get something based off GPS, time of day and
         | rotation?
         | 
         | Or no photos anymore?
        
           | valine wrote:
           | Not sure I understand the question. This paper is about using
           | diffusion models to reconstruct usable images from raw sensor
           | data. The diffusion model in essence replaces the lens.
        
           | exe34 wrote:
           | there would be a bunch of holes and a ccd at the back, just
           | no growths of bulbous lens.
        
         | smallerize wrote:
         | https://www.standard.co.uk/news/tech/ai-camera-images-paragr...
        
           | valine wrote:
           | Not sure what this has to do with anything. The paper I was
           | commenting on is using diffusion models to parse raw light
           | hitting the sensor as an alternative to a glass lens. No one
           | wants an image generation model hooked up to weather data--
           | that's kind of ridiculous.
        
       | a1o wrote:
       | Oh god, we are going to make lens a premium feature now aren't
       | we?
        
         | chpatrick wrote:
         | It would be pretty great if cheap phones can get good cameras
         | with this technology.
        
           | 082349872349872 wrote:
           | If/when cheap enough, even non-phone devices (POS terminals,
           | vending machines, etc) will have cameras; will living in a
           | camera-free environment become the premium feature?
        
             | chpatrick wrote:
             | Cameras are already cheap enough to put in everything.
        
           | fpoling wrote:
           | There is no camera. It is just a diffusion model trained on a
           | big set that tries to reconstruct the picture. Essentially
           | this is not much different from what Samsung did with their
           | AI-enhanced camera that detected a moon and replaced that
           | with high-resolution picture.
        
             | holoduke wrote:
             | And yet, we will probably get all camera software on our
             | phones with unlimited zoom and details. Turning grain in
             | crispy clear pictures. Inpainting, outpainting etc. In 5
             | years from now everybody uses it. Everything becomes fake.
        
             | chpatrick wrote:
             | The pictures in the paper are pretty damn close, and this
             | is just a prototype. Plus, as you said, phones already have
             | AI filters.
        
               | wl wrote:
               | The text on the Thor Labs "Lab Snacks" box is giant and
               | still unreadable, the model interpolating total junk. It
               | seems like there's nowhere near enough signal there.
        
       | teruakohatu wrote:
       | This is quite amazing that using a diffuser rather than a lens,
       | then using a diffusion model can reconstruct an image so well.
       | 
       | The downside of this is that is heavily relies on the model to
       | construct the image. Much like those colorisation models applied
       | to old monochrome photos, the results will probably always look a
       | little off based on the training data. I could imagine taking a
       | photo of some weird art installation and the camera getting
       | confused.
       | 
       | You can see examples of this when the model invented fabric
       | texture on the fabric examples and converted solar panels to
       | walls.
        
         | nine_k wrote:
         | The model basically guesses and reinvents what these diffuse
         | pixels might be. It's more like a painter producing a picture
         | from memory.
         | 
         | It inevitably means that the "camera" visually parses the scene
         | and then synthesizes its picture. The intermediate step is a
         | great moment to semantically edit the scene. Recolor and
         | retexture things. Remove some elements of the scene, or even
         | add some. Use different rendering styles.
         | 
         | Imagine pointing such a "camera" at person standing next to a
         | wall, and getting a picture of the person with their skin
         | lacking any wrinkles, clothes looking more lustrous as if it
         | were silk, not cotton, and the graffiti removed from the wall
         | behind.
         | 
         | Or making a "photo" that turns a group of kids into a group of
         | cartoon superheroes, while retaining their recognizable faces
         | and postures.
         | 
         | (ICBM course, photo evidence made with digital cameras should
         | long have been inadmissible in courts, but this would hasten
         | the transition.)
        
           | neom wrote:
           | Kinda reminds me of this a bit:
           | https://arstechnica.com/gadgets/2020/11/nvidia-used-
           | neural-n...
        
           | wizzwizz4 wrote:
           | > _photo evidence made with digital cameras should long have
           | been inadmissible in courts_
           | 
           | Sworn testimony is admissible in courts. I think the "you can
           | just make evidence up" threshold was passed a few thousand
           | years ago. The courts still, mostly, work.
        
             | xg15 wrote:
             | Yes. One solution to the problem if false testimony was
             | photo evidence...
             | 
             | But I'm less worried about the courts and more about media
             | that might publish photos without realizing they are AI
             | generated - or ordinary people using those cameras without
             | understanding how they work and then not realizing there
             | may be some details in the pictures that are plain fantasy.
        
               | wizzwizz4 wrote:
               | Most people I know would understand "the camera comes
               | with a built-in filter" to mean "what the camera
               | photographs isn't what you'd see if you looked". The
               | media publishing misleading (or misleadingly-captioned)
               | photos is a problem as old as print photography.
        
         | wilted-iris wrote:
         | Turned a guy right into a tree. This would have fascinating
         | implications if deployed broadly.
        
       | mjburgess wrote:
       | Does a camera without a lens make any physics sense? I cannot see
       | how the scene geometry could be recoverable. Rays of light
       | travelling from the scene arrive in all directions.
       | 
       | Intuitively, imagine moving your eye at every point along some
       | square inch. Each position of the eye is a different image. Now
       | all those images _overlap_ on the sensor.
       | 
       | If you look at the images in the paper, everything except their
       | most macro geometry and colour pallet is clearly generated --
       | since it changes depending on the prompt.
       | 
       | So at a guess, the lensless sensor gets this massive overlap of
       | all possible photos at that location and so is able, at least, to
       | capture minimal macro geometry and colour. This isn't going to be
       | a useful amount of information for almost any application.
        
         | orbital-decay wrote:
         | Yes, compressed sensing cameras do exactly that. They
         | reconstruct a photometrically correct image without the need
         | for focusing optics or pixel arrays. They have limitations (not
         | fundamental ones though) but they're useful for special use
         | cases like X-ray or LWIR single-pixel imaging where focusing
         | optics and pixel arrays are impossible or expensive. It was
         | first used on X-ray telescopes in 1970's in the form of coded
         | aperture, before the grazing incidence mirrors.
        
         | nialv7 wrote:
         | It's lensless, but it's not just a naked sensor. It still has
         | an optical element - read the paper.
        
       | ziofill wrote:
       | +1 for the Thor labs candy box
        
       | karmakaze wrote:
       | This is not a 'camera' per se. It's more like a human vision
       | system that samples light and hallucinates an appropriate image
       | based on context. The image is constructed from the data more
       | than it is reconstructed. And like human vision, it can be
       | correct more often than not to be useful.
        
         | cactusplant7374 wrote:
         | Thanks for the summary. I was looking for this.
        
       | Snoozus wrote:
       | It's not as crazy as it seems, a pinhole camera doesn't have any
       | lenses either and works just fine. The hole size is a tradeoff
       | between brightness and detail. This one has many holes and uses
       | software to puzzle their images back together.
        
       | dekhn wrote:
       | For those interested in various approaches to lens-free imaging,
       | Laura Waller at Berkeley has been pursuing this area for some
       | time.
       | 
       | https://waller-lab.github.io/DiffuserCam/ https://waller-
       | lab.github.io/DiffuserCam/tutorial.html includes instructions and
       | code to build your own
       | https://ieeexplore.ieee.org/abstract/document/8747341
       | https://ieeexplore.ieee.org/document/7492880
        
         | astrange wrote:
         | Note the difference between a "diffuser" and a "diffusion
         | model" here.
        
       | 6gvONxR4sf7o wrote:
       | Re: is this a camera or not, I recently realized that my fancy
       | mirrorless camera is closer to this than i'd previously thought.
       | 
       | The sensor has a zillion pixels but each one only measures one
       | color. for example, the pixel at index (145, 2832) might only
       | measure green, while its neighbor at (145, 2833) only measures
       | red. So we use models to fill in the blanks. We didn't measure
       | redness at (145, 2832) so we guess based on the redness nearby.
       | 
       | This kind of guessing is exactly what modern CV is so good at. So
       | the line of what is a camera and what isn't is a bit blurry to
       | begin with.
        
         | klysm wrote:
         | The structure you are referring to is a Bayer Array. Algorithms
         | that do the guessing are called debayering algorithms
        
           | 6gvONxR4sf7o wrote:
           | I think that's just a particular (very common) case. In
           | general it's called demosaicing, right?
        
         | astrange wrote:
         | They don't use models (although you certainly could.) They
         | usually use plain old closed-form solutions.
        
           | 6gvONxR4sf7o wrote:
           | Eh, I came to ML from the stats side of things, so maybe I
           | use "models" more expansively. They definitely use some
           | things tuned to typical pictures sometimes (aka tuned to a
           | natural dataset). On camera, it's much more constrained, but
           | in postprocessing, more sophisticated solutions pop up.
           | 
           | The wikipedia article on demosaicing has an algorithms
           | section with a nice part on tradeoffs, how making assumptions
           | about the kinds of pictures that will be taken can increase
           | accuracy in distribution but introduce artifacts out of
           | distribution.
           | 
           | The types of models you see used on camera are pretty
           | constrained (camera batteries are already a prime complaint),
           | but there's a whole zoo of stuff used today in off-camera
           | processing. And they're slowly making they're way on-camera
           | as dedicated "AI processors" (I assume tiny TPU-like chips)
           | are already making their way into cameras.
        
       | Dibby053 wrote:
       | This would be impressive if the examples weren't taken from the
       | same dataset (Laion-5B) that was used to train the Stable
       | Diffusion model it's using.
        
         | Legend2440 wrote:
         | They also show actual images they took of real scenes. See
         | figure 6.
        
       | Thomashuet wrote:
       | I don't understand the use of a textual description. In which
       | scenario do you not have enough space for a lens and yet have a
       | textual description of the scene?
        
       | pvillano wrote:
       | I wonder how it "reacts" to optical illusions? The ones we're
       | familiar with are optimized for probing the limits of the human
       | visual system, but there might be some overlap
        
       | albert_e wrote:
       | so this is like use of (in a different species) light sensitive
       | patches of skin instead of the eye balls (lenses) that most
       | animals on earth evolved ?
       | 
       | interesting.
       | 
       | even if this does not immediately replace traditional cameras and
       | lenses... I am wondering if this can add a complementary set of
       | capabilities to a traditional camera say next to a phone's camera
       | bump/island/cluster...so that we can drive some enhanced use
       | cases
       | 
       | maybe store the wider context in raw format alongside the EXIF
       | data ...so that future photo manipulation models can use that
       | ambient data to do more realistic edits / in painting / out
       | painting etc?
       | 
       | I am thinking this will benefit 3D photography and video graphics
       | a lot if you can capture more of the ambient data, not strictly
       | channeled through the lenses
        
       | xg15 wrote:
       | Oh great, waiting for the first media piece where pictures from
       | this "camera" are presented as evidence. (Or the inverse, where
       | actual photographic evidence is disputed because who knows if the
       | camera didn't have AI stuff built in)
        
       ___________________________________________________________________
       (page generated 2024-08-18 23:02 UTC)