[HN Gopher] DifuzCam: Replacing Camera Lens with a Mask and a Di...
       ___________________________________________________________________
        
       DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model
        
       Author : dataminer
       Score  : 66 points
       Date   : 2024-08-17 16:50 UTC (6 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | valine wrote:
       | I get the feeling that lens free cameras are the future.
       | Obviously the results here are no where near good enough, but
       | given the rapid improvement of diffusion models lately the
       | trajectory seems clear.
       | 
       | Would love to lose the camera bump on the back of my phone.
        
         | xandrius wrote:
         | So you just get something based off GPS, time of day and
         | rotation?
         | 
         | Or no photos anymore?
        
           | valine wrote:
           | Not sure I understand the question. This paper is about using
           | diffusion models to reconstruct usable images from raw sensor
           | data. The diffusion model in essence replaces the lens.
        
           | exe34 wrote:
           | there would be a bunch of holes and a ccd at the back, just
           | no growths of bulbous lens.
        
       | a1o wrote:
       | Oh god, we are going to make lens a premium feature now aren't
       | we?
        
         | chpatrick wrote:
         | It would be pretty great if cheap phones can get good cameras
         | with this technology.
        
           | 082349872349872 wrote:
           | If/when cheap enough, even non-phone devices (POS terminals,
           | vending machines, etc) will have cameras; will living in a
           | camera-free environment become the premium feature?
        
             | chpatrick wrote:
             | Cameras are already cheap enough to put in everything.
        
           | fpoling wrote:
           | There is no camera. It is just a diffusion model trained on a
           | big set that tries to reconstruct the picture. Essentially
           | this is not much different from what Samsung did with their
           | AI-enhanced camera that detected a moon and replaced that
           | with high-resolution picture.
        
             | holoduke wrote:
             | And yet, we will probably get all camera software on our
             | phones with unlimited zoom and details. Turning grain in
             | crispy clear pictures. Inpainting, outpainting etc. In 5
             | years from now everybody uses it. Everything becomes fake.
        
             | chpatrick wrote:
             | The pictures in the paper are pretty damn close, and this
             | is just a prototype. Plus, as you said, phones already have
             | AI filters.
        
       | teruakohatu wrote:
       | This is quite amazing that using a diffuser rather than a lens,
       | then using a diffusion model can reconstruct an image so well.
       | 
       | The downside of this is that is heavily relies on the model to
       | construct the image. Much like those colorisation models applied
       | to old monochrome photos, the results will probably always look a
       | little off based on the training data. I could imagine taking a
       | photo of some weird art installation and the camera getting
       | confused.
       | 
       | You can see examples of this when the model invented fabric
       | texture on the fabric examples and converted solar panels to
       | walls.
        
         | nine_k wrote:
         | The model basically guesses and reinvents what these diffuse
         | pixels might be. It's more like a painter producing a picture
         | from memory.
         | 
         | It inevitably means that the "camera" visually parses the scene
         | and then synthesizes its picture. The intermediate step is a
         | great moment to semantically edit the scene. Recolor and
         | retexture things. Remove some elements of the scene, or even
         | add some. Use different rendering styles.
         | 
         | Imagine pointing such a "camera" at person standing next to a
         | wall, and getting a picture of the person with their skin
         | lacking any wrinkles, clothes looking more lustrous as if it
         | were silk, not cotton, and the graffiti removed from the wall
         | behind.
         | 
         | Or making a "photo" that turns a group of kids into a group of
         | cartoon superheroes, while retaining their recognizable faces
         | and postures.
         | 
         | (ICBM course, photo evidence made with digital cameras should
         | long have been inadmissible in courts, but this would hasten
         | the transition.)
        
       | mjburgess wrote:
       | Does a camera without a lens make any physics sense? I cannot see
       | how the scene geometry could be recoverable. Rays of light
       | travelling from the scene arrive in all directions.
       | 
       | Intuitively, imagine moving your eye at every point along some
       | square inch. Each position of the eye is a different image. Now
       | all those images _overlap_ on the sensor.
       | 
       | If you look at the images in the paper, everything except their
       | most macro geometry and colour pallet is clearly generated --
       | since it changes depending on the prompt.
       | 
       | So at a guess, the lensless sensor gets this massive overlap of
       | all possible photos at that location and so is able, at least, to
       | capture minimal macro geometry and colour. This isn't going to be
       | a useful amount of information for almost any application.
        
       | ziofill wrote:
       | +1 for the Thor labs candy box
        
       | karmakaze wrote:
       | This is not a 'camera' per se. It's more like a human vision
       | system that samples light and hallucinates an appropriate image
       | based on context. The image is constructed from the data more
       | than it is reconstructed. And like human vision, it can be
       | correct more often than not to be useful.
        
         | cactusplant7374 wrote:
         | Thanks for the summary. I was looking for this.
        
       | Snoozus wrote:
       | It's not as crazy as it seems, a pinhole camera doesn't have any
       | lenses either and works just fine. The hole size is a tradeoff
       | between brightness and detail. This one has many holes and uses
       | software to puzzle their images back together.
        
       | dekhn wrote:
       | For those interested in various approaches to lens-free imaging,
       | Laura Waller at Berkeley has been pursuing this area for some
       | time.
       | 
       | https://waller-lab.github.io/DiffuserCam/ https://waller-
       | lab.github.io/DiffuserCam/tutorial.html includes instructions and
       | code to build your own
       | https://ieeexplore.ieee.org/abstract/document/8747341
       | https://ieeexplore.ieee.org/document/7492880
        
       | 6gvONxR4sf7o wrote:
       | Re: is this a camera or not, I recently realized that my fancy
       | mirrorless camera is closer to this than i'd previously thought.
       | 
       | The sensor has a zillion pixels but each one only measures one
       | color. for example, the pixel at index (145, 2832) might only
       | measure green, while its neighbor at (145, 2833) only measures
       | red. So we use models to fill in the blanks. We didn't measure
       | redness at (145, 2832) so we guess based on the redness nearby.
       | 
       | This kind of guessing is exactly what modern CV is so good at. So
       | the line of what is a camera and what isn't is a bit blurry to
       | begin with.
        
         | klysm wrote:
         | The structure you are referring to is a Bayer Array. Algorithms
         | that do the guessing are called debayering algorithms
        
           | 6gvONxR4sf7o wrote:
           | I think that's just a particular (very common) case. In
           | general it's called demosaicing, right?
        
       | Dibby053 wrote:
       | This would be impressive if the examples weren't taken from the
       | same dataset (Laion-5B) that was used to train the Stable
       | Diffusion model it's using.
        
       | Thomashuet wrote:
       | I don't understand the use of a textual description. In which
       | scenario do you not have enough space for a lens and yet have a
       | textual description of the scene?
        
       ___________________________________________________________________
       (page generated 2024-08-17 23:00 UTC)