[HN Gopher] DifuzCam: Replacing Camera Lens with a Mask and a Di...
___________________________________________________________________
DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model
Author : dataminer
Score : 66 points
Date : 2024-08-17 16:50 UTC (6 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| valine wrote:
| I get the feeling that lens free cameras are the future.
| Obviously the results here are no where near good enough, but
| given the rapid improvement of diffusion models lately the
| trajectory seems clear.
|
| Would love to lose the camera bump on the back of my phone.
| xandrius wrote:
| So you just get something based off GPS, time of day and
| rotation?
|
| Or no photos anymore?
| valine wrote:
| Not sure I understand the question. This paper is about using
| diffusion models to reconstruct usable images from raw sensor
| data. The diffusion model in essence replaces the lens.
| exe34 wrote:
| there would be a bunch of holes and a ccd at the back, just
| no growths of bulbous lens.
| a1o wrote:
| Oh god, we are going to make lens a premium feature now aren't
| we?
| chpatrick wrote:
| It would be pretty great if cheap phones can get good cameras
| with this technology.
| 082349872349872 wrote:
| If/when cheap enough, even non-phone devices (POS terminals,
| vending machines, etc) will have cameras; will living in a
| camera-free environment become the premium feature?
| chpatrick wrote:
| Cameras are already cheap enough to put in everything.
| fpoling wrote:
| There is no camera. It is just a diffusion model trained on a
| big set that tries to reconstruct the picture. Essentially
| this is not much different from what Samsung did with their
| AI-enhanced camera that detected a moon and replaced that
| with high-resolution picture.
| holoduke wrote:
| And yet, we will probably get all camera software on our
| phones with unlimited zoom and details. Turning grain in
| crispy clear pictures. Inpainting, outpainting etc. In 5
| years from now everybody uses it. Everything becomes fake.
| chpatrick wrote:
| The pictures in the paper are pretty damn close, and this
| is just a prototype. Plus, as you said, phones already have
| AI filters.
| teruakohatu wrote:
| This is quite amazing that using a diffuser rather than a lens,
| then using a diffusion model can reconstruct an image so well.
|
| The downside of this is that is heavily relies on the model to
| construct the image. Much like those colorisation models applied
| to old monochrome photos, the results will probably always look a
| little off based on the training data. I could imagine taking a
| photo of some weird art installation and the camera getting
| confused.
|
| You can see examples of this when the model invented fabric
| texture on the fabric examples and converted solar panels to
| walls.
| nine_k wrote:
| The model basically guesses and reinvents what these diffuse
| pixels might be. It's more like a painter producing a picture
| from memory.
|
| It inevitably means that the "camera" visually parses the scene
| and then synthesizes its picture. The intermediate step is a
| great moment to semantically edit the scene. Recolor and
| retexture things. Remove some elements of the scene, or even
| add some. Use different rendering styles.
|
| Imagine pointing such a "camera" at person standing next to a
| wall, and getting a picture of the person with their skin
| lacking any wrinkles, clothes looking more lustrous as if it
| were silk, not cotton, and the graffiti removed from the wall
| behind.
|
| Or making a "photo" that turns a group of kids into a group of
| cartoon superheroes, while retaining their recognizable faces
| and postures.
|
| (ICBM course, photo evidence made with digital cameras should
| long have been inadmissible in courts, but this would hasten
| the transition.)
| mjburgess wrote:
| Does a camera without a lens make any physics sense? I cannot see
| how the scene geometry could be recoverable. Rays of light
| travelling from the scene arrive in all directions.
|
| Intuitively, imagine moving your eye at every point along some
| square inch. Each position of the eye is a different image. Now
| all those images _overlap_ on the sensor.
|
| If you look at the images in the paper, everything except their
| most macro geometry and colour pallet is clearly generated --
| since it changes depending on the prompt.
|
| So at a guess, the lensless sensor gets this massive overlap of
| all possible photos at that location and so is able, at least, to
| capture minimal macro geometry and colour. This isn't going to be
| a useful amount of information for almost any application.
| ziofill wrote:
| +1 for the Thor labs candy box
| karmakaze wrote:
| This is not a 'camera' per se. It's more like a human vision
| system that samples light and hallucinates an appropriate image
| based on context. The image is constructed from the data more
| than it is reconstructed. And like human vision, it can be
| correct more often than not to be useful.
| cactusplant7374 wrote:
| Thanks for the summary. I was looking for this.
| Snoozus wrote:
| It's not as crazy as it seems, a pinhole camera doesn't have any
| lenses either and works just fine. The hole size is a tradeoff
| between brightness and detail. This one has many holes and uses
| software to puzzle their images back together.
| dekhn wrote:
| For those interested in various approaches to lens-free imaging,
| Laura Waller at Berkeley has been pursuing this area for some
| time.
|
| https://waller-lab.github.io/DiffuserCam/ https://waller-
| lab.github.io/DiffuserCam/tutorial.html includes instructions and
| code to build your own
| https://ieeexplore.ieee.org/abstract/document/8747341
| https://ieeexplore.ieee.org/document/7492880
| 6gvONxR4sf7o wrote:
| Re: is this a camera or not, I recently realized that my fancy
| mirrorless camera is closer to this than i'd previously thought.
|
| The sensor has a zillion pixels but each one only measures one
| color. for example, the pixel at index (145, 2832) might only
| measure green, while its neighbor at (145, 2833) only measures
| red. So we use models to fill in the blanks. We didn't measure
| redness at (145, 2832) so we guess based on the redness nearby.
|
| This kind of guessing is exactly what modern CV is so good at. So
| the line of what is a camera and what isn't is a bit blurry to
| begin with.
| klysm wrote:
| The structure you are referring to is a Bayer Array. Algorithms
| that do the guessing are called debayering algorithms
| 6gvONxR4sf7o wrote:
| I think that's just a particular (very common) case. In
| general it's called demosaicing, right?
| Dibby053 wrote:
| This would be impressive if the examples weren't taken from the
| same dataset (Laion-5B) that was used to train the Stable
| Diffusion model it's using.
| Thomashuet wrote:
| I don't understand the use of a textual description. In which
| scenario do you not have enough space for a lens and yet have a
| textual description of the scene?
___________________________________________________________________
(page generated 2024-08-17 23:00 UTC)