[HN Gopher] High-resolution efficient image generation from WiFi...
       ___________________________________________________________________
        
       High-resolution efficient image generation from WiFi Mapping
        
       Author : oldfuture
       Score  : 127 points
       Date   : 2025-10-01 06:33 UTC (16 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | jychang wrote:
       | The image examples from the paper are absolutely insane.
       | 
       | Is this just extremely overfitted?
       | 
       | Is there a way for us to test this? Or even if the model isn't
       | open source, I'd pay $1 to upload a capture from my wifi card on
       | my linux box and upload it to the researchers and have them
       | generate a picture and see if it's accurate
        
         | RicDan wrote:
         | Yeah this seems too insane to be true. I understand that wifi
         | signal strength etc. is heavily impacted by the contents of a
         | room, but even so it seems farfetched that there is enough
         | information in its distortion to lead to these results.
        
           | esrh wrote:
           | A lot of wifi sensing results that have high-dimensional
           | outputs are usually using wideband links... your average wifi
           | connection uses 20MHz of bandwidth and is transmitting on 48
           | spaced out frequencies. In the paper, we use 160MHz with
           | effectively 1992 input data points. This still isn't enough
           | to predict a 3x512x512 image well enough, which motivated
           | predicting 4x64x64 latent embeddings instead.
           | 
           | The more space you take up in the frequency domain, the
           | higher your resolution in the time domain is. Wifi sensing
           | results that detect heart rate or breathing, for example, use
           | even larger bandwidth, to the point where it'd be more
           | accurate to call them radars than wifi access points.
        
         | tylervigen wrote:
         | That is not how it works. The images of the room are included
         | in the generative training model. The wifi is "just" helping
         | identify the locations of objects in the room.
         | 
         | If you uploaded a random room to the model without retraining
         | it, you wouldn't get anything as accurate as the images in the
         | paper.
        
       | fxtentacle wrote:
       | FYI the images are not generated based on the WiFi data. The WiFi
       | data is used as additional conditioning for a regular diffusion
       | image generation model. So what that means is the WiFi
       | measurements are used for determining which objects to place
       | where in the image, but the diffusion model will then fill in any
       | "knowledge gaps" with randomly generated (but visually plausible)
       | data.
        
         | jstanley wrote:
         | I'm confused about how it gets things like the floor colour and
         | clothing colour correct.
         | 
         | It seems like they might be giving it more information besides
         | the WiFi data, or else maybe training it on photos of the
         | actual person in the actual room, in which case it's not
         | obvious how well it would generalise.
        
           | f_devd wrote:
           | This is what GP eludes to, the original dataset has many
           | similar reference images (i.e. the common mode is the same),
           | and the LatentCSI model is tasked to reconstruct the correct
           | specific instance (or a similarly plausible image in case of
           | the test/validation set)
        
           | Aurornis wrote:
           | > I'm confused about how it gets things like the floor colour
           | and clothing colour correct.
           | 
           | The model was trained on the room.
           | 
           | It would produce images of the room even without any WiFi
           | data input at all.
           | 
           | The WiFi is used as a modulator on the input to the pre
           | trained model.
           | 
           | It's not actually generating an image of the room from only
           | WiFi signals.
        
           | gblargg wrote:
           | It wouldn't generalize at all. The Wi-Fi is just
           | differentiating among a small set of possible object
           | placement/orientations within that fixed space, then
           | modifying photos taken appropriately, as far as I can tell.
        
         | esrh wrote:
         | Think of it as an img2img stable diffusion process, except
         | instead of starting with an image you want to transform, you
         | start with CSI.
         | 
         | The encoder itself is trained on latent embeddings of images in
         | the same environment with the same subject, so it learns visual
         | details (that are preserved through the original autoencoder;
         | this is why the model can't overfit on, say, text or faces).
        
       | equinox_nl wrote:
       | I'm highly skeptical about this paper just because the resulting
       | images are in color. How the hell would the model even infer that
       | from the input data?
        
         | orbital-decay wrote:
         | That's just a diffusion model (Stable Diffusion 1.5) with a
         | custom encoder that uses CSI measurements as input. So
         | apparently the answer is it's all hallucinated.
        
           | pftburger wrote:
           | Right but it's hallucinating the right colours which to me
           | feels like some data is leaking somewhere. Because no way
           | wifi sees colours
        
             | moffkalast wrote:
             | Well perhaps it can, a 2.4Ghz antenna is just a very red
             | lightbulb. Maybe material absorption correlates, though it
             | would be a long shot?
        
               | steinvakt2 wrote:
               | If it sees the shape of a fire extinguisher, the
               | diffusion model will "know" it should be red. But that's
               | not all that's going on here. Hair color etc seems
               | impossible to guess, right? To be fair I haven't actually
               | read the paper so maybe they explain this
        
               | defraudbah wrote:
               | downvoted until you read the paper
        
               | jstanley wrote:
               | You can't even pick colour out of infra-red-illuminated
               | night time photography. There's no way you can pick
               | colour out of WiFi-illuminated photography.
        
               | AngryData wrote:
               | There would be some correlation between the visual color
               | of objects and the spectrum of an object in another EM
               | frequency, many object's color share the same dye or
               | pigment materials, but it seems pretty unlikely that it
               | would be reliable at all with a spectrum of different
               | objects and materials and dyes because there is no
               | universal RGB dye or pigment set we rely upon. You can
               | make the same red color many different ways but each
               | material will have different spectral "colors" outside of
               | the visual range. Even something simple like black
               | plastics can be completely transparent in other spectrums
               | like the PS3 was to infrared. Structural colors would
               | probably be impossible to see discern however I don't
               | think too many household objects have structural colors
               | unless you got a stuffed bird or fish on the wall.
        
             | HeatrayEnjoyer wrote:
             | Different materials and dyes have different dialectical
             | properties. These examples are probably confabulation but
             | I'm sure it's possible in principle.
        
               | plorg wrote:
               | Assuming you mean dielectric, but I do like the idea that
               | different colors are different arguments in conflict with
               | each other.
        
         | anthonj wrote:
         | It is an overfitted model thst use WiFi data as hints for
         | generation:
         | 
         | "We consider a WiFi sensing system designed to monitor indoor
         | environments by capturing human activity through wireless
         | signals. The system consists of a WiFi access point, a WiFi
         | terminal, and an RGB camera that is available only during the
         | training phase. This setup enables the collection of paired
         | channel state information (CSI) and image data, which are used
         | to train an image generation model"
        
         | meindnoch wrote:
         | The model was trained on images of that particular room, from
         | that particular angle. It can only generate images of that
         | particular room.
        
         | dtj1123 wrote:
         | This is largely guesswork but I think whats happening is this.
         | The training set contains images of a small number of rooms
         | taken from specific camera angles with only that individual
         | standing in it, and associated wifi signal data. The model then
         | learns to predict the posture of the individual given the wifi
         | signal data, outputting the prediction as a colour image. Given
         | that the background doesn't vary across images, the model
         | learns to predict it consistently with accurate colors etc.
         | 
         | The interesting part of the whole setup is that the wifi signal
         | seems to contain the information required to predict the
         | posture of the individual to a reasonably high degree of
         | accuracy, which is actually pretty cool.
        
       | malux85 wrote:
       | PSA: If you publish a paper that talks about high resolution
       | images can you please include at least 1 high resolution image.
       | 
       | I know that is a subjective metric but by anyone's measure a 4x4
       | matrix of postage stamp sized images are not high resolution.
        
         | mistercow wrote:
         | 1. "High resolution" in this kind of context is generally
         | relative to previous work.
         | 
         | 2. "Postage stamp sized" is not a resolution. Zoom in on them
         | and you'll see that they're quite crisp.
        
         | amagasaki wrote:
         | The HTML version has much larger images
        
       | nntwozz wrote:
       | One step closer to The Light of Other Days.
       | 
       | "When a brilliant, driven industrialist harnesses the cutting
       | edge of quantum physics to enable people everywhere, at trivial
       | cost, to see one another at all times: around every corner,
       | through every wall, into everyone's most private, hidden, and
       | even intimate moments. It amounts to the sudden and complete
       | abolition of human privacy--forever."
        
         | nashashmi wrote:
         | So privacy is a mathematical function using variables of cost,
         | capability, control, reach?
        
       | esrh wrote:
       | This is my paper (first author).
       | 
       | I think the results here are much less important and surprising
       | than what some people seem to be thinking. To summarize the core
       | of the paper, we took stable diffusion (which is a 3-part system
       | of an encoder, u-net, decoder), and replaced the encoder to use
       | WiFi data instead of images. This gives you two advantages: you
       | get text-based guidance for free, and the encoder model can be
       | smaller. The smaller model combined with the semantic compression
       | from the autoencoder gives you better (SOTA resolution) results,
       | much faster.
       | 
       | I noticed a lot of discussion about how the model can possibly be
       | so accurate. It wouldn't be wrong to consider the model overfit,
       | in the sense that the visual details of the scene are moved from
       | the training data to the model weights. These kinds of models are
       | meant to be trained & deployed in a single environment. What's
       | interesting about this work is that learning the environment well
       | has become really fast because the output dimension is smaller
       | than image space. In fact, it's so fast that you can basically do
       | it in real time... you turn on a data collection node and can
       | train a model from scratch online, in a new environment that gets
       | decent results with at least a little bit of interesting
       | generalization in ~10min. I'm presenting a demonstration of this
       | at Mobicom 2025 next month in Hong Kong.
       | 
       | What people call "WiFi sensing" is now mostly CSI (channel state
       | information) sensing. When you transmit a packet on many
       | subcarriers (frequencies), the CSI represents how the data on
       | each frequency changed during transmission. So, CSI is inherently
       | quite sensitive to environmental changes.
       | 
       | I want to point out something that most everybody working in the
       | CSI sensing/general ISAC space seems to know: generalization is
       | hard and most definitely unsolved for any reasonably high-
       | dimensional sensing problem (like image generation and to some
       | extent pose estimation). I see a lot of fearmongering online
       | about wifi sensing killing privacy for good, but in my opinion
       | we're still quite far off.
       | 
       | I've made the project's code and some formatted data public since
       | this paper is starting to pick up some attention:
       | https://github.com/nishio-laboratory/latentcsi
        
         | phh wrote:
         | Is there a survey of SoTA of what can be achieved with CSI
         | sensing you would recommend?
         | 
         | What is available on the low level? Are researchers using SDR,
         | or there are common wifi chips that properly report CSI? Do
         | most people feed in CSI of literally every packet, or is it
         | sampled?
        
           | esrh wrote:
           | I'd suggest reading
           | https://dl.acm.org/doi/abs/10.1145/3310194 (2019) for a
           | survey on early methods and https://arxiv.org/abs/2503.08008.
           | 
           | As for low level:
           | 
           | The most common early hardware was afaik esp32s &
           | https://stevenmhernandez.github.io/ESP32-CSI-Tool/, and also
           | old intel NICs &
           | https://dhalperi.github.io/linux-80211n-csitool/.
           | 
           | Now many people use https://ps.zpj.io/ which supports some
           | hardware including SDRs, but I must discourage using it,
           | especially for research, as it's not free software and has a
           | restrictive license. I used https://feitcsi.kuskosoft.com/
           | which uses a slightly modified iwlwifi driver, since iwlwifi
           | needs to compute CSI anyway. There are free software
           | alternatives for SDR CSI extraction as well; it's not hard to
           | build an OFDM chain with GNUradio and extract CSI, although
           | this might require a slightly more in-depth understanding of
           | how wifi works.
        
       | nashashmi wrote:
       | Where is the color info coming from? It can't come from WiFi. Is
       | that being fed in using a photo?
        
       | brcmthrowaway wrote:
       | So the applications of this work is.. surveillance. Why are there
       | people working in this space?
        
       | cracki wrote:
       | So they trained a model on a handful of poses? OK cool.
       | 
       | Any "unknown" state of the scene is bound to confuse it.
        
       ___________________________________________________________________
       (page generated 2025-10-01 23:01 UTC)