[HN Gopher] High-resolution efficient image generation from WiFi...
___________________________________________________________________
High-resolution efficient image generation from WiFi Mapping
Author : oldfuture
Score : 127 points
Date : 2025-10-01 06:33 UTC (16 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| jychang wrote:
| The image examples from the paper are absolutely insane.
|
| Is this just extremely overfitted?
|
| Is there a way for us to test this? Or even if the model isn't
| open source, I'd pay $1 to upload a capture from my wifi card on
| my linux box and upload it to the researchers and have them
| generate a picture and see if it's accurate
| RicDan wrote:
| Yeah this seems too insane to be true. I understand that wifi
| signal strength etc. is heavily impacted by the contents of a
| room, but even so it seems farfetched that there is enough
| information in its distortion to lead to these results.
| esrh wrote:
| A lot of wifi sensing results that have high-dimensional
| outputs are usually using wideband links... your average wifi
| connection uses 20MHz of bandwidth and is transmitting on 48
| spaced out frequencies. In the paper, we use 160MHz with
| effectively 1992 input data points. This still isn't enough
| to predict a 3x512x512 image well enough, which motivated
| predicting 4x64x64 latent embeddings instead.
|
| The more space you take up in the frequency domain, the
| higher your resolution in the time domain is. Wifi sensing
| results that detect heart rate or breathing, for example, use
| even larger bandwidth, to the point where it'd be more
| accurate to call them radars than wifi access points.
| tylervigen wrote:
| That is not how it works. The images of the room are included
| in the generative training model. The wifi is "just" helping
| identify the locations of objects in the room.
|
| If you uploaded a random room to the model without retraining
| it, you wouldn't get anything as accurate as the images in the
| paper.
| fxtentacle wrote:
| FYI the images are not generated based on the WiFi data. The WiFi
| data is used as additional conditioning for a regular diffusion
| image generation model. So what that means is the WiFi
| measurements are used for determining which objects to place
| where in the image, but the diffusion model will then fill in any
| "knowledge gaps" with randomly generated (but visually plausible)
| data.
| jstanley wrote:
| I'm confused about how it gets things like the floor colour and
| clothing colour correct.
|
| It seems like they might be giving it more information besides
| the WiFi data, or else maybe training it on photos of the
| actual person in the actual room, in which case it's not
| obvious how well it would generalise.
| f_devd wrote:
| This is what GP eludes to, the original dataset has many
| similar reference images (i.e. the common mode is the same),
| and the LatentCSI model is tasked to reconstruct the correct
| specific instance (or a similarly plausible image in case of
| the test/validation set)
| Aurornis wrote:
| > I'm confused about how it gets things like the floor colour
| and clothing colour correct.
|
| The model was trained on the room.
|
| It would produce images of the room even without any WiFi
| data input at all.
|
| The WiFi is used as a modulator on the input to the pre
| trained model.
|
| It's not actually generating an image of the room from only
| WiFi signals.
| gblargg wrote:
| It wouldn't generalize at all. The Wi-Fi is just
| differentiating among a small set of possible object
| placement/orientations within that fixed space, then
| modifying photos taken appropriately, as far as I can tell.
| esrh wrote:
| Think of it as an img2img stable diffusion process, except
| instead of starting with an image you want to transform, you
| start with CSI.
|
| The encoder itself is trained on latent embeddings of images in
| the same environment with the same subject, so it learns visual
| details (that are preserved through the original autoencoder;
| this is why the model can't overfit on, say, text or faces).
| equinox_nl wrote:
| I'm highly skeptical about this paper just because the resulting
| images are in color. How the hell would the model even infer that
| from the input data?
| orbital-decay wrote:
| That's just a diffusion model (Stable Diffusion 1.5) with a
| custom encoder that uses CSI measurements as input. So
| apparently the answer is it's all hallucinated.
| pftburger wrote:
| Right but it's hallucinating the right colours which to me
| feels like some data is leaking somewhere. Because no way
| wifi sees colours
| moffkalast wrote:
| Well perhaps it can, a 2.4Ghz antenna is just a very red
| lightbulb. Maybe material absorption correlates, though it
| would be a long shot?
| steinvakt2 wrote:
| If it sees the shape of a fire extinguisher, the
| diffusion model will "know" it should be red. But that's
| not all that's going on here. Hair color etc seems
| impossible to guess, right? To be fair I haven't actually
| read the paper so maybe they explain this
| defraudbah wrote:
| downvoted until you read the paper
| jstanley wrote:
| You can't even pick colour out of infra-red-illuminated
| night time photography. There's no way you can pick
| colour out of WiFi-illuminated photography.
| AngryData wrote:
| There would be some correlation between the visual color
| of objects and the spectrum of an object in another EM
| frequency, many object's color share the same dye or
| pigment materials, but it seems pretty unlikely that it
| would be reliable at all with a spectrum of different
| objects and materials and dyes because there is no
| universal RGB dye or pigment set we rely upon. You can
| make the same red color many different ways but each
| material will have different spectral "colors" outside of
| the visual range. Even something simple like black
| plastics can be completely transparent in other spectrums
| like the PS3 was to infrared. Structural colors would
| probably be impossible to see discern however I don't
| think too many household objects have structural colors
| unless you got a stuffed bird or fish on the wall.
| HeatrayEnjoyer wrote:
| Different materials and dyes have different dialectical
| properties. These examples are probably confabulation but
| I'm sure it's possible in principle.
| plorg wrote:
| Assuming you mean dielectric, but I do like the idea that
| different colors are different arguments in conflict with
| each other.
| anthonj wrote:
| It is an overfitted model thst use WiFi data as hints for
| generation:
|
| "We consider a WiFi sensing system designed to monitor indoor
| environments by capturing human activity through wireless
| signals. The system consists of a WiFi access point, a WiFi
| terminal, and an RGB camera that is available only during the
| training phase. This setup enables the collection of paired
| channel state information (CSI) and image data, which are used
| to train an image generation model"
| meindnoch wrote:
| The model was trained on images of that particular room, from
| that particular angle. It can only generate images of that
| particular room.
| dtj1123 wrote:
| This is largely guesswork but I think whats happening is this.
| The training set contains images of a small number of rooms
| taken from specific camera angles with only that individual
| standing in it, and associated wifi signal data. The model then
| learns to predict the posture of the individual given the wifi
| signal data, outputting the prediction as a colour image. Given
| that the background doesn't vary across images, the model
| learns to predict it consistently with accurate colors etc.
|
| The interesting part of the whole setup is that the wifi signal
| seems to contain the information required to predict the
| posture of the individual to a reasonably high degree of
| accuracy, which is actually pretty cool.
| malux85 wrote:
| PSA: If you publish a paper that talks about high resolution
| images can you please include at least 1 high resolution image.
|
| I know that is a subjective metric but by anyone's measure a 4x4
| matrix of postage stamp sized images are not high resolution.
| mistercow wrote:
| 1. "High resolution" in this kind of context is generally
| relative to previous work.
|
| 2. "Postage stamp sized" is not a resolution. Zoom in on them
| and you'll see that they're quite crisp.
| amagasaki wrote:
| The HTML version has much larger images
| nntwozz wrote:
| One step closer to The Light of Other Days.
|
| "When a brilliant, driven industrialist harnesses the cutting
| edge of quantum physics to enable people everywhere, at trivial
| cost, to see one another at all times: around every corner,
| through every wall, into everyone's most private, hidden, and
| even intimate moments. It amounts to the sudden and complete
| abolition of human privacy--forever."
| nashashmi wrote:
| So privacy is a mathematical function using variables of cost,
| capability, control, reach?
| esrh wrote:
| This is my paper (first author).
|
| I think the results here are much less important and surprising
| than what some people seem to be thinking. To summarize the core
| of the paper, we took stable diffusion (which is a 3-part system
| of an encoder, u-net, decoder), and replaced the encoder to use
| WiFi data instead of images. This gives you two advantages: you
| get text-based guidance for free, and the encoder model can be
| smaller. The smaller model combined with the semantic compression
| from the autoencoder gives you better (SOTA resolution) results,
| much faster.
|
| I noticed a lot of discussion about how the model can possibly be
| so accurate. It wouldn't be wrong to consider the model overfit,
| in the sense that the visual details of the scene are moved from
| the training data to the model weights. These kinds of models are
| meant to be trained & deployed in a single environment. What's
| interesting about this work is that learning the environment well
| has become really fast because the output dimension is smaller
| than image space. In fact, it's so fast that you can basically do
| it in real time... you turn on a data collection node and can
| train a model from scratch online, in a new environment that gets
| decent results with at least a little bit of interesting
| generalization in ~10min. I'm presenting a demonstration of this
| at Mobicom 2025 next month in Hong Kong.
|
| What people call "WiFi sensing" is now mostly CSI (channel state
| information) sensing. When you transmit a packet on many
| subcarriers (frequencies), the CSI represents how the data on
| each frequency changed during transmission. So, CSI is inherently
| quite sensitive to environmental changes.
|
| I want to point out something that most everybody working in the
| CSI sensing/general ISAC space seems to know: generalization is
| hard and most definitely unsolved for any reasonably high-
| dimensional sensing problem (like image generation and to some
| extent pose estimation). I see a lot of fearmongering online
| about wifi sensing killing privacy for good, but in my opinion
| we're still quite far off.
|
| I've made the project's code and some formatted data public since
| this paper is starting to pick up some attention:
| https://github.com/nishio-laboratory/latentcsi
| phh wrote:
| Is there a survey of SoTA of what can be achieved with CSI
| sensing you would recommend?
|
| What is available on the low level? Are researchers using SDR,
| or there are common wifi chips that properly report CSI? Do
| most people feed in CSI of literally every packet, or is it
| sampled?
| esrh wrote:
| I'd suggest reading
| https://dl.acm.org/doi/abs/10.1145/3310194 (2019) for a
| survey on early methods and https://arxiv.org/abs/2503.08008.
|
| As for low level:
|
| The most common early hardware was afaik esp32s &
| https://stevenmhernandez.github.io/ESP32-CSI-Tool/, and also
| old intel NICs &
| https://dhalperi.github.io/linux-80211n-csitool/.
|
| Now many people use https://ps.zpj.io/ which supports some
| hardware including SDRs, but I must discourage using it,
| especially for research, as it's not free software and has a
| restrictive license. I used https://feitcsi.kuskosoft.com/
| which uses a slightly modified iwlwifi driver, since iwlwifi
| needs to compute CSI anyway. There are free software
| alternatives for SDR CSI extraction as well; it's not hard to
| build an OFDM chain with GNUradio and extract CSI, although
| this might require a slightly more in-depth understanding of
| how wifi works.
| nashashmi wrote:
| Where is the color info coming from? It can't come from WiFi. Is
| that being fed in using a photo?
| brcmthrowaway wrote:
| So the applications of this work is.. surveillance. Why are there
| people working in this space?
| cracki wrote:
| So they trained a model on a handful of poses? OK cool.
|
| Any "unknown" state of the scene is bound to confuse it.
___________________________________________________________________
(page generated 2025-10-01 23:01 UTC)