[HN Gopher] Shape of Motion: 4D Reconstruction from a Single Video
       ___________________________________________________________________
        
       Shape of Motion: 4D Reconstruction from a Single Video
        
       Author : lnyan
       Score  : 108 points
       Date   : 2024-07-20 18:41 UTC (3 days ago)
        
 (HTM) web link (shape-of-motion.github.io)
 (TXT) w3m dump (shape-of-motion.github.io)
        
       | smusamashah wrote:
       | I was wondering how were they getting depth from a video where
       | camera is still.
       | 
       | > we utilize a comprehensive set of data-driven priors, including
       | monocular depth maps
       | 
       | > Our method relies on off-the-shelf methods, e.g., mono-depth
       | estimation, which can be incorrect.
        
         | Daub wrote:
         | Actually, of the examples they showed, all but one clip
         | featured both camera and in-camera motion. Granted not a lot of
         | the former, but according to my non-expert opinion, maybe
         | enough to construct a disparity map.
        
           | was_a_dev wrote:
           | I imagine having stereo video would also help generate a
           | depth map from disparity?
        
       | yieldcrv wrote:
       | One thing I liked about Team Ico (a studio behind the Shadow of
       | the Colossus, Ico, Last Guardian video games) was how the player
       | can move the camera just a little but during automated sequences
       | 
       | Getting that kind of look around in a video scene would be really
       | engaging. A bit different than VR or watching in The Sphere, with
       | the engagement being that there are still things right out of
       | view you have to pan the camera for
        
         | Daub wrote:
         | I agree. I think that this is similar to the appeal of old-
         | school stereoscopy.
        
         | lancesells wrote:
         | Haven't played the other games but Ico was incredible. It gave
         | me the same feeling as Another World which was maybe 10 years
         | prior.
        
         | latexr wrote:
         | > Getting that kind of look around in a video scene would be
         | really engaging.
         | 
         | It might be interesting for one or two movies specifically
         | built around the feature, but otherwise it would be a gimmick
         | no one would care for. For games, sure, but movies are a
         | different experience.
        
           | yieldcrv wrote:
           | maybe, but for the last half decade nearly my entire social
           | circle cannot sit still for movies and just won't go out of
           | their way to do it anymore. I'm big into cinema, but they are
           | not. Treating this like the fidget-spinning surrogate that a
           | large portion of the population relies upon could potentially
           | make it a hit for some viewing experiences. Its a thesis I
           | would pursue, for money, at least.
        
             | latexr wrote:
             | This would make it unbearable to watch a movie with anyone
             | else, so it doesn't really solve the social issue. But even
             | if you're watching it alone, it only really makes sense if
             | the movie itself takes advantage of it in some interesting
             | way, which starts to get into game territory. It wouldn't
             | even work for the most part: how do you deal with cuts and
             | changes of scenery? It makes no sense in the context of a
             | movie; what you're looking for is a game and we can already
             | do that.
             | 
             | Maybe you could have it work as a documentary (good luck
             | getting a bored social group to go for that) or a virtual
             | tour, but we already have 3D interactions of those too.
             | 
             | We've had tons of movie viewing experiments and ultimately
             | always go back to the tried and true 2D screen, with the
             | bolder ideas being relegated mostly to the domain of theme
             | park gimmicks. Which are interesting in their own right,
             | but don't survive on their own.
        
               | yieldcrv wrote:
               | yes, my main use case would be for fidgeters solo, just
               | like those Team ICO games
               | 
               | > how do you deal with cuts and changes of scenery?
               | 
               | the same way the games did it. by doing nothing special
               | at all and retaining the same functionality. it really
               | depends on how this 4D reconstruction works before I
               | could say it uniquely adversely affects the experience
               | 
               | for the most part what's interesting to me is that the
               | overhead costs seem low enough not to care about random
               | things big studios did at great expense with no way to
               | justify the market appeal. its either a portfolio piece
               | or 1,000 monhtly users supporting my lifestyle
               | indefinitely.
        
       | moritonal wrote:
       | Curiosity, what is the difference between 4D or 6DoF (six degrees
       | of freedom)? Sounds a lot like the 6DoF work that Lytro did back
       | in 2012, although this obviously is coming at the problem from
       | the other direction, generating it rather than capturing it.
        
         | moralestapia wrote:
         | Move in 3D space + rotate in 3D space, I think.
         | 
         | But w/ time should it be 7?
        
         | deckar01 wrote:
         | Lytro added 2 spatial dimensions of info to 2D image capture:
         | the angles the light was traveling at when it entered the
         | camera. They could simulate the image with different camera
         | parameters, which was good for changing depth of field after
         | the fact, but the occlusion information was limited by the
         | diameter of the aperture. They tried to make depth maps, but
         | that extra data was not a silver bullet. As far as I could
         | tell, they were still fundamentally COLMAPing, they just had
         | extra hints to guess with.
        
           | ryandamm wrote:
           | This is spot-on. Note that the aperture on the camera was
           | quite large, I want to say on the order of 100mm? They
           | sourced really exotic hardware for that cinema camera.
           | 
           | They also had the "Immerge," which was a ~1m diameter,
           | hexagonal array of 2D cameras. They got the 4D data from
           | having a 2D (spatially distributed) array of 2D samples (each
           | camera's views). It's under sampled, because they threw out
           | most of the light, but using 3D as a prior for reconstructing
           | missing rays is generally pretty effective.
           | 
           | But I also understand a lot of what they demoed at first was
           | smoke and mirrors, plus a lot of traditional 3D VFX
           | workflows. Still impressive as hell at the time, it's just
           | that the tech has progressed significantly since ~2018.
        
             | PaulHoule wrote:
             | I got as Lytro Illium off Ebay at a reasonable price but it
             | is a bit of a white elephant. I was hoping to shoot
             | stereograms but I haven't been able to do it with the stock
             | software (I just get two images that look the same with no
             | clear disparity)
             | 
             | I've seen open source software for plentopic images which
             | might be able to generate a point cloud but I've only
             | gotten one good shot of the Lytro which was similar to a
             | shot I took with this crazy lens
             | 
             | https://7artisans.store/products/7artisans-50mm-f0-95-large
             | -...
        
         | littlestymaar wrote:
         | The scene itself moves over time, hence the 4D. Vanilla
         | gaussian splating already give you 6 degrees of freedom since
         | you have a full 3D scene.
        
       | tizio13 wrote:
       | This reminds me of the description of Disneys(future movies) in
       | Cloud Atlas. The movie had a good visualization, this feels like
       | that.
        
         | mrmetanoia wrote:
         | I liked Cloud Atlas, I should watch it again. It was weird and
         | ambitious.
        
       | InDubioProRubio wrote:
       | Our children will be so weird out by blade runner. Not by the
       | zoom into the picture, but by the fact that the guy believes in
       | halucinated data.
        
         | jajko wrote:
         | Who says that the recording medium simply didn't have 1
         | petapixel resolution? Or its analog analog to stick with the
         | movie
        
       | blt wrote:
       | the first HyperNeRF cat video is quite interesting-looking and
       | surreal!
        
         | cooper_ganglia wrote:
         | He was having a meowt of body experience.
        
       | Geee wrote:
       | For 3D VR videos, this would be useful for adjusting IPD for
       | every person, rather than use the static IPD of the camera setup.
       | Also, allowing just a little bit of head movement would really
       | increase the immersiveness. I don't need to travel long distances
       | inside the video. If the video is already filmed with static
       | stereo setup, it would be even easier to reconstruct an accurate
       | 4D video limited to short travel distances without glaring
       | errors.
        
         | thomastjeffery wrote:
         | https://augmentedperception.github.io/deepviewvideo/
         | 
         | We've been waiting 4 years. I just don't understand what is
         | taking so long.
         | 
         | Even at a low resolution, the difference is night and day. Even
         | with a very small window, this is a leap forward for VR
         | immersion. Why in the hell is no one using it?!?
        
       | latexr wrote:
       | The results are impressive, but what makes this 4D? Where's the
       | extra dimension and how is it relevant to 3D human beings?
        
         | thomastjeffery wrote:
         | Time
        
           | cooper_ganglia wrote:
           | We are all 4-dimensional beings on this fine day.
        
             | philipov wrote:
             | We're 3+1 dimensional beings. Time doesn't have the same
             | metric as spacial dimensions, so you can't add them
             | together. You can't rotate a temporal object along the xt-
             | plane, for example, nor can you speak about an object's
             | length along the t-axis. The three spacial dimensions are
             | interchangeable, but time is special, so calling it 4D is
             | incorrect.
        
           | tantalor wrote:
           | The input videos already have that dimension, so that can't
           | be the answer.
        
             | thomastjeffery wrote:
             | I agree that it _shouldn 't_ be, but it is apparent that it
             | (redundantly) is.
        
           | latexr wrote:
           | By that logic all videos would be (at least) 3D. But no one
           | would take you seriously if you said that.
        
             | echelon wrote:
             | Videos are already 4D.
             | 
             | There's the 2D frame and the time dimension. Then there's
             | the structural information conveyed by motion, parallax,
             | scene composition, camera movement, etc.
             | 
             | That's why there's the 180 rule, amongst other things.
             | 
             | Algorithms can take a video and turn it into a 4D volume.
             | As can our brains.
        
         | netruk44 wrote:
         | The reconstruction is a 3-dimensional scene that has animation
         | contained in it.
         | 
         | You can move a virtual camera 3-dimensionally within the scene
         | at any individual frame (x, y, z), and also move the scene
         | through its animation to play the animation forwards and
         | backwards (in other words, you move the camera through the
         | 'time' axis).
        
           | latexr wrote:
           | > in other words, you move the camera through the 'time' axis
           | 
           | So, like the scrubber in _any_ video? Doesn't feel like that
           | warrants the 4D moniker. Which is not to say you're not
           | right, I think you are and that's what they mean, but it that
           | being the case it feels more buzzword than anything.
        
             | radicality wrote:
             | I think it means that, given a normal flat 2D video, you
             | get back that video but as a 3D scene, meaning you can move
             | and pan the camera around as the 3d video plays. And I
             | guess they call it 4D since you had a flat 2d video + time
             | dimension, so 3d video + time dimension = 4 dimensions.
        
         | aaroninsf wrote:
         | This work is about taking an input with 2 spatial dimensions,
         | plus 1 time dimension,
         | 
         | and synthesizing a (limited) model with 3 spatial dimensions,
         | plus 1 time dimension.
         | 
         | 3D over time is colloquially called "4D;" though we don't call
         | video "3D" by analogy as the term binds strongly to its purely
         | spatial use.
        
           | aaroninsf wrote:
           | Re: relevance, once of the prospective uses of work like this
           | is in conversion of "flat" conventional video into "spatial"
           | video, eg as popular on the Apple Vision Pro.
           | 
           | I've been interested in the state of the art in that domain
           | myself, having thousands of 2D videos I've shot which I would
           | love to see "spatialized" well, someday.
        
           | latexr wrote:
           | > 3D over time is colloquially called "4D;"
           | 
           | Colloquially, meaning "used in or characteristic of familiar
           | and informal conversation" 4D films have a definition, and
           | that ain't it. 3D over time is colloquially still referred as
           | 3D, as evidenced by decades of 3D blockbusters.
           | 
           | https://en.wikipedia.org/wiki/3D_film
           | 
           | https://en.wikipedia.org/wiki/4D_film
        
       | PaulHoule wrote:
       | Whenever I play a video game ( _Monster Hunter World_ comes to me
       | immediately) and see an establishing shot with moving camera
       | (like the ones demoed on their web site) I think the game really
       | wants to run in an a VR headset where you can walk around and see
       | different angles.
       | 
       | (Funny there is a VR mod for _Monster Hunter Rise_ which makes me
       | think just how fun _Monster Hunter VR_ would be)
        
       ___________________________________________________________________
       (page generated 2024-07-23 23:09 UTC)