[HN Gopher] CityGaussian: Real-time high-quality large-scale sce...
       ___________________________________________________________________
        
       CityGaussian: Real-time high-quality large-scale scene rendering
       with Gaussians
        
       Author : smusamashah
       Score  : 273 points
       Date   : 2024-04-02 16:46 UTC (6 hours ago)
        
 (HTM) web link (dekuliutesla.github.io)
 (TXT) w3m dump (dekuliutesla.github.io)
        
       | chpatrick wrote:
       | "The average speed is 36 FPS (tested on A100)."
       | 
       | Real-Time if you have $8k I guess.
        
         | RicoElectrico wrote:
         | "Two more papers down the line..." ;)
        
           | Fauntleroy wrote:
           | Indeed, this very much looks like what we'll likely see from
           | Google Earth within a decade--or perhaps half that.
        
             | mortenjorck wrote:
             | I've seen very impressive Gaussian splatting demos of more
             | limited urban geographies (a few city blocks) running on
             | consumer hardware, so the reason this requires research-
             | tier Nvidia hardware right now is probably down to LOD
             | streaming. More optimization on that front, and this could
             | plausibly come to Google Earth on current devices.
             | 
             | "What a time to be alive" indeed!
        
         | rallyforthesun wrote:
         | As it seems the first 3DGS which uses Lods and blocks, there
         | might be place for optimization. This might become useful for
         | use cases in Virtual Production, probably not for mobiles.
        
         | jsheard wrote:
         | Good ol' "SIGGRAPH realtime", when a graphics paper describes
         | itself as achieving realtime speeds you always have to double
         | check that they mean actually realtime and not "640x480 at
         | 20fps on the most expensive hardware money can buy". Anything
         | can be realtime if you set the bar low enough.
        
           | phkahler wrote:
           | >> Anything can be realtime if you set the bar low enough.
           | 
           | I was doing "realtime ray tracing" on Pentium class computers
           | in the 1990s. I took my toy ray tracer and made an OLE
           | control and put it inside a small Visual Basic app which
           | handled keypress-navigation. It could run in a tiny little
           | window (size of a large icon) at reasonable frame rates.
           | Might even say it was using Visual Basic! So yeah "realtime"
           | needs some qualifiers ;-)
        
             | TeMPOraL wrote:
             | Fair, but today it could probably run 30FPS full-screen at
             | 2K resolution, without any special effort, on an average
             | consumer-grade machine; better if ported to take advantage
             | of the GPU.
             | 
             | Moore's law may be dead in general, but computing power
             | still increases (notwithstanding the software bloat that
             | makes it seem otherwise), and it's still something to count
             | on wrt. bleeding edge research demos.
        
           | cchance wrote:
           | I mean A100's were cutting edge a year or so ago now we're at
           | what H200 and B200 or is it 300's like it may be a year or 2
           | more but the A100 speed will trickle down to the average
           | consumer as well.
        
             | TeMPOraL wrote:
             | And, from the other end, research demonstrations tend to
             | have a lot of low-hanging fruits wrt. optimization, which
             | will get picked if the result is interesting enough.
        
           | oivey wrote:
           | Depending on what you're doing, that really isn't a low bar.
           | Saying you can get decent performance on any hardware is the
           | first step.
        
             | PheonixPharts wrote:
             | > get decent performance
             | 
             | The issue is that in Computer Science "real-time" doesn't
             | just mean "pretty fast", it's a very specific definition of
             | performance[0]. Doing "real-time" computing is generally
             | considered _hard_ even for problems that are themselves not
             | too challenging, and involves potentially severe
             | consequences for missing a computational deadline.
             | 
             | Which leads to both confusion and a bit of frustration when
             | sub-fields of CS throw around the term as if it just means
             | "we don't have to wait a long time for it to render" or
             | "you can watch it happen".
             | 
             | [0] https://en.wikipedia.org/wiki/Real-time_computing
        
               | aleksiy123 wrote:
               | That link defines it in terms of simulation as well: "The
               | term "real-time" is also used in simulation to mean that
               | the simulation's clock runs at the same speed as a real
               | clock." and even states that was the original usage of
               | the term.
               | 
               | I think that pretty much meets the definition of "you can
               | watch it happen".
               | 
               | Essentially there is real-time systems and real-time
               | simulation. So it seems that they are using the term
               | correctly in the context of simulation.
        
           | mateo1 wrote:
           | It can be run real time. Might be 640x480 or 20 fps, but many
           | algorithms out there could never been run on an $10k graphics
           | card or even a computing cluster in real time.
        
           | VelesDude wrote:
           | Microsoft once set the bar for realtime as 640x480 @ 10fps.
           | But this was just for research purposes. You can make out
           | what it is trying to do and the update rate was JUST
           | acceptable enough to be interactive.
        
         | mywittyname wrote:
         | Presumably, this is can be used as the first stage in a
         | pipeline. Take the models and textures generated from source
         | data using this, cached it, and stream that data to clients for
         | local rendering.
         | 
         | Consumer GPUs are probably 2-3 generations out from being as
         | capable as an A100.
        
           | Legend2440 wrote:
           | There are no models or textures, it's just a point cloud of
           | color blobs.
           | 
           | You can convert it to a mesh, but in the process you'd lose
           | the quality and realism that makes it interesting.
        
         | littlestymaar wrote:
         | I chuckled a bit too when I saw it.
         | 
         | By the way, what's the compute power difference between an A100
         | and a 4090?
        
           | entropicdrifter wrote:
           | 4090 is faster in terms of compute, but the A100 has 40GB of
           | VRAM.
        
           | enlyth wrote:
           | I believe the main advantage of the A100 is the memory
           | bandwidth. Computationally the 4090 has a higher clock speed
           | and more CUDA cores, so in that way it is faster.
           | 
           | So for this specific application it really depends on where
           | the bottleneck is
        
         | pierotofy wrote:
         | A lot of 3DGS/Nerf research is like this unfortunately (ugh).
         | 
         | Check https://github.com/pierotofy/OpenSplat for something you
         | can run on your 10 year old laptop, even without a GPU! (I'm
         | the author)
        
           | somethingsome wrote:
           | I know, I don't get the fuzz either, I've coded real-time
           | gaussian splat renderers >7 years ago with LOD and they were
           | able to show any kind of point cloud.
           | 
           | They worked with a basic 970 GTX on a big 3d screen and also
           | on oculus dk2.
        
         | m463 wrote:
         | otoh I remember those old GPU benchmarks that ran at 10 fps
         | when they came out, then over time...
         | 
         | https://www.techpowerup.com/forums/attachments/all-cards-png...
        
         | datascienced wrote:
         | Just wait 2 years it'll be on your phone.
        
       | rallyforthesun wrote:
       | Really advanced approach to render larger scenes with
       | 3DGaussians, cant wait to test the code :-)
        
       | 999900000999 wrote:
       | Excited to see what license this is released under. Would love to
       | see some open source games using this.
        
         | jsheard wrote:
         | Performance aside, someone needs to figure out a generalizable
         | way to make the scenes dynamic before it will really be usable
         | for games. History is littered with alternatives to triangles
         | meshes that looked promising until we realised there's no
         | efficient way to animate them.
        
           | 999900000999 wrote:
           | Can you explain what a dynamic is ?
           | 
           | I was more thinking you'd run this tool, and then have an
           | algorithm convert it( bake the mesh).
        
             | lawlessone wrote:
             | They probably mean animated, changeable etc. Like movement,
             | or changes in lighting.
        
           | CuriouslyC wrote:
           | Even if this doesn't replace triangles everywhere, I'm
           | guessing it's still going to be the easiest way to generate a
           | large volume of static art assets, which means we will see
           | hybrid rendering pipelines.
        
             | jsheard wrote:
             | AIUI these algorithms currently bake all of the lighting
             | into the surface colors statically, which mostly works if
             | the entire scene is constructed as one giant blob where
             | nothing moves but if you wanted to render an individual
             | NeRF asset inside an otherwise standard triangle-based
             | pipeline then it would need to be more adaptable than that.
             | Even if the asset itself isn't animated it would need to
             | adapt to the local lighting at the bare minimum, which I
             | haven't seen anyone tackle yet, the focus has been on the
             | rendering-one-giant-static-blob problem.
             | 
             | For hybrid pipelines to work the splatting algorithm would
             | probably need to output the standard G-Buffer channels
             | (unlit surface color, normal, roughness, etc) which can
             | then go through the same lighting pass as the triangle-
             | based assets, rather than the splatting algorithm trying to
             | infer lighting by itself and inevitably getting a result
             | that's inconsistent with how the triangle-based assets are
             | lit.
             | 
             | Think of those old cartoons where you could always tell
             | when part of the scenery was going to move because the
             | animation cel would stick out like a sore thumb against the
             | painted background, that's the kind of illusion break you
             | would get if the lighting isn't consistent.
        
               | somethingsome wrote:
               | For NeRF this problems exists. However, in the past it
               | was already solved for gaussian splatting. Usually you
               | define a normal field over the (2D) splat, This allows
               | you to have phong shading at least.
               | 
               | It is not too difficult to go to a 2D normal field over
               | the 3D gaussians..
        
       | forrestthewoods wrote:
       | Can someone convince me that 3D gaussian splatting isn't a dead
       | end? It's an order of magnitude too slow to render and order of
       | magnitude too much data. It's like raster vs raytrace all over
       | again. Raster will always be faster than raytracing. So even if
       | raytracing gets 10x faster so too will raster.
       | 
       | I think generating traditional geometry and materials from
       | gaussian point clouds is maybe interesting. But photogrammetry
       | has already been a thing for quite awhile. Trying to render a
       | giant city in real time via splats doesn't feel like "the right
       | thing".
       | 
       | It's definitely cool and fun and exciting. I'm just not sure that
       | it will ever be useful in practice? Maybe! I'm definitely not an
       | expert so my question is genuine.
        
         | kfarr wrote:
         | Yes this has tons of potential. It's analogous but different to
         | patented techniques used by Unreal engine. Performance is not
         | the focus in most research at the moment. There isn't even
         | alignment on unified format with compression yet. The potential
         | for optimization is very clear and straightforward to adapt to
         | many devices, it's similar to point cloud LOD, mesh culling,
         | etc. Splat performance could be temporary competitive advantage
         | for viewers, but similar to video decompression and other 3d
         | standards that are made available via open source, it will
         | likely become commonplace in a few years to have high quality
         | high fps splat viewing on most devices as tablestakes. The next
         | question is what are the applications thereof.
        
         | Legend2440 wrote:
         | Nothing comes close to this for realism, it's like looking at a
         | photo.
         | 
         | Traditional photogrammetry really struggles with complicated
         | scenes, and reflective or transparent surfaces.
        
         | mschuetz wrote:
         | It's currently unparalleled when it comes to realism as in
         | realistic 3D reconstruction from the real world. Photogrammetry
         | only really works for nice surfacic data, whereas gaussian
         | splats work for semi-volumetric data such as fur, vegetation,
         | particles, rough surfaces, and also for glossy/specular
         | surfaces and volumes with strong subdivision surface
         | properties, or generally stuff with materials that are strongly
         | view-dependent.
        
         | rallyforthesun wrote:
         | In regards of contentproduction for virtual production, it is
         | quicker to capture a scene and process the images into a cloud
         | of 3d-gaussians, but on the other hand it is harder to edit the
         | scene after its shot. Also, the light is already captured and
         | baked into it. The tools to edit scenes will probably rely a
         | lot on ai, like delighting and change of settings. right now
         | there are just a few, the process is more like using knife to
         | cut out parts and remove floaters. You can replay this of
         | course with the unreal engine, but in the long term you could
         | run it in a browser. So in short, if you want to capture a
         | place as it is with all its tiny details, 3dgaussians are a
         | quicker and cheaper way to afford this than using modelling and
         | texturing.
        
         | maxglute wrote:
         | Hardware evolves with production in mind. If method saves 10x
         | time/labour even using 50x more expensive compute/tools then
         | industry will figure out way to optimize/amortize compute cost
         | on that task over time and eventually deseminate into consumer
         | hardware.
        
           | forrestthewoods wrote:
           | Maybe. That implies that hardware evolution strictly benefits
           | Bar and not Foo. But what has happened so far is that
           | hardware advancements to accelerate NewThing also accelerate
           | OldThing.
        
         | fngjdflmdflg wrote:
         | >But photogrammetry has already been a thing for quite awhile.
         | 
         | Current photogrammetry to my knowledge requires much more data
         | than NeRfs/Gaussian splatting. So this could be a way to get
         | more data for the "dumb" photogrammetry algorithms to work
         | with.
        
         | gmerc wrote:
         | It's not an order of magnitude slower. You can easily get
         | 200-400 fps in Unreal or Unity at the moment.
         | 
         | 100+FPS in browser? https://current-
         | exhibition.com/laboratorio31/
         | 
         | 900FPS? https://m-niemeyer.github.io/radsplat/
         | 
         | We have 3 decades worth of R&D in traditional engines, it'll
         | take a while for this to catch up in terms of tooling and
         | optimization but when you look where the papers come from (many
         | from Apple and Meta), you see that this is the technology
         | destined to power the MetaVerse/Spatial Compute era both
         | companies are pushing towards.
         | 
         | The ability to move content at incredibly low production costs
         | (iphone movie) into 3d environments is going to murder a lot of
         | R&D made in traditional methods.
        
           | araes wrote:
           | Don't know the hardware involved, yet that first link is most
           | definitely not 100 FPS on all hardware. Slideshow on the
           | current device.
        
           | 101008 wrote:
           | Does anyone know how the first link is made?
        
         | jerf wrote:
         | You have to ask about what it's a dead end for. It seems pretty
         | cool for the moral equivalent of fully 3D photographs. That's a
         | completely legitimate use case.
         | 
         | For 3D gaming engines? I struggle to see how the fundamental
         | primitive can be made to sing and dance in the way that they
         | demand. People will try, though. But from this perspective,
         | gaussians strike me more as a final render format than a useful
         | intermediate representation. If they are going to use gaussians
         | there's going to have to be something else invented to make
         | them practical to use for engines in the meantime, and there's
         | still an awful lot of questions there.
         | 
         | For other uses? Who knows.
         | 
         | But the world is not all 3D gaming and visual special effects.
        
         | pierotofy wrote:
         | Photogrammetry struggles with certain types of materials (e.g.
         | reflective surfaces). It's also very difficult to capture fine
         | details (thin structures, hair). 3DGS is very good at that. And
         | people are working on improving current shortcomings, including
         | methods to extract meshes that we could use in traditional
         | graphics pipelines.
        
           | somethingsome wrote:
           | 3DGS is absolutely not good with non Lambertian materials..
           | 
           | After testing it, if fails in very basic cases. And it is
           | normal that it fails, non Lambertian materials are not
           | reconstructed correctly with SfM methods.
        
         | peppertree wrote:
         | Mesh based photogrammetry is a dead end. GS or radiance field
         | representation is just getting started. Not just rendering but
         | potentially a highly compact way to store large 3D scenes.
        
           | forrestthewoods wrote:
           | > potentially a highly compact way to store large 3D scenes.
           | 
           | Is it? So far it seems like the storage size is massive and
           | the detail is unacceptably low up close.
           | 
           | Is there a demo that will make me go "holy crap I can't
           | believe how well this scene compressed"?
        
             | peppertree wrote:
             | Here is a paper if you are interested.
             | https://arxiv.org/pdf/2311.13681.pdf
             | 
             | The key is not to compress but to leverage the property of
             | neural radiance fields and optimize for entropy. I suspect
             | NERF can yield more compact storage since it's volumetric.
             | 
             | Not sure what you mean by "unacceptably low up close". Most
             | GS demos don't have LoD lol.
        
         | chankstein38 wrote:
         | I'll be honest, I don't have a ton of technical insights into
         | these but anecdotally, I found that using KIRI Engine's
         | Gaussian Splatting scans (versus Photogrammetry scans) the GS
         | scans were way more accurate and true to life and required a
         | lot less cleanup!
        
         | bodhiandphysics wrote:
         | Try animating a photogrammetric model! How about one that
         | changes its shape? You get awful geometry from
         | photogrammetry...
         | 
         | In practice the answer to will this be useful is yes!
         | Subdivision surfaces coexist with nurbs for different
         | applications.
        
         | jonas21 wrote:
         | How is it too slow? You can easily render scenes at 60fps in a
         | browser or on a mobile phone.
         | 
         | Heck, you can even _train_ one from scratch in a minute on an
         | iPhone [1].
         | 
         | This technique has been around for less than a year. It's only
         | going to get better.
         | 
         | [1] https://www.youtube.com/watch?v=nk0f4FTcdmM
        
           | mthoms wrote:
           | That's pretty cool. It's not clear if it's incorporating
           | Lidar data or not though. It's very impressive if not.
        
           | somethingsome wrote:
           | This technique exists from more than 10 years, and real time
           | renderers exist too from very long.
        
         | thfuran wrote:
         | >much data. It's like raster vs raytrace all over again. Raster
         | will always be faster than raytracing. So even if raytracing
         | gets 10x faster so too will raster.
         | 
         | And? It's always going to be even faster to not have lighting
         | at all.
        
       | kfarr wrote:
       | Not quite the same thing, but over the weekend I hacked google
       | maps 3d tiles (mesh) together with a gaussian splat and the
       | effect is pretty similar and effective:
       | 
       | Example 1 with code linked:
       | https://twitter.com/kfarr/status/1773934700878561396
       | 
       | Example 2
       | https://twitter.com/3dstreetapp/status/1775203540442697782
        
         | sbarre wrote:
         | This is super cool! Congrats on the PoC ...
        
         | cchance wrote:
         | Thats really cool is there a github with the code...
         | 
         | getting errors on that first link in devtools
         | 
         | Uncaught (in promise) Error: Failed to fetch resource
         | https://tile.googleapis.com/v1/3dti...
        
           | kfarr wrote:
           | Probably rate limited api calls given the hug of Twitter and
           | HN. Capped at 1k per day see
           | https://github.com/3DStreet/aframe-loader-3dtiles-component
           | 
           | Code is available via glitch url
        
         | aantix wrote:
         | Wow, amazing work!
        
         | aaroninsf wrote:
         | Are you on Bluesky?
         | 
         | Would love to follow. But not, you know, over there.
        
       | syrusakbary wrote:
       | Gaussian splatting is truly amazing for 3d reconstruction.
       | 
       | I can't wait to see once it's applied to the world of driverless
       | vehicles and AI!
        
       | jnsjjdk wrote:
       | This does not look significantly better then e.g. cities
       | skylines, especially since they neither zoomed in or out, always
       | showing only a very limited frame
       | 
       | Am I missing something?
        
         | dartos wrote:
         | This was rendered from photographs, I believe
        
         | neuronexmachina wrote:
         | This is a 3D reconstruction, rather than a game rendering.
        
         | cchance wrote:
         | LOL this isn't a game engine, its real life photos being
         | converted into gausian 3d views.
        
         | chankstein38 wrote:
         | All 3 of the other commenters are replying without having done
         | any actual thought or research. The paper repeatedly references
         | MatrixCity and another commenter above found this https://city-
         | super.github.io/matrixcity/ which also, I'd like to add, calls
         | out that it's fully Synthetic. And, from what I understand, is
         | extracted from Unreal Engine.
        
       | boywitharupee wrote:
       | what's the memory and compute requirements for this?
        
       | speps wrote:
       | Note that the dataset from the video is called Matrix city. It's
       | highly likely extracted from the Unreal Engine 5 Matrix demo
       | released a few years ago. The views look very similar, so it's
       | photorealistic but not from photos.
       | 
       | EDIT: here it is, and I was right! https://city-
       | super.github.io/matrixcity/
        
         | speps wrote:
         | Replying to myself with a question, as someone could have the
         | answer: Would it be possible to create the splats without the
         | training phase? If we have a fully modelled scene in Unreal
         | Engine for example (like Matrix city), you shouldn't need to
         | spend all the time training to recreate the data...
        
           | kfarr wrote:
           | Yes, and then it gets interesting to think about procedurally
           | generated splats, such as spawning a randomized distribution
           | of grass splats on a field for example
        
           | sorenjan wrote:
           | Yes, it's possible to create gaussian splats from a mesh. See
           | for example step 3 in SuGaR:
           | https://imagine.enpc.fr/~guedona/sugar/
        
             | fudged71 wrote:
             | Are you referring to the gaussian splat rasterizer?
        
               | sorenjan wrote:
               | I'm referring to using the modeled scene to bind gaussian
               | splats to an existing mesh.
               | 
               | > Binding New 3D Gaussians to the Mesh
               | 
               | > This binding strategy also makes possible the use of
               | traditional mesh-editing tools for editing a Gaussian
               | Splatting representation of a scene
        
           | fudged71 wrote:
           | I could be wrong, but being able to remove the step of
           | estimating the camera position would save a large amount of
           | time. You're still going to need to train on the images to
           | create the splats
        
           | somethingsome wrote:
           | Of course! And this was done many times in the past, probably
           | with better results than current deep learning based gaussian
           | splatting where they use way too many splats to render a
           | scene.
           | 
           | Basically the problem with sparse pictures and point clouds
           | in general is their lack of topology and not precise spatial
           | position. But when you already have the topology (eg with a
           | mesh), you can extract (optimally) a set of points and
           | compute the radius of the splats such that there are no holes
           | in the final image (and their color). That is usually done
           | with the curvature and the normal.
           | 
           | The 'optimally' part is difficult, an easier and faster
           | approach is just to do a greedy pass to select good enough
           | splats.
        
         | jsheard wrote:
         | Epic acquired the photogrammetry company Quixel a while ago, so
         | it's quite likely they used their photo-scanned asset library
         | when building the Matrix city. Funnily that would mean the OP
         | is doing reconstructions of reconstructions of real objects.
        
           | reactordev wrote:
           | Or just rendering it mixed with some splats, we don't know
           | because they didn't release their source code. I'm highly
           | skeptical of their claims, their dataset, and the fact that
           | it's trivial to export it into some other viewer to fake it.
        
         | ttmb wrote:
         | Not all of the videos are Matrix City, some are real places.
        
       | mhuffman wrote:
       | Quick question for anyone that may have more technical insight,
       | is Gaussian Splatting the technology that Unreal Engine has been
       | using to have such jaw dropping demos with their new releases?
        
       ___________________________________________________________________
       (page generated 2024-04-02 23:00 UTC)