[HN Gopher] CityGaussian: Real-time high-quality large-scale sce...
___________________________________________________________________
CityGaussian: Real-time high-quality large-scale scene rendering
with Gaussians
Author : smusamashah
Score : 273 points
Date : 2024-04-02 16:46 UTC (6 hours ago)
(HTM) web link (dekuliutesla.github.io)
(TXT) w3m dump (dekuliutesla.github.io)
| chpatrick wrote:
| "The average speed is 36 FPS (tested on A100)."
|
| Real-Time if you have $8k I guess.
| RicoElectrico wrote:
| "Two more papers down the line..." ;)
| Fauntleroy wrote:
| Indeed, this very much looks like what we'll likely see from
| Google Earth within a decade--or perhaps half that.
| mortenjorck wrote:
| I've seen very impressive Gaussian splatting demos of more
| limited urban geographies (a few city blocks) running on
| consumer hardware, so the reason this requires research-
| tier Nvidia hardware right now is probably down to LOD
| streaming. More optimization on that front, and this could
| plausibly come to Google Earth on current devices.
|
| "What a time to be alive" indeed!
| rallyforthesun wrote:
| As it seems the first 3DGS which uses Lods and blocks, there
| might be place for optimization. This might become useful for
| use cases in Virtual Production, probably not for mobiles.
| jsheard wrote:
| Good ol' "SIGGRAPH realtime", when a graphics paper describes
| itself as achieving realtime speeds you always have to double
| check that they mean actually realtime and not "640x480 at
| 20fps on the most expensive hardware money can buy". Anything
| can be realtime if you set the bar low enough.
| phkahler wrote:
| >> Anything can be realtime if you set the bar low enough.
|
| I was doing "realtime ray tracing" on Pentium class computers
| in the 1990s. I took my toy ray tracer and made an OLE
| control and put it inside a small Visual Basic app which
| handled keypress-navigation. It could run in a tiny little
| window (size of a large icon) at reasonable frame rates.
| Might even say it was using Visual Basic! So yeah "realtime"
| needs some qualifiers ;-)
| TeMPOraL wrote:
| Fair, but today it could probably run 30FPS full-screen at
| 2K resolution, without any special effort, on an average
| consumer-grade machine; better if ported to take advantage
| of the GPU.
|
| Moore's law may be dead in general, but computing power
| still increases (notwithstanding the software bloat that
| makes it seem otherwise), and it's still something to count
| on wrt. bleeding edge research demos.
| cchance wrote:
| I mean A100's were cutting edge a year or so ago now we're at
| what H200 and B200 or is it 300's like it may be a year or 2
| more but the A100 speed will trickle down to the average
| consumer as well.
| TeMPOraL wrote:
| And, from the other end, research demonstrations tend to
| have a lot of low-hanging fruits wrt. optimization, which
| will get picked if the result is interesting enough.
| oivey wrote:
| Depending on what you're doing, that really isn't a low bar.
| Saying you can get decent performance on any hardware is the
| first step.
| PheonixPharts wrote:
| > get decent performance
|
| The issue is that in Computer Science "real-time" doesn't
| just mean "pretty fast", it's a very specific definition of
| performance[0]. Doing "real-time" computing is generally
| considered _hard_ even for problems that are themselves not
| too challenging, and involves potentially severe
| consequences for missing a computational deadline.
|
| Which leads to both confusion and a bit of frustration when
| sub-fields of CS throw around the term as if it just means
| "we don't have to wait a long time for it to render" or
| "you can watch it happen".
|
| [0] https://en.wikipedia.org/wiki/Real-time_computing
| aleksiy123 wrote:
| That link defines it in terms of simulation as well: "The
| term "real-time" is also used in simulation to mean that
| the simulation's clock runs at the same speed as a real
| clock." and even states that was the original usage of
| the term.
|
| I think that pretty much meets the definition of "you can
| watch it happen".
|
| Essentially there is real-time systems and real-time
| simulation. So it seems that they are using the term
| correctly in the context of simulation.
| mateo1 wrote:
| It can be run real time. Might be 640x480 or 20 fps, but many
| algorithms out there could never been run on an $10k graphics
| card or even a computing cluster in real time.
| VelesDude wrote:
| Microsoft once set the bar for realtime as 640x480 @ 10fps.
| But this was just for research purposes. You can make out
| what it is trying to do and the update rate was JUST
| acceptable enough to be interactive.
| mywittyname wrote:
| Presumably, this is can be used as the first stage in a
| pipeline. Take the models and textures generated from source
| data using this, cached it, and stream that data to clients for
| local rendering.
|
| Consumer GPUs are probably 2-3 generations out from being as
| capable as an A100.
| Legend2440 wrote:
| There are no models or textures, it's just a point cloud of
| color blobs.
|
| You can convert it to a mesh, but in the process you'd lose
| the quality and realism that makes it interesting.
| littlestymaar wrote:
| I chuckled a bit too when I saw it.
|
| By the way, what's the compute power difference between an A100
| and a 4090?
| entropicdrifter wrote:
| 4090 is faster in terms of compute, but the A100 has 40GB of
| VRAM.
| enlyth wrote:
| I believe the main advantage of the A100 is the memory
| bandwidth. Computationally the 4090 has a higher clock speed
| and more CUDA cores, so in that way it is faster.
|
| So for this specific application it really depends on where
| the bottleneck is
| pierotofy wrote:
| A lot of 3DGS/Nerf research is like this unfortunately (ugh).
|
| Check https://github.com/pierotofy/OpenSplat for something you
| can run on your 10 year old laptop, even without a GPU! (I'm
| the author)
| somethingsome wrote:
| I know, I don't get the fuzz either, I've coded real-time
| gaussian splat renderers >7 years ago with LOD and they were
| able to show any kind of point cloud.
|
| They worked with a basic 970 GTX on a big 3d screen and also
| on oculus dk2.
| m463 wrote:
| otoh I remember those old GPU benchmarks that ran at 10 fps
| when they came out, then over time...
|
| https://www.techpowerup.com/forums/attachments/all-cards-png...
| datascienced wrote:
| Just wait 2 years it'll be on your phone.
| rallyforthesun wrote:
| Really advanced approach to render larger scenes with
| 3DGaussians, cant wait to test the code :-)
| 999900000999 wrote:
| Excited to see what license this is released under. Would love to
| see some open source games using this.
| jsheard wrote:
| Performance aside, someone needs to figure out a generalizable
| way to make the scenes dynamic before it will really be usable
| for games. History is littered with alternatives to triangles
| meshes that looked promising until we realised there's no
| efficient way to animate them.
| 999900000999 wrote:
| Can you explain what a dynamic is ?
|
| I was more thinking you'd run this tool, and then have an
| algorithm convert it( bake the mesh).
| lawlessone wrote:
| They probably mean animated, changeable etc. Like movement,
| or changes in lighting.
| CuriouslyC wrote:
| Even if this doesn't replace triangles everywhere, I'm
| guessing it's still going to be the easiest way to generate a
| large volume of static art assets, which means we will see
| hybrid rendering pipelines.
| jsheard wrote:
| AIUI these algorithms currently bake all of the lighting
| into the surface colors statically, which mostly works if
| the entire scene is constructed as one giant blob where
| nothing moves but if you wanted to render an individual
| NeRF asset inside an otherwise standard triangle-based
| pipeline then it would need to be more adaptable than that.
| Even if the asset itself isn't animated it would need to
| adapt to the local lighting at the bare minimum, which I
| haven't seen anyone tackle yet, the focus has been on the
| rendering-one-giant-static-blob problem.
|
| For hybrid pipelines to work the splatting algorithm would
| probably need to output the standard G-Buffer channels
| (unlit surface color, normal, roughness, etc) which can
| then go through the same lighting pass as the triangle-
| based assets, rather than the splatting algorithm trying to
| infer lighting by itself and inevitably getting a result
| that's inconsistent with how the triangle-based assets are
| lit.
|
| Think of those old cartoons where you could always tell
| when part of the scenery was going to move because the
| animation cel would stick out like a sore thumb against the
| painted background, that's the kind of illusion break you
| would get if the lighting isn't consistent.
| somethingsome wrote:
| For NeRF this problems exists. However, in the past it
| was already solved for gaussian splatting. Usually you
| define a normal field over the (2D) splat, This allows
| you to have phong shading at least.
|
| It is not too difficult to go to a 2D normal field over
| the 3D gaussians..
| forrestthewoods wrote:
| Can someone convince me that 3D gaussian splatting isn't a dead
| end? It's an order of magnitude too slow to render and order of
| magnitude too much data. It's like raster vs raytrace all over
| again. Raster will always be faster than raytracing. So even if
| raytracing gets 10x faster so too will raster.
|
| I think generating traditional geometry and materials from
| gaussian point clouds is maybe interesting. But photogrammetry
| has already been a thing for quite awhile. Trying to render a
| giant city in real time via splats doesn't feel like "the right
| thing".
|
| It's definitely cool and fun and exciting. I'm just not sure that
| it will ever be useful in practice? Maybe! I'm definitely not an
| expert so my question is genuine.
| kfarr wrote:
| Yes this has tons of potential. It's analogous but different to
| patented techniques used by Unreal engine. Performance is not
| the focus in most research at the moment. There isn't even
| alignment on unified format with compression yet. The potential
| for optimization is very clear and straightforward to adapt to
| many devices, it's similar to point cloud LOD, mesh culling,
| etc. Splat performance could be temporary competitive advantage
| for viewers, but similar to video decompression and other 3d
| standards that are made available via open source, it will
| likely become commonplace in a few years to have high quality
| high fps splat viewing on most devices as tablestakes. The next
| question is what are the applications thereof.
| Legend2440 wrote:
| Nothing comes close to this for realism, it's like looking at a
| photo.
|
| Traditional photogrammetry really struggles with complicated
| scenes, and reflective or transparent surfaces.
| mschuetz wrote:
| It's currently unparalleled when it comes to realism as in
| realistic 3D reconstruction from the real world. Photogrammetry
| only really works for nice surfacic data, whereas gaussian
| splats work for semi-volumetric data such as fur, vegetation,
| particles, rough surfaces, and also for glossy/specular
| surfaces and volumes with strong subdivision surface
| properties, or generally stuff with materials that are strongly
| view-dependent.
| rallyforthesun wrote:
| In regards of contentproduction for virtual production, it is
| quicker to capture a scene and process the images into a cloud
| of 3d-gaussians, but on the other hand it is harder to edit the
| scene after its shot. Also, the light is already captured and
| baked into it. The tools to edit scenes will probably rely a
| lot on ai, like delighting and change of settings. right now
| there are just a few, the process is more like using knife to
| cut out parts and remove floaters. You can replay this of
| course with the unreal engine, but in the long term you could
| run it in a browser. So in short, if you want to capture a
| place as it is with all its tiny details, 3dgaussians are a
| quicker and cheaper way to afford this than using modelling and
| texturing.
| maxglute wrote:
| Hardware evolves with production in mind. If method saves 10x
| time/labour even using 50x more expensive compute/tools then
| industry will figure out way to optimize/amortize compute cost
| on that task over time and eventually deseminate into consumer
| hardware.
| forrestthewoods wrote:
| Maybe. That implies that hardware evolution strictly benefits
| Bar and not Foo. But what has happened so far is that
| hardware advancements to accelerate NewThing also accelerate
| OldThing.
| fngjdflmdflg wrote:
| >But photogrammetry has already been a thing for quite awhile.
|
| Current photogrammetry to my knowledge requires much more data
| than NeRfs/Gaussian splatting. So this could be a way to get
| more data for the "dumb" photogrammetry algorithms to work
| with.
| gmerc wrote:
| It's not an order of magnitude slower. You can easily get
| 200-400 fps in Unreal or Unity at the moment.
|
| 100+FPS in browser? https://current-
| exhibition.com/laboratorio31/
|
| 900FPS? https://m-niemeyer.github.io/radsplat/
|
| We have 3 decades worth of R&D in traditional engines, it'll
| take a while for this to catch up in terms of tooling and
| optimization but when you look where the papers come from (many
| from Apple and Meta), you see that this is the technology
| destined to power the MetaVerse/Spatial Compute era both
| companies are pushing towards.
|
| The ability to move content at incredibly low production costs
| (iphone movie) into 3d environments is going to murder a lot of
| R&D made in traditional methods.
| araes wrote:
| Don't know the hardware involved, yet that first link is most
| definitely not 100 FPS on all hardware. Slideshow on the
| current device.
| 101008 wrote:
| Does anyone know how the first link is made?
| jerf wrote:
| You have to ask about what it's a dead end for. It seems pretty
| cool for the moral equivalent of fully 3D photographs. That's a
| completely legitimate use case.
|
| For 3D gaming engines? I struggle to see how the fundamental
| primitive can be made to sing and dance in the way that they
| demand. People will try, though. But from this perspective,
| gaussians strike me more as a final render format than a useful
| intermediate representation. If they are going to use gaussians
| there's going to have to be something else invented to make
| them practical to use for engines in the meantime, and there's
| still an awful lot of questions there.
|
| For other uses? Who knows.
|
| But the world is not all 3D gaming and visual special effects.
| pierotofy wrote:
| Photogrammetry struggles with certain types of materials (e.g.
| reflective surfaces). It's also very difficult to capture fine
| details (thin structures, hair). 3DGS is very good at that. And
| people are working on improving current shortcomings, including
| methods to extract meshes that we could use in traditional
| graphics pipelines.
| somethingsome wrote:
| 3DGS is absolutely not good with non Lambertian materials..
|
| After testing it, if fails in very basic cases. And it is
| normal that it fails, non Lambertian materials are not
| reconstructed correctly with SfM methods.
| peppertree wrote:
| Mesh based photogrammetry is a dead end. GS or radiance field
| representation is just getting started. Not just rendering but
| potentially a highly compact way to store large 3D scenes.
| forrestthewoods wrote:
| > potentially a highly compact way to store large 3D scenes.
|
| Is it? So far it seems like the storage size is massive and
| the detail is unacceptably low up close.
|
| Is there a demo that will make me go "holy crap I can't
| believe how well this scene compressed"?
| peppertree wrote:
| Here is a paper if you are interested.
| https://arxiv.org/pdf/2311.13681.pdf
|
| The key is not to compress but to leverage the property of
| neural radiance fields and optimize for entropy. I suspect
| NERF can yield more compact storage since it's volumetric.
|
| Not sure what you mean by "unacceptably low up close". Most
| GS demos don't have LoD lol.
| chankstein38 wrote:
| I'll be honest, I don't have a ton of technical insights into
| these but anecdotally, I found that using KIRI Engine's
| Gaussian Splatting scans (versus Photogrammetry scans) the GS
| scans were way more accurate and true to life and required a
| lot less cleanup!
| bodhiandphysics wrote:
| Try animating a photogrammetric model! How about one that
| changes its shape? You get awful geometry from
| photogrammetry...
|
| In practice the answer to will this be useful is yes!
| Subdivision surfaces coexist with nurbs for different
| applications.
| jonas21 wrote:
| How is it too slow? You can easily render scenes at 60fps in a
| browser or on a mobile phone.
|
| Heck, you can even _train_ one from scratch in a minute on an
| iPhone [1].
|
| This technique has been around for less than a year. It's only
| going to get better.
|
| [1] https://www.youtube.com/watch?v=nk0f4FTcdmM
| mthoms wrote:
| That's pretty cool. It's not clear if it's incorporating
| Lidar data or not though. It's very impressive if not.
| somethingsome wrote:
| This technique exists from more than 10 years, and real time
| renderers exist too from very long.
| thfuran wrote:
| >much data. It's like raster vs raytrace all over again. Raster
| will always be faster than raytracing. So even if raytracing
| gets 10x faster so too will raster.
|
| And? It's always going to be even faster to not have lighting
| at all.
| kfarr wrote:
| Not quite the same thing, but over the weekend I hacked google
| maps 3d tiles (mesh) together with a gaussian splat and the
| effect is pretty similar and effective:
|
| Example 1 with code linked:
| https://twitter.com/kfarr/status/1773934700878561396
|
| Example 2
| https://twitter.com/3dstreetapp/status/1775203540442697782
| sbarre wrote:
| This is super cool! Congrats on the PoC ...
| cchance wrote:
| Thats really cool is there a github with the code...
|
| getting errors on that first link in devtools
|
| Uncaught (in promise) Error: Failed to fetch resource
| https://tile.googleapis.com/v1/3dti...
| kfarr wrote:
| Probably rate limited api calls given the hug of Twitter and
| HN. Capped at 1k per day see
| https://github.com/3DStreet/aframe-loader-3dtiles-component
|
| Code is available via glitch url
| aantix wrote:
| Wow, amazing work!
| aaroninsf wrote:
| Are you on Bluesky?
|
| Would love to follow. But not, you know, over there.
| syrusakbary wrote:
| Gaussian splatting is truly amazing for 3d reconstruction.
|
| I can't wait to see once it's applied to the world of driverless
| vehicles and AI!
| jnsjjdk wrote:
| This does not look significantly better then e.g. cities
| skylines, especially since they neither zoomed in or out, always
| showing only a very limited frame
|
| Am I missing something?
| dartos wrote:
| This was rendered from photographs, I believe
| neuronexmachina wrote:
| This is a 3D reconstruction, rather than a game rendering.
| cchance wrote:
| LOL this isn't a game engine, its real life photos being
| converted into gausian 3d views.
| chankstein38 wrote:
| All 3 of the other commenters are replying without having done
| any actual thought or research. The paper repeatedly references
| MatrixCity and another commenter above found this https://city-
| super.github.io/matrixcity/ which also, I'd like to add, calls
| out that it's fully Synthetic. And, from what I understand, is
| extracted from Unreal Engine.
| boywitharupee wrote:
| what's the memory and compute requirements for this?
| speps wrote:
| Note that the dataset from the video is called Matrix city. It's
| highly likely extracted from the Unreal Engine 5 Matrix demo
| released a few years ago. The views look very similar, so it's
| photorealistic but not from photos.
|
| EDIT: here it is, and I was right! https://city-
| super.github.io/matrixcity/
| speps wrote:
| Replying to myself with a question, as someone could have the
| answer: Would it be possible to create the splats without the
| training phase? If we have a fully modelled scene in Unreal
| Engine for example (like Matrix city), you shouldn't need to
| spend all the time training to recreate the data...
| kfarr wrote:
| Yes, and then it gets interesting to think about procedurally
| generated splats, such as spawning a randomized distribution
| of grass splats on a field for example
| sorenjan wrote:
| Yes, it's possible to create gaussian splats from a mesh. See
| for example step 3 in SuGaR:
| https://imagine.enpc.fr/~guedona/sugar/
| fudged71 wrote:
| Are you referring to the gaussian splat rasterizer?
| sorenjan wrote:
| I'm referring to using the modeled scene to bind gaussian
| splats to an existing mesh.
|
| > Binding New 3D Gaussians to the Mesh
|
| > This binding strategy also makes possible the use of
| traditional mesh-editing tools for editing a Gaussian
| Splatting representation of a scene
| fudged71 wrote:
| I could be wrong, but being able to remove the step of
| estimating the camera position would save a large amount of
| time. You're still going to need to train on the images to
| create the splats
| somethingsome wrote:
| Of course! And this was done many times in the past, probably
| with better results than current deep learning based gaussian
| splatting where they use way too many splats to render a
| scene.
|
| Basically the problem with sparse pictures and point clouds
| in general is their lack of topology and not precise spatial
| position. But when you already have the topology (eg with a
| mesh), you can extract (optimally) a set of points and
| compute the radius of the splats such that there are no holes
| in the final image (and their color). That is usually done
| with the curvature and the normal.
|
| The 'optimally' part is difficult, an easier and faster
| approach is just to do a greedy pass to select good enough
| splats.
| jsheard wrote:
| Epic acquired the photogrammetry company Quixel a while ago, so
| it's quite likely they used their photo-scanned asset library
| when building the Matrix city. Funnily that would mean the OP
| is doing reconstructions of reconstructions of real objects.
| reactordev wrote:
| Or just rendering it mixed with some splats, we don't know
| because they didn't release their source code. I'm highly
| skeptical of their claims, their dataset, and the fact that
| it's trivial to export it into some other viewer to fake it.
| ttmb wrote:
| Not all of the videos are Matrix City, some are real places.
| mhuffman wrote:
| Quick question for anyone that may have more technical insight,
| is Gaussian Splatting the technology that Unreal Engine has been
| using to have such jaw dropping demos with their new releases?
___________________________________________________________________
(page generated 2024-04-02 23:00 UTC)