[HN Gopher] Deleting multiplayer from the Unreal engine can save...
___________________________________________________________________
Deleting multiplayer from the Unreal engine can save memory
Author : mariuz
Score : 150 points
Date : 2025-04-06 10:23 UTC (2 days ago)
(HTM) web link (larstofus.com)
(TXT) w3m dump (larstofus.com)
| bengarney wrote:
| Really interesting analysis of where the data lives... cutting
| 3-4 textures would save you more memory even in the 100k actor
| case, though.
| reitzensteinm wrote:
| Depending on a bunch of factors of how this data is accessed
| and actors are laid out in memory, it may be more cache
| friendly which could yield substantial speedups.
|
| Or it could do next to nothing, as the data is multiple cache
| lines long anyway.
| bengarney wrote:
| I would not expect much, but you'd have to measure to be
| sure.
|
| If you actually have a million of something you're better off
| writing a custom manager thing to handle the bulk of the work
| anyway. For instance, if you're doing a brick building game
| where users might place a million bricks - maybe you want
| each brick to be an Actor for certain use cases, but you'd
| want to centralize all the collision, rendering, update
| logic. (This is what I did on a project with this exact use
| case and it worked nicely.)
| cma wrote:
| If the memory savings he got were fully read or fragmented with
| other stuff on cache lines that are read in every frame (not
| likely for static world actors), it could be ~10% of CPU memory
| bandwidth on mobile every frame at 120hz on an lpddr4 phone.
|
| A big problem with them is they are so heavyweight you can only
| spawn a few per frame before causing hitches and have to have
| pools or instancing to manage things like bullets.
|
| I think in their Robo Recall talk they found they could only
| spawn 10-20 projectile style bullets per frame before running
| into hitches, and switched to pools and recycling them.
| teamonkey wrote:
| Pooling is pretty standard practice though, it would be the
| go-to solution for any experienced gameplay programmer when
| dealing with more than a dozen entities (though annoyingly
| there isn't a standardised way of doing it in Blueprint).
| dijit wrote:
| To be completely fair though, blueprints themselves are
| oft-maligned for performance.
|
| They're fantastic for prototyping, but once you have
| designed some kind of hot-path most people typically start
| converting blueprints to code as an optimisation.
|
| In such a scenario adding pooling becomes a trivial part of
| such an effort.
| cma wrote:
| Standard practice, but it bit Epic by surprise. You
| wouldn't think it would be needed at such small numbers.
| You wouldn't automatically think it would be needed on
| 3+ghz machines.
| Pxtl wrote:
| I've never played with UE and so I'm kinda shocked to learn
| that there isn't pooling already for objects that have this
| kind of creation cost.
| spockz wrote:
| Besides the memory savings, would there also be performance
| gains? More actors would now fit in a cache line.
| cma wrote:
| The actors in unreal are such bloated single inheritance god
| classes that with a few actor components they take up more like
| a 4K memory page not part of a cache line, especially in editor
| builds.
|
| But they do have a more optimized entity component system now
| too.
|
| To be fair, a single transform now that things are 64 bit
| coordinates I think is bigger than a cache line too.
| larstofus wrote:
| I haven't profiled this specifically, but my guess is that
| there shouldn't be any measurable performance gains. Most of
| the time, actors are randomly scattered in memory anyway, so
| having smaller actors doesn't avoid a lot of cache misses.
| out-of-ideas wrote:
| is it me or does the side loop endlessly loading thousands of
| requests? to at least:
|
| - https://public-api.wordpress.com/wp-admin/rest-proxy/#https:...
|
| - https://s0.wp.com/wp-content/js/rlt-proxy.js?m=20240709
|
| blocking these via regex made the page load up really nice and
| fast
|
| edit: formatting
| keyle wrote:
| Would this have any effect in prod build though?
| teamonkey wrote:
| That was my immediate thought on reading the article. Is this
| in dev or prod builds and which UE version is it referring to?
|
| (Not that I expect the UActor code to have changed much but
| modifying UActor seemed more common in the early 4.x days.)
| larstofus wrote:
| Oh, good point. The test was done in the current stable 5.5
| release in a development configuration :) Since this is a
| change to the memory layout of the class, there should be no
| change due to the build configuration, though.
| mwkaufma wrote:
| On ABZU we ripped out multiplayer and lots of other cruft from
| AActor and UPrimitiveComponent - dropping builtin overlap events,
| which are a kind of anti pattern anyway, not only saved RAM but
| cut a lot of ghost reads/writes.
| rc5150 wrote:
| Whoa! Didn't picture myself seeing a dev who worked on Abzu in
| the wild here on HN--I very much enjoyed that game, my thanks
| and high fives to your team for your work!
|
| I'm having a kid-in-the-tunnel-meeting-Mean-Joe-Green-in-the-
| commercial moment, I just started my own game development
| journey about a week ago so it's neat getting to run across a
| full-on developer!
|
| To stay on topic, I often thought how cool Abzu would have been
| with multiplayer but it's a good lesson to me that some
| features that might be desirable might also be a hindrance to
| some degree.
|
| Okay, enough fanboying!
| jayd16 wrote:
| What makes overlap events an anti-pattern?
| mwkaufma wrote:
| In principle it's a fine idea, but their implementation has
| so many footguns (race conditions, thundering herds, perf
| cliffs, etc) it was easier to impl your own simpler
| alternative.
| thaumasiotes wrote:
| Your presence inspired me to try to look up what the circumflex
| on "abzu" is supposed to signify. As best I can tell, it's a
| marker of vowel length.
|
| I wonder how that came to be used. It's a traditional way to
| distinguish eta and omega in transliteration from Greek, but
| it's not at all a traditional way to mark long vowels in
| general.
|
| (I see that wikipedia says this about Akkadian:
|
| > Long vowels are transliterated with a macron (a, e, i, u) or
| a circumflex (a, e, i, u), the latter being used for long
| vowels arising from the contraction of vowels in hiatus.
|
| But it seems odd for an independent root to contain a
| contracted double vowel. And the page "Abzu" has the circumflex
| on the Sumerian transliteration too.)
| stavros wrote:
| "Abzu" is also the Greek onomatopoeia for a sneeze.
| thaumasiotes wrote:
| I was highly amused to learn that the ancient Greek verb
| for spitting is ptuo. (Compare English "ptooey".)
| stavros wrote:
| Yeah, and the modern is much the same ("phtuno"). I'm
| sure it's onomatopoeic, and it's an amusing word.
| jorvi wrote:
| It's funny how I can sort-of read the Dutch word in
| "tun": spit "tuf" and spitting "tuffen". I can't find the
| etymology of it so it might be a false cognate
|
| If it isn't a false cognate, I wonder what the function
| of "ph" and "o" are..
| stavros wrote:
| It is a false cognafe, it's not pronounced "oo", but
| "ee". It's just onomatopoeia, that's why it's so similar.
| jorvi wrote:
| We pronounce it t-uh-f, as in "tough" without the o.
| thaumasiotes wrote:
| Note that n is an N, not a V.
| thaumasiotes wrote:
| Mandarin is Tu /tku/.
| pandemic_region wrote:
| Does the popular 'hawk' prefix for this also originate
| from ancient Greek ?
| mananaysiempre wrote:
| Seems to be an older convention in linguistics. Romanizations
| of Japanese also switched from circumflexes (Tokyo) to
| macrons (Tokyo) at some point in time fairly long ago--I
| think the English-language Japanese journal I saw using that
| convention systematically was from the late 1950s, and its
| recent issues definitely don't use it.
|
| Perhaps a circumflex was easier to typeset, like with
| logicians switching from A to !A and the Chomskyan school in
| linguistics switching from X-bar and X-double-bar to X' and
| XP?
| larstofus wrote:
| Oh, nice to see that there are real-life examples of this
| stuff, thank you very much :) Needless to say that I'll take a
| deeper look at the overlaps now :D
| pwdisswordfishz wrote:
| What's a ghost read? Search engines are failing me.
| dagmx wrote:
| Unnecessary reads that you can't really control or observe
| well.
| mwkaufma wrote:
| Unused memory accesses thrashing cache
| Veliladon wrote:
| There's two main things. First of all, when you load data
| into a register from an address in memory the CPU loads
| 64-byte cache lines, not words or bytes. AActor for instance
| is 1016 bytes. 16 cache lines. It's freaking huge.
|
| So let's say you're going through all the actors and updating
| one thing. If those actors are in an array it's easy. Just a
| for loop, update the member variable, done. Easy, fast,
| should be performant right? But each time you're updating one
| of those the prefetcher is also bringing in extra lines, more
| data in the object, thinking you might need them next. So if
| you're only updating a single thing or a couple of things in
| the object on different cache lines you might really bring in
| 3-8x the data you actually need.
|
| CPU prefetchers have something called stride detectors which
| can detect access patterns of N number of steps and stop the
| prefetcher from grabbing additional lines but at 16 cache
| lines the AActor object is way too big for the stride
| detector to keep up with. So you stride in gaps of 16 cache
| lines at a time through memory and you still get 2-3 extra
| cache lines after the initial access.
|
| Secondly, a 1016 byte object just doesn't fit. It's word
| aligned but it's not cache line aligned and it's sure as hell
| not page aligned.
|
| Best case scenario if you're updating two variables next to
| each other in memory the prefetcher gets both on the same
| cache line. Medium case scenario, the prefetcher has to grab
| the next line every so often. You'll get best most often and
| medium rarely.
|
| Bad case scenario, the prefetcher has to grab the next cache
| line on the NEXT PAGE. Which only just became a thing on
| recent CPUs but also involves translating the virtual address
| of the next page to its physical page address which takes
| forever in data access terms. Bunch of pointer chasing,
| basically a few thousand clock of waiting.
|
| The absolute worst case scenario is that the prefetcher
| thinks you need the next cache line, it's on the next page,
| it does the rigamarole of translating the next page's virtual
| address and you don't actually need it. You've done two
| orders of magnitude more work than reading a single variable
| for literally nothing.
|
| So yeah. The prefetcher can do some weird ass shit when you
| throw weird and massive data structs at it. Slashing and
| burning the size down helps because the stride detector can
| start functioning again when the object is small enough. If
| it can be kept to a multiple of 64 bytes you even get page
| aligned again.
| joshyeager wrote:
| Thank you for ABZU! My daughter has played it at least ten
| times. And when she wrote a letter to your team for a school
| project, you sent back a t-shirt and a soundtrack CD. We've
| listened to that CD for hours on road trips, it is a great
| soundtrack.
| joegibbs wrote:
| You would probably want to avoid having tens of thousands or a
| hundred thousand actors though, they're pretty heavy regardless.
| There might be a few reasons why you'd want to have that many but
| I think ideally you'd want to instance them or have some kind of
| actor that handles UObjects instead
| Fokamul wrote:
| Too bad big companies don't care about this and more.
| "Morons(gamers) will just buy new hardware, fu hiring engine core
| devs".
| maccard wrote:
| This attitude comes up on here whenever gamedev comes up, and I
| really dislike it.
|
| Here's a quote form the article
|
| > I've already told you that this method saves 328 bytes per
| actor, which is not too much at first glance. You can apply the
| same trick for SceneComponents and save another 32 bytes per
| SceneComponent. Assuming an average of two SceneComponents per
| actor, you get up to 392 bytes per actor. Still not an
| impressive number unless you deal with a lot of actors. A
| hypothetical example level with 25 000 actors (which is a lot,
| but not unreasonable) will save about 10 MB.
|
| I've a lot of experience with Unreal, and 25k actors is likely
| to run into a whole host of problems, such that saving 10MB of
| RAM is likely to be the least of your worries. You'd get more
| benefit out of removing a single unneeded texture, or
| compressing a single animation better.
|
| One of the reasons developers use unreal (and yes, developers
| do use Unreal, it's not just "big companies" forcing their poor
| creatives to use the engine) is _because_ unreal has more man
| hours of development in a year than a small team would ever be
| able to put into their own engine. Like any tool it has
| tradeoffs, and it does have a (measureable) overhead. But to
| say that companies don't care is just disingenuous
| speed_spread wrote:
| Actors are handled by the CPU where shaving 10MB can mean
| that more things can now fit in the cache leading to dramatic
| improvement.
| maccard wrote:
| If you're going to make that assertion then back it up with
| numbers. It could just easily have absolutely no impact
| whatsoever because your game thread is spending all its
| time on navigation mesh queries which have nothing to do
| with actors or UObjects.
| speed_spread wrote:
| The keyword here is "can". I'm just saying it's
| definitely possible that a 10MB memory reduction in a
| critical spot results in significant performance gains. I
| agree 100% that any optimization should be backed up by
| solid benchmarks.
| maccard wrote:
| It also "can" also do absolutely nothing, or "can"
| introduce false sharing in multithreaded code.
|
| Lots of things are possible - but speculating on every
| possibility as though they're equally probable doesn't
| provide any value. Actors in unreal are a fairly low
| level item, but most games aren't going to have 25k
| actors in a world, and if they do, 10MB of memory usage
| fragmented across actors is likely the least of their
| worries.
| jayd16 wrote:
| Yeah, why does an engine include something useless like *checks
| notes... "multiplayer?"
| shadowgovt wrote:
| The market is a lot more complicated than that. But to a first
| approximation, this _is_ an uncharitable statement of the
| reality: gaming is a luxury product, gaming is winner-take-all
| (i.e. the really successful games see 100,000x gross revenue
| over the median indie game, and people can 't play two games at
| once so player attention is a very constrained resource), and
| the market consistently rewards novelty over polish. Players
| still bought Cyberpunk 2077, and then _kept buying it_ after
| the bug announcements came out; it has sold 30 million copies.
|
| All these market forces conspire to heavily incentivize a game
| studio to release as close to now as possible with as much game
| as they believe the players will stomach as possible. There are
| companies that buck this trend (Nintendo has a tradition of
| maximizing quality out-of-the-box), but that's where incentives
| point companies. Minecraft was _hilariously_ buggy (and devoid
| of features) when it came out; its original developer committed
| it to a price model where the earlier you bought it, the
| cheaper it would be, and it became one of the most popular
| mega-games of a generation.
|
| And the incentives come from players. Helldivers 2 doesn't have
| bugs because Arrowhead is lazy; it has bugs because Arrowhead
| wants a billion dollars _and gamers can be trusted to hand them
| over for a product that works most of the time, as long as it
| 's more fun than frustrating._
| jofla_net wrote:
| This was true even in the first version over 20 years ago. In a
| single player derivative, I remember combing through tons of
| UScript, unrealscript, stanzas which went something like. "Do
| this, and if we're in multiplayer, do this too or instead." The
| code was a messs, but again, good times.
| dleslie wrote:
| This has been a pattern since the first release of Unreal Engine.
| It's how we managed to smoosh it onto PS2 and Xbox.
| thenthenthen wrote:
| How to implement A-life in Stalker 2
| shadowgovt wrote:
| This is a really good writeup. Something the author doesn't
| mention is that shrinking your data structures is also helpful
| for cache cohesion: if your structures are smaller, more of them
| can fit in smaller CPU caches (even if the game engine is
| striping resources of the same kind to simplify ripping across
| them every frame, this can matter).
|
| The only counterweight I'd add is that if you _later_ decide to
| _add_ multiplayer, that is very, very hard if the engine wasn 't
| set up for it from the beginning. Multiplayer adds complexity
| that exceeds simple things like getting on the network and
| sending messages; synchronization and prediction are meaningful
| for a realtime experience and are much easier to get to if you've
| started from "It's multiplayer under the hood but the server is
| local" than "We have a single-player realtime game and we're
| making it multiplayer." But that's not a reason never to do this;
| not all games need to be multiplayer!
___________________________________________________________________
(page generated 2025-04-08 23:01 UTC)