[HN Gopher] Deleting multiplayer from the Unreal engine can save...
       ___________________________________________________________________
        
       Deleting multiplayer from the Unreal engine can save memory
        
       Author : mariuz
       Score  : 150 points
       Date   : 2025-04-06 10:23 UTC (2 days ago)
        
 (HTM) web link (larstofus.com)
 (TXT) w3m dump (larstofus.com)
        
       | bengarney wrote:
       | Really interesting analysis of where the data lives... cutting
       | 3-4 textures would save you more memory even in the 100k actor
       | case, though.
        
         | reitzensteinm wrote:
         | Depending on a bunch of factors of how this data is accessed
         | and actors are laid out in memory, it may be more cache
         | friendly which could yield substantial speedups.
         | 
         | Or it could do next to nothing, as the data is multiple cache
         | lines long anyway.
        
           | bengarney wrote:
           | I would not expect much, but you'd have to measure to be
           | sure.
           | 
           | If you actually have a million of something you're better off
           | writing a custom manager thing to handle the bulk of the work
           | anyway. For instance, if you're doing a brick building game
           | where users might place a million bricks - maybe you want
           | each brick to be an Actor for certain use cases, but you'd
           | want to centralize all the collision, rendering, update
           | logic. (This is what I did on a project with this exact use
           | case and it worked nicely.)
        
         | cma wrote:
         | If the memory savings he got were fully read or fragmented with
         | other stuff on cache lines that are read in every frame (not
         | likely for static world actors), it could be ~10% of CPU memory
         | bandwidth on mobile every frame at 120hz on an lpddr4 phone.
         | 
         | A big problem with them is they are so heavyweight you can only
         | spawn a few per frame before causing hitches and have to have
         | pools or instancing to manage things like bullets.
         | 
         | I think in their Robo Recall talk they found they could only
         | spawn 10-20 projectile style bullets per frame before running
         | into hitches, and switched to pools and recycling them.
        
           | teamonkey wrote:
           | Pooling is pretty standard practice though, it would be the
           | go-to solution for any experienced gameplay programmer when
           | dealing with more than a dozen entities (though annoyingly
           | there isn't a standardised way of doing it in Blueprint).
        
             | dijit wrote:
             | To be completely fair though, blueprints themselves are
             | oft-maligned for performance.
             | 
             | They're fantastic for prototyping, but once you have
             | designed some kind of hot-path most people typically start
             | converting blueprints to code as an optimisation.
             | 
             | In such a scenario adding pooling becomes a trivial part of
             | such an effort.
        
             | cma wrote:
             | Standard practice, but it bit Epic by surprise. You
             | wouldn't think it would be needed at such small numbers.
             | You wouldn't automatically think it would be needed on
             | 3+ghz machines.
        
           | Pxtl wrote:
           | I've never played with UE and so I'm kinda shocked to learn
           | that there isn't pooling already for objects that have this
           | kind of creation cost.
        
       | spockz wrote:
       | Besides the memory savings, would there also be performance
       | gains? More actors would now fit in a cache line.
        
         | cma wrote:
         | The actors in unreal are such bloated single inheritance god
         | classes that with a few actor components they take up more like
         | a 4K memory page not part of a cache line, especially in editor
         | builds.
         | 
         | But they do have a more optimized entity component system now
         | too.
         | 
         | To be fair, a single transform now that things are 64 bit
         | coordinates I think is bigger than a cache line too.
        
         | larstofus wrote:
         | I haven't profiled this specifically, but my guess is that
         | there shouldn't be any measurable performance gains. Most of
         | the time, actors are randomly scattered in memory anyway, so
         | having smaller actors doesn't avoid a lot of cache misses.
        
       | out-of-ideas wrote:
       | is it me or does the side loop endlessly loading thousands of
       | requests? to at least:
       | 
       | - https://public-api.wordpress.com/wp-admin/rest-proxy/#https:...
       | 
       | - https://s0.wp.com/wp-content/js/rlt-proxy.js?m=20240709
       | 
       | blocking these via regex made the page load up really nice and
       | fast
       | 
       | edit: formatting
        
       | keyle wrote:
       | Would this have any effect in prod build though?
        
         | teamonkey wrote:
         | That was my immediate thought on reading the article. Is this
         | in dev or prod builds and which UE version is it referring to?
         | 
         | (Not that I expect the UActor code to have changed much but
         | modifying UActor seemed more common in the early 4.x days.)
        
           | larstofus wrote:
           | Oh, good point. The test was done in the current stable 5.5
           | release in a development configuration :) Since this is a
           | change to the memory layout of the class, there should be no
           | change due to the build configuration, though.
        
       | mwkaufma wrote:
       | On ABZU we ripped out multiplayer and lots of other cruft from
       | AActor and UPrimitiveComponent - dropping builtin overlap events,
       | which are a kind of anti pattern anyway, not only saved RAM but
       | cut a lot of ghost reads/writes.
        
         | rc5150 wrote:
         | Whoa! Didn't picture myself seeing a dev who worked on Abzu in
         | the wild here on HN--I very much enjoyed that game, my thanks
         | and high fives to your team for your work!
         | 
         | I'm having a kid-in-the-tunnel-meeting-Mean-Joe-Green-in-the-
         | commercial moment, I just started my own game development
         | journey about a week ago so it's neat getting to run across a
         | full-on developer!
         | 
         | To stay on topic, I often thought how cool Abzu would have been
         | with multiplayer but it's a good lesson to me that some
         | features that might be desirable might also be a hindrance to
         | some degree.
         | 
         | Okay, enough fanboying!
        
         | jayd16 wrote:
         | What makes overlap events an anti-pattern?
        
           | mwkaufma wrote:
           | In principle it's a fine idea, but their implementation has
           | so many footguns (race conditions, thundering herds, perf
           | cliffs, etc) it was easier to impl your own simpler
           | alternative.
        
         | thaumasiotes wrote:
         | Your presence inspired me to try to look up what the circumflex
         | on "abzu" is supposed to signify. As best I can tell, it's a
         | marker of vowel length.
         | 
         | I wonder how that came to be used. It's a traditional way to
         | distinguish eta and omega in transliteration from Greek, but
         | it's not at all a traditional way to mark long vowels in
         | general.
         | 
         | (I see that wikipedia says this about Akkadian:
         | 
         | > Long vowels are transliterated with a macron (a, e, i, u) or
         | a circumflex (a, e, i, u), the latter being used for long
         | vowels arising from the contraction of vowels in hiatus.
         | 
         | But it seems odd for an independent root to contain a
         | contracted double vowel. And the page "Abzu" has the circumflex
         | on the Sumerian transliteration too.)
        
           | stavros wrote:
           | "Abzu" is also the Greek onomatopoeia for a sneeze.
        
             | thaumasiotes wrote:
             | I was highly amused to learn that the ancient Greek verb
             | for spitting is ptuo. (Compare English "ptooey".)
        
               | stavros wrote:
               | Yeah, and the modern is much the same ("phtuno"). I'm
               | sure it's onomatopoeic, and it's an amusing word.
        
               | jorvi wrote:
               | It's funny how I can sort-of read the Dutch word in
               | "tun": spit "tuf" and spitting "tuffen". I can't find the
               | etymology of it so it might be a false cognate
               | 
               | If it isn't a false cognate, I wonder what the function
               | of "ph" and "o" are..
        
               | stavros wrote:
               | It is a false cognafe, it's not pronounced "oo", but
               | "ee". It's just onomatopoeia, that's why it's so similar.
        
               | jorvi wrote:
               | We pronounce it t-uh-f, as in "tough" without the o.
        
               | thaumasiotes wrote:
               | Note that n is an N, not a V.
        
               | thaumasiotes wrote:
               | Mandarin is Tu  /tku/.
        
               | pandemic_region wrote:
               | Does the popular 'hawk' prefix for this also originate
               | from ancient Greek ?
        
           | mananaysiempre wrote:
           | Seems to be an older convention in linguistics. Romanizations
           | of Japanese also switched from circumflexes (Tokyo) to
           | macrons (Tokyo) at some point in time fairly long ago--I
           | think the English-language Japanese journal I saw using that
           | convention systematically was from the late 1950s, and its
           | recent issues definitely don't use it.
           | 
           | Perhaps a circumflex was easier to typeset, like with
           | logicians switching from A to !A and the Chomskyan school in
           | linguistics switching from X-bar and X-double-bar to X' and
           | XP?
        
         | larstofus wrote:
         | Oh, nice to see that there are real-life examples of this
         | stuff, thank you very much :) Needless to say that I'll take a
         | deeper look at the overlaps now :D
        
         | pwdisswordfishz wrote:
         | What's a ghost read? Search engines are failing me.
        
           | dagmx wrote:
           | Unnecessary reads that you can't really control or observe
           | well.
        
           | mwkaufma wrote:
           | Unused memory accesses thrashing cache
        
           | Veliladon wrote:
           | There's two main things. First of all, when you load data
           | into a register from an address in memory the CPU loads
           | 64-byte cache lines, not words or bytes. AActor for instance
           | is 1016 bytes. 16 cache lines. It's freaking huge.
           | 
           | So let's say you're going through all the actors and updating
           | one thing. If those actors are in an array it's easy. Just a
           | for loop, update the member variable, done. Easy, fast,
           | should be performant right? But each time you're updating one
           | of those the prefetcher is also bringing in extra lines, more
           | data in the object, thinking you might need them next. So if
           | you're only updating a single thing or a couple of things in
           | the object on different cache lines you might really bring in
           | 3-8x the data you actually need.
           | 
           | CPU prefetchers have something called stride detectors which
           | can detect access patterns of N number of steps and stop the
           | prefetcher from grabbing additional lines but at 16 cache
           | lines the AActor object is way too big for the stride
           | detector to keep up with. So you stride in gaps of 16 cache
           | lines at a time through memory and you still get 2-3 extra
           | cache lines after the initial access.
           | 
           | Secondly, a 1016 byte object just doesn't fit. It's word
           | aligned but it's not cache line aligned and it's sure as hell
           | not page aligned.
           | 
           | Best case scenario if you're updating two variables next to
           | each other in memory the prefetcher gets both on the same
           | cache line. Medium case scenario, the prefetcher has to grab
           | the next line every so often. You'll get best most often and
           | medium rarely.
           | 
           | Bad case scenario, the prefetcher has to grab the next cache
           | line on the NEXT PAGE. Which only just became a thing on
           | recent CPUs but also involves translating the virtual address
           | of the next page to its physical page address which takes
           | forever in data access terms. Bunch of pointer chasing,
           | basically a few thousand clock of waiting.
           | 
           | The absolute worst case scenario is that the prefetcher
           | thinks you need the next cache line, it's on the next page,
           | it does the rigamarole of translating the next page's virtual
           | address and you don't actually need it. You've done two
           | orders of magnitude more work than reading a single variable
           | for literally nothing.
           | 
           | So yeah. The prefetcher can do some weird ass shit when you
           | throw weird and massive data structs at it. Slashing and
           | burning the size down helps because the stride detector can
           | start functioning again when the object is small enough. If
           | it can be kept to a multiple of 64 bytes you even get page
           | aligned again.
        
         | joshyeager wrote:
         | Thank you for ABZU! My daughter has played it at least ten
         | times. And when she wrote a letter to your team for a school
         | project, you sent back a t-shirt and a soundtrack CD. We've
         | listened to that CD for hours on road trips, it is a great
         | soundtrack.
        
       | joegibbs wrote:
       | You would probably want to avoid having tens of thousands or a
       | hundred thousand actors though, they're pretty heavy regardless.
       | There might be a few reasons why you'd want to have that many but
       | I think ideally you'd want to instance them or have some kind of
       | actor that handles UObjects instead
        
       | Fokamul wrote:
       | Too bad big companies don't care about this and more.
       | "Morons(gamers) will just buy new hardware, fu hiring engine core
       | devs".
        
         | maccard wrote:
         | This attitude comes up on here whenever gamedev comes up, and I
         | really dislike it.
         | 
         | Here's a quote form the article
         | 
         | > I've already told you that this method saves 328 bytes per
         | actor, which is not too much at first glance. You can apply the
         | same trick for SceneComponents and save another 32 bytes per
         | SceneComponent. Assuming an average of two SceneComponents per
         | actor, you get up to 392 bytes per actor. Still not an
         | impressive number unless you deal with a lot of actors. A
         | hypothetical example level with 25 000 actors (which is a lot,
         | but not unreasonable) will save about 10 MB.
         | 
         | I've a lot of experience with Unreal, and 25k actors is likely
         | to run into a whole host of problems, such that saving 10MB of
         | RAM is likely to be the least of your worries. You'd get more
         | benefit out of removing a single unneeded texture, or
         | compressing a single animation better.
         | 
         | One of the reasons developers use unreal (and yes, developers
         | do use Unreal, it's not just "big companies" forcing their poor
         | creatives to use the engine) is _because_ unreal has more man
         | hours of development in a year than a small team would ever be
         | able to put into their own engine. Like any tool it has
         | tradeoffs, and it does have a (measureable) overhead. But to
         | say that companies don't care is just disingenuous
        
           | speed_spread wrote:
           | Actors are handled by the CPU where shaving 10MB can mean
           | that more things can now fit in the cache leading to dramatic
           | improvement.
        
             | maccard wrote:
             | If you're going to make that assertion then back it up with
             | numbers. It could just easily have absolutely no impact
             | whatsoever because your game thread is spending all its
             | time on navigation mesh queries which have nothing to do
             | with actors or UObjects.
        
               | speed_spread wrote:
               | The keyword here is "can". I'm just saying it's
               | definitely possible that a 10MB memory reduction in a
               | critical spot results in significant performance gains. I
               | agree 100% that any optimization should be backed up by
               | solid benchmarks.
        
               | maccard wrote:
               | It also "can" also do absolutely nothing, or "can"
               | introduce false sharing in multithreaded code.
               | 
               | Lots of things are possible - but speculating on every
               | possibility as though they're equally probable doesn't
               | provide any value. Actors in unreal are a fairly low
               | level item, but most games aren't going to have 25k
               | actors in a world, and if they do, 10MB of memory usage
               | fragmented across actors is likely the least of their
               | worries.
        
         | jayd16 wrote:
         | Yeah, why does an engine include something useless like *checks
         | notes... "multiplayer?"
        
         | shadowgovt wrote:
         | The market is a lot more complicated than that. But to a first
         | approximation, this _is_ an uncharitable statement of the
         | reality: gaming is a luxury product, gaming is winner-take-all
         | (i.e. the really successful games see 100,000x gross revenue
         | over the median indie game, and people can 't play two games at
         | once so player attention is a very constrained resource), and
         | the market consistently rewards novelty over polish. Players
         | still bought Cyberpunk 2077, and then _kept buying it_ after
         | the bug announcements came out; it has sold 30 million copies.
         | 
         | All these market forces conspire to heavily incentivize a game
         | studio to release as close to now as possible with as much game
         | as they believe the players will stomach as possible. There are
         | companies that buck this trend (Nintendo has a tradition of
         | maximizing quality out-of-the-box), but that's where incentives
         | point companies. Minecraft was _hilariously_ buggy (and devoid
         | of features) when it came out; its original developer committed
         | it to a price model where the earlier you bought it, the
         | cheaper it would be, and it became one of the most popular
         | mega-games of a generation.
         | 
         | And the incentives come from players. Helldivers 2 doesn't have
         | bugs because Arrowhead is lazy; it has bugs because Arrowhead
         | wants a billion dollars _and gamers can be trusted to hand them
         | over for a product that works most of the time, as long as it
         | 's more fun than frustrating._
        
       | jofla_net wrote:
       | This was true even in the first version over 20 years ago. In a
       | single player derivative, I remember combing through tons of
       | UScript, unrealscript, stanzas which went something like. "Do
       | this, and if we're in multiplayer, do this too or instead." The
       | code was a messs, but again, good times.
        
       | dleslie wrote:
       | This has been a pattern since the first release of Unreal Engine.
       | It's how we managed to smoosh it onto PS2 and Xbox.
        
       | thenthenthen wrote:
       | How to implement A-life in Stalker 2
        
       | shadowgovt wrote:
       | This is a really good writeup. Something the author doesn't
       | mention is that shrinking your data structures is also helpful
       | for cache cohesion: if your structures are smaller, more of them
       | can fit in smaller CPU caches (even if the game engine is
       | striping resources of the same kind to simplify ripping across
       | them every frame, this can matter).
       | 
       | The only counterweight I'd add is that if you _later_ decide to
       | _add_ multiplayer, that is very, very hard if the engine wasn 't
       | set up for it from the beginning. Multiplayer adds complexity
       | that exceeds simple things like getting on the network and
       | sending messages; synchronization and prediction are meaningful
       | for a realtime experience and are much easier to get to if you've
       | started from "It's multiplayer under the hood but the server is
       | local" than "We have a single-player realtime game and we're
       | making it multiplayer." But that's not a reason never to do this;
       | not all games need to be multiplayer!
        
       ___________________________________________________________________
       (page generated 2025-04-08 23:01 UTC)