[HN Gopher] Working with jumbo/unity builds in C/C++
       ___________________________________________________________________
        
       Working with jumbo/unity builds in C/C++
        
       Author : sph
       Score  : 57 points
       Date   : 2024-05-18 10:30 UTC (1 days ago)
        
 (HTM) web link (austinmorlan.com)
 (TXT) w3m dump (austinmorlan.com)
        
       | senkora wrote:
       | This is easily the best overview of unity builds in C/C++ that
       | I've seen. I'll definitely save and reference it in the future.
        
         | jslaby wrote:
         | I agree, I'm glad I read the article. Seeing the title
         | jumbo/unity builds thinking this was not for me, but I'm doing
         | some small C++ stuff, so I might try it out.
        
       | dmazzoni wrote:
       | Unfortunately this isn't consistent with how these terms are
       | typically used. This article only explains the simple case where
       | your program is small enough that you can compile all source
       | files at once. That's not realistic or practical for much larger
       | projects.
       | 
       | Both WebKit and Chromium support unity/jumbo builds. They combine
       | around 20 source files at a time into a single compilation unit,
       | which provides a reasonable tradeoff - making the full build
       | noticeably faster without overflowing RAM and without making the
       | cost of recompiling after a single change too large. You also get
       | lots of parallelism.
       | 
       | Making a unity / jumbo build work for a project with 10,000+
       | files and millions of lines of code is not simple at all.
        
         | npalli wrote:
         | Making a unity / jumbo build work for a project with 10,000+
         | files and millions of lines of code is not simple at all.
         | 
         | True, but the number of such large projects is tiny (< 100?)
         | compared to the tens of thousands of other smaller projects
         | that might benefit. Good writeup.
        
           | dayjaby wrote:
           | I bet there are many more projects than 100. What makes you
           | think the number is that low?
        
         | cpeterso wrote:
         | Firefox uses "unified" builds, concatenating .cpp files only
         | within each directory. Unified Firefox builds are about 2-5x
         | faster on my Mac.
         | 
         | Non-unified builds are still built to make sure there are no
         | unexpected side effects or accidental header file dependencies.
         | 
         | https://firefox-source-docs.mozilla.org/build/buildsystem/un...
        
           | omoikane wrote:
           | Firefox's unified build was discussed here:
           | 
           | https://news.ycombinator.com/item?id=35825683 - Unity builds
           | lurked into the Firefox Build System (2023)
           | 
           | The page that was referenced by that thread has moved,
           | current location is
           | 
           | https://serge-sans-paille.github.io/pythran-stories/how-
           | unit...
        
         | almostgotcaught wrote:
         | > Chromium support unity/jumbo builds
         | 
         | used to https://groups.google.com/a/chromium.org/g/chromium-
         | dev/c/DP...
        
         | StellarScience wrote:
         | > Making a unity / jumbo build work for a project with 10,000+
         | files and millions of lines of code is not simple at all
         | 
         | For years we rolled our own unity build, but now CMake supports
         | it directly through CMAKE_UNITY_BUILD and
         | CMAKE_UNITY_BUILD_BATCH_SIZE, making it straightforward to
         | enable.
         | 
         | When first enabling it on a large project, you'll run into
         | clashes where different files contain identically-named file-
         | scoped function or variables. Sometimes this reveals copy-
         | pasted code, where the fix is to refactor the duplicate code
         | anyway. Other times you just pick more specific names to avoid
         | the clash.
         | 
         | We find unity build gives a solid 3X build speedup. We haven't
         | eliminated header files, and in fact keep one slow CI job
         | building the code _without_ unity to ensure our code still
         | builds either way.
        
       | plq wrote:
       | Sqlite calls this an "amalgamation" build
       | 
       | Details here: https://sqlite.org/amalgamation.html
       | 
       | It's also touted as an easy way to embed sqlite.
        
       | faresahmed wrote:
       | IMO separation of interface and implementation is one of the
       | "good" things about C/C++ (I used to be confused about it when
       | starting out), it gives you a good overview of a piece of code
       | and how it is supposed to be used. In other languages with no
       | such "feature", you'll have to scroll hundreds of lines of
       | implementation details you don't care about to understand the
       | interface. You say this as a possible solution:
       | 
       | > You can still use header files if you want to, they're just no
       | longer strictly necessary. You're free to put struct definitions
       | and function prototypes into a header file if you'd like.
       | 
       | But is it really? not enforcing header files across the codebase
       | means that you'll definitely end up with some inconsistency
       | sooner or later that will be hard to deal with.
       | 
       | > The order that you include the source files in all.c matters.
       | In the above example, bar.c had to be included before foo.c
       | because foo.c used a struct and function that was defined in
       | bar.c.
       | 
       | This is just additional overhead that you don't need while
       | implementing something new.
       | 
       | And in general, this goes against how normally C/C++ codebases
       | are structured, I'm sure I'll be hella confused about a file
       | called `all.c`.
        
         | tyleo wrote:
         | I worked at a company which had a Unity builded codebase. The
         | company had actually branched the codebase for two separate
         | products. One of the products kept the .h/.cpp build running in
         | CI in addition to the Unity build, the other product only used
         | the Unity build.
         | 
         | My experience was that there was decent benefit to keeping the
         | .h/.cpp build running. Most 'normal' C++ tools and IDEs are not
         | going to assume you are using a Unity build and tend to choke
         | on it. Even though we never really shipped anything from the
         | non-Unity build, having it around was useful for avoiding
         | 'phantom' errors in the IDE and having static analysis tools
         | work properly.
        
       | jbandela1 wrote:
       | One issue that can happen that the article didn't mention is
       | running out of AST node identifiers in Clang.
       | 
       | Clang uses a 32-bit number to identify AST nodes. If a single
       | translation unit is large enough, it can overflow this and you
       | can get some very weird compilation errors.
        
         | 201984 wrote:
         | Unless your amalgamated source file is getting into the 100s of
         | megabytes to gigabytes range, I doubt that would be an issue.
        
           | gmueckl wrote:
           | If template instantiations use up AST node identifiers (I
           | don't know whether whether they do), I can see scenarios
           | where crazy amplified template instantiation chains can eat
           | substantial chunks of that range.
        
             | buildartefact wrote:
             | They do.
        
           | SassyBird wrote:
           | I suspect extensive use of Boost-style template
           | metaprogramming is a great help in making this goal
           | realistic.
        
       | fire_lake wrote:
       | Unity builds are strictly slower in the most common case -
       | changes to just a few files in a project that has been built
       | previously. This so why Google developed the Blaze build system.
        
       | dafelst wrote:
       | Unreal Engine does unity builds quite well - it will, as part of
       | a prebuild step:
       | 
       | 1. Merge related cpp and h files together in groups into monolith
       | files, usually in the order of 10-20 source files merged based on
       | my observations - often entire modules will be grouped together.
       | 2. Exclude any individual files from the monolith that have edits
       | since the last change.
       | 
       | It's a nice middle ground, you don't substantially slow down your
       | incremental builds since the files you're editing are still
       | compiled individually, plus you get a fairly substantial
       | improvement in build times.
       | 
       | It's nice in that you can still structure your source in separate
       | cpp/h files will little extra consideration for the unity builds
       | and it mostly "just works" in both unity mode and with regular
       | builds.
       | 
       | Unfortunately you occasionally do have issues with builds that
       | work in unity but not in individual compilation or vice versa
       | (usually due to arcane #include dependency chains) but they're
       | usually easy to fix.
        
         | qalmakka wrote:
         | Unreal's unity builds get very annoying as soon as your teams
         | exceeds a handful of developers. You often have code that has
         | builds fine on a machine but doesn't on another developer - it
         | completely depends on what order the UBT picks. This is even
         | more annoying when you have some people on Windows, some on
         | Linux, some on Macs, and they consistently get different
         | results. The fact that most tools (Intellisense, all clang-
         | based tooling, ...) are broken with UE makes automating the
         | detection of missing includes even harder.
         | 
         | We worked around this by introducing a mandatory pre-merge CI
         | stage that constantly does non-unity builds - but it's costly
         | and not something a small company can often afford (a non-unity
         | build of our UE project is ~20min on a Linux runner, and way
         | more on a Windows one. That adds up fast).
         | 
         | Unreal itself hasn't been non-unity buildable for a very long
         | time. In general IMHO Unity builds are a testament to the
         | failure of the C++ standards committee to realise that modules
         | and the building model should be part of the standard to. The
         | "one file is a translation unity" hasn't been adequate for
         | years IMHO - I honestly appreciate how Rust basically imposed
         | cargo as a standard, it was a hard but sane choice.
        
           | dafelst wrote:
           | Really? That hasn't been my experience at all. What engine
           | version are you on?
           | 
           | We target windows, linux, xsx and ps5 and have probably 15-20
           | programmers making contributions daily and probably only hit
           | maybe one or two of these issues per week, and they rarely
           | get checked in as we get them during our mandatory preflight
           | build during code review, similar to what you describe. We
           | run all the preflight on on-prem machines now so the cost is
           | minimized compared to our former cloud solution.
           | 
           | We did a lot of work to modularize our codebase so maybe that
           | is helping?
        
           | intelVISA wrote:
           | May be worth rethinking your module arrangements - unity
           | builds have been stable for a while.
        
           | maccard wrote:
           | I agree with your point on the C++ standards, but disagree
           | about your complaints with Unreal's unity issues. There are
           | issues with non-reproducibility in theory, but I've worked on
           | massive projects and the impact is minimal.
           | 
           | > We worked around this by introducing a mandatory pre-merge
           | CI stage that constantly does non-unity builds - but it's
           | costly and not something a small company can often afford (a
           | non-unity build of our UE project is ~20min on a Linux
           | runner, and way more on a Windows one. That adds up fast).
           | 
           | Or you could only build the files that have changed. Even the
           | largest of large files are one minute compiles.
           | 
           | > Unreal itself hasn't been non-unity buildable for a very
           | long time.
           | 
           | Unreal builds in non-unity just fine. There was definitely a
           | time period where it _didn't_, but for the last few years
           | it's been much better than that.
        
           | snovv_crash wrote:
           | Have you considered adding something like ccache to your
           | compiler to speed up your CI?
        
       | forrestthewoods wrote:
       | Getting rid of header files for "jumbo builds" is, imho, a bad
       | idea. "I don't want to update two places" is, imho, insufficient
       | reason. Thankfully this is optional.
       | 
       | Combing many cpp files into a single translation unit is a good
       | idea. More projects should do this. Infact I'd go so far as to
       | say that most non-trivial, popularC++ projects on GitHub could
       | and should probably be boiled down to a single translation unit.
       | 
       | The amount of redundant compiling in C++ is insane. Any project
       | that uses STL is compiling the same crap over and over and over
       | and over and over. Then counting on the linker to deduplicate the
       | billions of cycles of wasted work.
        
         | Yotsugi wrote:
         | I don't undestand why there's still no solution for this in
         | STL. Surely compilers like GCC could support a setup where STL
         | instantiations are cached somewhere? If it was just one extra
         | compile flag telling compiler to cache all template
         | instantiations in some directory however it pleases, that would
         | solve more than STL problems.
        
           | mattnewport wrote:
           | Modules, standardised in C++20, are the official solution
           | both for the standard library and for other code but they are
           | not universally well supported by all major compilers and
           | build systems yet.
           | 
           | Transitioning existing code to use modules is also not
           | entirely straightforward, though probably no more problematic
           | than introducing unity builds.
        
             | forrestthewoods wrote:
             | > though probably no more problematic than introducing
             | unity builds.
             | 
             | A "Unity build" really just means typing #include "foo.cpp"
             | a few times. It's trivial.
             | 
             | Meanwhile, neither Clang nor GCC support standard library
             | modules. They have only partial support for modules
             | themselves. C++ module support is non-existent in almost
             | all build systems.
             | https://en.cppreference.com/w/cpp/compiler_support
             | 
             | The idea of C++ modules is great. It's badly needed. In
             | practice I'm not sure if they're ever going to be genuinely
             | functional and widespread. Which makes me sad. Toy projects
             | don't count.
        
       | corysama wrote:
       | > You don't have to literally have only one translation unit.
       | 
       | Having 2 units per cpu core in your machine is a much better
       | idea.
        
       | gumby wrote:
       | > It's unfortunate that "Unity Build" is the prevailing term
       | because it's impossible to do a search without getting a lot of
       | results about the Unity game engine.
       | 
       | I ignored this article on the front page because of that. Only
       | because it stayed on the front page for several hours did (and
       | because I care about C++ build issues) did I eventually click on
       | it.
        
         | Yotsugi wrote:
         | > and because I care about C++ build issues
         | 
         | Look no further, these builds will give you more than enough
         | issues on any sizable project.
        
           | gumby wrote:
           | Yes, glad I eventually clicked on it!
        
       | wakawaka28 wrote:
       | It's only faster if you are building from scratch, and even then
       | it might not be faster. The resources required to build one huge
       | compilation unit are also higher than compiling separately. If
       | you support this type of build, it should not be the only option.
        
         | stephc_int13 wrote:
         | It is always faster in all cases from my experience. It should
         | not, but it is. Try.
        
           | unsatchmo wrote:
           | This is silly. If I'm only rebuilding 1 file out of a 200
           | file source base, I guarantee the non-unity build will be
           | faster. Just the number of characters I have to tokenize in
           | this thought experiment should be enough to convince. If this
           | isn't the case then you're doing some sort of n^2 c++ boost
           | every header includes most other headers shenanigans and you
           | need to stop that rather than doing a unity build.
        
           | maccard wrote:
           | I've worked with unity builds a lot in my career. Rebuilding
           | a unity module can be 90s-2minutes (plus linking), whereas a
           | single file might only be 3-5 seconds on its own.
           | 
           | The best possibility is if your build system can detect which
           | files have changed and exclude them from unity builds, as
           | this gives you the "slow path" the first time you change a
           | file, but from then on you get the fast path behaviour.
        
       | SleepyMyroslav wrote:
       | Linked article describes manual approaches that create less
       | reusable code.
       | 
       | Luckily there are unity build tools that do not require manual
       | changes to code and work with normal .h/.cpp files automatically.
        
       ___________________________________________________________________
       (page generated 2024-05-19 23:01 UTC)