[HN Gopher] Working with jumbo/unity builds in C/C++
___________________________________________________________________
Working with jumbo/unity builds in C/C++
Author : sph
Score : 57 points
Date : 2024-05-18 10:30 UTC (1 days ago)
(HTM) web link (austinmorlan.com)
(TXT) w3m dump (austinmorlan.com)
| senkora wrote:
| This is easily the best overview of unity builds in C/C++ that
| I've seen. I'll definitely save and reference it in the future.
| jslaby wrote:
| I agree, I'm glad I read the article. Seeing the title
| jumbo/unity builds thinking this was not for me, but I'm doing
| some small C++ stuff, so I might try it out.
| dmazzoni wrote:
| Unfortunately this isn't consistent with how these terms are
| typically used. This article only explains the simple case where
| your program is small enough that you can compile all source
| files at once. That's not realistic or practical for much larger
| projects.
|
| Both WebKit and Chromium support unity/jumbo builds. They combine
| around 20 source files at a time into a single compilation unit,
| which provides a reasonable tradeoff - making the full build
| noticeably faster without overflowing RAM and without making the
| cost of recompiling after a single change too large. You also get
| lots of parallelism.
|
| Making a unity / jumbo build work for a project with 10,000+
| files and millions of lines of code is not simple at all.
| npalli wrote:
| Making a unity / jumbo build work for a project with 10,000+
| files and millions of lines of code is not simple at all.
|
| True, but the number of such large projects is tiny (< 100?)
| compared to the tens of thousands of other smaller projects
| that might benefit. Good writeup.
| dayjaby wrote:
| I bet there are many more projects than 100. What makes you
| think the number is that low?
| cpeterso wrote:
| Firefox uses "unified" builds, concatenating .cpp files only
| within each directory. Unified Firefox builds are about 2-5x
| faster on my Mac.
|
| Non-unified builds are still built to make sure there are no
| unexpected side effects or accidental header file dependencies.
|
| https://firefox-source-docs.mozilla.org/build/buildsystem/un...
| omoikane wrote:
| Firefox's unified build was discussed here:
|
| https://news.ycombinator.com/item?id=35825683 - Unity builds
| lurked into the Firefox Build System (2023)
|
| The page that was referenced by that thread has moved,
| current location is
|
| https://serge-sans-paille.github.io/pythran-stories/how-
| unit...
| almostgotcaught wrote:
| > Chromium support unity/jumbo builds
|
| used to https://groups.google.com/a/chromium.org/g/chromium-
| dev/c/DP...
| StellarScience wrote:
| > Making a unity / jumbo build work for a project with 10,000+
| files and millions of lines of code is not simple at all
|
| For years we rolled our own unity build, but now CMake supports
| it directly through CMAKE_UNITY_BUILD and
| CMAKE_UNITY_BUILD_BATCH_SIZE, making it straightforward to
| enable.
|
| When first enabling it on a large project, you'll run into
| clashes where different files contain identically-named file-
| scoped function or variables. Sometimes this reveals copy-
| pasted code, where the fix is to refactor the duplicate code
| anyway. Other times you just pick more specific names to avoid
| the clash.
|
| We find unity build gives a solid 3X build speedup. We haven't
| eliminated header files, and in fact keep one slow CI job
| building the code _without_ unity to ensure our code still
| builds either way.
| plq wrote:
| Sqlite calls this an "amalgamation" build
|
| Details here: https://sqlite.org/amalgamation.html
|
| It's also touted as an easy way to embed sqlite.
| faresahmed wrote:
| IMO separation of interface and implementation is one of the
| "good" things about C/C++ (I used to be confused about it when
| starting out), it gives you a good overview of a piece of code
| and how it is supposed to be used. In other languages with no
| such "feature", you'll have to scroll hundreds of lines of
| implementation details you don't care about to understand the
| interface. You say this as a possible solution:
|
| > You can still use header files if you want to, they're just no
| longer strictly necessary. You're free to put struct definitions
| and function prototypes into a header file if you'd like.
|
| But is it really? not enforcing header files across the codebase
| means that you'll definitely end up with some inconsistency
| sooner or later that will be hard to deal with.
|
| > The order that you include the source files in all.c matters.
| In the above example, bar.c had to be included before foo.c
| because foo.c used a struct and function that was defined in
| bar.c.
|
| This is just additional overhead that you don't need while
| implementing something new.
|
| And in general, this goes against how normally C/C++ codebases
| are structured, I'm sure I'll be hella confused about a file
| called `all.c`.
| tyleo wrote:
| I worked at a company which had a Unity builded codebase. The
| company had actually branched the codebase for two separate
| products. One of the products kept the .h/.cpp build running in
| CI in addition to the Unity build, the other product only used
| the Unity build.
|
| My experience was that there was decent benefit to keeping the
| .h/.cpp build running. Most 'normal' C++ tools and IDEs are not
| going to assume you are using a Unity build and tend to choke
| on it. Even though we never really shipped anything from the
| non-Unity build, having it around was useful for avoiding
| 'phantom' errors in the IDE and having static analysis tools
| work properly.
| jbandela1 wrote:
| One issue that can happen that the article didn't mention is
| running out of AST node identifiers in Clang.
|
| Clang uses a 32-bit number to identify AST nodes. If a single
| translation unit is large enough, it can overflow this and you
| can get some very weird compilation errors.
| 201984 wrote:
| Unless your amalgamated source file is getting into the 100s of
| megabytes to gigabytes range, I doubt that would be an issue.
| gmueckl wrote:
| If template instantiations use up AST node identifiers (I
| don't know whether whether they do), I can see scenarios
| where crazy amplified template instantiation chains can eat
| substantial chunks of that range.
| buildartefact wrote:
| They do.
| SassyBird wrote:
| I suspect extensive use of Boost-style template
| metaprogramming is a great help in making this goal
| realistic.
| fire_lake wrote:
| Unity builds are strictly slower in the most common case -
| changes to just a few files in a project that has been built
| previously. This so why Google developed the Blaze build system.
| dafelst wrote:
| Unreal Engine does unity builds quite well - it will, as part of
| a prebuild step:
|
| 1. Merge related cpp and h files together in groups into monolith
| files, usually in the order of 10-20 source files merged based on
| my observations - often entire modules will be grouped together.
| 2. Exclude any individual files from the monolith that have edits
| since the last change.
|
| It's a nice middle ground, you don't substantially slow down your
| incremental builds since the files you're editing are still
| compiled individually, plus you get a fairly substantial
| improvement in build times.
|
| It's nice in that you can still structure your source in separate
| cpp/h files will little extra consideration for the unity builds
| and it mostly "just works" in both unity mode and with regular
| builds.
|
| Unfortunately you occasionally do have issues with builds that
| work in unity but not in individual compilation or vice versa
| (usually due to arcane #include dependency chains) but they're
| usually easy to fix.
| qalmakka wrote:
| Unreal's unity builds get very annoying as soon as your teams
| exceeds a handful of developers. You often have code that has
| builds fine on a machine but doesn't on another developer - it
| completely depends on what order the UBT picks. This is even
| more annoying when you have some people on Windows, some on
| Linux, some on Macs, and they consistently get different
| results. The fact that most tools (Intellisense, all clang-
| based tooling, ...) are broken with UE makes automating the
| detection of missing includes even harder.
|
| We worked around this by introducing a mandatory pre-merge CI
| stage that constantly does non-unity builds - but it's costly
| and not something a small company can often afford (a non-unity
| build of our UE project is ~20min on a Linux runner, and way
| more on a Windows one. That adds up fast).
|
| Unreal itself hasn't been non-unity buildable for a very long
| time. In general IMHO Unity builds are a testament to the
| failure of the C++ standards committee to realise that modules
| and the building model should be part of the standard to. The
| "one file is a translation unity" hasn't been adequate for
| years IMHO - I honestly appreciate how Rust basically imposed
| cargo as a standard, it was a hard but sane choice.
| dafelst wrote:
| Really? That hasn't been my experience at all. What engine
| version are you on?
|
| We target windows, linux, xsx and ps5 and have probably 15-20
| programmers making contributions daily and probably only hit
| maybe one or two of these issues per week, and they rarely
| get checked in as we get them during our mandatory preflight
| build during code review, similar to what you describe. We
| run all the preflight on on-prem machines now so the cost is
| minimized compared to our former cloud solution.
|
| We did a lot of work to modularize our codebase so maybe that
| is helping?
| intelVISA wrote:
| May be worth rethinking your module arrangements - unity
| builds have been stable for a while.
| maccard wrote:
| I agree with your point on the C++ standards, but disagree
| about your complaints with Unreal's unity issues. There are
| issues with non-reproducibility in theory, but I've worked on
| massive projects and the impact is minimal.
|
| > We worked around this by introducing a mandatory pre-merge
| CI stage that constantly does non-unity builds - but it's
| costly and not something a small company can often afford (a
| non-unity build of our UE project is ~20min on a Linux
| runner, and way more on a Windows one. That adds up fast).
|
| Or you could only build the files that have changed. Even the
| largest of large files are one minute compiles.
|
| > Unreal itself hasn't been non-unity buildable for a very
| long time.
|
| Unreal builds in non-unity just fine. There was definitely a
| time period where it _didn't_, but for the last few years
| it's been much better than that.
| snovv_crash wrote:
| Have you considered adding something like ccache to your
| compiler to speed up your CI?
| forrestthewoods wrote:
| Getting rid of header files for "jumbo builds" is, imho, a bad
| idea. "I don't want to update two places" is, imho, insufficient
| reason. Thankfully this is optional.
|
| Combing many cpp files into a single translation unit is a good
| idea. More projects should do this. Infact I'd go so far as to
| say that most non-trivial, popularC++ projects on GitHub could
| and should probably be boiled down to a single translation unit.
|
| The amount of redundant compiling in C++ is insane. Any project
| that uses STL is compiling the same crap over and over and over
| and over and over. Then counting on the linker to deduplicate the
| billions of cycles of wasted work.
| Yotsugi wrote:
| I don't undestand why there's still no solution for this in
| STL. Surely compilers like GCC could support a setup where STL
| instantiations are cached somewhere? If it was just one extra
| compile flag telling compiler to cache all template
| instantiations in some directory however it pleases, that would
| solve more than STL problems.
| mattnewport wrote:
| Modules, standardised in C++20, are the official solution
| both for the standard library and for other code but they are
| not universally well supported by all major compilers and
| build systems yet.
|
| Transitioning existing code to use modules is also not
| entirely straightforward, though probably no more problematic
| than introducing unity builds.
| forrestthewoods wrote:
| > though probably no more problematic than introducing
| unity builds.
|
| A "Unity build" really just means typing #include "foo.cpp"
| a few times. It's trivial.
|
| Meanwhile, neither Clang nor GCC support standard library
| modules. They have only partial support for modules
| themselves. C++ module support is non-existent in almost
| all build systems.
| https://en.cppreference.com/w/cpp/compiler_support
|
| The idea of C++ modules is great. It's badly needed. In
| practice I'm not sure if they're ever going to be genuinely
| functional and widespread. Which makes me sad. Toy projects
| don't count.
| corysama wrote:
| > You don't have to literally have only one translation unit.
|
| Having 2 units per cpu core in your machine is a much better
| idea.
| gumby wrote:
| > It's unfortunate that "Unity Build" is the prevailing term
| because it's impossible to do a search without getting a lot of
| results about the Unity game engine.
|
| I ignored this article on the front page because of that. Only
| because it stayed on the front page for several hours did (and
| because I care about C++ build issues) did I eventually click on
| it.
| Yotsugi wrote:
| > and because I care about C++ build issues
|
| Look no further, these builds will give you more than enough
| issues on any sizable project.
| gumby wrote:
| Yes, glad I eventually clicked on it!
| wakawaka28 wrote:
| It's only faster if you are building from scratch, and even then
| it might not be faster. The resources required to build one huge
| compilation unit are also higher than compiling separately. If
| you support this type of build, it should not be the only option.
| stephc_int13 wrote:
| It is always faster in all cases from my experience. It should
| not, but it is. Try.
| unsatchmo wrote:
| This is silly. If I'm only rebuilding 1 file out of a 200
| file source base, I guarantee the non-unity build will be
| faster. Just the number of characters I have to tokenize in
| this thought experiment should be enough to convince. If this
| isn't the case then you're doing some sort of n^2 c++ boost
| every header includes most other headers shenanigans and you
| need to stop that rather than doing a unity build.
| maccard wrote:
| I've worked with unity builds a lot in my career. Rebuilding
| a unity module can be 90s-2minutes (plus linking), whereas a
| single file might only be 3-5 seconds on its own.
|
| The best possibility is if your build system can detect which
| files have changed and exclude them from unity builds, as
| this gives you the "slow path" the first time you change a
| file, but from then on you get the fast path behaviour.
| SleepyMyroslav wrote:
| Linked article describes manual approaches that create less
| reusable code.
|
| Luckily there are unity build tools that do not require manual
| changes to code and work with normal .h/.cpp files automatically.
___________________________________________________________________
(page generated 2024-05-19 23:01 UTC)