https://devblogs.microsoft.com/cppblog/integrating-c-header-units-into-office-using-msvc-2-n/ Skip to main content [RE1Mu3b] Microsoft C++ Team Blog C++ Team Blog C++ Team Blog * Home * DevBlogs * Developer + Visual Studio + Visual Studio Code + Visual Studio for Mac + DevOps + Windows Developer + Developer support + CSE Developer + Engineering@Microsoft + Azure SDK + IoT + Command Line + Perf and Diagnostics + Dr. International + Notification Hubs + Math in Office + React Native * Technology + DirectX + PIX + Semantic Kernel + SurfaceDuo + Startups + Sustainable Engineering + Windows AI Platform * Languages + C++ + C# + F# + Visual Basic + TypeScript + PowerShell Community + PowerShell Team + Python + Q# + JavaScript + Java + Java Blog in Chinese * .NET + All .NET posts + .NET MAUI + ASP.NET Core + Blazor + Entity Framework + ML.NET + NuGet + Servicing + Xamarin + .NET Blog in Chinese * Platform Development + #ifdef Windows + Azure Depth Platform + Azure Government + Azure VM Runtime Team + Bing Dev Center + Microsoft Edge Dev + Microsoft Azure + Microsoft 365 Developer + Microsoft Entra Identity Developer Blog + Old New Thing + Power Platform + Windows MIDI and Music dev + Windows Search Platform * Data Development + Azure Cosmos DB + Azure Data Studio + Azure SQL Database + OData + Revolutions R + SQL Server Data Tools * More [ ] Search Search * No results Cancel Email Subscriptions are here! Get notified in your email when a new post is published to this blog Subscribe Close Integrating C++ header units into Office using MSVC (2/n) [png] Cameron DaCamara [png] Zachary Henkel September 11th, 20233 0 In this follow-up blog, we will explore progress made towards getting header units working in the Office codebase. Overview * Overview: Where we were, where we're going. * Old Code, Old Problems: 'fun' code issues found while scaling out. * Rethinking Compiler Tooling: How can the compiler rise to the scaling problems? * A New Approach to Referencing an IFC. * Playing Nice With Precompiled Headers (PCH). * Selecting a Launch Project: REDACTED. * Looking Ahead: Throughput. Overview Last time we talked about how and why header units can be integrated into a large cross-platform codebase like Office. We discussed how header units helped surface conformance issues (good) and expose and fix compiler bugs (good-ish). We talked about how we went about taking "baby steps" to integrate header units into smaller liblets--we're talking something on the order of 100s of header units. This blog entry is all about scale and how we move from 100s of header units to 1000s of header units, including playing nicely with precompiled headers! Office's header unit experiments continued by following the charge that we left you with in the last blog post: "Header Unit All the Things!". From that perspective we were wildly successful! By the last count we were able to successfully create over 5000 header units from liblet public headers. The road to reach that milestone wasn't always smooth, and we'll cover some of the challenges. We'd like to highlight that the recent release of MSVC 17.6.6 makes this one of the best times to get started with header units! The release contains the full set of fixes that were discovered in cooperation with Office. The full set of fixes will also be available in 17.7.5 and 17.8 preview 2. Old Code, Old Problems While scaling out we encountered quite a number of complications. Some fixes involved updating Office code, while others needed to be solved on the compiler side. We'll present just a few examples from each bucket. Symbolic Links The way Office sources coalesce is a mix between sources populated by git and libraries populated by NuGet packages which are then linked into a build source tree via symbolic links. From a build perspective, this is extremely convenient because you can decouple library updates from the sources and you can have one copy of library code shared across multiple copies of the git sources. Symbolic links are interesting from a compiler perspective for two reasons: diagnostics and #pragma once. For source locations, there are two options: use the symbolic link as the file name or use the physical file the link resolves to. When issuing diagnostics, the compiler tries to use the symbolic link location because that is what was provided on the /I line, but there are cases where you might want the physical file location. #pragma once is a whole other beast with respect to symbolic links. In the beginning, C-style include guards were created as a method of preventing repeated file content. #pragma once came about as a method of preventing inclusion of the same file. The distinction between file and content is part of the reason why #pragma once is difficult to standardize due to its reliance on filesystem vagaries to identify what it means to point to the "same file". Consider a small example: * Real file: C:\inc\a.h which contains a single #pragma once * Symlink 1: C:\syms\inc\lib1\a.h -> C:\inc\a.h * Symlink 2: C:\syms\inc\lib2\a.h -> C:\inc\a.h Further consider a header in C:\syms\inc\lib2\b.h: #pragma once #include "a.h" Then in our sources we have: #include "lib1/a.h" #include "lib2/b.h" Let's dive into what should happen here: * The compiler sees symlink #1 through "lib1/a.h", sees #pragma once in the file content and records that file C:\syms\inc\lib1\ a.h is associated with a pragma once. * The compiler reads "lib2/b.h" and sees an inclusion of "lib2/ a.h". * Symlink #2 is then read and the compiler observes that #pragma once is in the file content and records that file C:\syms\inc\ lib2\a.h is associated with a pragma once. See a problem? The fact that the compiler read the file content from symlink #2 is where things start to go wrong. What should have happened is that the compiler unwraps the symbolic link to discover what the underlying file is and records the real file as the owner of the pragma once, which is exactly what we did to solve the problem in normal compilation scenarios. How does the situation above play into header units? Header units need to work like a PCH where they record macros and pragma state, which includes #pramga once. This means that the IFC needs to record symbolic links and their "unwrapped" files so that the compiler can properly enforce #pragma once. After this work was done, many more scenarios were unblocked in the Office build. As a side note: it is always better to rely on standard C++ features to prevent repeated file content inclusion and doing so side-steps the symbolic link problem entirely. Inconsistent Conditional Compilation We called out the most critical source of issues in the first blog post: inconsistent conditional compilation. As the experiment added projects we ran into an increasing number of conflicts. Individual projects, or sometimes isolated build nodes, couldn't agree if the default char type is unsigned, if RTTI should be enabled, or if UNICODE support should be set, and that's only in command line options! Across liblet headers there were also code level macros to detangle: conditional selection of a memory allocator, masks for disallowed Windows SDK functions, plus the ASSUME macro example still hadn't been resolved! Solving such problems could require touching nearly every project in Office. Internal Linkage Declarations at namespace, including global, scope marked static have internal linkage. With textual includes this isn't an issue for declarations such as static const float pi = 3.14 because the contents of the header file become part of the translation unit. The source used to compile a header unit is a translation unit unto itself and thus the definition of pi will not be visible when it is imported. Fortunately, the fix is simple. Such data declarations should be marked as inline constexpr. However, after converting the constant declarations to be inline constexpr we were still seeing many unresolved symbols during linking. These issues ultimately required a compiler change so that IFC could contain the full initialization information for the data. Naturally, the compiler had to account for the common case where global variables marked as inline constexpr should have their values encoded into the actual IFC. In general, the compiler already does this for simpler values such as integral types, but more complex user-defined types or arrays of objects had to be accounted for. As a compiler optimization, the variable is only instantiated when it is referenced. More specifically, the compiler will only materialize the initializer if the variable is odr-used. Mismatched Include Paths Office has continued to utilize the /translateInclude flag to consume header units without rewriting source code. For this to operate as expected the #include directives in source must match what is specified by a /headerUnit switch. In Office the most common issue was a reference that used the internal path to a header instead of the external one. Example /headerUnit:quote componentA/publicheader.h=ifcdir/ switch: publicheader.h.ifc Incorrect #include path: Correct #include path: In the above example the incorrect path will not match the header-filename portion of the /headerUnit switch and thus will be textually included. Finding incorrect include paths is tricky because the code will compile correctly. The best way to discover any issues is to examine the output provided by /sourceDependencies for unexpected textual includes. Updating the IFC Specification There's been a perennial feedback bug since the original modules implementation in the compiler: `using namespace` declaration ignored when compiling as a header unit module. The problem was that the IFC had no representation for using-directives at namespace scope (e.g. using namespace std;). It turns out that Office also ran into this issue so to continue the experiment, we had to fix it. After some thinking through the problem, we came up with IFC-76 which describes an encoding method for persisting directives in a translation unit. Breaking Historical Assumptions Before modules existed, the compiler had been around for nearly 25 years. This was enough time for the toolchain to develop several assumptions about how the front-end conveys data to the back-end. One such coupling was encoding type information directly into compiled functions. Handles to the compiler-generated type information are 0-index-based and the underlying data is generated along with each handle. There was one case where this type index was emitted directly into a tree for the purposes of annotating type information with new expressions for the debugger: int* make_int() { return new int{}; // <- generated a direct encoding of the type index for 'new' } Why does this encoding cause a problem for modules? Since the compiler is now persisting compiled inline functions into the IFC the compiler also persists this type index from one compilation to the next and would, sometimes, cause a linker crash as it goes to lookup the type index from the PDB but crashes because the index for that particular translation unit is based on an array generated in a completely different translation unit. Office was able to reveal this issue very quickly as many of the link steps involved lots of translation units from which this bug would surface. Windows SDK Woes There is a, quite an old now, bug out: Visual Studio can't find time () function using modules and std.core. The root cause here is that the UCRT contains a definition of the time() function from C where it is defined as a static inline function within the SDK header. These static inline functions stem from C not having the C++ meaning of inline but static inline on a function declaration allows the function to behave as if it had C++ inline-like semantics. Note that defining the C standard function time() as static inline is in direct violation of the C standard which explicitly says that C standard library functions have external linkage. The Windows SDK team is hard at work fixing the issue above along with some other SDK issues that have plagued C++ modules interactions in the past, so stay tuned for fixes soon! Rethinking Compiler Tooling As we scaled the number of projects being built by the compiler using the header unit technology it became immediately evident that we quickly needed to rethink how to debug compiler problems. The traditional loop involved either reducing the failure to a simple two or three file repro or attaching to a remote debugging instance where the compiler was running on the build machine. It's easy to see how and why these approaches do not scale. Here are the concrete problems we needed to solve: * Reproduction data collection should be asynchronous. We did not want the process of capturing a repro to be a blocking task. * The data emitted by the compiler should be rich enough to reproduce the failure without debugging a remote compiler. * The process of emitting data should be completely opt-in and not have a performance impact if you did not request it. * The tool should offer a powerful visualization of the data such that we can easily navigate it and identify the underlying problem quickly. With the requirements outlined we designed a system inside the compiler which would act as a trace logging system for any modules-related functionality. If you would find value in using these types of tools, please let us know! A New Approach to Referencing an IFC Before the Office header unit experiments, the compiler relied on a pair of command line switches to specify individual IFC files: / reference for named modules, and /headerUnit for header units. It turns out that when thousands of header units or named modules are involved the command line grows quite long and unwieldy! An enormous list of flags is difficult to work with if there are compiler bugs to investigate, as you cannot 'comment' out a header unit reference easily. We solved this problem by implementing a new way of conveying IFC dependencies to the compiler: /ifcMap. The /ifcMap allows the user to provide an IFC reference map file, which is a subset of the TOML file format, to the compiler which details a mapping from named module name or header-name to its respective IFC which should be loaded. Here's a quick example of a valid .toml file for the switch: # Header Units [[header-unit]] name = ['quote', 'm1.h'] ifc = 'm1.h.ifc' [[header-unit]] name = ['quote', 'm2.h'] ifc = 'm2.h.ifc' # Modules [[module]] name = 'm1' ifc = 'm1-renamed.ifc' [[module]] name = 'm2' ifc = 'm2-renamed.ifc' /ifcMap allowed office to scale the number of header units painlessly and offer a solution to easily manage lots of header unit references beyond having them splat on one giant command line. The IFC map also enables a tight iterative approach when considering factors like debugging needs, both for the developer and the compiler team. Playing Nice with Precompiled Headers (PCH) Part of scaling out for the compiler is that large projects often use PCH as a way of achieving build speed. PCH is a reliable technology and has had the benefit of over 30 years of hardening and optimization. Header units as the standardized replacement for PCH still need to integrate seamlessly into these older build environments still using tried and true PCH technology. Furthermore, Office needs to maintain compatibility with the non-Windows platforms that aren't ready to adopt header units yet. The approach mentioned last time to force include the Office shared precompiled header file into each header unit resulted in a lot of duplicated parsing in the compiler. To that end we added support to consume the binary PCH directly when creating a header unit. This resulted in a nice performance win when compiling header units! Unfortunately, this resulted in a large build throughput degradation when consuming header units. As much as possible we would need to get precompiled headers out of the picture. The first, naive, tactic was to ensure that each public header was truly self-contained. Eliminating invisible dependencies felt like a virtuous task, even without considering the benefits to the header unit experiment! Thousands of #include and #include additions later, we were ready to test performance again... and it was barely any better. The issue is that precompiled headers and IFC are fundamentally different technologies. Requiring the compiler to reconcile data from both sources is incredibly wasteful. The headers being compiled into header units were free of binary PCH dependencies but a majority of cpp files were still using both technologies. At this point it's worth noting that we measured a significant build throughput improvement in projects that consumed header units but did not utilize a precompiled header. Individual compilands that switched from traditional PCH consumption to PCH-as-header unit import, saw build time improvements well above 50%. This is one of the key benefits that modules promised! The idea to create a header unit out of the existing PCH was the flash of insight we needed to keep the experiment moving forward! Although the compiler allows mixing PCH and header units it's much more efficient to "pick a lane" in your build system. By not presenting duplicate information to the compiler, it's easier to create build throughput wins. Selecting a Launch Project: Microsoft Word Although we were able to scale the experiment up and generate over 5000 header units from our liblet public headers, some low-level blocks needed to be cleared before we could utilize header units in our production build environment. We looked for a sizable project that could avoid the inconsistent conditional compilation issues that needed more time to clean up. Luckily, we found a great candidate in Microsoft Word. Word has utilized MSVC's C++ Build Insights to craft optimal precompiled headers. Specifically, the techniques presented in Faster builds with PCH suggestions from C++ Build Insights - C++ Team Blog were used to measure the performance benefit for each individual file included in their main PCH. Word was to be our first test converting existing precompiled headers directly to header units. At a high level the steps required were: 1. Create a header unit instead of a pch. Switch/Yc to / exportHeader 2. Replace the use PCH flag with a standard header unit reference: / Yu to /headerUnit:quote word_shared.h=path/to/word_shared.ifc 3. Profit? The code changes required in Word after adjusting the build flags were similar to what was described above. Constants needed to be made inline constexpr, missing includes or forward declarations added, and a handful of function definitions moved out-of-line. In total only 2 dozen C++ files in Word required code changes to compile successfully after the switch! The most unexpected of these changes were to standardize on quotes instead of angle brackets when writing the PCH's #include for the sake of consistency with the /headerUnit switch. Along with the conversion from Word's PCH to header units we created a header unit from the standard library. In the C++23 world, C++ projects can utilize the standard library module that ships alongside the compiler via import std; or import std.compat;. Unfortunately, Office makes edits to the standard library and thus must create its own header unit or named module. With this set of changes, we proved it possible to compile, link and launch Microsoft Word with header units! Image of Microsoft Word running after being built using header units Looking Ahead: Throughput The next step was to demonstrate the advantages that header units would bring to the Word engineering team. Fortunately, we were able to show a build performance improvement great enough that the team agreed to adopt header units into the Office production build system alongside msvc 17.6.6! In our next installment we'll go over our performance findings in depth. Closing As always, we welcome your feedback. Feel free to send any comments through e-mail at visualcpp@microsoft.com or through Twitter @visualc . Also, feel free to follow Cameron DaCamara on Twitter @starfreakclone. If you encounter other problems with MSVC in VS 2019/2022 please let us know via the Report a Problem option, either from the installer or the Visual Studio IDE itself. For suggestions or bug reports, let us know through DevComm. [png] Cameron DaCamara Senior Software Engineer, Visual C++ Follow [png] Zachary Henkel Principal Software Engineer, Microsoft Office Follow Posted in C++ General C++ SeriesTagged C++ C++ language C++20 modernization modules Office Read next MSVC Address Sanitizer - One DLL for all Runtime Configurations With Visual Studio 2022 version 17.7 Preview 3, we have refactored the MSVC Address Sanitizer (ASan) to depend on one runtime DLL regardless of the runtime configuration... [png]Amy Wishnousky August 10, 2023 4 comments Introducing CMake Debugger in VS Code: Debug your CMake Scripts using Open-Source CMake Debugger The new CMake Debugger that was introduced in Visual Studio is now available in VS Code. Now, you can debug your CMakeLists.txt scripts from VS Code using the CMake Tools... [png]Sinem Akinci August 9, 2023 1 comment 3 comments Leave a commentCancel reply Log in to join the discussion. * [png] Dwayne Robinson September 11, 2023 11:21 pm 1 collapse this comment copy link to this comment This is great for such a large codebase to shine even more light on dark corners and bring the feature to maturity. Keep it up . Looking forward to 3/n... Log in to Vote or Reply * [png] Paulo Pinto September 12, 2023 6:29 am 0 collapse this comment copy link to this comment Given the ongoing discussions in regards to supporting header units in other compilers and build tools, I am quite curious how the Office team will address having header units on the cross-platform libraries used by Office on other platforms. Log in to Vote or Reply * [png] qbprog September 12, 2023 12:21 pm 0 collapse this comment copy link to this comment what is the cmake status about headerUnits at this point? Log in to Vote or Reply Relevant Links Getting Started with C++ in VS Bring Your Existing C++ Code to VS C++ Code Editing & Navigation C++ Unit Testing C++ Debugging & Diagnostics Collaborating with Your Team in VS C++ Windows Development C++ Linux Development C++ Android & iOS Development C++ Game Development Topics C++ Announcement CMake New Feature Linux Visual Studio Code Diagnostics General C++ Series performance Vcpkg Game Development Writing Code OpenFolder Experimental Documentation New User IoT OpenMP Containers Survey faster Clang embedded Coroutine GitHub VC++ Migration Documentation Migration DevLab C++ Q&A Series Pure Virtual C++ Featured Mobile Trip Report Archive September 2023 August 2023 July 2023 June 2023 May 2023 April 2023 March 2023 February 2023 January 2023 December 2022 November 2022 October 2022 September 2022 August 2022 July 2022 June 2022 May 2022 April 2022 March 2022 February 2022 January 2022 December 2021 November 2021 October 2021 September 2021 August 2021 July 2021 June 2021 May 2021 April 2021 March 2021 February 2021 January 2021 December 2020 November 2020 October 2020 September 2020 August 2020 July 2020 June 2020 May 2020 April 2020 March 2020 February 2020 January 2020 December 2019 November 2019 October 2019 September 2019 August 2019 July 2019 June 2019 May 2019 April 2019 March 2019 February 2019 January 2019 December 2018 November 2018 October 2018 September 2018 August 2018 July 2018 June 2018 May 2018 April 2018 March 2018 February 2018 January 2018 December 2017 November 2017 October 2017 September 2017 August 2017 July 2017 June 2017 May 2017 April 2017 March 2017 February 2017 January 2017 December 2016 November 2016 October 2016 September 2016 August 2016 July 2016 June 2016 May 2016 April 2016 March 2016 February 2016 January 2016 December 2015 November 2015 October 2015 September 2015 August 2015 July 2015 June 2015 May 2015 April 2015 March 2015 February 2015 January 2015 December 2014 November 2014 October 2014 September 2014 August 2014 July 2014 June 2014 May 2014 April 2014 March 2014 February 2014 January 2014 December 2013 November 2013 October 2013 September 2013 August 2013 July 2013 June 2013 May 2013 April 2013 March 2013 February 2013 January 2013 December 2012 November 2012 October 2012 September 2012 August 2012 July 2012 June 2012 May 2012 April 2012 March 2012 February 2012 January 2012 December 2011 November 2011 October 2011 September 2011 August 2011 July 2011 June 2011 May 2011 April 2011 March 2011 February 2011 January 2011 December 2010 November 2010 October 2010 September 2010 August 2010 July 2010 June 2010 May 2010 April 2010 March 2010 February 2010 January 2010 December 2009 November 2009 October 2009 September 2009 August 2009 July 2009 June 2009 May 2009 April 2009 March 2009 February 2009 January 2009 December 2008 November 2008 October 2008 September 2008 August 2008 July 2008 June 2008 May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006 November 2006 October 2006 September 2006 August 2006 July 2006 June 2006 May 2006 Stay informed [ ] [Subscribe] By subscribing you agree to our Terms of Use and Privacy Policy Share on Social media * * * Login Theme * light-theme-iconLight * dark-theme-iconDark Insert/edit link Close Enter the destination URL URL [ ] Link Text [ ] [ ] Open link in a new tab Or link to existing content Search [ ] No search term specified. Showing recent items. Search or use up and down arrow keys to select an item. Cancel [Add Link] Code Block x Paste your code snippet [ ] Cancel Ok Feedback usabilla icon What's new * Surface Pro 9 * Surface Laptop 5 * Surface Studio 2+ * Surface Laptop Go 2 * Surface Laptop Studio * Surface Go 3 * Microsoft 365 * Windows 11 apps Microsoft Store * Account profile * Download Center * Microsoft Store support * Returns * Order tracking * Trade-in for Cash * Microsoft Store Promise * Flexible Payments Education * Microsoft in education * Devices for education * Microsoft Teams for Education * Microsoft 365 Education * How to buy for your school * Educator training and development * Deals for students and parents * Azure for students Business * Microsoft Cloud * Microsoft Security * Dynamics 365 * Microsoft 365 * Microsoft Power Platform * Microsoft Teams * Microsoft Industry * Small Business Developer & IT * Azure * Developer Center * Documentation * Microsoft Learn * Microsoft Tech Community * Azure Marketplace * AppSource * Visual Studio Company * Careers * About Microsoft * Company news * Privacy at Microsoft * Investors * Diversity and inclusion * Accessibility * Sustainability Your Privacy Choices Your Privacy Choices * Sitemap * Contact Microsoft * Privacy * Manage cookies * Terms of use * Trademarks * Safety & eco * Recycling * About our ads * (c) Microsoft 2023