[HN Gopher] Include-what-you-use: A tool to analyze includes in ...
       ___________________________________________________________________
        
       Include-what-you-use: A tool to analyze includes in C and C++
       source files
        
       Author : st_goliath
       Score  : 105 points
       Date   : 2021-04-20 14:43 UTC (8 hours ago)
        
 (HTM) web link (include-what-you-use.org)
 (TXT) w3m dump (include-what-you-use.org)
        
       | qbonnard wrote:
       | I haven't used C++ in a while (sadly), but in my days I had
       | stumbled upon deheader[0] which I don't seem mentioned here. From
       | what I remember, it was very simple and easy to use, and yielded
       | useful results.
       | 
       | [0] http://www.catb.org/~esr/deheader/deheader.html
        
         | [deleted]
        
       | eliora wrote:
       | i want to be a hacker
        
         | MaxBarraclough wrote:
         | Welcome to HN.
         | 
         | Please check out the Guidelines and the FAQ, [0][1] regarding
         | how best to participate. HN is friendly to curiosity, but not
         | to off-topic comments, which is why you've been downvoted. If
         | you'd like to discuss how to become a programmer, either find a
         | thread where that's being discussed, or submit an _Ask HN_
         | thread, following the style of this thread [2].
         | 
         | [0] https://news.ycombinator.com/newsguidelines.html
         | 
         | [1] https://news.ycombinator.com/newsfaq.html
         | 
         | [2] https://news.ycombinator.com/item?id=24810399
        
       | anarazel wrote:
       | I found IWYU pretty annoying, due to its tendency to also include
       | transitive includes. Some of those often end up about
       | implementation details and are much more likely to be
       | added/removed. But maybe the projects using it that I worked on
       | were using it wrong?
        
         | anand-bala wrote:
         | If the issue is with including transitive dependencies that are
         | in your own codebase, then you should annotate the public
         | interface header to the implementation details with IWYU
         | Pragmas [1] that export the implementation (for example [2]).
         | 
         | If this is in third-party libraries, you can use IWYU Mappings
         | [3] to map the "private" headers (usually the transitive
         | include) to the public interface. An example that I use for the
         | PEGTL library [4].
         | 
         | [1]: https://github.com/include-what-you-use/include-what-you-
         | use...
         | 
         | [2]: https://github.com/anand-bala/signal-temporal-
         | logic/blob/800...
         | 
         | [3]: https://github.com/include-what-you-use/include-what-you-
         | use...
         | 
         | [4]: https://github.com/anand-bala/signal-temporal-
         | logic/blob/800...
        
         | johnnyapol wrote:
         | I think it definitely can be a project thing. My experience
         | with IWYU has been on very large codebases and I considered its
         | ability to find transitive includes a blessing. The specific
         | case where it shined for me was it made it much easier to
         | identify the true impact of fileset changes on the larger
         | codebase when it came to refactoring.
        
       | Blikkentrekker wrote:
       | i rather more like _OCaml_ 's way of doing things that often
       | releases one entirely from having to write module inclusion
       | directives, since in general bindings are qualied by their module
       | name which becomes part of their namespace. So one would use
       | `Array.map` in code, which is the `map` binding exported by the
       | `Array` module, and the `Array` module is then included
       | automatically, of course. This would be `array_map` in many
       | languages to avoid conflicts, but modules in OCaml deliberately
       | export short names on the expectation that bindings will be
       | namespace qualified with their module name.
       | 
       | It is possible to explicitly open this module, so that one can
       | use `map` instead, but that's generally not wise.
       | 
       | I find having to write a long list of include directives at the
       | top of a file quite annoying, and this also does not betray in
       | what module exactly bindings are defined that one might encounter
       | in the code below them. If I encounter, say, `Net.Tcp.open` in
       | _Ocaml_ code, I know that this function is defined in `.
       | /net/tcp.ml`.
        
       | vbernat wrote:
       | Is that robust? Depending on the system, libc, compiler, some
       | includes may be unused while others may be needed.
        
         | quantumofalpha wrote:
         | iwyu is Google's project originally. It has worked for them for
         | more than a decade on their ginormous monorepo.
         | 
         | Sometimes it gets some things wrong, so you have these escape
         | hatches to control it: https://github.com/include-what-you-
         | use/include-what-you-use...
        
           | cperciva wrote:
           | Even with a monorepo this isn't necessarily safe -- if you
           | have a mix of x86 and arm servers, you'll need different
           | headers included for intrinsics for example.
        
             | quantumofalpha wrote:
             | Conditionally-off blocks of code under #ifdefs are
             | challenging for it, yes - it runs a proper C++ compiler on
             | your code and won't get to see code in those blocks without
             | the right defines.
             | 
             | Don't blindly apply its suggestions - test them, skim to
             | see what it got wrong, sprinkle some "// IWYU pragma: keep"
             | to help it out in corner cases. The tool is more like a
             | linter, you don't follow everything that your linter tells
             | you to, no?
        
       | ur-whale wrote:
       | What I've always wanted is to write C++ code and have the minimal
       | set of necessary includes needed to compile my code automatically
       | added [edit: I should have said "managed"] by the IDE.
       | 
       | How close can this tool get to that goal?
        
         | burntoutfire wrote:
         | I've just Googled the same question. The answers seem to
         | glorify the suffering of writing C++ and suggest that the
         | inquirer would perhaps be better off with switching to Java...
         | Sounds like a case of Stockholm syndrome to me.
         | 
         | Anyway, I'm a beginner in C/C++ world and the most convincing
         | solution I've found to use in my personal project is the Single
         | Compilation Unit approach
         | (https://en.wikipedia.org/wiki/Single_Compilation_Unit). It is
         | exemplified in the Handmade Hero github repository (which I'm
         | afraid is available for paying users only). Essentially, the
         | whole program is divided into modules, each within its own
         | single cpp file. The modules are then all included in the SCU,
         | which is the only file passed to the compiler. There can be no
         | circular dependencies between modules (as then, there would be
         | no order of including them in SCU which would work). In HH's
         | case, there seems to be an absolutely minimal number and volume
         | of headers and they only define data structures, never declare
         | functions.
        
         | aflag wrote:
         | CLion is capable of adding missing includes, I'm not sure if it
         | tells you about unused ones. They have a free trial, may be
         | worth a try.
        
       | inetknght wrote:
       | My experience with IWYU has been mixed. In general it's a
       | success. But it had trouble identifying that some headers were
       | only conditionally needed (eg, debug build or macro conditional).
       | Those cases are easy to work with if you own the code but can be
       | annoying if it's in a third party lib.
       | 
       | That said, I do highly recommend its use.
        
         | anand-bala wrote:
         | I've found that using IWYU Pragmas [1] for codebases you own
         | and IWYU Mappings [2] for third-party libraries __almost__
         | entirely eliminates weird IWYU suggestions (there are a few
         | annoyingly stupid suggestions from the tool I just ignore).
         | 
         | I've also recently been making libraries I write compatible
         | with users that run IWYU by annotating all public headers with
         | IWYU pragma comments that export symbols/transitive includes
         | correctly, etc.
         | 
         | [1]: https://github.com/include-what-you-use/include-what-you-
         | use...
         | 
         | [2]: https://github.com/include-what-you-use/include-what-you-
         | use...
        
       | marcodiego wrote:
       | IWYU is responsible for many lines of code that have been removed
       | from libreoffice:
       | https://cgit.freedesktop.org/libreoffice/core/log/?qt=grep&q...
        
       | dasloop wrote:
       | C++ 20 Modules will save us (eventually)
        
         | Kranar wrote:
         | How so? It will switch the problem from include what you use to
         | import what you use.
         | 
         | Other languages with modules have a similar issue. Go is the
         | only language I know of that makes it a hard compiler error to
         | import an unused module.
        
           | ot wrote:
           | Modules won't allow to rely on transitive includes, which is
           | one half of the problem. It won't solve the other half
           | (importing too much).
        
         | wyldfire wrote:
         | Kinda. I think a primary use case for modules is to help with
         | out-of-control compile times.
         | 
         | But the specific problem of include-what-you-use will still be
         | encountered if you include directly from C libraries like
         | system headers or library dependencies.
        
           | Kranar wrote:
           | Unfortunately modules do not have a significant impact on
           | compile times and in some cases can increase compiles times
           | due to inhibiting parallelism.
        
             | pjmlp wrote:
             | VC++ already does multihreading code generation across
             | multiple compiler phases, using modules won't change that.
        
               | Kranar wrote:
               | No it doesn't, cl.exe's compiler is an inherently single
               | threaded application. Parallelism in VC++ is achieved by
               | running multiple copies of cl.exe with one serving as the
               | primary instance and the rest as followers. The primary
               | instance forwards individual translation units to the
               | followers and waits for the followers to complete
               | compilation, then at the end the primary instance
               | terminates and the linker is invoked.
        
               | pjmlp wrote:
               | Not up to date?
               | 
               | https://docs.microsoft.com/en-
               | us/cpp/build/reference/cgthrea...
        
               | Kranar wrote:
               | That is a linker option, not a compiler option. Modules
               | have no effect on linking one way or another as linking
               | is fairly independent of the compilation process.
        
               | pjmlp wrote:
               | I mentioned code generation, you don't execute .obj
               | files.
        
               | Kranar wrote:
               | Then your comment is off-topic and creates confusion. My
               | point was modules inhibit the parallelism of the
               | compilation process, compile times, not that it has any
               | effect on the link times.
               | 
               | Modules do not have any effect on the linker one way or
               | another. They are independent of it.
        
               | pjmlp wrote:
               | Modules will bring compiler and linker work more closer,
               | just like other languages not tainted by UNIX toolchain
               | model.
               | 
               | Some C++ developers can keep using their pre-historic
               | UNIX like tooling, whereas others will embrace the fusion
               | of compiler, linker and build system.
        
             | gsliepen wrote:
             | This is not true. Compile times are usually much better
             | with modules. They also don't inhibit parallelism, but
             | perhaps you are referring to this paper (http://www.open-
             | std.org/jtc1/sc22/wg21/docs/papers/2019/p144...), which
             | shows that, with compiler versions from 2019, it can indeed
             | be slower to compile with modules if you have a large
             | number of threads and the depth of the module dependency
             | graph is large.
        
               | Kranar wrote:
               | Yes that's the correct benchmark.
               | 
               | Do you have evidence that the situation has changed? Last
               | I checked it still remains the case that modules inhibit
               | parallelism and hence result in slower builds in most
               | practical work loads. But of course if you have evidence
               | the contrary I'd be happy to see it.
        
               | gsliepen wrote:
               | I don't know of any newer benchmarks. However, I'm
               | reading the results differently I guess, because the
               | results show that with 128 threads, modules become slower
               | only when the DAG depth is higher than 29, and that's
               | quite a large depth! It also looks like each source file
               | used in the benchmark only imports other modules and
               | declares 300 variables, but nothing else. Practical
               | workloads will have more interesting stuff in the source
               | files, so I would expect the impact of module loading to
               | be less, so more can be done in parallel.
        
               | account4mypc wrote:
               | > with 128 threads, modules become slower only when the
               | DAG depth is higher than 29
               | 
               | yeah, but the same graph _never_ shows modules being
               | faster... it only ever shows them being the same or
               | slower. If I 'm going to put in all that work, the result
               | should be *faster*
        
               | volta83 wrote:
               | > This is not true. Compile times are usually much better
               | with modules.
               | 
               | What significantly improves compile-times is Pre-Compiled
               | Headers (PCH), which most compilers have supported for
               | decades.
               | 
               | The study you mention, does not show data for them.
               | 
               | Having ported one >1 million LOC C++ app to use modules
               | in two compilers, the compile time improvement of modules
               | over PCH was not distinguishable from noise.
               | 
               | Modules have many advantages, like better encapsulation,
               | etc.
               | 
               | The main thing people want from them seems to be better
               | compile times, which is the one thing they don't deliver,
               | at least over the PCH solutions that have existed for
               | decades, are already supported by all build systems, etc.
               | 
               | Compared to modules, PCHs are "zero-effort" and deliver
               | performance instantaneously.
        
               | colomon wrote:
               | Off-topic, but is there a guide to best practices for
               | portable pre-compiled headers out there somewhere? I'm
               | under considerable pressure to add pre-compiled headers
               | for Windows to my code, and it won't have any significant
               | benefit for me unless I can also make it work on MacOS
               | and Linux. So far my Googling has turned up little
               | information for any platform other than Windows, and
               | nothing that would suggest how to do it well for all
               | three platforms. (Well, more to the point, Visual C++,
               | clang, and g++.)
        
               | volta83 wrote:
               | Does your project use CMake ?
               | 
               | CMake supports these with all major compilers...
               | 
               | I'll just google "<your build system> pre-compiled
               | headers" and see if there is a flag or option that you
               | can enabled.
               | 
               | You will definetly need quite a bit of fine tuning for
               | apps over 500k LOC or so, but if your app is under that,
               | and you are splitting code between .h and .cpp files
               | appropriately, just flipping a flag might get you 80%
               | there.
               | 
               | The speed ups you see people get from PCHs is like 20-30%
               | faster compile-times. So they are more a "nice to have"
               | feature than something that will solve your compile-time
               | problems.
               | 
               | If your app is structured in such a way that it takes 20
               | min to compile, this can cut it to 15 min at most, but
               | that would probably still suck. If you want more, then
               | you'd need to consider other solutions like distributed
               | build caches (sccache, etc.).
        
               | jcelerier wrote:
               | With cmake it's just target_precompile_headers: https://c
               | make.org/cmake/help/latest/command/target_precompil...
        
             | dasloop wrote:
             | My understanding is just the opposite, they will decrease
             | compilation times as "included files" are processed just
             | once. We can see them as a better version of precompiled
             | headers (although they are more than that).
        
               | Kranar wrote:
               | Yes except that includes are usually not the performance
               | bottleneck, it's the semantic analysis that consumes the
               | bulk of the compile times.
               | 
               | Modules inhibit parallelism because modules are ordered
               | along a DAG and must be compiled from the root of the DAG
               | down to the leafs in order. So consider a traditional
               | setup as follows:
               | 
               | A.cpp <- A.h <- B.h <- C.h <- D.h
               | 
               | B.cpp <- B.h <- C.h <- D.h
               | 
               | C.cpp <- C.h <- D.h
               | 
               | D.cpp <- D.h
               | 
               | All four of those cpp files can be built in parallel,
               | even though you're right that all of the header files are
               | being reparsed multiple times. My claim is that parsing
               | header files is incredibly cheap, it's translating the
               | .cpp files that's expensive because cpp files are where
               | the bulk of the semantic analysis and type checking is
               | performed.
               | 
               | With modules, the same compilation model looks like this:
               | 
               | A.mxx <- B.mxx <- C.mxx <- D.mxx
               | 
               | There's no longer header/source and there's no longer
               | redundancy, but I can't build this in parallel anymore. I
               | have to first build D.mxx, then C.mxx, then B.mxx then
               | A.mxx in serial.
        
               | [deleted]
        
               | dbaupp wrote:
               | Parsing a single header file in isolation is cheap, but
               | each header will include others, and templates mean many
               | headers contain large amounts of code inline. For
               | instance, just including <vector> results in the compiler
               | having to look at almost 30kloc, on my system:
               | $ clang -x c++ -E - <<<"#include <vector>" | wc -l
               | 27378
               | 
               | Other headers are similar:                  algorithm
               | 23103        array     23450        memory    15909
               | random    52107        thread    31424        tuple
               | 9240
               | 
               | (Of course, a bunch of this code is shared, e.g.
               | including both thread and vector is "only" 35713 loc
               | total, not 60kloc.)
               | 
               | I believe C++ compilers have SIMD-accelerated
               | lexers/parsers because of the sheer explosion of code due
               | to headers and templates.
        
       | bradford wrote:
       | semi related, but I'm coming back to C++ after a long hiatus (15
       | years). I realize this is probably a newb question...
       | 
       | The code base I'm working in is very large and I have a recurring
       | problem where I see a term (class/variable/etc) being used in a
       | cpp file, and want to know which header file contains the
       | definition.
       | 
       | What's the quickest, easiest way to do this?
       | 
       | I've been using grep, but the size of the code base, combined
       | with the large number of #includes in each cpp file, makes this
       | inefficient.
       | 
       | I believe I can use ctags/vim, but I last used that circa 2000
       | and I'm curious to know what other static analysis solutions have
       | cropped up since then.
       | 
       | Does IWYU address this scenario? I'm using clang as a compiler if
       | that's at all relevant.
        
         | blcArmadillo wrote:
         | There are lots of options:
         | 
         | - Yes, ctags/vim would work
         | 
         | - You could use something like vscode
         | 
         | - Consider checking out cscope. With cscope you can also build
         | a reverse index which lets you find where things are called. It
         | can be used with something like vim but also has a pretty nice
         | TUI.
        
         | drummer wrote:
         | If you use Visual Studio, it is as easy as right clicking on
         | the typename or variable and choosing to go to the declaration
         | or definition.
        
         | inetknght wrote:
         | > _The code base I 'm working in is very large and I have a
         | recurring problem where I see a term (class/variable/etc) being
         | used in a cpp file, and want to know which header file contains
         | the definition._
         | 
         | A good IDE will have a feature to let you locate the
         | declaration and/or definition of any variable or type.
         | 
         | I've found that a lot of IDEs have that feature completely
         | broken. Qt Creator, for example, is easily confused and comes
         | with all kinds of Qt garbage^H^H^H^H^H^H^H^H baggage. CLion is
         | a resource hog and often just hangs. Visual Studio is usually
         | pretty good -- assuming you're using Windows. VS _Code_ is
         | "okay" but I've found it's more of a headache to set up. I
         | don't have experience with XCode since I've never used OSX for
         | development.
         | 
         | I've found the most reliable way is to learn how to use `grep`
         | and pair that with understanding _where_ to search; the project
         | source directory of course but also system headers and any
         | libraries installed to non-system locations. That knowledge
         | translates to usefulness in other workflows too.
        
         | anand-bala wrote:
         | In most cases, what you are looking for is a language server
         | like `clangd` (works for most compilers) [1].
         | 
         | You can find a Language Server Protocol implementation for your
         | editor at [2] (I don't think it lists __all__ clients, but it
         | should include the most popular ones).
         | 
         | EDIT: I realized that this is a vague answer, so let me
         | clarify.
         | 
         | An LSP implementation (especially clangd) provides actions like
         | `go-to definition` or `find references` that you would find in
         | full-featured IDEs like CLion (which is also amazing BTW).
         | Since you mentioned vim, I am guessing you use it and don't
         | necessarily want to let go of the hand-crafted vimrc you have
         | created. Adding an LSP plugin to Vim is incredibly easy and
         | gives you these "IDE" features with customizable mappings.
         | 
         | [1]: https://clangd.llvm.org/
         | 
         | [2]: https://langserver.org/#implementations-client
        
           | bradford wrote:
           | Thanks! I read about using LSP/Clangd with vim via
           | [coc](https://github.com/clangd/coc-clangd) and I think
           | that's the path I'll try going down.
           | 
           | Other responses, thanks for your input. Just want to clarify
           | that I have tried VS and VSCode with limited success
           | (sometimes search works, sometimes it doesn't, and my biggest
           | gripe is an occasional lack of transparency into what's going
           | on under the cover). I think any solution is going to require
           | some investment on my part and LSP sounds like a good
           | investment.
        
         | f00zz wrote:
         | I use ctags+vim every day with rather large C++ codebases (but
         | then again I'm a dinosaur).
        
           | bradford wrote:
           | Curious, I was under the impression that ctags offers a 'jump
           | to definition' functionality, but little more. (i.e., 'find
           | all references' isn't supported).
           | 
           | Is that correct? Do you use if for functionality beyond the
           | 'jump to definition/jump back to previous context'?
        
       | MauranKilom wrote:
       | Take note:
       | 
       | > CAVEAT
       | 
       | > This is alpha quality software -- at best (as of July 2018). It
       | was originally written to work specifically in the Google source
       | tree, and may make assumptions, or have gaps, that are
       | immediately and embarrassingly evident in other types of code.
       | 
       | > While we work to get IWYU quality up, we will be stinting new
       | features, and will prioritize reported bugs along with the many
       | existing, known bugs. The best chance of getting a problem fixed
       | is to submit a patch that fixes it (along with a test case that
       | verifies the fix)!
       | 
       | https://github.com/include-what-you-use/include-what-you-use...
       | 
       | Further useful docs:
       | 
       | Why Include What You Use? https://github.com/include-what-you-
       | use/include-what-you-use...
       | 
       | What Is A Use? https://github.com/include-what-you-use/include-
       | what-you-use...
       | 
       | Why Include What You Use Is Difficult https://github.com/include-
       | what-you-use/include-what-you-use...
        
       | dang wrote:
       | One past related discussion:
       | 
       |  _Include-what-you-use: Clang tool to analyze includes in C and
       | C++ source files_ - https://news.ycombinator.com/item?id=10958186
       | - Jan 2016 (40 comments)
        
       ___________________________________________________________________
       (page generated 2021-04-20 23:01 UTC)