[HN Gopher] The future of Clang-based tooling
       ___________________________________________________________________
        
       The future of Clang-based tooling
        
       Author : ingve
       Score  : 99 points
       Date   : 2023-07-28 12:11 UTC (1 days ago)
        
 (HTM) web link (blog.trailofbits.com)
 (TXT) w3m dump (blog.trailofbits.com)
        
       | traxys wrote:
       | I got bitten many times by the fact that PATH is not taken into
       | account, because I use Nix to manage by dotfiles, including
       | `clangd`, but when developing libraries that target the base
       | distro (not Nix) clangd sometimes gets confused and does not
       | taken into account the headers in /usr/include, only the Nix
       | headers....
        
       | Ericson2314 wrote:
       | As one of the authors of
       | https://hsyl20.fr/home/files/papers/2022-ghc-modularity.pdf
       | (discuessed at https://news.ycombinator.com/item?id=31250141)
       | this rings really true.
       | 
       | If all we do is compiled and to end, it is really easy for the
       | pipeline stages to "rot together" such that we get these lies the
       | blog post author points out. We must to be able to start and
       | resume compilation from any point, with arbitrary programs in the
       | intermediate representation to ensure modularity doesn't regress.
       | Really glad IDE and now security use-cases are finally hammering
       | these basic software arch principles to compiler writers!
       | 
       | I recall and earlier thread, https://discourse.llvm.org/t/rfc-an-
       | mlir-based-clang-ir-cir/..., where someone else was interested in
       | the same thing. And it seems Vast (prior to open sourcing, and
       | the reveal of that name) was mentioned by the blog post author in
       | the thread. Very much hoping there is thus enough interest to get
       | this upstreamed.
       | 
       | Best of luck to everyone involved!
        
       | HybridCurve wrote:
       | Having worked with clang (and gcc) quite a bit, there are a
       | number of good points the author makes. There are a lot of cool
       | things llvm/clang has, but it feels like a lot of the tooling
       | does not mesh together as well as it should and some things lack
       | refinement.
       | 
       | My biggest gripe overall (since it could be fixed easily) is the
       | compile_commands.json. It's used by a number of tools and is
       | generally awkward, cumbersome, and has a handful of shortcomings.
       | To fix these issues, I used the intercept-build system provided
       | with LLVM to generate a more succinct build file in JSON format
       | that abstracts certain options (like paths) and groups options
       | commonly found together. The reason for this is that sometimes
       | you might be generating llvm bitcode, building clang AST, running
       | clang-analyze, or translating the build options to work with
       | either compiling or linking with GCC or Clang. For many of these
       | it helps to be able to alter options easily, which you cannot do
       | with the compile_commands.json file alone.
       | 
       | There are a number of areas like this where clang would benefit
       | greatly, without demanding an enormous amount of effort.
        
       | CoastalCoder wrote:
       | I'm really impressed with the quality of the writing. It's
       | succinct, informative, and engaging.
       | 
       | The "engaging" part might be subjective, because I've recently
       | taken a renewed interest in LLVM internals. But regardless, good
       | writing.
       | 
       | P.S. The article gives a shout-out to CodeBrowser [0]. It wasn't
       | immediately clear from the homepage, but CodeBrowser _is_ open-
       | source: [1].
       | 
       | [0] https://codebrowser.dev/
       | 
       | [1] https://github.com/KDAB/codebrowser
        
       | seeknotfind wrote:
       | Good read.
       | 
       | > When Clang is using itself incorrectly, it makes sense to
       | trigger an assertion and abort execution--it's probably a sign of
       | a bug.
       | 
       | This statement may be ambiguous. It sounds like libraries
       | shouldn't ordinarily abort on bad usages, and it's true this is a
       | nuanced subject, but you really do want to abort as a default.
       | Problematic things are introducing an abort in a code path that
       | previously worked. You have to take two steps: tracking or
       | providing a mechanic for tracking when it happens, then aborting
       | once you are sure it won't cause a problem.
       | 
       | This of course doesn't apply to all ecosystems (JS for instance,
       | due in part to diversity of environment), but this perspective is
       | not limited to the internal behavior of clang, rather it applies
       | largely to low level, important, potentially-system software.
        
         | otherjason wrote:
         | Aborting (as in calling abort(3)) inside a library is very
         | problematic if I'm writing an application that uses it. It
         | takes away the ability of the larger application to detect and
         | handle the error, simply terminating the entire process.
         | Especially in a C++ library, something like exception throwing
         | is better than an immediate abort, because the application can
         | at least catch the exception and proceed. Exceptions are
         | admittedly a controversial subject, but are easier to utilize
         | inside potentially deeply nested call stacks where explicit
         | error reporting would otherwise complicate the API.
        
       | gavinray wrote:
       | One of the most surprising things I learned about "clang" was how
       | relatively poor the "libClang" capabilities are.
       | 
       | I wanted to write a codegen tool that would auto-generate
       | bindings for C++ code, and it turns out that "libTooling" is the
       | only reasonable way to get access to the proper info you need
       | from C++.
       | 
       | Another alternative is "libClangSharp", from Tanner Gooding who
       | works on C# at Microsoft.
       | 
       | https://github.com/dotnet/ClangSharp
        
         | HybridCurve wrote:
         | This is another part of clang I've considered be almost, but
         | not quite there yet. Some of the calls to the API are not very
         | intuitive and they left too much out of libclang for it to be
         | of anything but limited use. I am not a C++ guy, and it would
         | be far too difficult for me to learn on a project such as this
         | for my purpose so I had to use GCC instead. GCC has fairly good
         | internals documentation (not just doxygen, thankfully) and the
         | code is reasonably well annotated so it was't too difficult to
         | work with.
        
         | mathisfun123 wrote:
         | Have you seen https://github.com/RosettaCommons/binder ?
         | 
         | python aside, having gone down this rabbithole, and still not
         | infrequently revisiting said rabbithole, I don't believe using
         | *clang like this is a winning strategy. Because of the number
         | of corner cases there are in eg C++17, you will end
         | reimplementing effectively all of the "middle-end" (the parts
         | that lower to llvm) for your target language. At that point
         | you're not building bindings anymore but a whole-ass
         | transpiler. Binder fails to be complete in this way.
         | 
         | My current theory is to try "synthesize" bindings from the llvm
         | ir (a much smaller representational surface). Problems abound
         | here too (ABI).
         | 
         | Alternatively there is https://cppyy.readthedocs.io/en/latest/,
         | which I don't completely understand yet.
        
       | pizza wrote:
       | Does something like essentially this exist?
       | lattice = languageLattice [python, cpp] -- also includes
       | c-->python in the lattice, implicitly, since python is written in
       | c         latticeDebugger = languageLatticeDebugger
       | lattice=lattice
       | 
       | if I want to debug mixed python & c++? mainly I love numba but
       | I've had some trouble with it reducing debuggability/transparency
       | of the code
        
       | JonChesterfield wrote:
       | The ABI complaint is sound. That really shouldn't be smeared out
       | over the compiler front end (clang) and the architecture lowering
       | (llc, ish). I kind of blame C for that one but maybe we could do
       | better.
       | 
       | Llvm in general is pretty easy to work with. A single IR with
       | multiple passes is a good way to build a compiler. Extending
       | clang somewhat less so, though people seem to make that work
       | anyway.
        
         | Ericson2314 wrote:
         | > A single IR with multiple passes is a good way to build a
         | compiler
         | 
         | https://mlir.llvm.org/, which is using, is largely claiming the
         | opposite. Most passes more naturally are not "a -> a", but "a
         | -> b". data structures and data structures work hand in hand,
         | it is very nice to produce "evidence" for what is done in the
         | output data structure.
         | 
         | This is why https://cakeml.org/, which "can't cheat" with
         | partial functions, has so many IRs!
         | 
         | Using just a single IR was historically done for cost-control,
         | the idea being that having many IRs was a disaster in
         | repetitive boilerplate. MLIR seeks to solve that exact problem!
        
       | saagarjha wrote:
       | I think the bigger point isn't mentioned but you can guess it by
       | the medium: the author seems to want to do some sort of security
       | analysis which requires them to hook various stages with precise
       | semantics, and most of the API was probably designed around
       | providing autocomplete or basic code intelligence. Not entirely
       | sure that the only solution here is to throw out these
       | representations rather than have them match reality a bit more
       | closely if you ask for it, but I guess this works too.
        
         | adestefan wrote:
         | It was not built around any of that. It was built to facilitate
         | compiler construction and add some introspection to that
         | process. The problem is that building what is a compile and
         | code generation library to cover multiple languages and
         | multiple architectures is really hard. Abstractions start to
         | get leaky. Next thing you know there are a bunch of assumptions
         | and hacks that make you neat library a big ol' mess.
         | 
         | I'm not faulting any of the llvm maintainers. Other people were
         | hoping the IR and library bits would turn into more than a
         | compiler toolkit. Unfortunately, reality sets in over time.
        
           | Ericson2314 wrote:
           | Yeah the way things are is very naturally when one only
           | compiles end-to-end --- there is little economic incentive to
           | keep the internals modular when the productivity costs of
           | entanglement only show up with a delay (and also programmers
           | are not really compensated for productivity...).
           | 
           | It's really good in this new LSP era good tooling is
           | increasingly "mandatory", and language implementers have to
           | deliver. (See also https://ollef.github.io/blog/posts/query-
           | based-compilers.htm... .) The higher standards of users (and
           | aspirations of DARPA :)) are now providing the missing
           | economic incentive.
        
       | CalChris wrote:
       | Says _clang_ isn 't a toolsmith's compiler. Doesn't mention
       | _clangd_. Hmmm. Even Apple switched from _libclang_ to _clangd_.
       | 
       | https://lists.llvm.org/pipermail/cfe-dev/2018-April/057668.h...
        
         | Ericson2314 wrote:
         | libclang is a library, clangd is an executable. That post is
         | about switching away from libclang _-based_ tooling
         | infrastructure; i.e. stop developing their own tool.
         | 
         | There differences in C wrapper vs C++ are superficial and not
         | what this blog post is about. The problems are rather with the
         | poor division of labor between the intermediate represents and
         | the lies that they are properly self-contained. This is about
         | what clang _does_ , irrespective of whether one slaps a C
         | interface on top or not.
        
           | CalChris wrote:
           | You wouldn't 'execute' clangd. You would send it messages
           | adhering to its Language Server Protocol [1]. The distinction
           | between library API and server API is small. Moreover, Apple
           | switched away from libclang-based tooling towards clangd.
           | 
           | [1] https://microsoft.github.io/language-server-
           | protocol/specifi...
        
             | tyg13 wrote:
             | I can't tell if you're trying to split hairs or just being
             | unintentionally obtuse. How do you propose sending clangd
             | messages without executing it, i.e. starting the language
             | server?
        
       | eikenberry wrote:
       | The article lists features potentially responsible for Clang's
       | gaining popularity, among them was fast compile times. I've
       | always read that LLVM's compile times are terrible and that is,
       | for instance, one of the reasons for Rusts slow compile times.
       | Has this changed or is he only making claims about the Clang
       | front end?
        
         | HybridCurve wrote:
         | I haven't seen any recent comparisons but the most recent
         | benchmark I saw was that gcc/clang were close. I'm sure the
         | speeds vary quite a bit depending on project size, options,
         | available RAM, etc. IIRC using LTO makes linking significantly
         | more resource intensive and I would assume this is where most
         | of the disparities in performance are.
        
         | badsectoracula wrote:
         | I haven't used clang much in recent years but i remember back
         | when it was first introduced clang was faster than gcc while
         | producing only slightly slower (or sometimes comparable) code.
         | 
         | Most likely compile times slowed down as clang and llvm became
         | more complex, but early clang was faster enough for people to
         | switch to it - and then just stayed with it (this isn't a
         | unique case, people switched to Chrome from Firefox because
         | Chrome was much faster and they stayed with Chrome even after
         | Chrome became slower and Firefox faster).
         | 
         | In any case the comparison was with gcc (and perhaps msvc), not
         | all types of compilers.
        
       ___________________________________________________________________
       (page generated 2023-07-29 23:01 UTC)