[HN Gopher] The future of Clang-based tooling
___________________________________________________________________
The future of Clang-based tooling
Author : ingve
Score : 99 points
Date : 2023-07-28 12:11 UTC (1 days ago)
(HTM) web link (blog.trailofbits.com)
(TXT) w3m dump (blog.trailofbits.com)
| traxys wrote:
| I got bitten many times by the fact that PATH is not taken into
| account, because I use Nix to manage by dotfiles, including
| `clangd`, but when developing libraries that target the base
| distro (not Nix) clangd sometimes gets confused and does not
| taken into account the headers in /usr/include, only the Nix
| headers....
| Ericson2314 wrote:
| As one of the authors of
| https://hsyl20.fr/home/files/papers/2022-ghc-modularity.pdf
| (discuessed at https://news.ycombinator.com/item?id=31250141)
| this rings really true.
|
| If all we do is compiled and to end, it is really easy for the
| pipeline stages to "rot together" such that we get these lies the
| blog post author points out. We must to be able to start and
| resume compilation from any point, with arbitrary programs in the
| intermediate representation to ensure modularity doesn't regress.
| Really glad IDE and now security use-cases are finally hammering
| these basic software arch principles to compiler writers!
|
| I recall and earlier thread, https://discourse.llvm.org/t/rfc-an-
| mlir-based-clang-ir-cir/..., where someone else was interested in
| the same thing. And it seems Vast (prior to open sourcing, and
| the reveal of that name) was mentioned by the blog post author in
| the thread. Very much hoping there is thus enough interest to get
| this upstreamed.
|
| Best of luck to everyone involved!
| HybridCurve wrote:
| Having worked with clang (and gcc) quite a bit, there are a
| number of good points the author makes. There are a lot of cool
| things llvm/clang has, but it feels like a lot of the tooling
| does not mesh together as well as it should and some things lack
| refinement.
|
| My biggest gripe overall (since it could be fixed easily) is the
| compile_commands.json. It's used by a number of tools and is
| generally awkward, cumbersome, and has a handful of shortcomings.
| To fix these issues, I used the intercept-build system provided
| with LLVM to generate a more succinct build file in JSON format
| that abstracts certain options (like paths) and groups options
| commonly found together. The reason for this is that sometimes
| you might be generating llvm bitcode, building clang AST, running
| clang-analyze, or translating the build options to work with
| either compiling or linking with GCC or Clang. For many of these
| it helps to be able to alter options easily, which you cannot do
| with the compile_commands.json file alone.
|
| There are a number of areas like this where clang would benefit
| greatly, without demanding an enormous amount of effort.
| CoastalCoder wrote:
| I'm really impressed with the quality of the writing. It's
| succinct, informative, and engaging.
|
| The "engaging" part might be subjective, because I've recently
| taken a renewed interest in LLVM internals. But regardless, good
| writing.
|
| P.S. The article gives a shout-out to CodeBrowser [0]. It wasn't
| immediately clear from the homepage, but CodeBrowser _is_ open-
| source: [1].
|
| [0] https://codebrowser.dev/
|
| [1] https://github.com/KDAB/codebrowser
| seeknotfind wrote:
| Good read.
|
| > When Clang is using itself incorrectly, it makes sense to
| trigger an assertion and abort execution--it's probably a sign of
| a bug.
|
| This statement may be ambiguous. It sounds like libraries
| shouldn't ordinarily abort on bad usages, and it's true this is a
| nuanced subject, but you really do want to abort as a default.
| Problematic things are introducing an abort in a code path that
| previously worked. You have to take two steps: tracking or
| providing a mechanic for tracking when it happens, then aborting
| once you are sure it won't cause a problem.
|
| This of course doesn't apply to all ecosystems (JS for instance,
| due in part to diversity of environment), but this perspective is
| not limited to the internal behavior of clang, rather it applies
| largely to low level, important, potentially-system software.
| otherjason wrote:
| Aborting (as in calling abort(3)) inside a library is very
| problematic if I'm writing an application that uses it. It
| takes away the ability of the larger application to detect and
| handle the error, simply terminating the entire process.
| Especially in a C++ library, something like exception throwing
| is better than an immediate abort, because the application can
| at least catch the exception and proceed. Exceptions are
| admittedly a controversial subject, but are easier to utilize
| inside potentially deeply nested call stacks where explicit
| error reporting would otherwise complicate the API.
| gavinray wrote:
| One of the most surprising things I learned about "clang" was how
| relatively poor the "libClang" capabilities are.
|
| I wanted to write a codegen tool that would auto-generate
| bindings for C++ code, and it turns out that "libTooling" is the
| only reasonable way to get access to the proper info you need
| from C++.
|
| Another alternative is "libClangSharp", from Tanner Gooding who
| works on C# at Microsoft.
|
| https://github.com/dotnet/ClangSharp
| HybridCurve wrote:
| This is another part of clang I've considered be almost, but
| not quite there yet. Some of the calls to the API are not very
| intuitive and they left too much out of libclang for it to be
| of anything but limited use. I am not a C++ guy, and it would
| be far too difficult for me to learn on a project such as this
| for my purpose so I had to use GCC instead. GCC has fairly good
| internals documentation (not just doxygen, thankfully) and the
| code is reasonably well annotated so it was't too difficult to
| work with.
| mathisfun123 wrote:
| Have you seen https://github.com/RosettaCommons/binder ?
|
| python aside, having gone down this rabbithole, and still not
| infrequently revisiting said rabbithole, I don't believe using
| *clang like this is a winning strategy. Because of the number
| of corner cases there are in eg C++17, you will end
| reimplementing effectively all of the "middle-end" (the parts
| that lower to llvm) for your target language. At that point
| you're not building bindings anymore but a whole-ass
| transpiler. Binder fails to be complete in this way.
|
| My current theory is to try "synthesize" bindings from the llvm
| ir (a much smaller representational surface). Problems abound
| here too (ABI).
|
| Alternatively there is https://cppyy.readthedocs.io/en/latest/,
| which I don't completely understand yet.
| pizza wrote:
| Does something like essentially this exist?
| lattice = languageLattice [python, cpp] -- also includes
| c-->python in the lattice, implicitly, since python is written in
| c latticeDebugger = languageLatticeDebugger
| lattice=lattice
|
| if I want to debug mixed python & c++? mainly I love numba but
| I've had some trouble with it reducing debuggability/transparency
| of the code
| JonChesterfield wrote:
| The ABI complaint is sound. That really shouldn't be smeared out
| over the compiler front end (clang) and the architecture lowering
| (llc, ish). I kind of blame C for that one but maybe we could do
| better.
|
| Llvm in general is pretty easy to work with. A single IR with
| multiple passes is a good way to build a compiler. Extending
| clang somewhat less so, though people seem to make that work
| anyway.
| Ericson2314 wrote:
| > A single IR with multiple passes is a good way to build a
| compiler
|
| https://mlir.llvm.org/, which is using, is largely claiming the
| opposite. Most passes more naturally are not "a -> a", but "a
| -> b". data structures and data structures work hand in hand,
| it is very nice to produce "evidence" for what is done in the
| output data structure.
|
| This is why https://cakeml.org/, which "can't cheat" with
| partial functions, has so many IRs!
|
| Using just a single IR was historically done for cost-control,
| the idea being that having many IRs was a disaster in
| repetitive boilerplate. MLIR seeks to solve that exact problem!
| saagarjha wrote:
| I think the bigger point isn't mentioned but you can guess it by
| the medium: the author seems to want to do some sort of security
| analysis which requires them to hook various stages with precise
| semantics, and most of the API was probably designed around
| providing autocomplete or basic code intelligence. Not entirely
| sure that the only solution here is to throw out these
| representations rather than have them match reality a bit more
| closely if you ask for it, but I guess this works too.
| adestefan wrote:
| It was not built around any of that. It was built to facilitate
| compiler construction and add some introspection to that
| process. The problem is that building what is a compile and
| code generation library to cover multiple languages and
| multiple architectures is really hard. Abstractions start to
| get leaky. Next thing you know there are a bunch of assumptions
| and hacks that make you neat library a big ol' mess.
|
| I'm not faulting any of the llvm maintainers. Other people were
| hoping the IR and library bits would turn into more than a
| compiler toolkit. Unfortunately, reality sets in over time.
| Ericson2314 wrote:
| Yeah the way things are is very naturally when one only
| compiles end-to-end --- there is little economic incentive to
| keep the internals modular when the productivity costs of
| entanglement only show up with a delay (and also programmers
| are not really compensated for productivity...).
|
| It's really good in this new LSP era good tooling is
| increasingly "mandatory", and language implementers have to
| deliver. (See also https://ollef.github.io/blog/posts/query-
| based-compilers.htm... .) The higher standards of users (and
| aspirations of DARPA :)) are now providing the missing
| economic incentive.
| CalChris wrote:
| Says _clang_ isn 't a toolsmith's compiler. Doesn't mention
| _clangd_. Hmmm. Even Apple switched from _libclang_ to _clangd_.
|
| https://lists.llvm.org/pipermail/cfe-dev/2018-April/057668.h...
| Ericson2314 wrote:
| libclang is a library, clangd is an executable. That post is
| about switching away from libclang _-based_ tooling
| infrastructure; i.e. stop developing their own tool.
|
| There differences in C wrapper vs C++ are superficial and not
| what this blog post is about. The problems are rather with the
| poor division of labor between the intermediate represents and
| the lies that they are properly self-contained. This is about
| what clang _does_ , irrespective of whether one slaps a C
| interface on top or not.
| CalChris wrote:
| You wouldn't 'execute' clangd. You would send it messages
| adhering to its Language Server Protocol [1]. The distinction
| between library API and server API is small. Moreover, Apple
| switched away from libclang-based tooling towards clangd.
|
| [1] https://microsoft.github.io/language-server-
| protocol/specifi...
| tyg13 wrote:
| I can't tell if you're trying to split hairs or just being
| unintentionally obtuse. How do you propose sending clangd
| messages without executing it, i.e. starting the language
| server?
| eikenberry wrote:
| The article lists features potentially responsible for Clang's
| gaining popularity, among them was fast compile times. I've
| always read that LLVM's compile times are terrible and that is,
| for instance, one of the reasons for Rusts slow compile times.
| Has this changed or is he only making claims about the Clang
| front end?
| HybridCurve wrote:
| I haven't seen any recent comparisons but the most recent
| benchmark I saw was that gcc/clang were close. I'm sure the
| speeds vary quite a bit depending on project size, options,
| available RAM, etc. IIRC using LTO makes linking significantly
| more resource intensive and I would assume this is where most
| of the disparities in performance are.
| badsectoracula wrote:
| I haven't used clang much in recent years but i remember back
| when it was first introduced clang was faster than gcc while
| producing only slightly slower (or sometimes comparable) code.
|
| Most likely compile times slowed down as clang and llvm became
| more complex, but early clang was faster enough for people to
| switch to it - and then just stayed with it (this isn't a
| unique case, people switched to Chrome from Firefox because
| Chrome was much faster and they stayed with Chrome even after
| Chrome became slower and Firefox faster).
|
| In any case the comparison was with gcc (and perhaps msvc), not
| all types of compilers.
___________________________________________________________________
(page generated 2023-07-29 23:01 UTC)