[HN Gopher] Incremental Builds for Haskell with Bazel
___________________________________________________________________
Incremental Builds for Haskell with Bazel
Author : ingve
Score : 86 points
Date : 2022-06-23 11:29 UTC (11 hours ago)
(HTM) web link (www.tweag.io)
(TXT) w3m dump (www.tweag.io)
| aschleck wrote:
| I really can't emphasize enough how much I love using Bazel. The
| ability to tell a less technical user "just run `bazel run
| //amazing/server`" regardless of language and know that
| everything will magically work (toolchain installation, future
| toolchain upgrades, incremental rebuilds) is really freeing. The
| actions graph with rules and aspects is quite powerful, so you
| can do things like add Java nullability checks or Python type
| checking remarkably easily. Recently I put together a simple
| build rule that strips external dependencies, archives the rest,
| and uploads it to a cache. Then we can easily run that archive
| against a pre-built container (which contains the external
| dependencies) in our cluster, enabling a very fast ML iteration
| loop on beefy cluster machines. I've also done a lot of work to
| enable middle-ground environments, so my users can run Python
| scripts like they're used to (`python script.py`) while _inside_
| of a Bazel environment, which makes it easy for them to develop
| quickly and then create a BUILD file when they 're ready.
|
| The major downside I've experienced is that any time you're
| trying to do something in a less-than-Bazel way (for example
| relying on binaries built outside of Bazel) things can get really
| hairy. My containers often need various things from apt
| repositories, so I had to give up on rules_docker and made my own
| rules for Podman. I think you need someone who understands
| aspects and rules before adopting it, or else the sharp edges of
| Bazel will keep cutting you until you drop it.
| toastal wrote:
| Found it the opposite and it completely put me out of work on
| NixOS as it starts pulling in a separate JVM and its internal
| Python was missing libraries and it then pulled unpatchelf'd
| bins from around the net. The Python-like config language is
| off-putting too.
| throwaway894345 wrote:
| I tried Bazel once about 5 years ago for a Python project and
| it definitely wasn't up to task then, and I kind of wrote it
| off for a while (it was a bad experience). I do like the idea
| of tools like Bazel and Nix, and I've since moved toward Go and
| Rust which I think are more of a happy path for Bazel. I
| wouldn't mind giving it another try, but I'm not eager to bite
| off a big learning curve (limited spare time, other hobbies,
| etc). If anyone has any recommendations for gentle
| introductions to Bazel (ideally for Go), I would appreciate
| them.
| liuliu wrote:
| Python support in Bazel now looks more promising with
| `rules_python`: https://github.com/bazelbuild/rules_python
|
| `rules_go` to my understanding is great too.
|
| Over years, Bazel is not as opinionated as before, mostly
| because adoptions in different orgs force it to be so.
| throwamon wrote:
| > "just run `bazel run //amazing/server`" regardless of
| language and know that everything will magically work
|
| > The major downside I've experienced is that any time you're
| trying to do something in a less-than-Bazel way (for example
| relying on binaries built outside of Bazel) things can get
| really hairy
|
| Sounds like Nix
| ris wrote:
| As a nixpkgs maintainer I can't tell you how painful Bazel is
| for packagers. The difficulty of substituting dependencies.
| Unstable hashes of fetched dependency sources. The
| infeasibility of building bazel itself fully from source...
| jfim wrote:
| > Recently I put together a simple build rule that strips
| external dependencies, archives the rest, and uploads it to a
| cache. Then we can easily run that archive against a pre-built
| container (which contains the external dependencies) in our
| cluster, enabling a very fast ML iteration loop on beefy
| cluster machines.
|
| Can you tell us more about this? We're using bazel at $day_job
| and it's about as pleasant as gouging my eyes out with a rusty
| spoon. Building docker images using bazel takes forever.
| aschleck wrote:
| Sure! So the overall goal is to prebuild a runfiles tree
| containing all the external dependencies into a Docker
| container, and then when the user wants to run something we
| build a runfiles tree with all the non-external code. Then in
| the cluster we want to extract the user's runfiles tree on
| top of the prebuilt runfiles tree, and then execute the
| user's code.
|
| * I have an archive Starlark function that I use for both
| this and containers. It sets up a folder structure similar to
| <target>.runfiles with everything symlinked to the actual
| location, then it tars the whole thing following symlinks. It
| has a parameter to include files that start with external/ or
| not.
|
| * This archive function is used by my Bazel container rules,
| so I simply made a runner.py target that depends on every
| possible external Python dependency and made a Docker image
| with it.
|
| * I then made a Bazel rule that uses the archive function to
| archive a given executable without external/ and uploads it
| to a shared location.
|
| * At runtime runner.py is given the location as an argument,
| downloads it, extracts it, and then execv's it.
| lbhdc wrote:
| My impression of bazel has been the same. It makes working with
| polyglot tech stack a breeze.
|
| Do you have an example of the rule that strips and archives
| your dependencies? Or an example of being able to invoke the
| python interpreter in a bazel context? I haven't seen anything
| like that, and want to try it out in my own project.
| aschleck wrote:
| Unfortunately I can't share code at this time but I just
| described the archiving in more detail in a sibling comment.
|
| The Python interpreter is also quite simple. There's several
| ways you can do it, but the simplest thing to imagine is if
| you make a launcher.py script that just invokes Bash as a
| foreground subprocess. The pstree is kind of funky (bash ->
| python -> bash), but inside that shell PYTHONPATH will be set
| approximately correctly. There are reasons to prefer an
| approach that works with sourcing (eg so you can set PS1),
| but it's a little harder to describe. You can do some
| acrobatics to make runfiles (mostly) work, and my
| recollection is that PATH mostly works but that may require
| some more work. We do the same thing for Jupyterlab and
| IPython.
| pharmakom wrote:
| Bazel ergonomics / DX sucks but it _does_ deliver on correct and
| fast.
| tantalor wrote:
| Compared to what?
|
| Do you mean writing build rules, or the behind-the-scenes
| stuff?
| asadawadia wrote:
| everything
| marcyb5st wrote:
| I agree on adding non native build rules, but once you get
| that over it is pretty epic.
|
| Also, I don't think it's more complex than SBT/Gradle (what
| I was using before, but admittedly was a long time ago so I
| don't know now).
| sp33der89 wrote:
| SBT is a lot less scary these days IMHO.
| vips7L wrote:
| SBT is the bane of my existence.
| rtpg wrote:
| For a lot of people the alternatives are things like
| fabric/invoke.
|
| If you don't care about correctness there a loads of
| options that work well, and don't have problems
| supporting esoteric use cases such as ..."let a command
| generate a folder of data"
| xvilka wrote:
| Bringing Java into a Haskell project just for building doesn't
| sound perfect.
| spockz wrote:
| Cabal and GHC have had incremental compilation since (almost?)
| the beginning/forever. How is special?
| the_duke wrote:
| Reading the article would answer your question...
| ParetoOptimal wrote:
| A little better when TH is involved but incremental isn't as
| good as ghc/cabal yet:
|
| > When doing incremental builds, though, both stack and cabal-
| install can use the recompilation checker, and for changes deep
| in the dependency graph with little propagation, haskell_module
| is not able to beat them yet. For changes near the build
| targets, or which force more recompilation, haskell_module
| would be more competitive.
| mark_l_watson wrote:
| This looks really cool. If I remember correctly, when I was a
| consultant at Google for a while about ten years ago, they used
| Bazel.
|
| I have my own hack for long Haskell build times: I live in the
| mountains so I start a long running build process and then go on
| a walk.
| Kototama wrote:
| > I have my own hack for long Haskell build times: I live in
| the mountains so I start a long running build process and then
| go on a walk.
|
| This is a great unexpected side-effect :-).
| Arcuru wrote:
| > If I remember correctly, when I was a consultant at Google
| for a while about ten years ago, they used Bazel.
|
| Bazel is the open source version of Google's internal build
| tool, so you're correct.
|
| > I live in the mountains so I start a long running build
| process and then go on a walk.
|
| That's an awesome way to cope with long compiles! Maybe I
| should buy a slower computer...
| corrral wrote:
| Anyone got any stories about migrating big, crufty, multiplatform
| cmake C++ projects to Bazel? "It sucked but was worth it" or "it
| was surprisingly easy" or "LOL don't even try?"
| elteto wrote:
| It delivers on its premise of always correct, incremental
| builds but it is extremely opinionated. I don't blame it for
| that, maybe having truly hermetic and reproducible builds
| requires that level of structure. It is almost magical changing
| a linker flag in some bazel config and see it _only_ relink
| affected targets.
|
| If you need to do cross-compilation then I feel like it is
| extremely overengineered with the whole platform/toolchain
| concepts, and after _years_ the docs are still incredibly
| lacking on this aspect. I almost prefer the previous approach
| with the semi-documented protobuf as JSON crosstool file.
|
| If you need the safety guarantees or the reproducibility
| there's no other build system out there. If you don't then you
| will be inclined to hate it because you are not extracting
| value of it.
| boris wrote:
| > It is almost magical changing a linker flag in some bazel
| config and see it _only_ relink affected targets.
|
| I think this should be expected from any modern build system.
| Now, if you make a whitespace change in your source file and
| the build system recognized this and skips recompiling it,
| that could pass for magic (build2 does this for C/C++
| sources).
| elteto wrote:
| If the output .o is identical then I think bazel will also
| skip recompilation, FWIW.
| colatkinson wrote:
| Yeah the cross-compilation thing is definitely a rough spot.
| I have one project that's able to work around it via
| _extensive_ hacks with macros, but at some point I 'll need
| to do it "the right way."
|
| Honestly if the docs had a canonical example of e.g. using
| unix_cc_toolchain_config (example: [0]) + Bootlin to compile
| for aarch64, it'd probably go a long way to making things
| understandable. Because say what you will about the old
| CROSSTOOL approach, at least there was a nice tutorial for
| it.
|
| [0] https://github.com/grailbio/bazel-
| toolchain/blob/f14a8a5de8f...
| kldx wrote:
| Have you come across https://github.com/aspect-build/gcc-
| toolchain? I use it as a starting point for my toolchains.
| elteto wrote:
| My normal workflow to bootstrap cross-compilation with
| bazel is to create a dummy project with some dummy C/C++
| file and build it. Then go into whatever bazel-X internal
| folder and extract the autogenerated bzl for the local
| system's compiler. Then update it with my toolchain and
| strip it down (I hate the "features" feature) until it is
| somewhat understandable.
|
| This is a _terrible_ DX.
| klodolph wrote:
| Yeah, that's the same way I do it.
| lolpython wrote:
| I migrated a mid-size polyglot project from Makefiles to Bazel
| and C++ was a large component of the project.
|
| Some obstacles:
|
| 1. Building with QT5 MOC & UI files. There is a great
| library[0] for it but it has hardcoded paths to the QT binaries
| and header files assuming a system-wide installation. I had to
| patch the rule to point to our QT location. Then it worked
| fine.
|
| 2. There is no rule to build a fully static library[1]. Since
| we were shipping a static library to clients via our Makefile
| system, that was somewhat annoying.
|
| 3. We were using system links like
| `$PROJECT_ROOT/links/GCC/vX.Y.Z/ -> /opt/gcc/...` to point to
| all the build tools, but these didn't work in Bazel I think
| because it required absolute paths for any binaries it calls.
| We ended up putting them in a .bazelrc but we would need a
| different one for Windows and Linux.
|
| 4. Not good integration with IDEs
|
| 5. (edit) The Bazel toolchain system is confusing and I
| couldn't understand it after reading all its docs
|
| Ultimately we did not keep using Bazel because we were building
| Python binaries and py_binary was too slow on Windows. And we
| didn't have enough time to write a PyInstaller rule.
|
| [0]: https://github.com/justbuchanan/bazel_rules_qt
|
| [1]: https://github.com/bazelbuild/bazel/issues/1920
| klodolph wrote:
| Regarding #3, my approach to solve problems like this is to
| make a custom repository rule which creates the desired
| symlinks. The repository rule can invoke external programs or
| examine the environment as necessary to figure out how this
| should be created.
|
| Basically, you create a repository rule that symlinks your
| $PROJECT_ROOT/links/GCC/vX.Y.Z/ to $repo/... somewhere, and
| then generates a BUILD file for the repository.
|
| Writing your own repository rule is not especially difficult
| and they do have a lot of power not available to ordinary
| rules. This is the API that you can use from within
| repository rules--you can see that it lets you run arbitrary
| programs, create files and symlinks, download files, etc.
|
| https://bazel.build/rules/lib/repository_ctx
| rfoo wrote:
| #2: To be fair it is reasonably easy to make a
| cc_static_library_binary-ish rule which merges all transitive
| .a-s (just generate an ar script and call archiver). But I
| have to admit that I spent non-trivial time on maintaining
| our "CROSSTOOL in skylark" (forgot the term) for 20+ target
| platforms before and it helped a lot on understanding the
| (still incomplete) C++ sandwich.
| rufius wrote:
| I've not got specifically what you're describing but I've
| encountered Bazel twice in my career - both in very large
| Java/Scala/Go code bases (distinct repo for Go but a lot of
| code).
|
| Bazel is extremely underwhelming. I've worked with crusty
| ancient systems that built huge systems and Bazel is just the
| most clown shoes build tool in comparison.
|
| Something didn't work? Try typing the same command repeatedly
| and hope that this time it sticks. Multiple commands to achieve
| a seemingly straightforward task? Why isn't there a single one
| that will get us there.
|
| FWIW - I suspect like most tools that Google open sources, the
| tool makes far more sense in the context of Google's systems
| and architecture. If you're adhering to that, it's probably
| coherent.
|
| I'd never choose it willingly though.
| rfoo wrote:
| > Multiple commands to achieve a seemingly straightforward
| task? Why isn't there a single one that will get us there.
|
| Could you elaborate on this? I seldomly need commands beside
| `bazel build`, `bazel test` and `bazel run` and am curious to
| your story.
| rufius wrote:
| Full admission - I don't own or have responsibility for how
| the build system I work with is setup.
|
| I'm on parental leave so details are fuzzy but roughly:
|
| - run Bazel build
|
| - expect dependencies to be built
|
| - they weren't built
|
| - run Bazel build again
|
| - something else decides to be built
|
| - repeat ad nauseum
|
| This is within the same project directory, etc. It's
| entirely possible that the project is setup in some
| pathological way but I've encountered this enough times in
| two companies that it's stuck in my head.
| rfoo wrote:
| Whoa, that's tough. Sounds like you have indeed hit a
| case where
|
| > the tool makes far more sense in the context of
| Google's systems and architecture
|
| Bazel should not behave like what you described (at least
| in my experience) for in-repo sources and build rules.
| Except that the world does not work in this way, so they
| added a duct tape called "using workspace rules to fetch
| and potentially build external dependencies", which is as
| fragile as a ./build.sh pulling in all your dependencies.
|
| And Google? They did vendor everything they use at
| //third_party in their monorepo, so (+`#')+(+-+
| mattnewton wrote:
| This sounds highly broken somehow. Bazel's raison d'etre
| is rebuilding only the necessary dependencies. Maybe this
| is a case of someone trying to force-fit a process that
| doesn't look like bazel's opinions with some custom rules
| that break it at the core?
|
| Pretty much all my bazel/blaze experience has been at
| google or in other side projects built from the start
| with bazel, but I have never encountered anything like
| that. The only complaints I've had is that building
| python slows the scripting loop, and that for side
| projects without google's build infrastructure the
| rebuild-the-world-at-head-from-source method can get very
| expensive. But this sounds really unfortunate and nothing
| like the tool I've used :/
| Difwif wrote:
| That sounds like some one hacked up an existing project
| into Bazel without resolving the opinionated differences.
| I've worked on Bazel projects at multiple companies and
| I've seen things go off the rails like that a couple
| times before someone that actually understands the tool
| rewrites the problematic build process. It's usually some
| nasty stuff where someone tried to work outside of Bazel
| because they didn't understand it and created a bunch of
| impedance mismatches. New people doing this instead of
| the "right way" is the most valid criticism of Bazel IMO.
|
| Sounds like you're working in the worst of both worlds
| right now.
| eklitzke wrote:
| The canonical build system for LLVM is CMake, but you can also
| build LLVM with Bazel now. That might give you a good idea of
| how to do it. The Bazel support code is in
| https://github.com/llvm/llvm-project/tree/main/utils/bazel
| rfoo wrote:
| Having gone through this twice, I'd say it is not that
| difficult, but could take reasonably huge effort. On par with
| turning your crufty CMakeLists to be so called "Modern CMake"
| (whatever that means).
|
| But, why? Bazel is very opinionated on how you layout your C++
| source code in the repository, and it's something which could
| not be retrofitted easily.
| klodolph wrote:
| I have some adjacent experience. Not exactly with that, but I
| have migrated large C++ build systems, migrated systems to
| Bazel, etc. I've also written a bunch of Bazel build scripts
| for various open-source C++ libraries so I can better integrate
| them into projects that use Bazel.
|
| Bazel is opinionated. The tradeoff here is that if you can make
| your project match Bazel's opinions, you get a very good
| experience--but you can have a bad experience if you disagree
| with Bazel. If you have Bazel experience, you can look at a
| project and get a quick sense of the distance between how the
| project is built and how Bazel "wants" to build the project.
|
| The payoff is that once you get your Bazel build system,
| everything seems a lot more trustworthy. No more "make clean".
| When you run a Bazel command, it just gives you the correct
| output, very fast, without worrying about what state your build
| tree is in. You can make any change to your build scripts and
| just "bazel build" and get the correct result immediately, as
| long as you aren't trying to bypass how Bazel works. I never
| have to do anything like run "make" twice. This is something
| I've never gotten with systems like Make or CMake, which put
| more of the onus on individual developers to get things
| correct.
|
| So I can kind of shut off my brain when using Bazel.
|
| Depending on the particulars of your project, the most
| straightforward migration path to Bazel will not be obvious.
| You may need to make certain choices about how much you modify
| your project to fit Bazel's expectations, versus how much you
| adapt Bazel to fit your existing project. One example is
| include paths... do you modify all of your '#include'
| directives to match how Bazel expects you to write them? Or do
| you adapt Bazel to do things your way?
|
| The difficulty and payoff is highly variable. My experience has
| generally been positive, but I also have a lot of familiarity
| with Bazel. It's easy enough to find an example of a project
| where I'd just never bother migrating to Bazel, or to find
| examples of projects (even large ones) where migrating is super
| easy.
___________________________________________________________________
(page generated 2022-06-23 23:02 UTC)