[HN Gopher] Incremental Builds for Haskell with Bazel
       ___________________________________________________________________
        
       Incremental Builds for Haskell with Bazel
        
       Author : ingve
       Score  : 86 points
       Date   : 2022-06-23 11:29 UTC (11 hours ago)
        
 (HTM) web link (www.tweag.io)
 (TXT) w3m dump (www.tweag.io)
        
       | aschleck wrote:
       | I really can't emphasize enough how much I love using Bazel. The
       | ability to tell a less technical user "just run `bazel run
       | //amazing/server`" regardless of language and know that
       | everything will magically work (toolchain installation, future
       | toolchain upgrades, incremental rebuilds) is really freeing. The
       | actions graph with rules and aspects is quite powerful, so you
       | can do things like add Java nullability checks or Python type
       | checking remarkably easily. Recently I put together a simple
       | build rule that strips external dependencies, archives the rest,
       | and uploads it to a cache. Then we can easily run that archive
       | against a pre-built container (which contains the external
       | dependencies) in our cluster, enabling a very fast ML iteration
       | loop on beefy cluster machines. I've also done a lot of work to
       | enable middle-ground environments, so my users can run Python
       | scripts like they're used to (`python script.py`) while _inside_
       | of a Bazel environment, which makes it easy for them to develop
       | quickly and then create a BUILD file when they 're ready.
       | 
       | The major downside I've experienced is that any time you're
       | trying to do something in a less-than-Bazel way (for example
       | relying on binaries built outside of Bazel) things can get really
       | hairy. My containers often need various things from apt
       | repositories, so I had to give up on rules_docker and made my own
       | rules for Podman. I think you need someone who understands
       | aspects and rules before adopting it, or else the sharp edges of
       | Bazel will keep cutting you until you drop it.
        
         | toastal wrote:
         | Found it the opposite and it completely put me out of work on
         | NixOS as it starts pulling in a separate JVM and its internal
         | Python was missing libraries and it then pulled unpatchelf'd
         | bins from around the net. The Python-like config language is
         | off-putting too.
        
         | throwaway894345 wrote:
         | I tried Bazel once about 5 years ago for a Python project and
         | it definitely wasn't up to task then, and I kind of wrote it
         | off for a while (it was a bad experience). I do like the idea
         | of tools like Bazel and Nix, and I've since moved toward Go and
         | Rust which I think are more of a happy path for Bazel. I
         | wouldn't mind giving it another try, but I'm not eager to bite
         | off a big learning curve (limited spare time, other hobbies,
         | etc). If anyone has any recommendations for gentle
         | introductions to Bazel (ideally for Go), I would appreciate
         | them.
        
           | liuliu wrote:
           | Python support in Bazel now looks more promising with
           | `rules_python`: https://github.com/bazelbuild/rules_python
           | 
           | `rules_go` to my understanding is great too.
           | 
           | Over years, Bazel is not as opinionated as before, mostly
           | because adoptions in different orgs force it to be so.
        
         | throwamon wrote:
         | > "just run `bazel run //amazing/server`" regardless of
         | language and know that everything will magically work
         | 
         | > The major downside I've experienced is that any time you're
         | trying to do something in a less-than-Bazel way (for example
         | relying on binaries built outside of Bazel) things can get
         | really hairy
         | 
         | Sounds like Nix
        
           | ris wrote:
           | As a nixpkgs maintainer I can't tell you how painful Bazel is
           | for packagers. The difficulty of substituting dependencies.
           | Unstable hashes of fetched dependency sources. The
           | infeasibility of building bazel itself fully from source...
        
         | jfim wrote:
         | > Recently I put together a simple build rule that strips
         | external dependencies, archives the rest, and uploads it to a
         | cache. Then we can easily run that archive against a pre-built
         | container (which contains the external dependencies) in our
         | cluster, enabling a very fast ML iteration loop on beefy
         | cluster machines.
         | 
         | Can you tell us more about this? We're using bazel at $day_job
         | and it's about as pleasant as gouging my eyes out with a rusty
         | spoon. Building docker images using bazel takes forever.
        
           | aschleck wrote:
           | Sure! So the overall goal is to prebuild a runfiles tree
           | containing all the external dependencies into a Docker
           | container, and then when the user wants to run something we
           | build a runfiles tree with all the non-external code. Then in
           | the cluster we want to extract the user's runfiles tree on
           | top of the prebuilt runfiles tree, and then execute the
           | user's code.
           | 
           | * I have an archive Starlark function that I use for both
           | this and containers. It sets up a folder structure similar to
           | <target>.runfiles with everything symlinked to the actual
           | location, then it tars the whole thing following symlinks. It
           | has a parameter to include files that start with external/ or
           | not.
           | 
           | * This archive function is used by my Bazel container rules,
           | so I simply made a runner.py target that depends on every
           | possible external Python dependency and made a Docker image
           | with it.
           | 
           | * I then made a Bazel rule that uses the archive function to
           | archive a given executable without external/ and uploads it
           | to a shared location.
           | 
           | * At runtime runner.py is given the location as an argument,
           | downloads it, extracts it, and then execv's it.
        
         | lbhdc wrote:
         | My impression of bazel has been the same. It makes working with
         | polyglot tech stack a breeze.
         | 
         | Do you have an example of the rule that strips and archives
         | your dependencies? Or an example of being able to invoke the
         | python interpreter in a bazel context? I haven't seen anything
         | like that, and want to try it out in my own project.
        
           | aschleck wrote:
           | Unfortunately I can't share code at this time but I just
           | described the archiving in more detail in a sibling comment.
           | 
           | The Python interpreter is also quite simple. There's several
           | ways you can do it, but the simplest thing to imagine is if
           | you make a launcher.py script that just invokes Bash as a
           | foreground subprocess. The pstree is kind of funky (bash ->
           | python -> bash), but inside that shell PYTHONPATH will be set
           | approximately correctly. There are reasons to prefer an
           | approach that works with sourcing (eg so you can set PS1),
           | but it's a little harder to describe. You can do some
           | acrobatics to make runfiles (mostly) work, and my
           | recollection is that PATH mostly works but that may require
           | some more work. We do the same thing for Jupyterlab and
           | IPython.
        
       | pharmakom wrote:
       | Bazel ergonomics / DX sucks but it _does_ deliver on correct and
       | fast.
        
         | tantalor wrote:
         | Compared to what?
         | 
         | Do you mean writing build rules, or the behind-the-scenes
         | stuff?
        
           | asadawadia wrote:
           | everything
        
             | marcyb5st wrote:
             | I agree on adding non native build rules, but once you get
             | that over it is pretty epic.
             | 
             | Also, I don't think it's more complex than SBT/Gradle (what
             | I was using before, but admittedly was a long time ago so I
             | don't know now).
        
               | sp33der89 wrote:
               | SBT is a lot less scary these days IMHO.
        
               | vips7L wrote:
               | SBT is the bane of my existence.
        
               | rtpg wrote:
               | For a lot of people the alternatives are things like
               | fabric/invoke.
               | 
               | If you don't care about correctness there a loads of
               | options that work well, and don't have problems
               | supporting esoteric use cases such as ..."let a command
               | generate a folder of data"
        
       | xvilka wrote:
       | Bringing Java into a Haskell project just for building doesn't
       | sound perfect.
        
       | spockz wrote:
       | Cabal and GHC have had incremental compilation since (almost?)
       | the beginning/forever. How is special?
        
         | the_duke wrote:
         | Reading the article would answer your question...
        
         | ParetoOptimal wrote:
         | A little better when TH is involved but incremental isn't as
         | good as ghc/cabal yet:
         | 
         | > When doing incremental builds, though, both stack and cabal-
         | install can use the recompilation checker, and for changes deep
         | in the dependency graph with little propagation, haskell_module
         | is not able to beat them yet. For changes near the build
         | targets, or which force more recompilation, haskell_module
         | would be more competitive.
        
       | mark_l_watson wrote:
       | This looks really cool. If I remember correctly, when I was a
       | consultant at Google for a while about ten years ago, they used
       | Bazel.
       | 
       | I have my own hack for long Haskell build times: I live in the
       | mountains so I start a long running build process and then go on
       | a walk.
        
         | Kototama wrote:
         | > I have my own hack for long Haskell build times: I live in
         | the mountains so I start a long running build process and then
         | go on a walk.
         | 
         | This is a great unexpected side-effect :-).
        
         | Arcuru wrote:
         | > If I remember correctly, when I was a consultant at Google
         | for a while about ten years ago, they used Bazel.
         | 
         | Bazel is the open source version of Google's internal build
         | tool, so you're correct.
         | 
         | > I live in the mountains so I start a long running build
         | process and then go on a walk.
         | 
         | That's an awesome way to cope with long compiles! Maybe I
         | should buy a slower computer...
        
       | corrral wrote:
       | Anyone got any stories about migrating big, crufty, multiplatform
       | cmake C++ projects to Bazel? "It sucked but was worth it" or "it
       | was surprisingly easy" or "LOL don't even try?"
        
         | elteto wrote:
         | It delivers on its premise of always correct, incremental
         | builds but it is extremely opinionated. I don't blame it for
         | that, maybe having truly hermetic and reproducible builds
         | requires that level of structure. It is almost magical changing
         | a linker flag in some bazel config and see it _only_ relink
         | affected targets.
         | 
         | If you need to do cross-compilation then I feel like it is
         | extremely overengineered with the whole platform/toolchain
         | concepts, and after _years_ the docs are still incredibly
         | lacking on this aspect. I almost prefer the previous approach
         | with the semi-documented protobuf as JSON crosstool file.
         | 
         | If you need the safety guarantees or the reproducibility
         | there's no other build system out there. If you don't then you
         | will be inclined to hate it because you are not extracting
         | value of it.
        
           | boris wrote:
           | > It is almost magical changing a linker flag in some bazel
           | config and see it _only_ relink affected targets.
           | 
           | I think this should be expected from any modern build system.
           | Now, if you make a whitespace change in your source file and
           | the build system recognized this and skips recompiling it,
           | that could pass for magic (build2 does this for C/C++
           | sources).
        
             | elteto wrote:
             | If the output .o is identical then I think bazel will also
             | skip recompilation, FWIW.
        
           | colatkinson wrote:
           | Yeah the cross-compilation thing is definitely a rough spot.
           | I have one project that's able to work around it via
           | _extensive_ hacks with macros, but at some point I 'll need
           | to do it "the right way."
           | 
           | Honestly if the docs had a canonical example of e.g. using
           | unix_cc_toolchain_config (example: [0]) + Bootlin to compile
           | for aarch64, it'd probably go a long way to making things
           | understandable. Because say what you will about the old
           | CROSSTOOL approach, at least there was a nice tutorial for
           | it.
           | 
           | [0] https://github.com/grailbio/bazel-
           | toolchain/blob/f14a8a5de8f...
        
             | kldx wrote:
             | Have you come across https://github.com/aspect-build/gcc-
             | toolchain? I use it as a starting point for my toolchains.
        
             | elteto wrote:
             | My normal workflow to bootstrap cross-compilation with
             | bazel is to create a dummy project with some dummy C/C++
             | file and build it. Then go into whatever bazel-X internal
             | folder and extract the autogenerated bzl for the local
             | system's compiler. Then update it with my toolchain and
             | strip it down (I hate the "features" feature) until it is
             | somewhat understandable.
             | 
             | This is a _terrible_ DX.
        
               | klodolph wrote:
               | Yeah, that's the same way I do it.
        
         | lolpython wrote:
         | I migrated a mid-size polyglot project from Makefiles to Bazel
         | and C++ was a large component of the project.
         | 
         | Some obstacles:
         | 
         | 1. Building with QT5 MOC & UI files. There is a great
         | library[0] for it but it has hardcoded paths to the QT binaries
         | and header files assuming a system-wide installation. I had to
         | patch the rule to point to our QT location. Then it worked
         | fine.
         | 
         | 2. There is no rule to build a fully static library[1]. Since
         | we were shipping a static library to clients via our Makefile
         | system, that was somewhat annoying.
         | 
         | 3. We were using system links like
         | `$PROJECT_ROOT/links/GCC/vX.Y.Z/ -> /opt/gcc/...` to point to
         | all the build tools, but these didn't work in Bazel I think
         | because it required absolute paths for any binaries it calls.
         | We ended up putting them in a .bazelrc but we would need a
         | different one for Windows and Linux.
         | 
         | 4. Not good integration with IDEs
         | 
         | 5. (edit) The Bazel toolchain system is confusing and I
         | couldn't understand it after reading all its docs
         | 
         | Ultimately we did not keep using Bazel because we were building
         | Python binaries and py_binary was too slow on Windows. And we
         | didn't have enough time to write a PyInstaller rule.
         | 
         | [0]: https://github.com/justbuchanan/bazel_rules_qt
         | 
         | [1]: https://github.com/bazelbuild/bazel/issues/1920
        
           | klodolph wrote:
           | Regarding #3, my approach to solve problems like this is to
           | make a custom repository rule which creates the desired
           | symlinks. The repository rule can invoke external programs or
           | examine the environment as necessary to figure out how this
           | should be created.
           | 
           | Basically, you create a repository rule that symlinks your
           | $PROJECT_ROOT/links/GCC/vX.Y.Z/ to $repo/... somewhere, and
           | then generates a BUILD file for the repository.
           | 
           | Writing your own repository rule is not especially difficult
           | and they do have a lot of power not available to ordinary
           | rules. This is the API that you can use from within
           | repository rules--you can see that it lets you run arbitrary
           | programs, create files and symlinks, download files, etc.
           | 
           | https://bazel.build/rules/lib/repository_ctx
        
           | rfoo wrote:
           | #2: To be fair it is reasonably easy to make a
           | cc_static_library_binary-ish rule which merges all transitive
           | .a-s (just generate an ar script and call archiver). But I
           | have to admit that I spent non-trivial time on maintaining
           | our "CROSSTOOL in skylark" (forgot the term) for 20+ target
           | platforms before and it helped a lot on understanding the
           | (still incomplete) C++ sandwich.
        
         | rufius wrote:
         | I've not got specifically what you're describing but I've
         | encountered Bazel twice in my career - both in very large
         | Java/Scala/Go code bases (distinct repo for Go but a lot of
         | code).
         | 
         | Bazel is extremely underwhelming. I've worked with crusty
         | ancient systems that built huge systems and Bazel is just the
         | most clown shoes build tool in comparison.
         | 
         | Something didn't work? Try typing the same command repeatedly
         | and hope that this time it sticks. Multiple commands to achieve
         | a seemingly straightforward task? Why isn't there a single one
         | that will get us there.
         | 
         | FWIW - I suspect like most tools that Google open sources, the
         | tool makes far more sense in the context of Google's systems
         | and architecture. If you're adhering to that, it's probably
         | coherent.
         | 
         | I'd never choose it willingly though.
        
           | rfoo wrote:
           | > Multiple commands to achieve a seemingly straightforward
           | task? Why isn't there a single one that will get us there.
           | 
           | Could you elaborate on this? I seldomly need commands beside
           | `bazel build`, `bazel test` and `bazel run` and am curious to
           | your story.
        
             | rufius wrote:
             | Full admission - I don't own or have responsibility for how
             | the build system I work with is setup.
             | 
             | I'm on parental leave so details are fuzzy but roughly:
             | 
             | - run Bazel build
             | 
             | - expect dependencies to be built
             | 
             | - they weren't built
             | 
             | - run Bazel build again
             | 
             | - something else decides to be built
             | 
             | - repeat ad nauseum
             | 
             | This is within the same project directory, etc. It's
             | entirely possible that the project is setup in some
             | pathological way but I've encountered this enough times in
             | two companies that it's stuck in my head.
        
               | rfoo wrote:
               | Whoa, that's tough. Sounds like you have indeed hit a
               | case where
               | 
               | > the tool makes far more sense in the context of
               | Google's systems and architecture
               | 
               | Bazel should not behave like what you described (at least
               | in my experience) for in-repo sources and build rules.
               | Except that the world does not work in this way, so they
               | added a duct tape called "using workspace rules to fetch
               | and potentially build external dependencies", which is as
               | fragile as a ./build.sh pulling in all your dependencies.
               | 
               | And Google? They did vendor everything they use at
               | //third_party in their monorepo, so (+`#')+(+-+
        
               | mattnewton wrote:
               | This sounds highly broken somehow. Bazel's raison d'etre
               | is rebuilding only the necessary dependencies. Maybe this
               | is a case of someone trying to force-fit a process that
               | doesn't look like bazel's opinions with some custom rules
               | that break it at the core?
               | 
               | Pretty much all my bazel/blaze experience has been at
               | google or in other side projects built from the start
               | with bazel, but I have never encountered anything like
               | that. The only complaints I've had is that building
               | python slows the scripting loop, and that for side
               | projects without google's build infrastructure the
               | rebuild-the-world-at-head-from-source method can get very
               | expensive. But this sounds really unfortunate and nothing
               | like the tool I've used :/
        
               | Difwif wrote:
               | That sounds like some one hacked up an existing project
               | into Bazel without resolving the opinionated differences.
               | I've worked on Bazel projects at multiple companies and
               | I've seen things go off the rails like that a couple
               | times before someone that actually understands the tool
               | rewrites the problematic build process. It's usually some
               | nasty stuff where someone tried to work outside of Bazel
               | because they didn't understand it and created a bunch of
               | impedance mismatches. New people doing this instead of
               | the "right way" is the most valid criticism of Bazel IMO.
               | 
               | Sounds like you're working in the worst of both worlds
               | right now.
        
         | eklitzke wrote:
         | The canonical build system for LLVM is CMake, but you can also
         | build LLVM with Bazel now. That might give you a good idea of
         | how to do it. The Bazel support code is in
         | https://github.com/llvm/llvm-project/tree/main/utils/bazel
        
         | rfoo wrote:
         | Having gone through this twice, I'd say it is not that
         | difficult, but could take reasonably huge effort. On par with
         | turning your crufty CMakeLists to be so called "Modern CMake"
         | (whatever that means).
         | 
         | But, why? Bazel is very opinionated on how you layout your C++
         | source code in the repository, and it's something which could
         | not be retrofitted easily.
        
         | klodolph wrote:
         | I have some adjacent experience. Not exactly with that, but I
         | have migrated large C++ build systems, migrated systems to
         | Bazel, etc. I've also written a bunch of Bazel build scripts
         | for various open-source C++ libraries so I can better integrate
         | them into projects that use Bazel.
         | 
         | Bazel is opinionated. The tradeoff here is that if you can make
         | your project match Bazel's opinions, you get a very good
         | experience--but you can have a bad experience if you disagree
         | with Bazel. If you have Bazel experience, you can look at a
         | project and get a quick sense of the distance between how the
         | project is built and how Bazel "wants" to build the project.
         | 
         | The payoff is that once you get your Bazel build system,
         | everything seems a lot more trustworthy. No more "make clean".
         | When you run a Bazel command, it just gives you the correct
         | output, very fast, without worrying about what state your build
         | tree is in. You can make any change to your build scripts and
         | just "bazel build" and get the correct result immediately, as
         | long as you aren't trying to bypass how Bazel works. I never
         | have to do anything like run "make" twice. This is something
         | I've never gotten with systems like Make or CMake, which put
         | more of the onus on individual developers to get things
         | correct.
         | 
         | So I can kind of shut off my brain when using Bazel.
         | 
         | Depending on the particulars of your project, the most
         | straightforward migration path to Bazel will not be obvious.
         | You may need to make certain choices about how much you modify
         | your project to fit Bazel's expectations, versus how much you
         | adapt Bazel to fit your existing project. One example is
         | include paths... do you modify all of your '#include'
         | directives to match how Bazel expects you to write them? Or do
         | you adapt Bazel to do things your way?
         | 
         | The difficulty and payoff is highly variable. My experience has
         | generally been positive, but I also have a lot of familiarity
         | with Bazel. It's easy enough to find an example of a project
         | where I'd just never bother migrating to Bazel, or to find
         | examples of projects (even large ones) where migrating is super
         | easy.
        
       ___________________________________________________________________
       (page generated 2022-06-23 23:02 UTC)