[HN Gopher] At the Mountains of Madness
       ___________________________________________________________________
        
       At the Mountains of Madness
        
       Author : wwilson
       Score  : 97 points
       Date   : 2024-07-10 13:05 UTC (9 hours ago)
        
 (HTM) web link (antithesis.com)
 (TXT) w3m dump (antithesis.com)
        
       | wwilson wrote:
       | Post author here. Feel free to ask me any questions about the
       | piece of software that I most regret having had to write.
        
         | Klaster_1 wrote:
         | No questions about Madness, but I really enjoyed the article
         | tone and playfulness. Thank you.
        
         | jcgrillo wrote:
         | Also not a question, just want to say that "crt glow" (or maybe
         | "Cerenkov glow"?) effect upon hovering over a link is awesome.
        
         | limaoscarjuliet wrote:
         | Been there, done that. In my case, I symlinked myself out of
         | this mess rather than modify ELF.
        
         | dasyatidprime wrote:
         | I would like to half-seriously recommend that you overwrite a
         | _different_ character than the first when mangling the
         | environment variable name. Specifically one beyond the third,
         | so as to stay within the LD_ namespace (not yours exactly, but
         | at least easier to keep track of and more justifiably
         | excludable from random applications) and deny someone ten years
         | from now the exciting journey of figuring out why their
         | MD_PRELOAD environment variable is overwitten with garbage on
         | some systems. How do you feel about LD_PRELOAF?
         | 
         | Also it's probably better to leave LD_PRELOAD properly unset
         | rather than just null if it was unset before; in particular I
         | wonder if empty-but-set might still trip some software's
         | "someone is playing tricks" alarms.
         | 
         | There are probably other ways this is less than robust...
         | 
         | (hi, I kind of have a Thing for GNU and Linux innards
         | sometimes)
        
           | wwilson wrote:
           | Good suggestion on leaving LD_PRELOAD unset if it was
           | previously unset. We will fix that.
           | 
           | I'm torn on whether MD_PRELOAD or LD_PRELOAF is more
           | obnoxious to other programs.
           | 
           | Fun fact: A previous version of this program used an even
           | more inscrutable `env[0][0]+=1`, which is great as a sort of
           | multilingual C/English pun, but terrible in the way that all
           | "clever" code is terrible.
        
         | foobiekr wrote:
         | It's a shame that DLL hell was never resolved in the obvious
         | way: deduplication of identical libraries through cryptographic
         | hashes. Containers basically threw away any hope of sharing the
         | bytes on disk - and more importantly _in ram_. Disk bytes are
         | cheap, ram bytes are not, let alone TLB space, branch predictor
         | context, and so on.
         | 
         | There was a middle ground possible at one point where
         | containers actually were packaged with all of their
         | dependencies, but a container installer would fragment this
         | assembly into cryptographically verifiable share dependencies,
         | but we lost that because it was hard.
        
           | xenophonf wrote:
           | > _deduplication of identical libraries through cryptographic
           | hashes_
           | 
           | Isn't that how the .NET CLR's global assembly cache works?
        
           | rbanffy wrote:
           | > deduplication of identical libraries through cryptographic
           | hashes
           | 
           | Or, maybe, adding a version string to the file name, so, if
           | you were compiled with data structures for libFoo1 (which you
           | found on libFoo.h provided by libFoo1-devel) you'll link to
           | libFoo1 and not libFoo or libFoo2.
        
         | foobiekr wrote:
         | As an aside, a lot of people don't know about ldd, and
         | introducing it to them is very cool, but it should almost
         | always come with a warning - maybe add a note that people
         | should be careful with ldd - ldd _may_ execute arbitrary code.
         | This is in the ldd man page, but most people never read
         | documentation. It is unsafe to use on any binary you're not
         | otherwise believed safe.
        
           | wwilson wrote:
           | Great point! I'll update the post to mention that.
        
         | adamgordonbell wrote:
         | Love it. I came to this same insight about nix and containers
         | being two approaches to a dynamic linking work around, but via
         | a different path, of building my own little container runtime.
         | 
         | Feels like we are building things who's original purpose is now
         | holding us back, but path dependence leaves us stuck wrapping
         | abstractions in other abstractions.
        
         | trod123 wrote:
         | Hi Will, I'm curious what your thoughts are about the Nix
         | uniqueness problem, and the characterization of failures, or
         | lack thereof under undefined behavior's failure domains.
         | Exception handling generally requires a defined and
         | deterministic state which can't be guaranteed given design
         | choices to resolve DLL hell under Nix (i.e. its a stochastic
         | process).
         | 
         | I mention this since it is a similar form of the problem you
         | mention in writing this piece of software, that can lead to
         | madness.
         | 
         | Also, operationally, the troubleshooting problem-space of
         | keeping things running, segments nicely into deterministic and
         | non-deterministic regions; which the latter ends up costing
         | orders of magnitude more as a function of time to resolve since
         | you can't perturb individual subsystems to test for correct
         | function, without determinism and time in-variance (as system's
         | properties), testing piecemeal has contradictions in stochastic
         | processes.
         | 
         | Hashing by rigorous definition is non-unique (i.e. its like
         | navigating a circle), and there is no proof of uniformity. So
         | problems in this space would be in the latter region.
         | 
         | While, there are heuristics from cryptography that suggest
         | using factional cubic roots to initialize the fields brings
         | more uniformity to the examined space than not, there is no
         | proof of such.
         | 
         | When building resilient systems, engineers often try to remove
         | any brittle features that promote failures.
         | 
         | Interestingly, as a side note, ldd output injects non-
         | determinism into the pipe by flattening empty columns non-
         | deterministically (i.e. if you ldd ssh client, you'll see the
         | null state for each input to output has more than a single
         | meaning/edge on the traversal depending on object type, this
         | ends up violating the 1:1 unique input-output state graph/map
         | required for determinism as a property, though it won't be
         | evident until you run use it as an input that problematically
         | maps later in automation (i.e. grepping the output with RegEx
         | will silently fail, providing what looks like legitimate output
         | if one doesn't look too closely).
         | 
         | PaX ended up forking the project with the fix, because the
         | maintainers refused to admit the problem (reported 2016, forked
         | in 2018), the bug remains in all current versions of ldd (to my
         | knowledge).
         | 
         | While based in theory, these types of problems crop up
         | everywhere in computation and few seem to recognize them.
         | 
         | Working with system's properties, and whether they are
         | preserved; informs on whether the system can be safely and
         | consistently used in later automated processes, as well as
         | maintained at cheap cost.
         | 
         | Businesses generally need a supportable and defensible
         | infrastructure.
        
         | jadbox wrote:
         | I'd like to express that while this article is WAY outside my
         | wheelhouse, but I liked the writing style and the AI
         | illustrations felt like they were emotionally additive to the
         | section rather just a distraction (to me). Also, my head hurts
         | trying to still understand this cursed thing:
         | https://github.com/antithesishq/madness/blob/main/pkgs/madne...
        
         | swayvil wrote:
         | Dig the purple anteater pix.
        
         | colinsane wrote:
         | > This minimal meta-loader will totally work if you invoke it
         | directly like `$ meta_loader.sh foo`, and it will totally not
         | work if you hardcode its path (or a symlink to it) in the ELF
         | headers of a binary.
         | 
         | why not have `foo` be a shell script which invokes the meta
         | loader on the "real" foo? like:
         | 
         | ``` #!/bin/sh # file: /bin/foo
         | 
         | # invoke the real "foo" (renamed e.g. ".foo-wrapped" or
         | "/libexec/foo" or anything else easy for the loader to locate
         | but unlikely to be invoked accidentally) exec meta_loader.sh
         | .foo-wrapped "$@" ```
         | 
         | it's a common enough idiom that nixpkgs provides the
         | `wrapProgram` function to generate these kinds of wrapper
         | scripts during your build: even with an option to build a
         | statically-linked binary wrapper instead of a shell-script
         | wrapper (`makeBinaryWrapper`).
        
       | NoraCodes wrote:
       | What is that abominable diffusion output doing at the top of an
       | otherwise interesting article?
        
         | wwilson wrote:
         | Our artist is on vacation, and some fool gave the CEO access to
         | Midjourney.
        
           | imagineerschool wrote:
           | These synthetic artifacts will come to be regarded as
           | psychological asbestos.
           | 
           | Please consider labelling it, and giving provenance data. And
           | protecting public sanity by putting it behind a clickwall.
        
           | bloopernova wrote:
           | In my opinion, generative AI pictures make a blog post feel
           | cheaper and less truthful. Just my view, I fully accept that
           | I'm probably in a minority of opinion.
        
             | pizzalife wrote:
             | I agree with you since it adds absolutely no value to the
             | article. Technical articles don't need unrelated pictures
             | that add huge page breaks.
        
           | didsomeonesay wrote:
           | FWIW, I enjoyed how the pictures were adding a little theme,
           | were consistent and broke up the reading nicely without being
           | too "noisy" (compared to e.g. technical articles full of meme
           | pictures).
        
         | knowaveragejoe wrote:
         | What is this trifling and snobby driveby commentary doing in
         | the comments of an otherwise interesting article?
        
           | NoraCodes wrote:
           | I think it's useful to impose a social cost for using
           | plagarism machines to make slop.
        
       | DoreenMichele wrote:
       | _tl;dr: we are open-sourcing an internal tool that solves a
       | problem that we think many NixOS shops are likely to run into.
       | The rest of this post is just the story of how we came to write
       | this tool, which is totally a skippable story._
       | 
       | The tool happens to be called Madness, thus the Lovecraftian
       | reference in this piece.
       | 
       |  _Madness enables you to easily run the same binary on NixOS and
       | non-NixOS systems_
       | 
       | https://github.com/antithesishq/madness
        
       | finnh wrote:
       | Every time I look at NixOS, I think that it perfectly solves a
       | problem that I only have once every 5 years, when buying a new
       | computer. I think I even looked into it once to automate that
       | exact process, but that idea fell apart at the first line of Nix
       | syntax. I'll stick with OSX and `brew bundle` I guess...
       | 
       | But then I read a piece like this and remember that some people
       | do have to plumb the depths of C/C++ linkers, and I'm glad I'm
       | not one of them.
       | 
       | Great post! FWIW I always want to know the prompt text when
       | seeing an AI-generated image, I wish there were a convention
       | around that.
        
         | shrx wrote:
         | Some generators like Automatic1111 embed the prompt in the
         | image metadata.
        
       | AdamH12113 wrote:
       | I'm still confused why static linking isn't a more common
       | solution to versioning issues. Software developers normally have
       | no problem using an order of magnitude more resources to solve
       | organizational problems. Is there any technical advantage to
       | dynamic linking other than smaller binaries and maybe slightly
       | faster load times from disk?
        
         | wwilson wrote:
         | Biggest advantages I know of for dynamic linking:
         | 
         | * You can use the LD_PRELOAD trick to override behavior at
         | runtime.
         | 
         | * You can run with entirely different implementations of the
         | dynamically linked library in different places.
         | 
         | * Software can pick up interface-compatible upgrades to its
         | dependencies without being re-compiled and distributed again.
         | 
         | We use all three of these tricks in our SDKs, FWIW. But it is
         | still a giant pain in the ass.
        
         | klodolph wrote:
         | IIRC the underlying implementation may be different on other
         | systems. I think in particular, DNS resolution.
         | 
         | Linux is the only system where static linking all the way
         | really makes any sense. For most systems, you don't get a
         | stable syscall ABI. Instead, you get a stable ABI to the
         | library which does syscalls for you... Windows has
         | kernel32.dll, macOS has libSystem.
         | 
         | Note that on Linux, the vDSO is dynamically linked.
         | 
         | Compilation speed is a big plus. For large projects, linking
         | time can easily dominate the time needed for incremental
         | rebuilds.
         | 
         | Due to tooling issues, PIE is a lot easier with dynamic
         | linking, and this gives you better ASLR. These issues are
         | solvable, but it's a lot easier to get PIE if you use dynamic
         | linking. If you want static PIE, you need to compile all your
         | static libs as PIE--doable, but you don't get it out of the
         | box.
        
         | anyfoo wrote:
         | Static linking essentially freezes not only the ABI to the
         | kernel, but many implementation details of the linked libraries
         | as well. Including, for example, how library client code talks
         | to daemons, or formats of files directly read in by the
         | libraries. The timezone configuration would be one instance, or
         | things related to NSS.
         | 
         | It's really not viable in a lot of cases, unless you like
         | rebuilding (or at least relinking) with every system software
         | update.
         | 
         | And then there's of course the memory savings. macOS and iOS
         | for example have giant "shared caches" which are mapped into
         | all processes and comprise of all the system libraries. (Other
         | OSs often do this on the individual shared library level.) With
         | static linking, you'd instead have many copies of lots of
         | potentially-but-not-necessarily identical library code pages in
         | DRAM.
        
           | nyarlathotep_ wrote:
           | Seems its rare to hear a defense in favor of dynamic linking
           | (aside from vague allusions to "resource use", especially in
           | light of "successor languages" seemingly moving away from
           | this approach. Thanks for this.
        
         | partdavid wrote:
         | In addition to the problems others mention, dynamic object
         | dependencies are only one slice of a dependency pie; if you can
         | config-manage your way out of all of the others then maybe
         | dynamic libs aren't really a lot of problem? I think this is
         | why container images became so popular, because they close over
         | a lot more dependencies than just dynamic libs.
         | 
         | And the scope of the solution isn't very wide. In the Go
         | community it's common to distribute statically-linked binaries
         | because it solves so many problems--but it just kind of moves
         | them to installation or configuration time because you have to
         | pick a platform and platform version and so forth to find the
         | binary you need, if you want your tool to work on more than one
         | of them.
        
       | klodolph wrote:
       | > No such file or directory
       | 
       | Anyone who's run into this problem remembers it! (This isn't a
       | Nix problem--this is just the baffling errors you get because
       | a.out exists, but one of the libraries it needs does not, and the
       | error message doesn't distinguish that case.)
       | 
       | Anyway, Nix.
       | 
       | Nix has the Nix way of building things. Nix doesn't give you
       | standard tools. It gives you wrappers around the standard tools
       | that force you to do things a certain way. Part of that is
       | futzing around with RPATH--because Nix stores everything in an
       | unusual location. The user experience around this is awful, if
       | you ever run into a case where Nix's tooling doesn't
       | automatically do the right thing for you. It's not just RPATH,
       | but also other paths.
       | 
       | What's the solution?
       | 
       | Honestly--I think it would make sense for Nix to have a "cross
       | compilation" mode where you tell it to cross-compile for other
       | Linuxes. You know, something like pkgsCross.x86_64-generic-linux.
       | This comes with all the cross-compilation headaches, but you know
       | what? You _are_ cross-compiling.
        
         | throwway120385 wrote:
         | Yeah I wish the ld-linux.so interpreter would actually indicate
         | when a file it's trying to link wasn't found. Something like
         | "unable to locate shared object BLAH" would go a long way. It's
         | like a rite of passage the first time you debug something like
         | that.
        
         | nyarlathotep_ wrote:
         | I've only encountered this once and I've not forgotten it.
         | 
         | Downloaded a release of some binary for Linux, but I'd
         | downloaded the FreeBSD built binary. Was lost until I explored
         | in the same fashion as the author.
        
         | lilyball wrote:
         | Nix does have tools for running stuff in an FHS container.
         | Something I have considered but not yet attempted is to use
         | this to wrap the build such that building the binary happens in
         | the FHS container (using the unwrapped versions of the compiler
         | and associated tooling).
        
       | sjburt wrote:
       | It seems like every article about nix goes on and on about DLL
       | hell. I've been using Debian/Ubuntu for 15+ years and never
       | really experienced dependency hell. I guess maybe this is thanks
       | to hard work by Debian maintainers and rarely needing to run a
       | bleeding edge library, but also, why do we need to run bleeding
       | edge versions of everything and then invent an incredibly
       | complicated scheme to keep multiple copies of each library, most
       | of which are completely compatible with each other?
       | 
       | And then when there's a security problem, who goes and checks
       | that every version of every dependency of every application has
       | actually been patched and updated? Why would I want to roll a
       | system back to an (definitely insecure) state of a few months
       | ago?
       | 
       | What problem does Nix solve that SO numbers (properly used)
       | doesn't?
       | 
       | I have many of the same questions about Snap and even Docker.
        
         | HideousKojima wrote:
         | I run into DLL hell any time I try to install some software
         | that isn't in some sort of package repository or other happy
         | path. Most recent example I can think of was about a year ago
         | when I helped my father-in-law install Klipper on an RPi4 so
         | that he could do input shaping on his 3d prints. All of the
         | guides and documentation seemed to assume that you were using a
         | specific version of Linux on the RPi, and if you weren't (like
         | my FIL) then welcome to dependency hell. Took several hours of
         | pulling my hair out to resolve them all, for a non-developer it
         | would have been impossible.
        
         | klodolph wrote:
         | I'm using Nix for development and generally I agree.
         | 
         | The first catch is that I want to be able to update my system
         | on a regular basis, and keep using exactly the same
         | dependencies in my project after an update. Maybe I'm in the
         | middle of working on a change.
         | 
         | The second catch is that sometimes my development environment
         | is really weird, and the packages I need aren't in Debian. At
         | least, not the versions I want. Nix can handle cross-
         | compilation environments and you can use it for embedded
         | development. You stick your entire development toolchain (arm-
         | none-eabi-gcc, whatever) inside your development environment.
         | 
         | > Why would I want to roll a system back to an (definitely
         | insecure) state of a few months ago?
         | 
         | Periodically, I want to update everything in my development
         | environment to the latest version of everything. Sometimes,
         | something will break. Maybe a new version of GCC reveals
         | previously undiscovered bugs in my code. Maybe a function gets
         | removed from a library (I've seen it happen). In Nix, it's
         | pretty easy to pin my entire development environment to an old
         | version, while I'm still updating the rest of my system. I can
         | also get the same environment on either Linux or macOS with
         | relatively minimal hassle (with the note that I've run into
         | several packages that just don't run on macOS, which required
         | me to make "fixed" versions).
         | 
         | Also keep in mind when I say "Nix", I'm talking about nixpkgs.
         | I'm not using NixOS and I just don't care about NixOS.
         | 
         | Nix also has its pain points. I think of it as being like a
         | coarse-grained Bazel with a ton of packages.
        
           | nyarlathotep_ wrote:
           | My Nix experience is limited, so forgive my ignorance here,
           | but is it possible to create a development environment for an
           | "older" project as well?
           | 
           | Say I need some 3.20 version of CMake and gcc 9/whatever or
           | something--i assume such a thing is possible, but I've not
           | seen a simple way to "pin versions" of things the way you
           | would in say a language's package manager.
        
             | klodolph wrote:
             | My Nix experience is pretty limited, too. Nix is not
             | _great_ at pinning to specific versions.
             | 
             | If your older project was made in Nix, it's no problem. You
             | just check out the old copy of the project and you
             | automatically get the old copy of the dependencies.
             | 
             | If your old project needs some specific major version of
             | GCC, going back to like 4.8, there are specific packages in
             | Nix. You just add "gcc48" to your dependencies and you get
             | GCC 4.8. You still get newer versions of e.g. binutils.
             | 
             | If your old project needs a specific version of CMake, I
             | know two ways to get that, but they're a little ugly.
             | 
             | First method is to import an old <nixpkgs> containing the
             | right version of CMake, and then import that into your
             | environment. You search through Git history of the nixpkgs
             | repository until you find one with the correct version.
             | Yes, this sounds awful. It's not that bad. I'm not sure how
             | to do this with flakes.
             | 
             | You can also copy the CMake derivation into your project
             | and modify it to compile & build the version of CMake you
             | like. This is the approach I would normally use, most of
             | the time.
             | 
             | There may be easier ways to do this. I'm not sure.
        
       | bbor wrote:
       | I love what they're going for, but I couldn't help but react
       | negatively at finding out that I had been hyped up for a post on
       | some small technical topic for an OS I don't know of. Maybe title
       | it "At the Mountains of NIXos Madness"? But then again I'm just a
       | grouch! Well written article regardless, from what I was able to
       | get out of it
        
       | pizzalife wrote:
       | Calling binaries using ld-linux used to be a popular way to get
       | around noexec on filesystems, since the libraries are usually in
       | a place that is executable..
        
       | dhash wrote:
       | I loved this post, and patchelf is a real gem of a utility.
        
       | georgewsinger wrote:
       | ======= _Technical Summary_ ========
       | 
       | Here's a problem with NixOS:
       | 
       | 1. Suppose we have a `./nixos_binary_program_with_glibc-newer`
       | compiled on a NixOS machine against bleeding edge `glibc-newer`.
       | 
       | 2. `./nixos_binary_program_with_glibc-newer` will have
       | `/nix/store/glibc-newer/linux-ld.so` path hardcoded into its ELF
       | header which will be used when the program launches to find all
       | of the program's shared libraries, and so forth. (And this is a
       | fact that `ldd` will obfuscate!).
       | 
       | 3. When `./nixos_binary_program_with_glibc-newer` is distributed
       | to machines which use `glibc-older` instead of `glibc-newer`, the
       | hardcoded `linux-ld.so` from (2) will fail to be found, leading
       | to a launch error.
       | 
       | 4. (3) will also happen on machines which don't use nix in the
       | first place.
       | 
       | ======= _Will 's Solution_========
       | 
       | 1. Use `patchelf` to hardcode a standard FHS `ld-linux.so`
       | location into `nixos_binary_program_with_glibc-newer`'s ELF
       | header (using e.g. `/lib64/ld-linux-x86-64.so.2` as the path)
       | 
       | 2. Use a metaloader to launch `nixos_binary_program_with_glibc-
       | newer` with an augmented `RPATH` which has a bunch of different
       | `/nix/store/ _glibc-newer_ ` paths, so that nix machines can find
       | a suitable `ld-linux.so` to launch the program with.
       | 
       | This will make `nixos_binary_program_with_glibc-newer` work on
       | _any_ machine, including both non-nix machines _and_ nix machines
       | (which might be running older versions of glibc by default)!
        
       ___________________________________________________________________
       (page generated 2024-07-10 23:01 UTC)