[HN Gopher] At the Mountains of Madness
___________________________________________________________________
At the Mountains of Madness
Author : wwilson
Score : 97 points
Date : 2024-07-10 13:05 UTC (9 hours ago)
(HTM) web link (antithesis.com)
(TXT) w3m dump (antithesis.com)
| wwilson wrote:
| Post author here. Feel free to ask me any questions about the
| piece of software that I most regret having had to write.
| Klaster_1 wrote:
| No questions about Madness, but I really enjoyed the article
| tone and playfulness. Thank you.
| jcgrillo wrote:
| Also not a question, just want to say that "crt glow" (or maybe
| "Cerenkov glow"?) effect upon hovering over a link is awesome.
| limaoscarjuliet wrote:
| Been there, done that. In my case, I symlinked myself out of
| this mess rather than modify ELF.
| dasyatidprime wrote:
| I would like to half-seriously recommend that you overwrite a
| _different_ character than the first when mangling the
| environment variable name. Specifically one beyond the third,
| so as to stay within the LD_ namespace (not yours exactly, but
| at least easier to keep track of and more justifiably
| excludable from random applications) and deny someone ten years
| from now the exciting journey of figuring out why their
| MD_PRELOAD environment variable is overwitten with garbage on
| some systems. How do you feel about LD_PRELOAF?
|
| Also it's probably better to leave LD_PRELOAD properly unset
| rather than just null if it was unset before; in particular I
| wonder if empty-but-set might still trip some software's
| "someone is playing tricks" alarms.
|
| There are probably other ways this is less than robust...
|
| (hi, I kind of have a Thing for GNU and Linux innards
| sometimes)
| wwilson wrote:
| Good suggestion on leaving LD_PRELOAD unset if it was
| previously unset. We will fix that.
|
| I'm torn on whether MD_PRELOAD or LD_PRELOAF is more
| obnoxious to other programs.
|
| Fun fact: A previous version of this program used an even
| more inscrutable `env[0][0]+=1`, which is great as a sort of
| multilingual C/English pun, but terrible in the way that all
| "clever" code is terrible.
| foobiekr wrote:
| It's a shame that DLL hell was never resolved in the obvious
| way: deduplication of identical libraries through cryptographic
| hashes. Containers basically threw away any hope of sharing the
| bytes on disk - and more importantly _in ram_. Disk bytes are
| cheap, ram bytes are not, let alone TLB space, branch predictor
| context, and so on.
|
| There was a middle ground possible at one point where
| containers actually were packaged with all of their
| dependencies, but a container installer would fragment this
| assembly into cryptographically verifiable share dependencies,
| but we lost that because it was hard.
| xenophonf wrote:
| > _deduplication of identical libraries through cryptographic
| hashes_
|
| Isn't that how the .NET CLR's global assembly cache works?
| rbanffy wrote:
| > deduplication of identical libraries through cryptographic
| hashes
|
| Or, maybe, adding a version string to the file name, so, if
| you were compiled with data structures for libFoo1 (which you
| found on libFoo.h provided by libFoo1-devel) you'll link to
| libFoo1 and not libFoo or libFoo2.
| foobiekr wrote:
| As an aside, a lot of people don't know about ldd, and
| introducing it to them is very cool, but it should almost
| always come with a warning - maybe add a note that people
| should be careful with ldd - ldd _may_ execute arbitrary code.
| This is in the ldd man page, but most people never read
| documentation. It is unsafe to use on any binary you're not
| otherwise believed safe.
| wwilson wrote:
| Great point! I'll update the post to mention that.
| adamgordonbell wrote:
| Love it. I came to this same insight about nix and containers
| being two approaches to a dynamic linking work around, but via
| a different path, of building my own little container runtime.
|
| Feels like we are building things who's original purpose is now
| holding us back, but path dependence leaves us stuck wrapping
| abstractions in other abstractions.
| trod123 wrote:
| Hi Will, I'm curious what your thoughts are about the Nix
| uniqueness problem, and the characterization of failures, or
| lack thereof under undefined behavior's failure domains.
| Exception handling generally requires a defined and
| deterministic state which can't be guaranteed given design
| choices to resolve DLL hell under Nix (i.e. its a stochastic
| process).
|
| I mention this since it is a similar form of the problem you
| mention in writing this piece of software, that can lead to
| madness.
|
| Also, operationally, the troubleshooting problem-space of
| keeping things running, segments nicely into deterministic and
| non-deterministic regions; which the latter ends up costing
| orders of magnitude more as a function of time to resolve since
| you can't perturb individual subsystems to test for correct
| function, without determinism and time in-variance (as system's
| properties), testing piecemeal has contradictions in stochastic
| processes.
|
| Hashing by rigorous definition is non-unique (i.e. its like
| navigating a circle), and there is no proof of uniformity. So
| problems in this space would be in the latter region.
|
| While, there are heuristics from cryptography that suggest
| using factional cubic roots to initialize the fields brings
| more uniformity to the examined space than not, there is no
| proof of such.
|
| When building resilient systems, engineers often try to remove
| any brittle features that promote failures.
|
| Interestingly, as a side note, ldd output injects non-
| determinism into the pipe by flattening empty columns non-
| deterministically (i.e. if you ldd ssh client, you'll see the
| null state for each input to output has more than a single
| meaning/edge on the traversal depending on object type, this
| ends up violating the 1:1 unique input-output state graph/map
| required for determinism as a property, though it won't be
| evident until you run use it as an input that problematically
| maps later in automation (i.e. grepping the output with RegEx
| will silently fail, providing what looks like legitimate output
| if one doesn't look too closely).
|
| PaX ended up forking the project with the fix, because the
| maintainers refused to admit the problem (reported 2016, forked
| in 2018), the bug remains in all current versions of ldd (to my
| knowledge).
|
| While based in theory, these types of problems crop up
| everywhere in computation and few seem to recognize them.
|
| Working with system's properties, and whether they are
| preserved; informs on whether the system can be safely and
| consistently used in later automated processes, as well as
| maintained at cheap cost.
|
| Businesses generally need a supportable and defensible
| infrastructure.
| jadbox wrote:
| I'd like to express that while this article is WAY outside my
| wheelhouse, but I liked the writing style and the AI
| illustrations felt like they were emotionally additive to the
| section rather just a distraction (to me). Also, my head hurts
| trying to still understand this cursed thing:
| https://github.com/antithesishq/madness/blob/main/pkgs/madne...
| swayvil wrote:
| Dig the purple anteater pix.
| colinsane wrote:
| > This minimal meta-loader will totally work if you invoke it
| directly like `$ meta_loader.sh foo`, and it will totally not
| work if you hardcode its path (or a symlink to it) in the ELF
| headers of a binary.
|
| why not have `foo` be a shell script which invokes the meta
| loader on the "real" foo? like:
|
| ``` #!/bin/sh # file: /bin/foo
|
| # invoke the real "foo" (renamed e.g. ".foo-wrapped" or
| "/libexec/foo" or anything else easy for the loader to locate
| but unlikely to be invoked accidentally) exec meta_loader.sh
| .foo-wrapped "$@" ```
|
| it's a common enough idiom that nixpkgs provides the
| `wrapProgram` function to generate these kinds of wrapper
| scripts during your build: even with an option to build a
| statically-linked binary wrapper instead of a shell-script
| wrapper (`makeBinaryWrapper`).
| NoraCodes wrote:
| What is that abominable diffusion output doing at the top of an
| otherwise interesting article?
| wwilson wrote:
| Our artist is on vacation, and some fool gave the CEO access to
| Midjourney.
| imagineerschool wrote:
| These synthetic artifacts will come to be regarded as
| psychological asbestos.
|
| Please consider labelling it, and giving provenance data. And
| protecting public sanity by putting it behind a clickwall.
| bloopernova wrote:
| In my opinion, generative AI pictures make a blog post feel
| cheaper and less truthful. Just my view, I fully accept that
| I'm probably in a minority of opinion.
| pizzalife wrote:
| I agree with you since it adds absolutely no value to the
| article. Technical articles don't need unrelated pictures
| that add huge page breaks.
| didsomeonesay wrote:
| FWIW, I enjoyed how the pictures were adding a little theme,
| were consistent and broke up the reading nicely without being
| too "noisy" (compared to e.g. technical articles full of meme
| pictures).
| knowaveragejoe wrote:
| What is this trifling and snobby driveby commentary doing in
| the comments of an otherwise interesting article?
| NoraCodes wrote:
| I think it's useful to impose a social cost for using
| plagarism machines to make slop.
| DoreenMichele wrote:
| _tl;dr: we are open-sourcing an internal tool that solves a
| problem that we think many NixOS shops are likely to run into.
| The rest of this post is just the story of how we came to write
| this tool, which is totally a skippable story._
|
| The tool happens to be called Madness, thus the Lovecraftian
| reference in this piece.
|
| _Madness enables you to easily run the same binary on NixOS and
| non-NixOS systems_
|
| https://github.com/antithesishq/madness
| finnh wrote:
| Every time I look at NixOS, I think that it perfectly solves a
| problem that I only have once every 5 years, when buying a new
| computer. I think I even looked into it once to automate that
| exact process, but that idea fell apart at the first line of Nix
| syntax. I'll stick with OSX and `brew bundle` I guess...
|
| But then I read a piece like this and remember that some people
| do have to plumb the depths of C/C++ linkers, and I'm glad I'm
| not one of them.
|
| Great post! FWIW I always want to know the prompt text when
| seeing an AI-generated image, I wish there were a convention
| around that.
| shrx wrote:
| Some generators like Automatic1111 embed the prompt in the
| image metadata.
| AdamH12113 wrote:
| I'm still confused why static linking isn't a more common
| solution to versioning issues. Software developers normally have
| no problem using an order of magnitude more resources to solve
| organizational problems. Is there any technical advantage to
| dynamic linking other than smaller binaries and maybe slightly
| faster load times from disk?
| wwilson wrote:
| Biggest advantages I know of for dynamic linking:
|
| * You can use the LD_PRELOAD trick to override behavior at
| runtime.
|
| * You can run with entirely different implementations of the
| dynamically linked library in different places.
|
| * Software can pick up interface-compatible upgrades to its
| dependencies without being re-compiled and distributed again.
|
| We use all three of these tricks in our SDKs, FWIW. But it is
| still a giant pain in the ass.
| klodolph wrote:
| IIRC the underlying implementation may be different on other
| systems. I think in particular, DNS resolution.
|
| Linux is the only system where static linking all the way
| really makes any sense. For most systems, you don't get a
| stable syscall ABI. Instead, you get a stable ABI to the
| library which does syscalls for you... Windows has
| kernel32.dll, macOS has libSystem.
|
| Note that on Linux, the vDSO is dynamically linked.
|
| Compilation speed is a big plus. For large projects, linking
| time can easily dominate the time needed for incremental
| rebuilds.
|
| Due to tooling issues, PIE is a lot easier with dynamic
| linking, and this gives you better ASLR. These issues are
| solvable, but it's a lot easier to get PIE if you use dynamic
| linking. If you want static PIE, you need to compile all your
| static libs as PIE--doable, but you don't get it out of the
| box.
| anyfoo wrote:
| Static linking essentially freezes not only the ABI to the
| kernel, but many implementation details of the linked libraries
| as well. Including, for example, how library client code talks
| to daemons, or formats of files directly read in by the
| libraries. The timezone configuration would be one instance, or
| things related to NSS.
|
| It's really not viable in a lot of cases, unless you like
| rebuilding (or at least relinking) with every system software
| update.
|
| And then there's of course the memory savings. macOS and iOS
| for example have giant "shared caches" which are mapped into
| all processes and comprise of all the system libraries. (Other
| OSs often do this on the individual shared library level.) With
| static linking, you'd instead have many copies of lots of
| potentially-but-not-necessarily identical library code pages in
| DRAM.
| nyarlathotep_ wrote:
| Seems its rare to hear a defense in favor of dynamic linking
| (aside from vague allusions to "resource use", especially in
| light of "successor languages" seemingly moving away from
| this approach. Thanks for this.
| partdavid wrote:
| In addition to the problems others mention, dynamic object
| dependencies are only one slice of a dependency pie; if you can
| config-manage your way out of all of the others then maybe
| dynamic libs aren't really a lot of problem? I think this is
| why container images became so popular, because they close over
| a lot more dependencies than just dynamic libs.
|
| And the scope of the solution isn't very wide. In the Go
| community it's common to distribute statically-linked binaries
| because it solves so many problems--but it just kind of moves
| them to installation or configuration time because you have to
| pick a platform and platform version and so forth to find the
| binary you need, if you want your tool to work on more than one
| of them.
| klodolph wrote:
| > No such file or directory
|
| Anyone who's run into this problem remembers it! (This isn't a
| Nix problem--this is just the baffling errors you get because
| a.out exists, but one of the libraries it needs does not, and the
| error message doesn't distinguish that case.)
|
| Anyway, Nix.
|
| Nix has the Nix way of building things. Nix doesn't give you
| standard tools. It gives you wrappers around the standard tools
| that force you to do things a certain way. Part of that is
| futzing around with RPATH--because Nix stores everything in an
| unusual location. The user experience around this is awful, if
| you ever run into a case where Nix's tooling doesn't
| automatically do the right thing for you. It's not just RPATH,
| but also other paths.
|
| What's the solution?
|
| Honestly--I think it would make sense for Nix to have a "cross
| compilation" mode where you tell it to cross-compile for other
| Linuxes. You know, something like pkgsCross.x86_64-generic-linux.
| This comes with all the cross-compilation headaches, but you know
| what? You _are_ cross-compiling.
| throwway120385 wrote:
| Yeah I wish the ld-linux.so interpreter would actually indicate
| when a file it's trying to link wasn't found. Something like
| "unable to locate shared object BLAH" would go a long way. It's
| like a rite of passage the first time you debug something like
| that.
| nyarlathotep_ wrote:
| I've only encountered this once and I've not forgotten it.
|
| Downloaded a release of some binary for Linux, but I'd
| downloaded the FreeBSD built binary. Was lost until I explored
| in the same fashion as the author.
| lilyball wrote:
| Nix does have tools for running stuff in an FHS container.
| Something I have considered but not yet attempted is to use
| this to wrap the build such that building the binary happens in
| the FHS container (using the unwrapped versions of the compiler
| and associated tooling).
| sjburt wrote:
| It seems like every article about nix goes on and on about DLL
| hell. I've been using Debian/Ubuntu for 15+ years and never
| really experienced dependency hell. I guess maybe this is thanks
| to hard work by Debian maintainers and rarely needing to run a
| bleeding edge library, but also, why do we need to run bleeding
| edge versions of everything and then invent an incredibly
| complicated scheme to keep multiple copies of each library, most
| of which are completely compatible with each other?
|
| And then when there's a security problem, who goes and checks
| that every version of every dependency of every application has
| actually been patched and updated? Why would I want to roll a
| system back to an (definitely insecure) state of a few months
| ago?
|
| What problem does Nix solve that SO numbers (properly used)
| doesn't?
|
| I have many of the same questions about Snap and even Docker.
| HideousKojima wrote:
| I run into DLL hell any time I try to install some software
| that isn't in some sort of package repository or other happy
| path. Most recent example I can think of was about a year ago
| when I helped my father-in-law install Klipper on an RPi4 so
| that he could do input shaping on his 3d prints. All of the
| guides and documentation seemed to assume that you were using a
| specific version of Linux on the RPi, and if you weren't (like
| my FIL) then welcome to dependency hell. Took several hours of
| pulling my hair out to resolve them all, for a non-developer it
| would have been impossible.
| klodolph wrote:
| I'm using Nix for development and generally I agree.
|
| The first catch is that I want to be able to update my system
| on a regular basis, and keep using exactly the same
| dependencies in my project after an update. Maybe I'm in the
| middle of working on a change.
|
| The second catch is that sometimes my development environment
| is really weird, and the packages I need aren't in Debian. At
| least, not the versions I want. Nix can handle cross-
| compilation environments and you can use it for embedded
| development. You stick your entire development toolchain (arm-
| none-eabi-gcc, whatever) inside your development environment.
|
| > Why would I want to roll a system back to an (definitely
| insecure) state of a few months ago?
|
| Periodically, I want to update everything in my development
| environment to the latest version of everything. Sometimes,
| something will break. Maybe a new version of GCC reveals
| previously undiscovered bugs in my code. Maybe a function gets
| removed from a library (I've seen it happen). In Nix, it's
| pretty easy to pin my entire development environment to an old
| version, while I'm still updating the rest of my system. I can
| also get the same environment on either Linux or macOS with
| relatively minimal hassle (with the note that I've run into
| several packages that just don't run on macOS, which required
| me to make "fixed" versions).
|
| Also keep in mind when I say "Nix", I'm talking about nixpkgs.
| I'm not using NixOS and I just don't care about NixOS.
|
| Nix also has its pain points. I think of it as being like a
| coarse-grained Bazel with a ton of packages.
| nyarlathotep_ wrote:
| My Nix experience is limited, so forgive my ignorance here,
| but is it possible to create a development environment for an
| "older" project as well?
|
| Say I need some 3.20 version of CMake and gcc 9/whatever or
| something--i assume such a thing is possible, but I've not
| seen a simple way to "pin versions" of things the way you
| would in say a language's package manager.
| klodolph wrote:
| My Nix experience is pretty limited, too. Nix is not
| _great_ at pinning to specific versions.
|
| If your older project was made in Nix, it's no problem. You
| just check out the old copy of the project and you
| automatically get the old copy of the dependencies.
|
| If your old project needs some specific major version of
| GCC, going back to like 4.8, there are specific packages in
| Nix. You just add "gcc48" to your dependencies and you get
| GCC 4.8. You still get newer versions of e.g. binutils.
|
| If your old project needs a specific version of CMake, I
| know two ways to get that, but they're a little ugly.
|
| First method is to import an old <nixpkgs> containing the
| right version of CMake, and then import that into your
| environment. You search through Git history of the nixpkgs
| repository until you find one with the correct version.
| Yes, this sounds awful. It's not that bad. I'm not sure how
| to do this with flakes.
|
| You can also copy the CMake derivation into your project
| and modify it to compile & build the version of CMake you
| like. This is the approach I would normally use, most of
| the time.
|
| There may be easier ways to do this. I'm not sure.
| bbor wrote:
| I love what they're going for, but I couldn't help but react
| negatively at finding out that I had been hyped up for a post on
| some small technical topic for an OS I don't know of. Maybe title
| it "At the Mountains of NIXos Madness"? But then again I'm just a
| grouch! Well written article regardless, from what I was able to
| get out of it
| pizzalife wrote:
| Calling binaries using ld-linux used to be a popular way to get
| around noexec on filesystems, since the libraries are usually in
| a place that is executable..
| dhash wrote:
| I loved this post, and patchelf is a real gem of a utility.
| georgewsinger wrote:
| ======= _Technical Summary_ ========
|
| Here's a problem with NixOS:
|
| 1. Suppose we have a `./nixos_binary_program_with_glibc-newer`
| compiled on a NixOS machine against bleeding edge `glibc-newer`.
|
| 2. `./nixos_binary_program_with_glibc-newer` will have
| `/nix/store/glibc-newer/linux-ld.so` path hardcoded into its ELF
| header which will be used when the program launches to find all
| of the program's shared libraries, and so forth. (And this is a
| fact that `ldd` will obfuscate!).
|
| 3. When `./nixos_binary_program_with_glibc-newer` is distributed
| to machines which use `glibc-older` instead of `glibc-newer`, the
| hardcoded `linux-ld.so` from (2) will fail to be found, leading
| to a launch error.
|
| 4. (3) will also happen on machines which don't use nix in the
| first place.
|
| ======= _Will 's Solution_========
|
| 1. Use `patchelf` to hardcode a standard FHS `ld-linux.so`
| location into `nixos_binary_program_with_glibc-newer`'s ELF
| header (using e.g. `/lib64/ld-linux-x86-64.so.2` as the path)
|
| 2. Use a metaloader to launch `nixos_binary_program_with_glibc-
| newer` with an augmented `RPATH` which has a bunch of different
| `/nix/store/ _glibc-newer_ ` paths, so that nix machines can find
| a suitable `ld-linux.so` to launch the program with.
|
| This will make `nixos_binary_program_with_glibc-newer` work on
| _any_ machine, including both non-nix machines _and_ nix machines
| (which might be running older versions of glibc by default)!
___________________________________________________________________
(page generated 2024-07-10 23:01 UTC)