hngopher.com

       [HN Gopher] Hidden dependencies in Linux binaries
       ___________________________________________________________________
        
       Hidden dependencies in Linux binaries
        
       Author : thunderbong
       Score  : 102 points
       Date   : 2024-04-14 17:33 UTC (5 hours ago)
        
 (HTM) web link (thelittleengineerthatcould.blogspot.com)
 (TXT) w3m dump (thelittleengineerthatcould.blogspot.com)
        
       | ris wrote:
       | > Meanwhile, when I use CUDA instead of Vulkan, I get serenity
       | back. CUDA FTW!
       | 
       | Just because the complexity is hidden from you doesn't mean it's
       | not there. You have no idea what is statically bundled into the
       | CUDA libs.
        
         | dosshell wrote:
         | I agree with you, hidden is worse.
         | 
         | But we do know what it can not static link to, any GPL library,
         | which many indirect dependencies are.
        
           | qwertox wrote:
           | If static and dynamic libraries use the same interface,
           | shouldn't they be detectable in both cases? Or is it removed
           | at compile time?
        
             | dosshell wrote:
             | First IANACC (I'm not a compiler programmer), but this is
             | my understanding:
             | 
             | What do you mean by interface?
             | 
             | A dynamic library is handled very different from a static
             | one. A dynamic library is loaded into the process virtual
             | memory address space. There will be a tree trace there of
             | loaded libraries. (I would guess this program walks this
             | tree. But there may be better ways i do not know of that
             | this program utilize)
             | 
             | In the world of gnu/linux a static library is more or less
             | a collection of object files. The linker, to my best
             | knowledge, will not treat the content of the static
             | libraries different than from your own code. LTO can take
             | place. In the final elf the static library will be
             | indistinguishable from your own code.
             | 
             | My experience of the symbole table in elf files is limited
             | and I do not know if they could help to unwrap static
             | library dependencies. (A debug symbol table would of course
             | help).
        
         | shmerl wrote:
         | Exactly. Other use case is just more modular even if
         | dependencies are sometimes tangled unnecessarily.
        
       | surajrmal wrote:
       | The tool is interesting, but doesn't account for the fact that
       | some shared libraries opened via dlopen are done so lazily. So it
       | might miss those if you haven't executed a code path that
       | triggers them to load.
       | 
       | The other side of not accidentally loading more into your process
       | than you thought is breaking down shared libraries into
       | increasingly smaller sizes. In its limit I imagine it would be
       | akin to a function per shared library, which probably defeats the
       | point a bit.
        
         | rwmj wrote:
         | From the article it seems like that might be the point, to
         | visualise what the binary actually uses, eg. if you use
         | different preferences.
        
         | Retr0id wrote:
         | If the code path that loads the library is never hit, does it
         | really count as a dependency?
        
           | sooperserieous wrote:
           | It does if someone is intentionally sending it down that code
           | path for exactly that reason.
        
           | marcosdumay wrote:
           | If your tests don't exercise a bug, is it still there?
        
         | eqvinox wrote:
         | Lazy binding for dlopen is disabled when the LD_BIND_NOW
         | environment variable is set to a nonempty value, cf.
         | https://man7.org/linux/man-pages/man3/dlopen.3.html
        
           | tedunangst wrote:
           | That doesn't mean the program is going execute all the
           | dlopen() calls at start up.
        
             | eqvinox wrote:
             | That is true, but the root comment was specifically
             | referring to dependencies of dlopen'ed libs not getting
             | loaded. That one is 'fixable'.
             | 
             | (Btw, I'm pretty sure dlopen itself can't be lazy, due to
             | needing to run constructors; the root comment is a bit
             | vaguely worded... but ofc that only matters after dlopen is
             | called.)
        
       | rwmj wrote:
       | I was debugging a crash in vlc today - actually in the Intel
       | VDPAU driver - and debuginfod (which dynamically downloads the
       | debuginfo for everything in a coredump) took a good 15 minutes to
       | run. If you look at the 'ldd /usr/bin/vlc' output it's only about
       | 10 libraries, but it loads dozens and dozens more dynamically
       | using dlopen, and I think probably those libraries dlopen even
       | more. This tool could be pretty useful to visualise that.
        
       | quotemstr wrote:
       | We're in this situation because we're using a model of dynamic
       | linking that's decades out of date. Why aren't we using process-
       | isolated sandboxed components talking over io_uring-based low-
       | latency IPC to express most software dependencies? The vast
       | majority of these dependencies absolutely do not need to be co-
       | located with their users.
       | 
       | Consider liblzma: would liblzma-as-a-service _really_ be that
       | bad, especially if the service client and service could share
       | memory pages for zero-copy data transfer, just as we already do
       | for, e.g. video decode?
       | 
       | Or consider React Native: RN works by having an application
       | thread send a GUI scene to a renderer thread, which then adjusts
       | a native widget tree to match what the GUI thread wants. Why do
       | these threads have to be in the same process? You're doing a
       | thread switch _anyway_ to jump from the GUI thread to the
       | renderer thread: is switching address spaces at the same time
       | going to kill you? Especially if the two threads live on
       | different cores and nothing has to  "switch"?
       | 
       | Both dynamic linking _and_ static linking should be rare in
       | modern software ecosystems. We need to instead reinvigorate the
       | idea of agent-based component systems with strongly isolated
       | components.
        
         | rwmj wrote:
         | Essentially path dependency. It would be almost impossible to
         | change how it works now.
        
           | vlovich123 wrote:
           | I don't think this stuff is as hard as you make it out to be.
           | Consider that companies like Apple regularly change how
           | things are done successfully. It just requires a good plan,
           | time, & budget. Wayland is one example of what that looks
           | like & it's not a good story. Pulse audio followed by
           | pipewire is another example of migrations happening. I
           | suspect this would probably be slightly worse than Wayland
           | unless some kind of transparent shim could be written for
           | each boundary so that it can be slotted in transparently.
        
             | Analemma_ wrote:
             | Apple can regularly change how things are done because they
             | have absolute control over their platform and use an "our
             | way or the highway" approach to breaking changes, where
             | developers have to go along or lose access to a lucrative
             | market. This approach really _really_ would not work on
             | Linux: consider that the rollout of systemd was one-tenth
             | as dictatorial as is SOP for changes from Apple, and it
             | caused legions of Linux users to scream for Poettering 's
             | head on a stick.
        
               | vlovich123 wrote:
               | That's not a path dependency though. That's just a
               | critique of the bazaar development model. And honestly I
               | think if the big distros got together and agreed this
               | would be a significant security improvement, they could
               | drag the community kicking & screaming just like they did
               | with systemd (people hated systemd so much at first that
               | they tried to get other init systems to not suck but over
               | time persistent effort wins out).
        
         | dosshell wrote:
         | This is very interesting! Are there any movements to move
         | towards this?
         | 
         | Wouldn't it open up for a new attack vector where process could
         | read each other data?
        
         | ajross wrote:
         | > Why aren't we using process-isolated sandboxed components
         | talking over io_uring-based low-latency IPC to express most
         | software dependencies?
         | 
         | To some extent we are, if what you do is work on backend RPC or
         | web app frameworks.
         | 
         | But the better answer is because sometimes what you _actually
         | want_ is the ability to put a C function in a separate file
         | that can be versioned and updated on its own, which is what a
         | shared library captures. Trying to replace a function call of
         | 2-3 instructions with your io_uring monstrosity is...
         | suboptimal for a lot of applications.
         | 
         | And in any case, the protocol parsing you'd need to provide to
         | enable all that RPC is going to need to live somewhere, right?
         | What is that going to be, other than a shared library or
         | equivalent?
        
         | otabdeveloper4 wrote:
         | > is switching address spaces at the same time going to kill
         | you?
         | 
         | The answer is "yes".
         | 
         | I won't stop you, if you want to make React even slower, be my
         | guest. I want off this ride.
        
           | quotemstr wrote:
           | > The answer is "yes".
           | 
           | And if you have one per core anyway so nothing "switches"?
           | Computers aren't single-core 80486es anymore. We have highly
           | parallel machines nowadays and old intuition about what's
           | expensive and what's cheap decays by the year.
        
             | JackSlateur wrote:
             | You _know_ that process A (the caller) is currently
             | running. You _assume_ that process B (the callee) is
             | running in another core on the same NUMA nodes.
        
             | otabdeveloper4 wrote:
             | Sorry, not dedicating a whole CPU core for your shitty
             | React app.
        
             | josephg wrote:
             | I only have 16 cores. Linux, windows and macOS already load
             | about 50+ processes at startup. If we moved shared
             | libraries into their own processes, we'd be talking
             | hundreds or thousands of processes running all the time.
             | They don't get a core each.
             | 
             | But, if you're interested in this architecture, smalltalk
             | did something similar. Fire up a smalltalk vm and play
             | around!
        
         | jcelerier wrote:
         | currently involved professionally in a software architecture
         | based on pretty much raw shared memory IPC, it's still too slow
         | compared to in-process. See also VST hosts that allow grouping
         | plug-ins together in one process or separating them in distinct
         | processes, like Bitwig: for just a few dozen plug-ins you can
         | very easily get 10+% of CPU impact (and CPU is an extremely
         | dire commodity when making pro audio, it's pretty much a
         | constant fight against high CPU usage in larger music making
         | sessions)
        
           | quotemstr wrote:
           | > it's still too slow compared to in-process
           | 
           | Why? Relative to the in-process case, properly done multi-
           | process data flow pipelines don't necessarily incur extra
           | copies. Sure, switching to a different process is somewhat
           | more expensive than switching to a different thread due to
           | page table changes, but if you're doing bulk data processing,
           | you amortize any process-separation-driven costs across lots
           | of compute anyway --- and in a many-core world, you can run
           | different parts of your system on different cores anyway and
           | get away with not paying context-switch costs at all.
           | 
           | Also, 10% is actually a pretty modest price to pay for
           | increased software robustness and modularity. We're paying
           | more than that for speculative execution vulnerability
           | anyway. Do you run your fancy low-level audio processing
           | pipeline with "mitigations=off" in /proc/cmdline?
        
         | bno1 wrote:
         | A big issue with IPC is thread scheduling. Thread B needs to
         | get scheduled to see the request from thread A and thread A
         | needs to get scheduled to see the response from thread B. I
         | think there are WIP solutions to deal with this [1] this but
         | I'm not up to date.
         | 
         | [1] https://www.phoronix.com/news/Google-User-Thread-Futex-Swap
        
         | rini17 wrote:
         | Someone got a microservice hammer, so everything looks like a
         | nail eh?
         | 
         | What is really needed, is sane memory model where you can
         | easily call any function with buffers (pointer + size) and it
         | is allowed to access only these buffers and nothing else(note).
         | Not this mess coming from C where this is difficult by design.
         | 
         | (note)since HN likes to split hairs: except for its private
         | storage and other well thought exceptions
        
           | ptx wrote:
           | This would be what's known as software-based fault isolation,
           | right? Here's a paper from 1993:
           | https://dl.acm.org/doi/abs/10.1145/168619.168635
           | 
           | I don't understand why this idea keeps failing to take hold
           | even though it's constantly reintroduced in various forms.
           | Surely now, 30 years after that paper was published, we can
           | bear the "slightly increased execution time for distrusted
           | modules" in return for (as the paper suggests) faster
           | communication between isolated modules?
        
           | o11c wrote:
           | So basically like that one Rust hardware project that gets
           | posted periodically?
           | 
           | You could probably do it pretty decently in C via
           | `pkey_mprotect` (probably with `dlmopen`).
           | 
           | https://www.man7.org/linux/man-pages/man7/pkeys.7.html
        
             | quotemstr wrote:
             | It'd be nice if dlmopen weren't broken and made a linker
             | namespace so separate that it couldn't even share pthreads
             | with another.
        
         | JackSlateur wrote:
         | You would replace function calls with syscalls ? Yeah well, if
         | you omit performance and complexity, why not .. io_uring is
         | nice. Yet my application have to call that lzma function and
         | get the result now : you now add cross-process synchronization
         | (via the kernel => syscall) as well as insert the scheduler in
         | the mix
        
         | pjmlp wrote:
         | Because hardware resources, 20 years ago doing something like
         | VSCode with tons of external processes per plugin, would drag
         | your computer to a crawl.
         | 
         | Emacs wasn't Eight Megabytes and Constantly Swapping only due
         | to Elisp.
        
         | marcosdumay wrote:
         | That's because IPC is not low-latency.
         | 
         | No modern processor architecture has a proper message passing
         | mechanism. All of them expect you to use interruptions; with
         | it's inherent problems of losing cache, disrupting pipelines,
         | and well, interrupting your process flow.
         | 
         | All the modern architectures are also so close to have a proper
         | message passing mechanism that it's unsettling. You actually
         | need this to have uniform memory in a multi-core CPU. They have
         | all the mechanisms for zero copy sharing of memory, enforcing
         | coherence, atomicity, etc. AFAIK, they just lack a userspace
         | mechanism to signal other processes.
        
         | zbentley wrote:
         | This was roughly the dream of DBus. However, outside of
         | desktop-shaped niches it proved to be extremely difficult to
         | secure, standardize, and debug.
         | 
         | Process-level/address-space-level dependency sharing remains
         | both easier to think about and simpler to implement (and
         | capabilities are taking bites out of the security risks
         | entailed by this model as time goes on).
        
         | josephg wrote:
         | If the goal is to put a security boundary within the process
         | between libraries, there might be better ways to do it than
         | process boundaries. One approach is to wasm sandbox library
         | code. Firefox apparently does this - compiling some libraries
         | to wasm, then compiling the wasm back to C and linking it. They
         | get all the benefits of wasm but without any need to JIT
         | compile the code.
         | 
         | Another approach would be to leverage a language like rust. I'd
         | love it if rust provided a way to deny any sensitive access to
         | part of my dependency tree. I want to pull a library but deny
         | it the ability to run unsafe code or make any syscall (or
         | maybe, make syscalls but I'll whitelist what it's allowed to
         | call). Restrictions should be transitive to all of that
         | library's dependencies (optionally with further restrictions).
         | 
         | Both of these approaches would stop the library from doing
         | untoward things. Way more so than you'd get running the library
         | in a separate process.
        
         | magicalhippo wrote:
         | Think you would have loved Signularity OS[1].
         | 
         | It featured software-isolated processes that communicated via
         | contract-based message passing, which allowed for zero-copy
         | exchange of data.
         | 
         | As a research OS it never became a fully-fledged OS[2], but an
         | interesting attempt IMHO.
         | 
         | [1]: https://www.microsoft.com/en-us/research/wp-
         | content/uploads/...
         | 
         | [2]:
         | https://en.wikipedia.org/wiki/Singularity_%28operating_syste...
        
         | zokier wrote:
         | Dbus is a thing. Bus1 was its still-born spiritual kernel-space
         | successor.
        
       | JonChesterfield wrote:
       | The glibc separation into multiple shared libraries is such a
       | weird thing. Anyone happen to know how that happened? See musl
       | for an example where they put it all in one lib and thus avoid a
       | whole pile of failure modes.
        
         | eqvinox wrote:
         | POSIX requires it, cf.
         | https://pubs.opengroup.org/onlinepubs/9699919799/utilities/c...
         | 
         | However, that "requirement" doesn't prevent you from shipping
         | an empty libm (or other libs listed there.)
         | 
         | (The actual reason is probably that glibc is old enough to have
         | lived in a time where you cared about saving time and space by
         | not linking the math functions when you didn't need them...)
        
           | eqvinox wrote:
           | Actually, now that I'm looking at the list... I certainly
           | don't have a "libxnet" on my system ;D
        
         | o11c wrote:
         | A lot of them are stubs nowadays actually.
        
       | ajross wrote:
       | Really the big finding here is that Xlib et. al. get pulled in to
       | GPU compute tools, because access to GPU contexts has
       | traditionally been mediated by the desktop subsystem, because the
       | GPU was traditionally "owned" by the rendering layers of the
       | device abstraction.
       | 
       | The bug here is much more a changing hardware paradigm than it is
       | an issue with shared library dependencies that recapitulate it.
       | Things moved and the software layers kludged along instead of
       | reworking from scratch.
       | 
       | Obviously what's needed is a layer somewhere in the device stack
       | that "owns" the GPU resources and doles them out to desktop
       | rendering and compute clients as needed, without the two needing
       | to know about each other. But that's a ton more work than just
       | untangling some symbol dependencies!
        
       | RajT88 wrote:
       | On windows, this is Dependency Walker versus ProcExp. Similar
       | eye-goggling results.
       | 
       | https://www.dependencywalker.com/
       | 
       | https://learn.microsoft.com/en-us/sysinternals/downloads/pro...
        
         | pjmlp wrote:
         | Dependency walker used to be part of the Windows SDK, never got
         | the point why the removal, given its utility.
        
       | qwertox wrote:
       | When I google for "libvulkan_virtio" I get zero results.
       | 
       | What does it do?
        
         | epilys wrote:
         | virtio-gpu with vulkan command passthrough to host,
         | https://docs.mesa3d.org/drivers/venus.html
        
           | qwertox wrote:
           | Thank you. So is this a Vulkan emulator which does not send
           | the commands into a software renderer but rather to the
           | host's GPU? What reserves the resources on the GPU, also this
           | driver? Can one reserve resources explicitly through the API
           | or does this happen dynamically, as-needed? Because if
           | explicitly, then I'd wonder if this is also part of the
           | library, of the Vulkan spec, or if it is some Mesa offering.
        
       | YarickR2 wrote:
       | Just wait until Lennart pushes his idea of doing linking entirely
       | via dlopen() in systemd (see the story from a few days ago). Last
       | bits of sane and efficient means to track dependencies will be
       | gone forever after that. Good luck creating any lean Docker/k8s
       | images without pulling in systemd-based stack after that.
        
         | nerdponx wrote:
         | > see the story from a few days ago
         | 
         | I missed it, can you add a link or at least the post title I
         | can search for?
        
         | bheadmaster wrote:
         | > Just wait until Lennart pushes his idea of doing linking
         | entirely via dlopen() in systemd (see the story from a few days
         | ago)
         | 
         | Could you paste a link to the story? I haven't been able to
         | find it through search engines, and I'd love to read the
         | rationale of such idea...
        
           | dsissitka wrote:
           | Looks like https://news.ycombinator.com/item?id=40014724.
        
         | tedunangst wrote:
         | Wait until /lib is split into /lib.d with /lib.d/default and
         | /lib.d/available and....
        
         | dmwilcox wrote:
         | I'm impatiently waiting for the systemd shell and editor /jk
         | (and to be clear I hope this is a joke but I worry sometimes)
        
       | nerdponx wrote:
       | I am a perpetual newbie when it comes to things like this. What's
       | the advantage of dlopen()-ing instead of dynamic linking?
        
         | tux3 wrote:
         | Faster program start, and potentially making it a soft-
         | dependency instead of a hard dep (or letting you fallback on
         | something else if it's not there)
        
         | d3m0t3p wrote:
         | You can load libs at runtime with dl-open, for exemple if you
         | need a feature you can load the corresponding lib. While with
         | dynamic linking you'd load everything when the process is
         | launched slowing the launch
        
         | sgtnoodle wrote:
         | A more obscure use would be for loading multiple instances of a
         | singleton library. This is especially helpful in something like
         | a unit test suite, where you want each test case to start in a
         | cleanly initialized state. If the code under test has a bunch
         | of globally initialized variables, reloading the library at
         | runtime is one of only a few possible ways of doing it.
        
       ___________________________________________________________________
       (page generated 2024-04-14 23:02 UTC)