[HN Gopher] Hidden dependencies in Linux binaries
___________________________________________________________________
Hidden dependencies in Linux binaries
Author : thunderbong
Score : 102 points
Date : 2024-04-14 17:33 UTC (5 hours ago)
(HTM) web link (thelittleengineerthatcould.blogspot.com)
(TXT) w3m dump (thelittleengineerthatcould.blogspot.com)
| ris wrote:
| > Meanwhile, when I use CUDA instead of Vulkan, I get serenity
| back. CUDA FTW!
|
| Just because the complexity is hidden from you doesn't mean it's
| not there. You have no idea what is statically bundled into the
| CUDA libs.
| dosshell wrote:
| I agree with you, hidden is worse.
|
| But we do know what it can not static link to, any GPL library,
| which many indirect dependencies are.
| qwertox wrote:
| If static and dynamic libraries use the same interface,
| shouldn't they be detectable in both cases? Or is it removed
| at compile time?
| dosshell wrote:
| First IANACC (I'm not a compiler programmer), but this is
| my understanding:
|
| What do you mean by interface?
|
| A dynamic library is handled very different from a static
| one. A dynamic library is loaded into the process virtual
| memory address space. There will be a tree trace there of
| loaded libraries. (I would guess this program walks this
| tree. But there may be better ways i do not know of that
| this program utilize)
|
| In the world of gnu/linux a static library is more or less
| a collection of object files. The linker, to my best
| knowledge, will not treat the content of the static
| libraries different than from your own code. LTO can take
| place. In the final elf the static library will be
| indistinguishable from your own code.
|
| My experience of the symbole table in elf files is limited
| and I do not know if they could help to unwrap static
| library dependencies. (A debug symbol table would of course
| help).
| shmerl wrote:
| Exactly. Other use case is just more modular even if
| dependencies are sometimes tangled unnecessarily.
| surajrmal wrote:
| The tool is interesting, but doesn't account for the fact that
| some shared libraries opened via dlopen are done so lazily. So it
| might miss those if you haven't executed a code path that
| triggers them to load.
|
| The other side of not accidentally loading more into your process
| than you thought is breaking down shared libraries into
| increasingly smaller sizes. In its limit I imagine it would be
| akin to a function per shared library, which probably defeats the
| point a bit.
| rwmj wrote:
| From the article it seems like that might be the point, to
| visualise what the binary actually uses, eg. if you use
| different preferences.
| Retr0id wrote:
| If the code path that loads the library is never hit, does it
| really count as a dependency?
| sooperserieous wrote:
| It does if someone is intentionally sending it down that code
| path for exactly that reason.
| marcosdumay wrote:
| If your tests don't exercise a bug, is it still there?
| eqvinox wrote:
| Lazy binding for dlopen is disabled when the LD_BIND_NOW
| environment variable is set to a nonempty value, cf.
| https://man7.org/linux/man-pages/man3/dlopen.3.html
| tedunangst wrote:
| That doesn't mean the program is going execute all the
| dlopen() calls at start up.
| eqvinox wrote:
| That is true, but the root comment was specifically
| referring to dependencies of dlopen'ed libs not getting
| loaded. That one is 'fixable'.
|
| (Btw, I'm pretty sure dlopen itself can't be lazy, due to
| needing to run constructors; the root comment is a bit
| vaguely worded... but ofc that only matters after dlopen is
| called.)
| rwmj wrote:
| I was debugging a crash in vlc today - actually in the Intel
| VDPAU driver - and debuginfod (which dynamically downloads the
| debuginfo for everything in a coredump) took a good 15 minutes to
| run. If you look at the 'ldd /usr/bin/vlc' output it's only about
| 10 libraries, but it loads dozens and dozens more dynamically
| using dlopen, and I think probably those libraries dlopen even
| more. This tool could be pretty useful to visualise that.
| quotemstr wrote:
| We're in this situation because we're using a model of dynamic
| linking that's decades out of date. Why aren't we using process-
| isolated sandboxed components talking over io_uring-based low-
| latency IPC to express most software dependencies? The vast
| majority of these dependencies absolutely do not need to be co-
| located with their users.
|
| Consider liblzma: would liblzma-as-a-service _really_ be that
| bad, especially if the service client and service could share
| memory pages for zero-copy data transfer, just as we already do
| for, e.g. video decode?
|
| Or consider React Native: RN works by having an application
| thread send a GUI scene to a renderer thread, which then adjusts
| a native widget tree to match what the GUI thread wants. Why do
| these threads have to be in the same process? You're doing a
| thread switch _anyway_ to jump from the GUI thread to the
| renderer thread: is switching address spaces at the same time
| going to kill you? Especially if the two threads live on
| different cores and nothing has to "switch"?
|
| Both dynamic linking _and_ static linking should be rare in
| modern software ecosystems. We need to instead reinvigorate the
| idea of agent-based component systems with strongly isolated
| components.
| rwmj wrote:
| Essentially path dependency. It would be almost impossible to
| change how it works now.
| vlovich123 wrote:
| I don't think this stuff is as hard as you make it out to be.
| Consider that companies like Apple regularly change how
| things are done successfully. It just requires a good plan,
| time, & budget. Wayland is one example of what that looks
| like & it's not a good story. Pulse audio followed by
| pipewire is another example of migrations happening. I
| suspect this would probably be slightly worse than Wayland
| unless some kind of transparent shim could be written for
| each boundary so that it can be slotted in transparently.
| Analemma_ wrote:
| Apple can regularly change how things are done because they
| have absolute control over their platform and use an "our
| way or the highway" approach to breaking changes, where
| developers have to go along or lose access to a lucrative
| market. This approach really _really_ would not work on
| Linux: consider that the rollout of systemd was one-tenth
| as dictatorial as is SOP for changes from Apple, and it
| caused legions of Linux users to scream for Poettering 's
| head on a stick.
| vlovich123 wrote:
| That's not a path dependency though. That's just a
| critique of the bazaar development model. And honestly I
| think if the big distros got together and agreed this
| would be a significant security improvement, they could
| drag the community kicking & screaming just like they did
| with systemd (people hated systemd so much at first that
| they tried to get other init systems to not suck but over
| time persistent effort wins out).
| dosshell wrote:
| This is very interesting! Are there any movements to move
| towards this?
|
| Wouldn't it open up for a new attack vector where process could
| read each other data?
| ajross wrote:
| > Why aren't we using process-isolated sandboxed components
| talking over io_uring-based low-latency IPC to express most
| software dependencies?
|
| To some extent we are, if what you do is work on backend RPC or
| web app frameworks.
|
| But the better answer is because sometimes what you _actually
| want_ is the ability to put a C function in a separate file
| that can be versioned and updated on its own, which is what a
| shared library captures. Trying to replace a function call of
| 2-3 instructions with your io_uring monstrosity is...
| suboptimal for a lot of applications.
|
| And in any case, the protocol parsing you'd need to provide to
| enable all that RPC is going to need to live somewhere, right?
| What is that going to be, other than a shared library or
| equivalent?
| otabdeveloper4 wrote:
| > is switching address spaces at the same time going to kill
| you?
|
| The answer is "yes".
|
| I won't stop you, if you want to make React even slower, be my
| guest. I want off this ride.
| quotemstr wrote:
| > The answer is "yes".
|
| And if you have one per core anyway so nothing "switches"?
| Computers aren't single-core 80486es anymore. We have highly
| parallel machines nowadays and old intuition about what's
| expensive and what's cheap decays by the year.
| JackSlateur wrote:
| You _know_ that process A (the caller) is currently
| running. You _assume_ that process B (the callee) is
| running in another core on the same NUMA nodes.
| otabdeveloper4 wrote:
| Sorry, not dedicating a whole CPU core for your shitty
| React app.
| josephg wrote:
| I only have 16 cores. Linux, windows and macOS already load
| about 50+ processes at startup. If we moved shared
| libraries into their own processes, we'd be talking
| hundreds or thousands of processes running all the time.
| They don't get a core each.
|
| But, if you're interested in this architecture, smalltalk
| did something similar. Fire up a smalltalk vm and play
| around!
| jcelerier wrote:
| currently involved professionally in a software architecture
| based on pretty much raw shared memory IPC, it's still too slow
| compared to in-process. See also VST hosts that allow grouping
| plug-ins together in one process or separating them in distinct
| processes, like Bitwig: for just a few dozen plug-ins you can
| very easily get 10+% of CPU impact (and CPU is an extremely
| dire commodity when making pro audio, it's pretty much a
| constant fight against high CPU usage in larger music making
| sessions)
| quotemstr wrote:
| > it's still too slow compared to in-process
|
| Why? Relative to the in-process case, properly done multi-
| process data flow pipelines don't necessarily incur extra
| copies. Sure, switching to a different process is somewhat
| more expensive than switching to a different thread due to
| page table changes, but if you're doing bulk data processing,
| you amortize any process-separation-driven costs across lots
| of compute anyway --- and in a many-core world, you can run
| different parts of your system on different cores anyway and
| get away with not paying context-switch costs at all.
|
| Also, 10% is actually a pretty modest price to pay for
| increased software robustness and modularity. We're paying
| more than that for speculative execution vulnerability
| anyway. Do you run your fancy low-level audio processing
| pipeline with "mitigations=off" in /proc/cmdline?
| bno1 wrote:
| A big issue with IPC is thread scheduling. Thread B needs to
| get scheduled to see the request from thread A and thread A
| needs to get scheduled to see the response from thread B. I
| think there are WIP solutions to deal with this [1] this but
| I'm not up to date.
|
| [1] https://www.phoronix.com/news/Google-User-Thread-Futex-Swap
| rini17 wrote:
| Someone got a microservice hammer, so everything looks like a
| nail eh?
|
| What is really needed, is sane memory model where you can
| easily call any function with buffers (pointer + size) and it
| is allowed to access only these buffers and nothing else(note).
| Not this mess coming from C where this is difficult by design.
|
| (note)since HN likes to split hairs: except for its private
| storage and other well thought exceptions
| ptx wrote:
| This would be what's known as software-based fault isolation,
| right? Here's a paper from 1993:
| https://dl.acm.org/doi/abs/10.1145/168619.168635
|
| I don't understand why this idea keeps failing to take hold
| even though it's constantly reintroduced in various forms.
| Surely now, 30 years after that paper was published, we can
| bear the "slightly increased execution time for distrusted
| modules" in return for (as the paper suggests) faster
| communication between isolated modules?
| o11c wrote:
| So basically like that one Rust hardware project that gets
| posted periodically?
|
| You could probably do it pretty decently in C via
| `pkey_mprotect` (probably with `dlmopen`).
|
| https://www.man7.org/linux/man-pages/man7/pkeys.7.html
| quotemstr wrote:
| It'd be nice if dlmopen weren't broken and made a linker
| namespace so separate that it couldn't even share pthreads
| with another.
| JackSlateur wrote:
| You would replace function calls with syscalls ? Yeah well, if
| you omit performance and complexity, why not .. io_uring is
| nice. Yet my application have to call that lzma function and
| get the result now : you now add cross-process synchronization
| (via the kernel => syscall) as well as insert the scheduler in
| the mix
| pjmlp wrote:
| Because hardware resources, 20 years ago doing something like
| VSCode with tons of external processes per plugin, would drag
| your computer to a crawl.
|
| Emacs wasn't Eight Megabytes and Constantly Swapping only due
| to Elisp.
| marcosdumay wrote:
| That's because IPC is not low-latency.
|
| No modern processor architecture has a proper message passing
| mechanism. All of them expect you to use interruptions; with
| it's inherent problems of losing cache, disrupting pipelines,
| and well, interrupting your process flow.
|
| All the modern architectures are also so close to have a proper
| message passing mechanism that it's unsettling. You actually
| need this to have uniform memory in a multi-core CPU. They have
| all the mechanisms for zero copy sharing of memory, enforcing
| coherence, atomicity, etc. AFAIK, they just lack a userspace
| mechanism to signal other processes.
| zbentley wrote:
| This was roughly the dream of DBus. However, outside of
| desktop-shaped niches it proved to be extremely difficult to
| secure, standardize, and debug.
|
| Process-level/address-space-level dependency sharing remains
| both easier to think about and simpler to implement (and
| capabilities are taking bites out of the security risks
| entailed by this model as time goes on).
| josephg wrote:
| If the goal is to put a security boundary within the process
| between libraries, there might be better ways to do it than
| process boundaries. One approach is to wasm sandbox library
| code. Firefox apparently does this - compiling some libraries
| to wasm, then compiling the wasm back to C and linking it. They
| get all the benefits of wasm but without any need to JIT
| compile the code.
|
| Another approach would be to leverage a language like rust. I'd
| love it if rust provided a way to deny any sensitive access to
| part of my dependency tree. I want to pull a library but deny
| it the ability to run unsafe code or make any syscall (or
| maybe, make syscalls but I'll whitelist what it's allowed to
| call). Restrictions should be transitive to all of that
| library's dependencies (optionally with further restrictions).
|
| Both of these approaches would stop the library from doing
| untoward things. Way more so than you'd get running the library
| in a separate process.
| magicalhippo wrote:
| Think you would have loved Signularity OS[1].
|
| It featured software-isolated processes that communicated via
| contract-based message passing, which allowed for zero-copy
| exchange of data.
|
| As a research OS it never became a fully-fledged OS[2], but an
| interesting attempt IMHO.
|
| [1]: https://www.microsoft.com/en-us/research/wp-
| content/uploads/...
|
| [2]:
| https://en.wikipedia.org/wiki/Singularity_%28operating_syste...
| zokier wrote:
| Dbus is a thing. Bus1 was its still-born spiritual kernel-space
| successor.
| JonChesterfield wrote:
| The glibc separation into multiple shared libraries is such a
| weird thing. Anyone happen to know how that happened? See musl
| for an example where they put it all in one lib and thus avoid a
| whole pile of failure modes.
| eqvinox wrote:
| POSIX requires it, cf.
| https://pubs.opengroup.org/onlinepubs/9699919799/utilities/c...
|
| However, that "requirement" doesn't prevent you from shipping
| an empty libm (or other libs listed there.)
|
| (The actual reason is probably that glibc is old enough to have
| lived in a time where you cared about saving time and space by
| not linking the math functions when you didn't need them...)
| eqvinox wrote:
| Actually, now that I'm looking at the list... I certainly
| don't have a "libxnet" on my system ;D
| o11c wrote:
| A lot of them are stubs nowadays actually.
| ajross wrote:
| Really the big finding here is that Xlib et. al. get pulled in to
| GPU compute tools, because access to GPU contexts has
| traditionally been mediated by the desktop subsystem, because the
| GPU was traditionally "owned" by the rendering layers of the
| device abstraction.
|
| The bug here is much more a changing hardware paradigm than it is
| an issue with shared library dependencies that recapitulate it.
| Things moved and the software layers kludged along instead of
| reworking from scratch.
|
| Obviously what's needed is a layer somewhere in the device stack
| that "owns" the GPU resources and doles them out to desktop
| rendering and compute clients as needed, without the two needing
| to know about each other. But that's a ton more work than just
| untangling some symbol dependencies!
| RajT88 wrote:
| On windows, this is Dependency Walker versus ProcExp. Similar
| eye-goggling results.
|
| https://www.dependencywalker.com/
|
| https://learn.microsoft.com/en-us/sysinternals/downloads/pro...
| pjmlp wrote:
| Dependency walker used to be part of the Windows SDK, never got
| the point why the removal, given its utility.
| qwertox wrote:
| When I google for "libvulkan_virtio" I get zero results.
|
| What does it do?
| epilys wrote:
| virtio-gpu with vulkan command passthrough to host,
| https://docs.mesa3d.org/drivers/venus.html
| qwertox wrote:
| Thank you. So is this a Vulkan emulator which does not send
| the commands into a software renderer but rather to the
| host's GPU? What reserves the resources on the GPU, also this
| driver? Can one reserve resources explicitly through the API
| or does this happen dynamically, as-needed? Because if
| explicitly, then I'd wonder if this is also part of the
| library, of the Vulkan spec, or if it is some Mesa offering.
| YarickR2 wrote:
| Just wait until Lennart pushes his idea of doing linking entirely
| via dlopen() in systemd (see the story from a few days ago). Last
| bits of sane and efficient means to track dependencies will be
| gone forever after that. Good luck creating any lean Docker/k8s
| images without pulling in systemd-based stack after that.
| nerdponx wrote:
| > see the story from a few days ago
|
| I missed it, can you add a link or at least the post title I
| can search for?
| bheadmaster wrote:
| > Just wait until Lennart pushes his idea of doing linking
| entirely via dlopen() in systemd (see the story from a few days
| ago)
|
| Could you paste a link to the story? I haven't been able to
| find it through search engines, and I'd love to read the
| rationale of such idea...
| dsissitka wrote:
| Looks like https://news.ycombinator.com/item?id=40014724.
| tedunangst wrote:
| Wait until /lib is split into /lib.d with /lib.d/default and
| /lib.d/available and....
| dmwilcox wrote:
| I'm impatiently waiting for the systemd shell and editor /jk
| (and to be clear I hope this is a joke but I worry sometimes)
| nerdponx wrote:
| I am a perpetual newbie when it comes to things like this. What's
| the advantage of dlopen()-ing instead of dynamic linking?
| tux3 wrote:
| Faster program start, and potentially making it a soft-
| dependency instead of a hard dep (or letting you fallback on
| something else if it's not there)
| d3m0t3p wrote:
| You can load libs at runtime with dl-open, for exemple if you
| need a feature you can load the corresponding lib. While with
| dynamic linking you'd load everything when the process is
| launched slowing the launch
| sgtnoodle wrote:
| A more obscure use would be for loading multiple instances of a
| singleton library. This is especially helpful in something like
| a unit test suite, where you want each test case to start in a
| cleanly initialized state. If the code under test has a bunch
| of globally initialized variables, reloading the library at
| runtime is one of only a few possible ways of doing it.
___________________________________________________________________
(page generated 2024-04-14 23:02 UTC)