[HN Gopher] Speeding up ELF relocations for store-based systems
___________________________________________________________________
Speeding up ELF relocations for store-based systems
Author : setheron
Score : 43 points
Date : 2024-05-03 21:48 UTC (3 days ago)
(HTM) web link (fzakaria.com)
(TXT) w3m dump (fzakaria.com)
| kreetx wrote:
| This article confuses static linking and deterministic builds.
| I.e Nix, a "store-based system" (author's term), still very much
| dynamically links. Static linking means copying actual program
| code from libraries into the executable itself, such that
| external .so files don't need to be loaded.
| geddawm wrote:
| I believe you're referring to:
|
| > Store-based systems, however, are static in nature, with all
| dependencies being resolved at build time.
|
| I think the author is saying that the shared libraries (.so)
| are available at build time on store-based systems and never
| change. Thus, the dynamic linker can speed up symbol resolution
| by doing the symbol resolution at build time and sticking the
| result in output binary. This is distinct from static linking
| which sticks the entire library (.a) into the output binary.
| setheron wrote:
| You clarified correctly (author)
| kreetx wrote:
| Thank you both, clearly it was _I_ who was confused!
| lgg wrote:
| Windows and macOS both use a form of two level name-spacing,
| which does the same sort of direct binding to a target library
| for each symbol. Retrofitting that into a binary format is pretty
| simple, but retrofitting it into an ecosystem that depends on the
| existing flat namespace look up semantics is not. I think it is
| pretty clever that the author noticed the static nature of the
| nix store allows them to statically evaluate the symbol
| resolutions and get the launch time benefits of two level
| namespaces.
|
| I do wonder if it might make more sense to rewrite the binaries
| to use Direct Binding[1]. That is an existing encoding of library
| targets for symbols in ELF that has been used by Solaris for a
| number of years.
|
| 1: https://en.wikipedia.org/wiki/Direct_binding
| JonChesterfield wrote:
| That is much better than the Linux model!
|
| Not only is there less crawling around looking for symbols,
| you're no longer in trouble when two libraries export the same
| symbol.
|
| Especially given libraries are found by name, and symbols by
| name, where "type information" or "is that actually the library
| I wanted" are afterthoughts.
| cryptonector wrote:
| > you're no longer in trouble when two libraries export the
| same symbol.
|
| Whether you use direct binding or symbol versioning, either
| way you don't have a problem with multiple libraries
| exporting the same symbol.
|
| By the way, this is the fundamental problem with static
| linking for C: it's still stuck with 1970s semantics and you
| can't get the same symbol conflict resolution semantics as
| with ELF because the static linker-editors do not record
| dependencies in static link archives.
|
| The key insight is that when you link-edit your libraries and
| programs you should provide only the direct dependencies, and
| the linker-editor should then record in its output which one
| of those provided which symbol. Compare to static linking
| where only the final edit gets the dependency information
| _and_ that dependency tree has to get flattened (because it
| has to fit on a command-line, which is linear in nature).
| glandium wrote:
| I think you can get an effect similar to direct binding with
| symbol versioning.
| cryptonector wrote:
| You can get the same semantics as for direct binding using
| symbol versioning, but direct binding is _faster_.
|
| Also, symbol versioning is only really better than direct
| binding if you end up having multiple versions of the same
| symbol provided by the _same_ object, but that 's relatively
| hard to use, so it's really only ever used for things like
| the C library itself. Mind you, that is a very valuable
| feature when you need it. In Solaris itself when we needed to
| deal with the various different behaviors of snprintf() there
| just wasn't a good way to do it, and only symbol versioning
| with support for multiple versions of a symbol would have
| helped.
| lgg wrote:
| Not really... symbol versioning is a form of namespacing, but
| it is somewhat orthogonal to this.
|
| Symbol versioning allows you to have multiple symbols with
| the same name namespaced by version, but you still have no
| control over what library in the search path they will be
| found in. So it does not improve the speed of the runtime
| searching (since they could be in any library an the search
| path and you still need to search for them in order), and it
| does not provide the the same binary compatibility support
| and dylib hijacking protection (since again, any dylibs
| earlier in the search path could declare a symbol with he
| same name.
|
| One could use symbol versioning to construct a system where
| you had the same binary protection guarantees, but it would
| involve every library declaring a unique version string, and
| guaranteeing there are no collisions. The obvious way to do
| that would be to use the file path as the symbol version, at
| which point you have reinvented mach-o install names, except:
|
| 1. You still do not get the runtime speed ups unless you
| change the dynamic linker behavior to use the version string
| as the search path, which would require ecosystem wide
| changes.
|
| 2. You can't actually use symbol versioning to do versioned
| symbols any more, since you overloaded the use of version
| strings (mach-o binaries end up accomplishing symbol
| versioning through header tricks with `asmname`, so it is not
| completely intractable to do even without explicit support).
| cryptonector wrote:
| > Symbol versioning allows you to have multiple symbols
| with the same name namespaced by version, but you still
| have no control over what library in the search path they
| will be found in.
|
| Yes, but since the convention is to use the SONAME and
| SOVERSION in the symbol version therefore in practice the
| symbol version does -when adhering to this convention- help
| in binding symbols to objects.
|
| Still, because this is an indirect scheme it does not help
| speed up relocation processing.
|
| As you say, direct binding _is_ better for safety and
| speed.
| cryptonector wrote:
| Another option is the Solaris/Illumos "direct binding" scheme
| where each object stores for each external symbol the SONAME of
| the object meant to provide it. It's a lot like pre-linking, but
| a) less intrusive, b) less fast (because it's less intrusive).
|
| EDIT: Ay, https://news.ycombinator.com/item?id=40268546 mentions
| this.
| cryptonector wrote:
| I don't like the term "store-based" for what Nix does. Nix uses a
| partial transitive universe hash to compute deploy-time locations
| for built artifacts at build configuration time so that those can
| be hard-coded into built artifacts -- "store-based" _hardly_
| captures this. I don 't know what to call this, but "store-based"
| is insufficiently suggestive.
___________________________________________________________________
(page generated 2024-05-06 23:00 UTC)