[HN Gopher] Speeding up ELF relocations for store-based systems
       ___________________________________________________________________
        
       Speeding up ELF relocations for store-based systems
        
       Author : setheron
       Score  : 43 points
       Date   : 2024-05-03 21:48 UTC (3 days ago)
        
 (HTM) web link (fzakaria.com)
 (TXT) w3m dump (fzakaria.com)
        
       | kreetx wrote:
       | This article confuses static linking and deterministic builds.
       | I.e Nix, a "store-based system" (author's term), still very much
       | dynamically links. Static linking means copying actual program
       | code from libraries into the executable itself, such that
       | external .so files don't need to be loaded.
        
         | geddawm wrote:
         | I believe you're referring to:
         | 
         | > Store-based systems, however, are static in nature, with all
         | dependencies being resolved at build time.
         | 
         | I think the author is saying that the shared libraries (.so)
         | are available at build time on store-based systems and never
         | change. Thus, the dynamic linker can speed up symbol resolution
         | by doing the symbol resolution at build time and sticking the
         | result in output binary. This is distinct from static linking
         | which sticks the entire library (.a) into the output binary.
        
           | setheron wrote:
           | You clarified correctly (author)
        
             | kreetx wrote:
             | Thank you both, clearly it was _I_ who was confused!
        
       | lgg wrote:
       | Windows and macOS both use a form of two level name-spacing,
       | which does the same sort of direct binding to a target library
       | for each symbol. Retrofitting that into a binary format is pretty
       | simple, but retrofitting it into an ecosystem that depends on the
       | existing flat namespace look up semantics is not. I think it is
       | pretty clever that the author noticed the static nature of the
       | nix store allows them to statically evaluate the symbol
       | resolutions and get the launch time benefits of two level
       | namespaces.
       | 
       | I do wonder if it might make more sense to rewrite the binaries
       | to use Direct Binding[1]. That is an existing encoding of library
       | targets for symbols in ELF that has been used by Solaris for a
       | number of years.
       | 
       | 1: https://en.wikipedia.org/wiki/Direct_binding
        
         | JonChesterfield wrote:
         | That is much better than the Linux model!
         | 
         | Not only is there less crawling around looking for symbols,
         | you're no longer in trouble when two libraries export the same
         | symbol.
         | 
         | Especially given libraries are found by name, and symbols by
         | name, where "type information" or "is that actually the library
         | I wanted" are afterthoughts.
        
           | cryptonector wrote:
           | > you're no longer in trouble when two libraries export the
           | same symbol.
           | 
           | Whether you use direct binding or symbol versioning, either
           | way you don't have a problem with multiple libraries
           | exporting the same symbol.
           | 
           | By the way, this is the fundamental problem with static
           | linking for C: it's still stuck with 1970s semantics and you
           | can't get the same symbol conflict resolution semantics as
           | with ELF because the static linker-editors do not record
           | dependencies in static link archives.
           | 
           | The key insight is that when you link-edit your libraries and
           | programs you should provide only the direct dependencies, and
           | the linker-editor should then record in its output which one
           | of those provided which symbol. Compare to static linking
           | where only the final edit gets the dependency information
           | _and_ that dependency tree has to get flattened (because it
           | has to fit on a command-line, which is linear in nature).
        
         | glandium wrote:
         | I think you can get an effect similar to direct binding with
         | symbol versioning.
        
           | cryptonector wrote:
           | You can get the same semantics as for direct binding using
           | symbol versioning, but direct binding is _faster_.
           | 
           | Also, symbol versioning is only really better than direct
           | binding if you end up having multiple versions of the same
           | symbol provided by the _same_ object, but that 's relatively
           | hard to use, so it's really only ever used for things like
           | the C library itself. Mind you, that is a very valuable
           | feature when you need it. In Solaris itself when we needed to
           | deal with the various different behaviors of snprintf() there
           | just wasn't a good way to do it, and only symbol versioning
           | with support for multiple versions of a symbol would have
           | helped.
        
           | lgg wrote:
           | Not really... symbol versioning is a form of namespacing, but
           | it is somewhat orthogonal to this.
           | 
           | Symbol versioning allows you to have multiple symbols with
           | the same name namespaced by version, but you still have no
           | control over what library in the search path they will be
           | found in. So it does not improve the speed of the runtime
           | searching (since they could be in any library an the search
           | path and you still need to search for them in order), and it
           | does not provide the the same binary compatibility support
           | and dylib hijacking protection (since again, any dylibs
           | earlier in the search path could declare a symbol with he
           | same name.
           | 
           | One could use symbol versioning to construct a system where
           | you had the same binary protection guarantees, but it would
           | involve every library declaring a unique version string, and
           | guaranteeing there are no collisions. The obvious way to do
           | that would be to use the file path as the symbol version, at
           | which point you have reinvented mach-o install names, except:
           | 
           | 1. You still do not get the runtime speed ups unless you
           | change the dynamic linker behavior to use the version string
           | as the search path, which would require ecosystem wide
           | changes.
           | 
           | 2. You can't actually use symbol versioning to do versioned
           | symbols any more, since you overloaded the use of version
           | strings (mach-o binaries end up accomplishing symbol
           | versioning through header tricks with `asmname`, so it is not
           | completely intractable to do even without explicit support).
        
             | cryptonector wrote:
             | > Symbol versioning allows you to have multiple symbols
             | with the same name namespaced by version, but you still
             | have no control over what library in the search path they
             | will be found in.
             | 
             | Yes, but since the convention is to use the SONAME and
             | SOVERSION in the symbol version therefore in practice the
             | symbol version does -when adhering to this convention- help
             | in binding symbols to objects.
             | 
             | Still, because this is an indirect scheme it does not help
             | speed up relocation processing.
             | 
             | As you say, direct binding _is_ better for safety and
             | speed.
        
       | cryptonector wrote:
       | Another option is the Solaris/Illumos "direct binding" scheme
       | where each object stores for each external symbol the SONAME of
       | the object meant to provide it. It's a lot like pre-linking, but
       | a) less intrusive, b) less fast (because it's less intrusive).
       | 
       | EDIT: Ay, https://news.ycombinator.com/item?id=40268546 mentions
       | this.
        
       | cryptonector wrote:
       | I don't like the term "store-based" for what Nix does. Nix uses a
       | partial transitive universe hash to compute deploy-time locations
       | for built artifacts at build configuration time so that those can
       | be hard-coded into built artifacts -- "store-based" _hardly_
       | captures this. I don 't know what to call this, but "store-based"
       | is insufficiently suggestive.
        
       ___________________________________________________________________
       (page generated 2024-05-06 23:00 UTC)