[HN Gopher] The glibc s390 ABI break (2014)
       ___________________________________________________________________
        
       The glibc s390 ABI break (2014)
        
       Author : rdpintqogeogsaa
       Score  : 64 points
       Date   : 2022-03-17 15:27 UTC (7 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | somat wrote:
       | I always feel the Linux people learn the wrong lesson from these
       | events.
       | 
       | The difference between the Linux and the OpenBSD mentality in one
       | example.
       | 
       | The ABI must be changed.
       | 
       | Linux: That was really hard, we are never going to do that again.
       | 
       | OpenBSD: That was really hard, we better get good at it.
       | 
       | My opinion: When your deliverable is made up of source code(like
       | these _open source_ projects are), the ABI and ABI stability is
       | not that important, it is the API(the source interface) that is
       | critical.
        
         | ggreg84 wrote:
         | If you can re-compile all your source code, then ABI stability
         | is not that important.
         | 
         | In the real world,
         | 
         | - people use operating systems to get work done,
         | 
         | - such work is often done with proprietary software that people
         | buy, and that lot of people get paid to develop
         | 
         | - breaking such software is not acceptable, because it means
         | that suddenly dozens of thousand of people can't work
         | 
         | - telling all those people to "stop working" until their
         | software is fixed is not acceptable either.
         | 
         | People use Linux for work in the real world, and that's why
         | Linux tries very hard not to break ABIs.
         | 
         | Microsoft has had a stable ABI for 30 years, and people still
         | run software compiled 30 years ago on the latest Windows
         | version today.
         | 
         | What this tells you about OpenBSD users, is another story.
        
           | moonbug wrote:
           | no one uses Debian on s/390 for anything important anyway
        
           | ece wrote:
           | The source-based Linux distros (Gentoo in my experience) do
           | try to get better at keeping things working and moving
           | forward when ABI breaks occur. Recompiling downstream
           | dependencies (FFIs, libraries/packages), keeping the old
           | library/packages around until a new ones are available, and
           | keeping track of breaks/automating fixes with versioning all
           | happen when for example a new boost or qt version comes out.
           | All of this automated by the package manager. Proprietary
           | packages that require fixed ABIs from certain libraries have
           | *-compat packages in the gentoo repos, and other distros have
           | this too.
           | 
           | Ultimately, it is about the APIs when you're compiling from
           | source, a package that isn't using a new API is going to need
           | the old library, and both might need to be compiled with and
           | linked against the same toolchain. I think Linux and BSDs are
           | closer than one might think here. Packaging and upstreams
           | have both gotten better here over time I think, at least over
           | the last couple of decades I've used Linux. I've only played
           | around with the BSDs in VMs.
        
           | alerighi wrote:
           | And even if you can recompile everything, it's still a pain
           | to do so, and it's better to avoid doing so. Also recompiling
           | the software may not be as simple, for example can require
           | modifications of its source code to adapt it to newer
           | versions of the compiler or other libraries that changed, and
           | of course you can't build it on an older system since the ABI
           | changed.
           | 
           | This is also the reason why containers are so popular these
           | days, ship a software with all its dependencies to avoid
           | having to recompile stuff each time you upgrade or change the
           | operating system.
        
         | leeter wrote:
         | This forgets MS:
         | 
         | MS: "Let's design it so either it never breaks or if we can't
         | avoid that we'll just add a FooEx or Foo2 method call"
         | 
         | This is why so many Win32 methods take structs that have a size
         | parameter that must be filled in. That acts as a version
         | parameter. That's not to say they haven't had ABI fun
         | moments... _glares at COM_.
        
           | secondcoming wrote:
           | One flaw with that is that the struct size won't change if
           | you do something like reordering the members (not that anyone
           | would ever do that).
        
             | leeter wrote:
             | Which they explicitly prohibit, in fact if you look at how
             | they design the APIs they'll deliberately add reserved
             | members to ensure they hold the space so that it's much
             | harder to break.
        
             | asveikau wrote:
             | Yes, it takes some discipline to keep that working. You
             | need to internalize which changes cause ABI breaks, look
             | for them in code reviews, evangelize that culture to new
             | hires that aren't familiar with the rules (which becomes
             | harder over time as people coming up now tend to spend less
             | time learning C), etc.
        
           | Someone wrote:
           | Example: https://gankra.github.io/blah/c-isnt-a-
           | language/#case-study-...
        
             | leeter wrote:
             | NGL... that inspired my comment XD
        
           | stouset wrote:
           | While Microsoft's backwards-compatibility guarantees have
           | historically been a strong competitive advantage for them...
           | every time I get a glance into Windows APIs I'm blown away by
           | how much historical garbage they seem to be stuck with.
           | 
           | I get this feeling every time I read a post on The Old New
           | Thing (Raymond Chen). He'll patiently explain why some weird
           | wart is the way it is, or how misusing some API is bad, and
           | it's always extremely interesting. But at the same time I'm
           | sitting there just thanking my lucky stars I've managed to
           | avoid ever having to write software for that environment,
           | because it's always absolutely bonkers.
        
             | _3u10 wrote:
             | It's so much nicer than Linux.
             | 
             | Whatever version of windows you make your software work on
             | it just works on all future versions.
             | 
             | If you don't like the old APIs don't use them.
             | 
             | Being able to put software on a website and have people
             | download and use it immediately is kind of amazing compared
             | to Linux.
             | 
             | COM is very annoying tho. I'll give you that.
        
             | leeter wrote:
             | Honestly? Having written software for both Win32 is
             | actually pretty nice to work in. Better in many ways than
             | the alternatives. Most of the things Raymond goes on about
             | are weird quirks you can ignore for the most part. Whereas
             | there are some very serious hazards running around POSIX if
             | you're not careful that can bite hard. _Glares at signals_
             | 
             | I much prefer the EVENT based system Win32 goes with. You
             | can just suspend on the event and not worry about it. The
             | kernel will wake you up when it matters. Also MS seems to
             | have very serious API design rules that help keep usage
             | patterns consistent, which is important in making sure
             | things aren't accidentally misused.
        
             | hnlmorg wrote:
             | While I much prefer Linux as a platform, it's fair to say
             | Linux has more than it's fair share of historical baggage
             | as well. The entire design of the console for starter:
             | TTYs, ANSI escape sequences used for encoding data in band,
             | typeless byte streams, job control being a weird cross
             | responsibility between the kernel, shell and application,
             | etc.
             | 
             | ...And that's without addressing any other parts of the
             | system beyond the console.
             | 
             | I'd still take Linux over Windows every day if the week
             | though.
        
           | Asooka wrote:
           | Also "make everyone carry their dependencies down to the
           | standard C library". The churn that the userland on Linux
           | distros has is very unfortunate, but it's probably not going
           | to change any time soon. It's not _that_ hard to ship
           | software with minimal dependencies if you know what you 're
           | doing, but it sure is easier for Windows.
        
           | ithkuil wrote:
           | AmigaOS2.0 had an interesting API: tag arrays: a variable
           | length key/value pair of parameters
        
           | orra wrote:
           | > This is why so many Win32 methods take structs that have a
           | size parameter that must be filled in. That acts as a version
           | parameter
           | 
           | That said, I couldn't name you a Win32 that actually has more
           | than one valid value for said cbSize member. Are there any?
        
             | MaulingMonkey wrote:
             | um/winuser.h has some, although they haven't changed in
             | awhile. WNDCLASSEX gained hIconSm after Windows 3.x.
             | MENUITEMINFO gained hbmpItem in Windows 2000.
             | NONCLIENTMETRICS gained iPaddedBorderWidth in Windows
             | Vista.
             | 
             | https://devblogs.microsoft.com/oldnewthing/20031212-00/?p=4
             | 1...
             | 
             | um/webauthn.h has a bunch of structures that explicitly
             | document fields added in various dwVersion s. However, no
             | individual structure has changed since being published in
             | the Windows SDK last I checked. e.g.:                   //
             | // The following fields have been added in
             | WEBAUTHN_AUTHENTICATOR_MAKE_CREDENTIAL_OPTIONS_VERSION_2
             | //              // Cancellation Id - Optional - See
             | WebAuthNGetCancellationId         GUID *pCancellationId;
             | 
             | um/wincrypt.h also has a few structures chopped up with
             | #ifdef ..._HAS_EXTRA_FIELDS s that are presumably
             | differentaited between via cbSize.
             | 
             | um/ShlObj_core.h has COMPONENT, which extends and is
             | differentiated from IE4COMPONENT - presumably by dwSize.
             | 
             | It also wouldn't suprise me if cbSize is also used to
             | differentiate between 32-bit and 64-bit versions of the
             | same structure as well, with WoW64 blindly forwarding said
             | structures.
        
             | pitterpatter wrote:
             | It's hard to notice sometimes because the a struct FOO is
             | often a typedef to whatever the latest version is in the
             | SDK you're using (FOO1, FOO2, etc).
        
             | jborean93 wrote:
             | A good example of a really important one is CreateProcess
             | for lpStartupInfo. The value is either a STARTINFO or
             | STARTUPINFOEX and the first member of both structs is cb
             | which is set to the size of the struct that is being used.
             | This allows the code to understand what struct is actually
             | used in the call.
        
             | coldpie wrote:
             | Sure, you can find a fair number with a grep of the Wine
             | source. Try "git grep if.*cbSize", there's a few obvious
             | ones and some less obvious, too.
             | 
             | One example: https://source.winehq.org/git/wine.git/blob/62
             | df608d3ed84aac...
        
         | idealmedtech wrote:
         | The issue, based on my read, is that lots of existing
         | applications, many of whom are proprietary and may not be
         | supported by their vendors, _depend_ on ABI stability, and a
         | break cause these applications to simply fail, for what appears
         | like no good reason (how many businesses care that their legacy
         | app broke because of supporting a niche CPU architecture?).
         | That's the big issue here, and the historical reason for
         | caution.
        
           | rwmj wrote:
           | This specific issue affected s390x which is pretty niche but
           | the general principle of not breaking ABI affects all
           | architectures.
        
         | matheusmoreira wrote:
         | > My opinion: When your deliverable is made up of source
         | code(like these open source projects are), the ABI and ABI
         | stability is not that important, it is the API(the source
         | interface) that is critical.
         | 
         | It is absolutely important. ABI instability is the number one
         | reason why packaging software on Linux is difficult. Breaking
         | ABI causes pain even for maintainers of free software projects
         | and packages. The Linux kernel is the only project in a Linux
         | distribution that seems to take it seriously.
        
         | marcodiego wrote:
         | > My opinion: When your deliverable is made up of source
         | code(like these open source projects are), the ABI and ABI
         | stability is not that important, it is the API(the source
         | interface) that is critical.
         | 
         | I think ISV's disagree.
        
       | AnssiH wrote:
       | FWIW, this ABI break was reverted 2 weeks after the LWN article
       | was released:                 commit
       | 2f438e20ab591641760e97458d5d1569942eced5       Author: Stefan
       | Liebler <___.ibm.com>       Date:   Thu Jul 31 20:04:54 2014
       | +0200                  S/390: Revert the jmp_buf/ucontext_t ABI
       | change.
        
       | schemescape wrote:
       | > Debian's developers ... considered rebuilding all of Perl and
       | then, perhaps, all (500 or so) packages depending on the PNG
       | library
       | 
       | libpng using setjmp.h for error handling was always my least
       | favorite part of libpng, especially since the libpng
       | documentation indicates it was just done for convenience (tell
       | that to the authors of the OP!):
       | 
       | > The motivation behind using setjmp() and longjmp() is the C++
       | throw and catch exception handling methods. This makes the code
       | much easier to write, as there is no need to check every return
       | code of every function call.
        
         | kragen wrote:
         | It's not just convenience for the authors of libpng; it's also
         | convenience for libpng's users, because it means they can
         | handle errors at a single location instead of checking the
         | return code of every function call.
         | 
         | It is true that changing the size of jmp_buf breaks binary
         | compatibility, but that's hardly a unique fatal flaw in
         | setjmp/longjmp; it's also true of fd_set, struct timeval,
         | struct tm, struct sockaddr, struct stat, and all the other
         | exposed memory-layout interfaces mentioned in the article:
         | __pthread_unwind_buf_t, PerlInterpreter, png_struct_def, etc.
         | Indeed, you'd think it would be much less of a problem for
         | jmp_buf, because normally the things stored in a jmp_buf are
         | precisely the callee-saved registers in your ABI; changing that
         | involves comprehensively breaking your ABI anyway.
         | 
         | There is a big advantage in C to exposing the memory layout of
         | a struct in this way: it doesn't need to be heap-allocated, so
         | it doesn't introduce a dependency on heap allocation (which
         | wasn't part of the standard library at all when setjmp was
         | defined, and is still forbidden in many contexts), and you can
         | statically bound your program's memory use, so you can be sure
         | it won't fail. Heap allocation can always fail, so you can only
         | ever use it in programs where failure is an option. You don't
         | want your antilock braking system to raise an exception and
         | reboot because its heap has become fragmented.
        
       | mrlonglong wrote:
       | I felt the pain transitioning from libc5 to glibc 2 a long long
       | time ago. Glad I never had to do that again.
        
       | jancsika wrote:
       | > libc.so.6.1
       | 
       | My brain hurts-- if libc.so.6.1 is a nightmare, then what is the
       | utility of having the libc.so.6 soname numbering at all?
       | 
       | I'll put it in a more effective wrong-thing-on-the-internet style
       | for receiving responses: There's no point in using NixOS. Just
       | use lib.so.versionNumber. (Sorry) :)
        
         | jcranmer wrote:
         | The problem with sonames for libc is that libc is too
         | foundational of a library to really have multiple incompatible
         | versions running around [1]. There is no way for an application
         | to have multiple versions of libc in its memory space. When you
         | live in a modern world where applications will pull in multiple
         | third-party packages that all depend on core packages, those
         | core packages better be extremely good about backwards
         | compatibility or there is immense pain on adopting new versions
         | (see also Python 2->3 transition).
         | 
         | [1] Or, perhaps more accurately, an incompatible version of
         | libc requires the creation of an entire distinct target triple.
         | You can have multiple incompatible versions if you've got a
         | multiarch setup of some kind (like 32-bit and 64-bit x86 code
         | on the same system), but an application can't simultaneously
         | use both in the same process in such systems still.
        
           | jancsika wrote:
           | In keeping with my confidently-wrong-on-the-internet theme:
           | 
           | Nobody is trying to run multiple incompatible versions of
           | glibc in a single application. The point is to run multiple
           | different versions of the same application on a single OS,
           | where each version uses a different version of glibc.
           | 
           | NixOS is wildly over-engineered for this purpose. Just use
           | lib.so.versionNumber, full stop.
           | 
           | And, scene. :)
        
             | jcelerier wrote:
             | > Nobody is trying to run multiple incompatible versions of
             | glibc in a single application
             | 
             | on windows multiple incompatible versions of the C runtime
             | in the same application are mostly fine as far as I know
             | and fairly useful, one can load a DLL built in 2003 in a
             | program built in 2022.
        
               | dfox wrote:
               | For win32 style code multiple different versions of
               | msvcrt in same process is mostly fine except when some of
               | the dlls involved in that is built without sxs manifest
               | (it will probably still work, but produces modal
               | ShowMessage saying that you should no do that).
               | 
               | The issue is then when porting stuff from unix world
               | which expects that you can pass things like FILE* between
               | different modules or that you can return malloc()'d thing
               | that can be free()'d by the caller.
               | 
               | Edit: another thing is that on windows there are CRT
               | implementations that do not implement C++ exceptions and
               | setjmp/longjmp in terms of SEH, which creates additional
               | dimension of hard to debug ABI incompatibilities (and
               | sidestepping this issue involves being sure that SEH
               | unwind will not happen through SEH unaware code, which is
               | essentially impossible to do unless you handle that as an
               | fatal error).
        
               | jancsika wrote:
               | What's the use case, loading plugins?
        
               | jcelerier wrote:
               | yes, it's pretty common in audio and I guess in other
               | creative fields.
        
               | jancsika wrote:
               | It's funny you mention audio. For example, going back to
               | the early 2000s I can't think of a single case of a user
               | reporting such a need on the Pure Data mailing list. And
               | there are a few thousand plugins for it.
               | 
               | I guess Csound would be the grandaddy to check. But I
               | have a feeling this would essentially be limited to
               | running old proprietary plugins for which no source is
               | available?
        
               | detaro wrote:
               | With all respect for it, Pure Data (nor csound) is not
               | exactly representative of audio software with plugins
               | overall.
        
               | jcelerier wrote:
               | > But I have a feeling this would essentially be limited
               | to running old proprietary plugins for which no source is
               | available?
               | 
               | Yes, I have some songs from circa 2007-2009 which depend
               | on windows 32-bit versions of freeware plug-ins whose
               | developers have been MIA for 15 years now. Well, now I
               | know better and try to only use software that I can
               | recompile or write for my music (including pd :-)). But
               | then I also spent 10 years not making much music because
               | of that.
               | 
               | I also work with artists who keep old 10.6 Macs around in
               | case they'd have to perform one of their past songs.
        
               | Asooka wrote:
               | Most often, yes. As a consequence, you'll often see DLLs
               | with a "createFoo" and a "deleteFoo" functions for
               | creating and deleting Foo objects. The reason is that you
               | can't free a pointer allocated from one DLL in a
               | different DLL, since they can use different runtimes and
               | thus different heaps. This is one of the sources of
               | really fun bugs when writing for Windows :) .
        
             | assbuttbuttass wrote:
             | If some library e.g. libpng depends on a particular version
             | of glibc, and your application depends on libpng and glibc,
             | you better hope they want the same version
        
               | jancsika wrote:
               | I'm still not getting it. Doesn't libpng use the
               | soname.someVersion, too?
        
               | assbuttbuttass wrote:
               | Well suppose to libpng depends on glibc version 6.0, but
               | when you try to link your app it will find the latest
               | glibc 6.1. Only one of these versions is going to get
               | loaded at runtime. The compiler's not going to check
               | compatibility between all the different libraries.
        
               | jancsika wrote:
               | If the highest number always gets chosen then what's the
               | point of libname.so.versionNumber in the first place?
        
               | rlpb wrote:
               | The point is for runtime. At build time generally the
               | latest available version is used (via a symlink) and then
               | that version gets locked in to the resulting binary.
        
           | Arnavion wrote:
           | It's partly a Linux-specific problem. Windows programs are
           | fine with having multiple copies of MSVCRT loaded into the
           | same process. It's often unavoidable; any process that loads
           | plugin libraries will have its own CRT plus other CRTs pulled
           | in by the plugins in its address space.
           | 
           | Just open up some process in procexp and see how many
           | msvcrt##.dll you find loaded.
           | 
           | It means every library has to be careful about providing both
           | malloc and free calls in their API. Eg if a library has a
           | `Foo *create_foo(void)` it _has_ to provide a `void
           | delete_foo(Foo*)`, instead of expecting the user to call
           | `free(the_foo)` because that `free` might be from the wrong
           | CRT. With Linux libraries I often find that you have a
           | `create_foo` but no `delete_foo`, and you 're expected to
           | just `free` it.
           | 
           | I assume Linux libraries never took this much care
           | historically because software was generally open-source, so
           | everyone was getting their software from distro repos that
           | rebuilt the world to link to a single libc.
        
             | yjftsjthsd-h wrote:
             | You should know that HN lets you prefix stars with a
             | backslash to escape them, so your text ends up saying, ex.
             | foo*bar and bar*baz rather than foo _bar and bar_ baz
        
               | Arnavion wrote:
               | I know. I fixed it shortly after I posted the comment.
        
             | jcranmer wrote:
             | glibc serves multiple purposes, and these purposes are
             | broken up into separate libraries on Windows. One of those
             | purposes is to act as the C standard library, which is what
             | is provided by msvcrt.dll. But another purpose is to act as
             | the library of system services, which is provided by
             | kernel32.dll instead.
             | 
             | The actual functionality that glibc broke in this specific
             | ABI break is really the sort of functionality that is
             | provided by kernel32.dll (i.e.., that's where the SEH
             | functions live) on Windows. And you can't provide multiple
             | copies of kernel32.dll on Windows, just like you can't have
             | multiple copies of glibc.
        
               | Arnavion wrote:
               | setjmp, longjmp, jmp_buf are defined in the CRT, not in
               | kernel32. Of course their implementations will eventually
               | depend on kernel32, but that's not relevant to the
               | callers of setjmp/longjmp.
        
               | jcranmer wrote:
               | If glibc were set up to have a kernel32/CRT divide like
               | Windows does, I suspect the ABI breakage would have
               | impacted the kernel32 side of the divide, given that the
               | rationale was for better support of hardware stuff in
               | s390. Of course, in Windows, setjmp/longjmp/jmp_buf being
               | the C way of doing things, the implementations live in
               | CRT-land, but the underlying implementation (being
               | essentially SEH/unwind) actually lives in kernel32.
        
               | Arnavion wrote:
               | Sure? My point is, even if it impacted the CRT side of
               | the divide, it would likely have not been a problem,
               | because it would be a new version of the CRT with the
               | breaking API and would not have affected existing code
               | linking to the old CRT. And because libraries are
               | generally more hygienic about not leaking CRT details
               | across their own API boundaries, it would not have been a
               | problem for old and new CRTs to be mixed in the same
               | process.
        
         | [deleted]
        
         | [deleted]
        
         | rwmj wrote:
         | I remember using Linux with libc.so.4 & .5 (which were not
         | based on glibc code). libc 4 used a pre-ELF scheme called
         | a.out, and libc 5 used ELF. When Linux distros started to adopt
         | GNU libc 2.x, the soname was 6 (to be obviously higher than
         | previous versions) and they promised the new ELF symbol
         | versioning scheme which meant that the version number would
         | never have to change again. So here we are.
        
           | yjftsjthsd-h wrote:
           | > I remember using Linux with libc.so.4 & .5 (which were not
           | based on glibc code).
           | 
           | What libc implementation was that, then? I thought glibc was
           | the first to support Linux.
        
             | asveikau wrote:
             | This hasn't been a thing since the late 90s. Linux used to
             | use a fork of an earlier, pre-linux gnu libc. Glibc2 was a
             | big release where upstream gnu libc was what Linux distros
             | used for the first time.
             | 
             | Despite being version 2.0, they set the so version to 6
             | because the Linux specific fork had reached version 5. That
             | is why libc6 is synonymous with glibc2.
             | 
             | Here is what I found on Google trying to confirm these
             | memories: https://lwn.net/Articles/417848/
             | 
             | There were other examples from the gnu world where an
             | experimental fork ends up with rapid development and then
             | gets superseded by an upstream release. Egcs vs gcc is
             | another I remember from that time period.
        
             | rwmj wrote:
             | So apparently Linux libc (ie. soname 4 & 5) was a fork of
             | glibc 1.0: https://man.archlinux.org/man/glibc.7#Linux_libc
        
           | marcodiego wrote:
           | I've been using linux since late 90's and the only related
           | thing I remember was to have to install an "old" libc to be
           | able to run kylix. Do you have more info on this event? Seems
           | to have completely vanished from my memory.
        
             | rwmj wrote:
             | I guess you may not remember the a.out to ELF transition,
             | which was a "Big Bang" event. Before ELF, shared libraries
             | did not use position independent code and so load addresses
             | had to be globally coordinated (with extra space left for a
             | library to grow in size!). That was libc.so.4 (old libc,
             | a.out) -> libc.so.5 (old libc, ELF) and Linux 1.2. I don't
             | recall if there were libc < 4. Google says this happened in
             | around 1995. The libc.so.5 (old libc, ELF) to libc.so.6
             | (glibc 2.0) transition happened a little later in 1997.
        
               | marcodiego wrote:
               | I remember the a.out to elf transition because the first
               | distro I used had Doom in its repos. When I installed it
               | complained that a.out format wasn't supported anymore.
               | 
               | IIRC I tried kylix in 1999. It still requiring libc5 is
               | strange. Well... those were not best inprise/borland days
               | indeed.
        
       ___________________________________________________________________
       (page generated 2022-03-17 23:01 UTC)