[HN Gopher] The glibc s390 ABI break (2014)
___________________________________________________________________
The glibc s390 ABI break (2014)
Author : rdpintqogeogsaa
Score : 64 points
Date : 2022-03-17 15:27 UTC (7 hours ago)
(HTM) web link (lwn.net)
(TXT) w3m dump (lwn.net)
| somat wrote:
| I always feel the Linux people learn the wrong lesson from these
| events.
|
| The difference between the Linux and the OpenBSD mentality in one
| example.
|
| The ABI must be changed.
|
| Linux: That was really hard, we are never going to do that again.
|
| OpenBSD: That was really hard, we better get good at it.
|
| My opinion: When your deliverable is made up of source code(like
| these _open source_ projects are), the ABI and ABI stability is
| not that important, it is the API(the source interface) that is
| critical.
| ggreg84 wrote:
| If you can re-compile all your source code, then ABI stability
| is not that important.
|
| In the real world,
|
| - people use operating systems to get work done,
|
| - such work is often done with proprietary software that people
| buy, and that lot of people get paid to develop
|
| - breaking such software is not acceptable, because it means
| that suddenly dozens of thousand of people can't work
|
| - telling all those people to "stop working" until their
| software is fixed is not acceptable either.
|
| People use Linux for work in the real world, and that's why
| Linux tries very hard not to break ABIs.
|
| Microsoft has had a stable ABI for 30 years, and people still
| run software compiled 30 years ago on the latest Windows
| version today.
|
| What this tells you about OpenBSD users, is another story.
| moonbug wrote:
| no one uses Debian on s/390 for anything important anyway
| ece wrote:
| The source-based Linux distros (Gentoo in my experience) do
| try to get better at keeping things working and moving
| forward when ABI breaks occur. Recompiling downstream
| dependencies (FFIs, libraries/packages), keeping the old
| library/packages around until a new ones are available, and
| keeping track of breaks/automating fixes with versioning all
| happen when for example a new boost or qt version comes out.
| All of this automated by the package manager. Proprietary
| packages that require fixed ABIs from certain libraries have
| *-compat packages in the gentoo repos, and other distros have
| this too.
|
| Ultimately, it is about the APIs when you're compiling from
| source, a package that isn't using a new API is going to need
| the old library, and both might need to be compiled with and
| linked against the same toolchain. I think Linux and BSDs are
| closer than one might think here. Packaging and upstreams
| have both gotten better here over time I think, at least over
| the last couple of decades I've used Linux. I've only played
| around with the BSDs in VMs.
| alerighi wrote:
| And even if you can recompile everything, it's still a pain
| to do so, and it's better to avoid doing so. Also recompiling
| the software may not be as simple, for example can require
| modifications of its source code to adapt it to newer
| versions of the compiler or other libraries that changed, and
| of course you can't build it on an older system since the ABI
| changed.
|
| This is also the reason why containers are so popular these
| days, ship a software with all its dependencies to avoid
| having to recompile stuff each time you upgrade or change the
| operating system.
| leeter wrote:
| This forgets MS:
|
| MS: "Let's design it so either it never breaks or if we can't
| avoid that we'll just add a FooEx or Foo2 method call"
|
| This is why so many Win32 methods take structs that have a size
| parameter that must be filled in. That acts as a version
| parameter. That's not to say they haven't had ABI fun
| moments... _glares at COM_.
| secondcoming wrote:
| One flaw with that is that the struct size won't change if
| you do something like reordering the members (not that anyone
| would ever do that).
| leeter wrote:
| Which they explicitly prohibit, in fact if you look at how
| they design the APIs they'll deliberately add reserved
| members to ensure they hold the space so that it's much
| harder to break.
| asveikau wrote:
| Yes, it takes some discipline to keep that working. You
| need to internalize which changes cause ABI breaks, look
| for them in code reviews, evangelize that culture to new
| hires that aren't familiar with the rules (which becomes
| harder over time as people coming up now tend to spend less
| time learning C), etc.
| Someone wrote:
| Example: https://gankra.github.io/blah/c-isnt-a-
| language/#case-study-...
| leeter wrote:
| NGL... that inspired my comment XD
| stouset wrote:
| While Microsoft's backwards-compatibility guarantees have
| historically been a strong competitive advantage for them...
| every time I get a glance into Windows APIs I'm blown away by
| how much historical garbage they seem to be stuck with.
|
| I get this feeling every time I read a post on The Old New
| Thing (Raymond Chen). He'll patiently explain why some weird
| wart is the way it is, or how misusing some API is bad, and
| it's always extremely interesting. But at the same time I'm
| sitting there just thanking my lucky stars I've managed to
| avoid ever having to write software for that environment,
| because it's always absolutely bonkers.
| _3u10 wrote:
| It's so much nicer than Linux.
|
| Whatever version of windows you make your software work on
| it just works on all future versions.
|
| If you don't like the old APIs don't use them.
|
| Being able to put software on a website and have people
| download and use it immediately is kind of amazing compared
| to Linux.
|
| COM is very annoying tho. I'll give you that.
| leeter wrote:
| Honestly? Having written software for both Win32 is
| actually pretty nice to work in. Better in many ways than
| the alternatives. Most of the things Raymond goes on about
| are weird quirks you can ignore for the most part. Whereas
| there are some very serious hazards running around POSIX if
| you're not careful that can bite hard. _Glares at signals_
|
| I much prefer the EVENT based system Win32 goes with. You
| can just suspend on the event and not worry about it. The
| kernel will wake you up when it matters. Also MS seems to
| have very serious API design rules that help keep usage
| patterns consistent, which is important in making sure
| things aren't accidentally misused.
| hnlmorg wrote:
| While I much prefer Linux as a platform, it's fair to say
| Linux has more than it's fair share of historical baggage
| as well. The entire design of the console for starter:
| TTYs, ANSI escape sequences used for encoding data in band,
| typeless byte streams, job control being a weird cross
| responsibility between the kernel, shell and application,
| etc.
|
| ...And that's without addressing any other parts of the
| system beyond the console.
|
| I'd still take Linux over Windows every day if the week
| though.
| Asooka wrote:
| Also "make everyone carry their dependencies down to the
| standard C library". The churn that the userland on Linux
| distros has is very unfortunate, but it's probably not going
| to change any time soon. It's not _that_ hard to ship
| software with minimal dependencies if you know what you 're
| doing, but it sure is easier for Windows.
| ithkuil wrote:
| AmigaOS2.0 had an interesting API: tag arrays: a variable
| length key/value pair of parameters
| orra wrote:
| > This is why so many Win32 methods take structs that have a
| size parameter that must be filled in. That acts as a version
| parameter
|
| That said, I couldn't name you a Win32 that actually has more
| than one valid value for said cbSize member. Are there any?
| MaulingMonkey wrote:
| um/winuser.h has some, although they haven't changed in
| awhile. WNDCLASSEX gained hIconSm after Windows 3.x.
| MENUITEMINFO gained hbmpItem in Windows 2000.
| NONCLIENTMETRICS gained iPaddedBorderWidth in Windows
| Vista.
|
| https://devblogs.microsoft.com/oldnewthing/20031212-00/?p=4
| 1...
|
| um/webauthn.h has a bunch of structures that explicitly
| document fields added in various dwVersion s. However, no
| individual structure has changed since being published in
| the Windows SDK last I checked. e.g.: //
| // The following fields have been added in
| WEBAUTHN_AUTHENTICATOR_MAKE_CREDENTIAL_OPTIONS_VERSION_2
| // // Cancellation Id - Optional - See
| WebAuthNGetCancellationId GUID *pCancellationId;
|
| um/wincrypt.h also has a few structures chopped up with
| #ifdef ..._HAS_EXTRA_FIELDS s that are presumably
| differentaited between via cbSize.
|
| um/ShlObj_core.h has COMPONENT, which extends and is
| differentiated from IE4COMPONENT - presumably by dwSize.
|
| It also wouldn't suprise me if cbSize is also used to
| differentiate between 32-bit and 64-bit versions of the
| same structure as well, with WoW64 blindly forwarding said
| structures.
| pitterpatter wrote:
| It's hard to notice sometimes because the a struct FOO is
| often a typedef to whatever the latest version is in the
| SDK you're using (FOO1, FOO2, etc).
| jborean93 wrote:
| A good example of a really important one is CreateProcess
| for lpStartupInfo. The value is either a STARTINFO or
| STARTUPINFOEX and the first member of both structs is cb
| which is set to the size of the struct that is being used.
| This allows the code to understand what struct is actually
| used in the call.
| coldpie wrote:
| Sure, you can find a fair number with a grep of the Wine
| source. Try "git grep if.*cbSize", there's a few obvious
| ones and some less obvious, too.
|
| One example: https://source.winehq.org/git/wine.git/blob/62
| df608d3ed84aac...
| idealmedtech wrote:
| The issue, based on my read, is that lots of existing
| applications, many of whom are proprietary and may not be
| supported by their vendors, _depend_ on ABI stability, and a
| break cause these applications to simply fail, for what appears
| like no good reason (how many businesses care that their legacy
| app broke because of supporting a niche CPU architecture?).
| That's the big issue here, and the historical reason for
| caution.
| rwmj wrote:
| This specific issue affected s390x which is pretty niche but
| the general principle of not breaking ABI affects all
| architectures.
| matheusmoreira wrote:
| > My opinion: When your deliverable is made up of source
| code(like these open source projects are), the ABI and ABI
| stability is not that important, it is the API(the source
| interface) that is critical.
|
| It is absolutely important. ABI instability is the number one
| reason why packaging software on Linux is difficult. Breaking
| ABI causes pain even for maintainers of free software projects
| and packages. The Linux kernel is the only project in a Linux
| distribution that seems to take it seriously.
| marcodiego wrote:
| > My opinion: When your deliverable is made up of source
| code(like these open source projects are), the ABI and ABI
| stability is not that important, it is the API(the source
| interface) that is critical.
|
| I think ISV's disagree.
| AnssiH wrote:
| FWIW, this ABI break was reverted 2 weeks after the LWN article
| was released: commit
| 2f438e20ab591641760e97458d5d1569942eced5 Author: Stefan
| Liebler <___.ibm.com> Date: Thu Jul 31 20:04:54 2014
| +0200 S/390: Revert the jmp_buf/ucontext_t ABI
| change.
| schemescape wrote:
| > Debian's developers ... considered rebuilding all of Perl and
| then, perhaps, all (500 or so) packages depending on the PNG
| library
|
| libpng using setjmp.h for error handling was always my least
| favorite part of libpng, especially since the libpng
| documentation indicates it was just done for convenience (tell
| that to the authors of the OP!):
|
| > The motivation behind using setjmp() and longjmp() is the C++
| throw and catch exception handling methods. This makes the code
| much easier to write, as there is no need to check every return
| code of every function call.
| kragen wrote:
| It's not just convenience for the authors of libpng; it's also
| convenience for libpng's users, because it means they can
| handle errors at a single location instead of checking the
| return code of every function call.
|
| It is true that changing the size of jmp_buf breaks binary
| compatibility, but that's hardly a unique fatal flaw in
| setjmp/longjmp; it's also true of fd_set, struct timeval,
| struct tm, struct sockaddr, struct stat, and all the other
| exposed memory-layout interfaces mentioned in the article:
| __pthread_unwind_buf_t, PerlInterpreter, png_struct_def, etc.
| Indeed, you'd think it would be much less of a problem for
| jmp_buf, because normally the things stored in a jmp_buf are
| precisely the callee-saved registers in your ABI; changing that
| involves comprehensively breaking your ABI anyway.
|
| There is a big advantage in C to exposing the memory layout of
| a struct in this way: it doesn't need to be heap-allocated, so
| it doesn't introduce a dependency on heap allocation (which
| wasn't part of the standard library at all when setjmp was
| defined, and is still forbidden in many contexts), and you can
| statically bound your program's memory use, so you can be sure
| it won't fail. Heap allocation can always fail, so you can only
| ever use it in programs where failure is an option. You don't
| want your antilock braking system to raise an exception and
| reboot because its heap has become fragmented.
| mrlonglong wrote:
| I felt the pain transitioning from libc5 to glibc 2 a long long
| time ago. Glad I never had to do that again.
| jancsika wrote:
| > libc.so.6.1
|
| My brain hurts-- if libc.so.6.1 is a nightmare, then what is the
| utility of having the libc.so.6 soname numbering at all?
|
| I'll put it in a more effective wrong-thing-on-the-internet style
| for receiving responses: There's no point in using NixOS. Just
| use lib.so.versionNumber. (Sorry) :)
| jcranmer wrote:
| The problem with sonames for libc is that libc is too
| foundational of a library to really have multiple incompatible
| versions running around [1]. There is no way for an application
| to have multiple versions of libc in its memory space. When you
| live in a modern world where applications will pull in multiple
| third-party packages that all depend on core packages, those
| core packages better be extremely good about backwards
| compatibility or there is immense pain on adopting new versions
| (see also Python 2->3 transition).
|
| [1] Or, perhaps more accurately, an incompatible version of
| libc requires the creation of an entire distinct target triple.
| You can have multiple incompatible versions if you've got a
| multiarch setup of some kind (like 32-bit and 64-bit x86 code
| on the same system), but an application can't simultaneously
| use both in the same process in such systems still.
| jancsika wrote:
| In keeping with my confidently-wrong-on-the-internet theme:
|
| Nobody is trying to run multiple incompatible versions of
| glibc in a single application. The point is to run multiple
| different versions of the same application on a single OS,
| where each version uses a different version of glibc.
|
| NixOS is wildly over-engineered for this purpose. Just use
| lib.so.versionNumber, full stop.
|
| And, scene. :)
| jcelerier wrote:
| > Nobody is trying to run multiple incompatible versions of
| glibc in a single application
|
| on windows multiple incompatible versions of the C runtime
| in the same application are mostly fine as far as I know
| and fairly useful, one can load a DLL built in 2003 in a
| program built in 2022.
| dfox wrote:
| For win32 style code multiple different versions of
| msvcrt in same process is mostly fine except when some of
| the dlls involved in that is built without sxs manifest
| (it will probably still work, but produces modal
| ShowMessage saying that you should no do that).
|
| The issue is then when porting stuff from unix world
| which expects that you can pass things like FILE* between
| different modules or that you can return malloc()'d thing
| that can be free()'d by the caller.
|
| Edit: another thing is that on windows there are CRT
| implementations that do not implement C++ exceptions and
| setjmp/longjmp in terms of SEH, which creates additional
| dimension of hard to debug ABI incompatibilities (and
| sidestepping this issue involves being sure that SEH
| unwind will not happen through SEH unaware code, which is
| essentially impossible to do unless you handle that as an
| fatal error).
| jancsika wrote:
| What's the use case, loading plugins?
| jcelerier wrote:
| yes, it's pretty common in audio and I guess in other
| creative fields.
| jancsika wrote:
| It's funny you mention audio. For example, going back to
| the early 2000s I can't think of a single case of a user
| reporting such a need on the Pure Data mailing list. And
| there are a few thousand plugins for it.
|
| I guess Csound would be the grandaddy to check. But I
| have a feeling this would essentially be limited to
| running old proprietary plugins for which no source is
| available?
| detaro wrote:
| With all respect for it, Pure Data (nor csound) is not
| exactly representative of audio software with plugins
| overall.
| jcelerier wrote:
| > But I have a feeling this would essentially be limited
| to running old proprietary plugins for which no source is
| available?
|
| Yes, I have some songs from circa 2007-2009 which depend
| on windows 32-bit versions of freeware plug-ins whose
| developers have been MIA for 15 years now. Well, now I
| know better and try to only use software that I can
| recompile or write for my music (including pd :-)). But
| then I also spent 10 years not making much music because
| of that.
|
| I also work with artists who keep old 10.6 Macs around in
| case they'd have to perform one of their past songs.
| Asooka wrote:
| Most often, yes. As a consequence, you'll often see DLLs
| with a "createFoo" and a "deleteFoo" functions for
| creating and deleting Foo objects. The reason is that you
| can't free a pointer allocated from one DLL in a
| different DLL, since they can use different runtimes and
| thus different heaps. This is one of the sources of
| really fun bugs when writing for Windows :) .
| assbuttbuttass wrote:
| If some library e.g. libpng depends on a particular version
| of glibc, and your application depends on libpng and glibc,
| you better hope they want the same version
| jancsika wrote:
| I'm still not getting it. Doesn't libpng use the
| soname.someVersion, too?
| assbuttbuttass wrote:
| Well suppose to libpng depends on glibc version 6.0, but
| when you try to link your app it will find the latest
| glibc 6.1. Only one of these versions is going to get
| loaded at runtime. The compiler's not going to check
| compatibility between all the different libraries.
| jancsika wrote:
| If the highest number always gets chosen then what's the
| point of libname.so.versionNumber in the first place?
| rlpb wrote:
| The point is for runtime. At build time generally the
| latest available version is used (via a symlink) and then
| that version gets locked in to the resulting binary.
| Arnavion wrote:
| It's partly a Linux-specific problem. Windows programs are
| fine with having multiple copies of MSVCRT loaded into the
| same process. It's often unavoidable; any process that loads
| plugin libraries will have its own CRT plus other CRTs pulled
| in by the plugins in its address space.
|
| Just open up some process in procexp and see how many
| msvcrt##.dll you find loaded.
|
| It means every library has to be careful about providing both
| malloc and free calls in their API. Eg if a library has a
| `Foo *create_foo(void)` it _has_ to provide a `void
| delete_foo(Foo*)`, instead of expecting the user to call
| `free(the_foo)` because that `free` might be from the wrong
| CRT. With Linux libraries I often find that you have a
| `create_foo` but no `delete_foo`, and you 're expected to
| just `free` it.
|
| I assume Linux libraries never took this much care
| historically because software was generally open-source, so
| everyone was getting their software from distro repos that
| rebuilt the world to link to a single libc.
| yjftsjthsd-h wrote:
| You should know that HN lets you prefix stars with a
| backslash to escape them, so your text ends up saying, ex.
| foo*bar and bar*baz rather than foo _bar and bar_ baz
| Arnavion wrote:
| I know. I fixed it shortly after I posted the comment.
| jcranmer wrote:
| glibc serves multiple purposes, and these purposes are
| broken up into separate libraries on Windows. One of those
| purposes is to act as the C standard library, which is what
| is provided by msvcrt.dll. But another purpose is to act as
| the library of system services, which is provided by
| kernel32.dll instead.
|
| The actual functionality that glibc broke in this specific
| ABI break is really the sort of functionality that is
| provided by kernel32.dll (i.e.., that's where the SEH
| functions live) on Windows. And you can't provide multiple
| copies of kernel32.dll on Windows, just like you can't have
| multiple copies of glibc.
| Arnavion wrote:
| setjmp, longjmp, jmp_buf are defined in the CRT, not in
| kernel32. Of course their implementations will eventually
| depend on kernel32, but that's not relevant to the
| callers of setjmp/longjmp.
| jcranmer wrote:
| If glibc were set up to have a kernel32/CRT divide like
| Windows does, I suspect the ABI breakage would have
| impacted the kernel32 side of the divide, given that the
| rationale was for better support of hardware stuff in
| s390. Of course, in Windows, setjmp/longjmp/jmp_buf being
| the C way of doing things, the implementations live in
| CRT-land, but the underlying implementation (being
| essentially SEH/unwind) actually lives in kernel32.
| Arnavion wrote:
| Sure? My point is, even if it impacted the CRT side of
| the divide, it would likely have not been a problem,
| because it would be a new version of the CRT with the
| breaking API and would not have affected existing code
| linking to the old CRT. And because libraries are
| generally more hygienic about not leaking CRT details
| across their own API boundaries, it would not have been a
| problem for old and new CRTs to be mixed in the same
| process.
| [deleted]
| [deleted]
| rwmj wrote:
| I remember using Linux with libc.so.4 & .5 (which were not
| based on glibc code). libc 4 used a pre-ELF scheme called
| a.out, and libc 5 used ELF. When Linux distros started to adopt
| GNU libc 2.x, the soname was 6 (to be obviously higher than
| previous versions) and they promised the new ELF symbol
| versioning scheme which meant that the version number would
| never have to change again. So here we are.
| yjftsjthsd-h wrote:
| > I remember using Linux with libc.so.4 & .5 (which were not
| based on glibc code).
|
| What libc implementation was that, then? I thought glibc was
| the first to support Linux.
| asveikau wrote:
| This hasn't been a thing since the late 90s. Linux used to
| use a fork of an earlier, pre-linux gnu libc. Glibc2 was a
| big release where upstream gnu libc was what Linux distros
| used for the first time.
|
| Despite being version 2.0, they set the so version to 6
| because the Linux specific fork had reached version 5. That
| is why libc6 is synonymous with glibc2.
|
| Here is what I found on Google trying to confirm these
| memories: https://lwn.net/Articles/417848/
|
| There were other examples from the gnu world where an
| experimental fork ends up with rapid development and then
| gets superseded by an upstream release. Egcs vs gcc is
| another I remember from that time period.
| rwmj wrote:
| So apparently Linux libc (ie. soname 4 & 5) was a fork of
| glibc 1.0: https://man.archlinux.org/man/glibc.7#Linux_libc
| marcodiego wrote:
| I've been using linux since late 90's and the only related
| thing I remember was to have to install an "old" libc to be
| able to run kylix. Do you have more info on this event? Seems
| to have completely vanished from my memory.
| rwmj wrote:
| I guess you may not remember the a.out to ELF transition,
| which was a "Big Bang" event. Before ELF, shared libraries
| did not use position independent code and so load addresses
| had to be globally coordinated (with extra space left for a
| library to grow in size!). That was libc.so.4 (old libc,
| a.out) -> libc.so.5 (old libc, ELF) and Linux 1.2. I don't
| recall if there were libc < 4. Google says this happened in
| around 1995. The libc.so.5 (old libc, ELF) to libc.so.6
| (glibc 2.0) transition happened a little later in 1997.
| marcodiego wrote:
| I remember the a.out to elf transition because the first
| distro I used had Doom in its repos. When I installed it
| complained that a.out format wasn't supported anymore.
|
| IIRC I tried kylix in 1999. It still requiring libc5 is
| strange. Well... those were not best inprise/borland days
| indeed.
___________________________________________________________________
(page generated 2022-03-17 23:01 UTC)