[HN Gopher] Stdio(3) change: FILE is now opaque (OpenBSD)
___________________________________________________________________
Stdio(3) change: FILE is now opaque (OpenBSD)
Author : gslin
Score : 80 points
Date : 2025-07-20 18:18 UTC (4 hours ago)
(HTM) web link (undeadly.org)
(TXT) w3m dump (undeadly.org)
| abnercoimbre wrote:
| Can someone elaborate? I always treated FILE as opaque, but never
| imagined people could poke into it?
| pjmlp wrote:
| People use reflection for monkey patching and complain when
| using compiled languages less supportive of such approaches.
|
| So it wouldn't surprise me, that a few folks would do some
| tricks with FILE internals.
| recipe19 wrote:
| The standard doesn't specify any serviceable parts, and I don't
| think there are any internals of the struct defined in musl
| libc on Linux (glibc may be a different story). However, on
| OpenBSD, it did seem to have some user-visible bits:
|
| https://github.com/openbsd/src/commit/b7f6c2eb760a2da367dd51...
|
| If you expose it, someone will probably sooner or later use it,
| but probably not in any sane / portable code. On the face of
| it, it doesn't seem like a consequential change, but maybe
| they're mopping up after some vulnerability in that one weird
| package that did touch this.
| fweimer wrote:
| In gnulib, there is code that patches FILE internals for
| various platforms to modify behavior of <stdio.h> functions, or
| implement new functionality.
|
| https://cgit.git.savannah.gnu.org/cgit/gnulib.git/tree/lib/s...
|
| Yes, it's not a good idea to do this. There are more
| questionable pieces in gnulib, like closing stdin/stdout/stderr
| (because fflush and fsync is deemed too slow, and regular close
| reports some errors on NFS on some systems that would otherwise
| go unreported).
| collinfunk wrote:
| Yes, that part of Gnulib has caused some problems previously.
| It is mostly used to implement <stdio_ext.h> functions on
| non-glibc systems. However, it is also needed for some buggy
| implementations of ftello, fseeko, and fflush.
|
| P.S. Hi Florian :)
| quotemstr wrote:
| > Yes, it's not a good idea to do this. There are more
| questionable pieces in gnulib, like closing
| stdin/stdout/stderr (because fflush and fsync is deemed too
| slow, and regular close reports some errors on NFS on some
| systems that would otherwise go unreported).
|
| Hyrum's law strikes again. People cast dl_info and poke at
| internal bits all the time too.
|
| glibc and others should be using kernel-style compiler-driven
| struct layout randomization to fight it.
| jancsika wrote:
| > Hyrum's law strikes again.
|
| Is there a name for APIs that are drawn directly from some
| subset of observed behaviors?
|
| Like Crockford going, "Hey, there's a nice little data
| format buried in these JS objects. _Schloink_ "
| ksherlock wrote:
| *BSD stdio.h used to include macro versions of some stdio
| functions (feof, ferror, clearerr, fileno, getc, putc) so they
| would be inlined. /* * This has been
| tuned to generate reasonable code on the vax using pcc.
| */*
| pm215 wrote:
| The MH and nmh mail clients used to directly look into FILE
| internals. If you look for LINUX_STDIO in this old version of
| the relevant file you can see the kind of ugliness that
| resulted:
|
| https://cgit.git.savannah.gnu.org/cgit/nmh.git/tree/sbr/m_ge...
|
| It's basically searching an email file to find the contents of
| either a given header or the mail body. These days there is no
| need to go under the hood of libc for this (and this code got
| ripped out over a decade ago), but back when the mail client
| was running on elderly VAXen this ate up significant time.
| Sneaking in and reading directly from the internal stdio buffer
| lets you avoid copying all the data the way an fread would. The
| same function also used to have a bit of inline vax assembly
| for string searching...
|
| The only reason this "works" is that traditionally the FILE
| struct is declared in a public header so libc can have some of
| its own functions implemented as macros for speed, and that
| there was not (when this hack was originally put in in the
| 1980s) yet much divergence in libc implementations.
| loeg wrote:
| Historically some FILE designs exposed the structure somewhere
| so that some of the f* methods could be implemented as macros
| or inline functions (e.g., `fileno()`).
| bitwize wrote:
| Hyrum's Law applies: the API of any software component is the
| entire exposed surface, not just what you've documented. Hence,
| if you have FILE well-defined somewhere in a programmer-
| accessible header, somebody somewhere _can_ and _will_ poke at
| the internal bits in order to achieve some hack or
| optimization.
| krylon wrote:
| OTOH, yes.
|
| OTOH, when coding, I consider FILE to be effectively opaque
| in the sense that it probably is not portable, and that the
| implementers might change it at any time.
|
| I am reminded of this fine article by Raymond Chen, which
| covers a similar situation on Windows way back when: https://
| devblogs.microsoft.com/oldnewthing/20031015-00/?p=42...
| brokencode wrote:
| Yes, it would not be sane to depend on implementation
| details of something like this.
|
| But the sad reality is that many developers (myself
| included earlier in my career) will do insane things to fix
| a critical bug or performance problem when faced with a
| tight deadline.
| zahlman wrote:
| I always assumed that people could poke into it, but shuddered
| at the thought.
| asveikau wrote:
| I've seen old code do this over the years. When you consider
| for example that snprintf() didn't used to be standardized
| until the late 1990s. People would mock up a fake FILE* and use
| fprintf.
| p0w3n3d wrote:
| However, who should really rely on internals of FILE? Isn't this
| a bad practice?
| vitaut wrote:
| In general, it is a bad practice. However, it can be useful for
| some low-level libraries. For example,
| https://github.com/fmtlib/fmt provides a type-safe replacement
| for `printf` that can write directly to the FILE buffer
| providing comparable or better performance to native stdio.
| Retr0id wrote:
| Doesn't fwrite more or less write directly to the FILE
| buffer, if buffering is enabled?
|
| I'm curious to take a closer look at fmtlib/fmt, which APIs
| treat FILE as non-opaque?
|
| Edit: ah, found some of the magic, I think: https://github.co
| m/fmtlib/fmt/blob/35dcc58263d6b55419a5932bd...
|
| I'm curious how much speedup is gained from this.
| vitaut wrote:
| With fwrite that would be another level of buffering in
| addition to FILE's buffer. If you are interested in what
| {fmt} is doing, a good starting point is https://github.com
| /fmtlib/fmt/blob/35dcc58263d6b55419a5932bd.... It is also
| possible to bypass stdio completely and get even faster
| output (https://vitaut.net/posts/2020/optimal-file-buffer-
| size/) and while it is great for files, it may introduce
| interleaving problems with things like stdout.
| cryptonector wrote:
| In SunOS 4.x `FILE` was not opaque, and `int fileno(FILE *)`
| was a macro, not a funciton, and the field of the struct that
| held the fd number was a `char`. Yeah, that sucked for ages,
| especially since it bled into the Solaris 2.x 32-bit ABI.
| bodyfour wrote:
| Indeed, that was the way it originally worked in all UNIXes:
| https://github.com/dspinellis/unix-history-
| repo/blob/Researc...
|
| It was a then-important optimization to do the most common
| operations with macros since calling a function for every
| getc()/putc() would have slowed I/O down too much.
|
| That's why there is also fgetc()/fputc() -- they're the same
| as getc()/putc() but they're always defined as functions so
| calling them generated less code at the callsite at the
| expense of always requiring a function call. A classic speed-
| vs-space tradeoff.
|
| But, yeah, it was a mistake that it originally used a "char"
| to store the file descriptor. Back then it was typical to
| limit processes to 20 open files (
| https://github.com/dspinellis/unix-history-
| repo/blob/Researc... ) so a "char" I'm sure felt like plenty.
| somat wrote:
| To misquote the street fighter movie: OpenBSD to Linux:
|
| "For you the day you changed your ABI was the most important day
| in your life, but for me? It was Tuesday"
|
| I enjoy the dichotomy between how bad the Linux project is at
| changing their ABI and how good OpenBSD is at the same task.
|
| Where for the most part Linux just decides to live with the bad
| ABI forever. and if they do decide it actually needs to be
| changed it is a multi year drama with much crying and missteps.
|
| I mean sure, linux has additional considerations that make
| breaking the ABI very scary for them. the big one is the corpus
| of closed source software, but being a orders of magnitude bigger
| project and their overall looser integration does not help any.
| viraptor wrote:
| This has nothing to do with Linux-the-project. An equivalent
| change would be in glibc / musl / ...
| ioasuncvinvaer wrote:
| I think the difference is just the amount of people using the
| technology.
| loeg wrote:
| I think FreeBSD tried to opaque FILE[1], but it was reverted[2]
| and still non-opaque in main[3].
|
| [1]: https://github.com/freebsd/freebsd-
| src/commit/c17bf9a9a5a3b5...
|
| [2]: https://github.com/freebsd/freebsd-
| src/commit/19e03ca8038019...
|
| [3]: https://github.com/freebsd/freebsd-
| src/blob/main/include/std...
| asveikau wrote:
| OpenBSD tends to commit to breaking changes much more
| aggressively than others. Something tells me they're not
| reverting.
| loeg wrote:
| I think FreeBSD is also more concerned with performance
| regression than OpenBSD is.
| notepad0x90 wrote:
| I don't know if I agree, but this is one shining example of what
| makes *bsd's great, not being afraid of change. Linux should take
| note. So much of Windows' headaches stem from not wanting to
| break things, and needing to support old client code.
| justincormack wrote:
| There isn't really much of "Linux" here - this code is in libc,
| so glibc, but that was built from portability, it isn't very
| Linux specific. Linux doesn't have an all encpmpassing
| community for userspace.
| cperciva wrote:
| In addition to "some code frobs internals", non-opaque FILE also
| allows for compatibility with code which puts FILE into a
| structure, since an opaque FILE doesn't have a size.
| quotemstr wrote:
| CHERI would defend against access to internal data structures
| without having to bounce between address spaces, FWIW.
| mcculley wrote:
| Please elaborate.
| JdeBP wrote:
| If you've ever done this to a C library, the first thing that
| you'll look at when someone else does it is not the FILE type,
| but how stdin, stdout, and stderr have changed.
|
| The big breaking change is usually the historical implementation
| of the standard streams as addresses of elements of an array
| rather than as named pointers. (Plauger's example implementation
| had them as elements 0, 1, and 2 of a _Files[] array, for
| example.) It's possible to retain binary compatibility with
| unrecompiled code that uses the old
| getc/putc/feof/ferror/fclearerr/&c. macros by preserving
| structure layouts, but changing stdin, stdout, and stderr can
| make things not link.
|
| And indeed that has happened here.
___________________________________________________________________
(page generated 2025-07-20 23:00 UTC)