[HN Gopher] Stdio(3) change: FILE is now opaque (OpenBSD)
       ___________________________________________________________________
        
       Stdio(3) change: FILE is now opaque (OpenBSD)
        
       Author : gslin
       Score  : 80 points
       Date   : 2025-07-20 18:18 UTC (4 hours ago)
        
 (HTM) web link (undeadly.org)
 (TXT) w3m dump (undeadly.org)
        
       | abnercoimbre wrote:
       | Can someone elaborate? I always treated FILE as opaque, but never
       | imagined people could poke into it?
        
         | pjmlp wrote:
         | People use reflection for monkey patching and complain when
         | using compiled languages less supportive of such approaches.
         | 
         | So it wouldn't surprise me, that a few folks would do some
         | tricks with FILE internals.
        
         | recipe19 wrote:
         | The standard doesn't specify any serviceable parts, and I don't
         | think there are any internals of the struct defined in musl
         | libc on Linux (glibc may be a different story). However, on
         | OpenBSD, it did seem to have some user-visible bits:
         | 
         | https://github.com/openbsd/src/commit/b7f6c2eb760a2da367dd51...
         | 
         | If you expose it, someone will probably sooner or later use it,
         | but probably not in any sane / portable code. On the face of
         | it, it doesn't seem like a consequential change, but maybe
         | they're mopping up after some vulnerability in that one weird
         | package that did touch this.
        
         | fweimer wrote:
         | In gnulib, there is code that patches FILE internals for
         | various platforms to modify behavior of <stdio.h> functions, or
         | implement new functionality.
         | 
         | https://cgit.git.savannah.gnu.org/cgit/gnulib.git/tree/lib/s...
         | 
         | Yes, it's not a good idea to do this. There are more
         | questionable pieces in gnulib, like closing stdin/stdout/stderr
         | (because fflush and fsync is deemed too slow, and regular close
         | reports some errors on NFS on some systems that would otherwise
         | go unreported).
        
           | collinfunk wrote:
           | Yes, that part of Gnulib has caused some problems previously.
           | It is mostly used to implement <stdio_ext.h> functions on
           | non-glibc systems. However, it is also needed for some buggy
           | implementations of ftello, fseeko, and fflush.
           | 
           | P.S. Hi Florian :)
        
           | quotemstr wrote:
           | > Yes, it's not a good idea to do this. There are more
           | questionable pieces in gnulib, like closing
           | stdin/stdout/stderr (because fflush and fsync is deemed too
           | slow, and regular close reports some errors on NFS on some
           | systems that would otherwise go unreported).
           | 
           | Hyrum's law strikes again. People cast dl_info and poke at
           | internal bits all the time too.
           | 
           | glibc and others should be using kernel-style compiler-driven
           | struct layout randomization to fight it.
        
             | jancsika wrote:
             | > Hyrum's law strikes again.
             | 
             | Is there a name for APIs that are drawn directly from some
             | subset of observed behaviors?
             | 
             | Like Crockford going, "Hey, there's a nice little data
             | format buried in these JS objects. _Schloink_ "
        
         | ksherlock wrote:
         | *BSD stdio.h used to include macro versions of some stdio
         | functions (feof, ferror, clearerr, fileno, getc, putc) so they
         | would be inlined.                   /*          * This has been
         | tuned to generate reasonable code on the vax using pcc.
         | */*
        
         | pm215 wrote:
         | The MH and nmh mail clients used to directly look into FILE
         | internals. If you look for LINUX_STDIO in this old version of
         | the relevant file you can see the kind of ugliness that
         | resulted:
         | 
         | https://cgit.git.savannah.gnu.org/cgit/nmh.git/tree/sbr/m_ge...
         | 
         | It's basically searching an email file to find the contents of
         | either a given header or the mail body. These days there is no
         | need to go under the hood of libc for this (and this code got
         | ripped out over a decade ago), but back when the mail client
         | was running on elderly VAXen this ate up significant time.
         | Sneaking in and reading directly from the internal stdio buffer
         | lets you avoid copying all the data the way an fread would. The
         | same function also used to have a bit of inline vax assembly
         | for string searching...
         | 
         | The only reason this "works" is that traditionally the FILE
         | struct is declared in a public header so libc can have some of
         | its own functions implemented as macros for speed, and that
         | there was not (when this hack was originally put in in the
         | 1980s) yet much divergence in libc implementations.
        
         | loeg wrote:
         | Historically some FILE designs exposed the structure somewhere
         | so that some of the f* methods could be implemented as macros
         | or inline functions (e.g., `fileno()`).
        
         | bitwize wrote:
         | Hyrum's Law applies: the API of any software component is the
         | entire exposed surface, not just what you've documented. Hence,
         | if you have FILE well-defined somewhere in a programmer-
         | accessible header, somebody somewhere _can_ and _will_ poke at
         | the internal bits in order to achieve some hack or
         | optimization.
        
           | krylon wrote:
           | OTOH, yes.
           | 
           | OTOH, when coding, I consider FILE to be effectively opaque
           | in the sense that it probably is not portable, and that the
           | implementers might change it at any time.
           | 
           | I am reminded of this fine article by Raymond Chen, which
           | covers a similar situation on Windows way back when: https://
           | devblogs.microsoft.com/oldnewthing/20031015-00/?p=42...
        
             | brokencode wrote:
             | Yes, it would not be sane to depend on implementation
             | details of something like this.
             | 
             | But the sad reality is that many developers (myself
             | included earlier in my career) will do insane things to fix
             | a critical bug or performance problem when faced with a
             | tight deadline.
        
         | zahlman wrote:
         | I always assumed that people could poke into it, but shuddered
         | at the thought.
        
         | asveikau wrote:
         | I've seen old code do this over the years. When you consider
         | for example that snprintf() didn't used to be standardized
         | until the late 1990s. People would mock up a fake FILE* and use
         | fprintf.
        
       | p0w3n3d wrote:
       | However, who should really rely on internals of FILE? Isn't this
       | a bad practice?
        
         | vitaut wrote:
         | In general, it is a bad practice. However, it can be useful for
         | some low-level libraries. For example,
         | https://github.com/fmtlib/fmt provides a type-safe replacement
         | for `printf` that can write directly to the FILE buffer
         | providing comparable or better performance to native stdio.
        
           | Retr0id wrote:
           | Doesn't fwrite more or less write directly to the FILE
           | buffer, if buffering is enabled?
           | 
           | I'm curious to take a closer look at fmtlib/fmt, which APIs
           | treat FILE as non-opaque?
           | 
           | Edit: ah, found some of the magic, I think: https://github.co
           | m/fmtlib/fmt/blob/35dcc58263d6b55419a5932bd...
           | 
           | I'm curious how much speedup is gained from this.
        
             | vitaut wrote:
             | With fwrite that would be another level of buffering in
             | addition to FILE's buffer. If you are interested in what
             | {fmt} is doing, a good starting point is https://github.com
             | /fmtlib/fmt/blob/35dcc58263d6b55419a5932bd.... It is also
             | possible to bypass stdio completely and get even faster
             | output (https://vitaut.net/posts/2020/optimal-file-buffer-
             | size/) and while it is great for files, it may introduce
             | interleaving problems with things like stdout.
        
         | cryptonector wrote:
         | In SunOS 4.x `FILE` was not opaque, and `int fileno(FILE *)`
         | was a macro, not a funciton, and the field of the struct that
         | held the fd number was a `char`. Yeah, that sucked for ages,
         | especially since it bled into the Solaris 2.x 32-bit ABI.
        
           | bodyfour wrote:
           | Indeed, that was the way it originally worked in all UNIXes:
           | https://github.com/dspinellis/unix-history-
           | repo/blob/Researc...
           | 
           | It was a then-important optimization to do the most common
           | operations with macros since calling a function for every
           | getc()/putc() would have slowed I/O down too much.
           | 
           | That's why there is also fgetc()/fputc() -- they're the same
           | as getc()/putc() but they're always defined as functions so
           | calling them generated less code at the callsite at the
           | expense of always requiring a function call. A classic speed-
           | vs-space tradeoff.
           | 
           | But, yeah, it was a mistake that it originally used a "char"
           | to store the file descriptor. Back then it was typical to
           | limit processes to 20 open files (
           | https://github.com/dspinellis/unix-history-
           | repo/blob/Researc... ) so a "char" I'm sure felt like plenty.
        
       | somat wrote:
       | To misquote the street fighter movie: OpenBSD to Linux:
       | 
       | "For you the day you changed your ABI was the most important day
       | in your life, but for me? It was Tuesday"
       | 
       | I enjoy the dichotomy between how bad the Linux project is at
       | changing their ABI and how good OpenBSD is at the same task.
       | 
       | Where for the most part Linux just decides to live with the bad
       | ABI forever. and if they do decide it actually needs to be
       | changed it is a multi year drama with much crying and missteps.
       | 
       | I mean sure, linux has additional considerations that make
       | breaking the ABI very scary for them. the big one is the corpus
       | of closed source software, but being a orders of magnitude bigger
       | project and their overall looser integration does not help any.
        
         | viraptor wrote:
         | This has nothing to do with Linux-the-project. An equivalent
         | change would be in glibc / musl / ...
        
         | ioasuncvinvaer wrote:
         | I think the difference is just the amount of people using the
         | technology.
        
       | loeg wrote:
       | I think FreeBSD tried to opaque FILE[1], but it was reverted[2]
       | and still non-opaque in main[3].
       | 
       | [1]: https://github.com/freebsd/freebsd-
       | src/commit/c17bf9a9a5a3b5...
       | 
       | [2]: https://github.com/freebsd/freebsd-
       | src/commit/19e03ca8038019...
       | 
       | [3]: https://github.com/freebsd/freebsd-
       | src/blob/main/include/std...
        
         | asveikau wrote:
         | OpenBSD tends to commit to breaking changes much more
         | aggressively than others. Something tells me they're not
         | reverting.
        
           | loeg wrote:
           | I think FreeBSD is also more concerned with performance
           | regression than OpenBSD is.
        
       | notepad0x90 wrote:
       | I don't know if I agree, but this is one shining example of what
       | makes *bsd's great, not being afraid of change. Linux should take
       | note. So much of Windows' headaches stem from not wanting to
       | break things, and needing to support old client code.
        
         | justincormack wrote:
         | There isn't really much of "Linux" here - this code is in libc,
         | so glibc, but that was built from portability, it isn't very
         | Linux specific. Linux doesn't have an all encpmpassing
         | community for userspace.
        
       | cperciva wrote:
       | In addition to "some code frobs internals", non-opaque FILE also
       | allows for compatibility with code which puts FILE into a
       | structure, since an opaque FILE doesn't have a size.
        
       | quotemstr wrote:
       | CHERI would defend against access to internal data structures
       | without having to bounce between address spaces, FWIW.
        
         | mcculley wrote:
         | Please elaborate.
        
       | JdeBP wrote:
       | If you've ever done this to a C library, the first thing that
       | you'll look at when someone else does it is not the FILE type,
       | but how stdin, stdout, and stderr have changed.
       | 
       | The big breaking change is usually the historical implementation
       | of the standard streams as addresses of elements of an array
       | rather than as named pointers. (Plauger's example implementation
       | had them as elements 0, 1, and 2 of a _Files[] array, for
       | example.) It's possible to retain binary compatibility with
       | unrecompiled code that uses the old
       | getc/putc/feof/ferror/fclearerr/&c. macros by preserving
       | structure layouts, but changing stdin, stdout, and stderr can
       | make things not link.
       | 
       | And indeed that has happened here.
        
       ___________________________________________________________________
       (page generated 2025-07-20 23:00 UTC)