[HN Gopher] I thought I found a bug
___________________________________________________________________
I thought I found a bug
Author : MBCook
Score : 116 points
Date : 2024-12-25 19:55 UTC (1 days ago)
(HTM) web link (www.os2museum.com)
(TXT) w3m dump (www.os2museum.com)
| userbinator wrote:
| I was a bit disappointed that the article didn't go into the
| system calls themselves, since AFAIK those have always supported
| interleaved reads and writes with no problems even on early
| Unices. E.g. POSIX has this:
|
| https://pubs.opengroup.org/onlinepubs/9699919799/functions/w...
|
| _After a write() to a regular file has successfully returned:
|
| Any successful read() from each byte position in the file that
| was modified by that write shall return the data specified by the
| write() for that position until such byte positions are again
| modified._
| PeterWhittaker wrote:
| Perhaps because the article is specifically about the buffered
| f*() calls in stdio, and not the system calls?
|
| Though, as I offer that thought, the divergence between C and
| the system calls is definitely curious.
| mystified5016 wrote:
| I get a real kick out of the different ways people pluralize
| Unixen. Unices is a good one
| anonymousiam wrote:
| While reading this, I realized that I have a copy of the elusive
| usr/group Standard that the author mentions. I just pulled it off
| an image of my early DOS hard drive (before I migrated to Linux
| in 1991). I should probably post it somewhere.
|
| # ls -altr
|
| total 540
|
| -rw-r--r-- 1 root root 22606 Apr 12 1990 NOTES
|
| -rw-r--r-- 1 root root 172645 Apr 12 1990 LIB
|
| -rw-r--r-- 1 root root 102349 Apr 12 1990 APP
|
| -rw-r--r-- 1 root root 224037 Apr 12 1990 C
| codetrotter wrote:
| > I should probably post it somewhere.
|
| Upload it to the Internet Archive! :D
|
| https://help.archive.org/help/uploading-a-basic-guide/
| anonymousiam wrote:
| Thanks. It looks like the document itself is still
| copyrighted. The files can be uploaded, so I will.
| xelxebar wrote:
| > before I migrated to Linux in 1991
|
| What a mic drop. That must have been a fun ride from then until
| now! Would love to hear some of your battle stories.
| jsjohnst wrote:
| My first Linux machine was in 1993, what would you like to
| know? Pre-1.0 Kernels were an adventure that's for sure.
| icedchai wrote:
| My first Linux box ran 0.99.10. I was running SLS, which
| installed off of a dozen or so floppies. I eventually moved
| to Slackware a year or so later.
| xenophonf wrote:
| I remember downloading and installing one of the MCC
| Interim releases in 1993? 1994? before switching to
| Slackware. Early *BSD and Linux were certainly an
| adventure back then. I don't miss it.
| anonymousiam wrote:
| SLS was also my first distro. I also played with
| Yggdrasil Linux (bootable CD) for a while, because at the
| time, nobody could afford a hard drive with as much
| capacity as a CD-ROM.
|
| Those early Linux distros borrowed a lot from SunOS
| (Solaris 1), so it was easy to adapt between work/home.
| purplesyringa wrote:
| I must be missing something.
|
| The article lists three libcs (Open Watcom, Microsoft Visual C++
| 6.0, IBM C/C++ 3.6 for Windows) from the good old times. Does the
| emulator link to Open Watcom, i.e., does it emulate DOS on
| machines about as old as DOS itself? What's the point here?
| AntiRush wrote:
| The article is about compiling and running a program inside the
| emulator. When the unexpected behavior occurred, the author
| assumed it was a bug in the emulator.
| purplesyringa wrote:
| So if it's not a bug in the emulator, then it's a bug in
| COMMAND.COM? I don't think that's the case, surely it
| couldn't have been missed by Microsoft at the time. The
| article goes on to talk about fread/fwrite calls, but
| COMMAND.COM was written in assembly, I'm pretty sure it
| didn't link to any libc, and certainly not to Open Watcom --
| why would MS use it instead of their own library?
| grodriguez100 wrote:
| It is not a bug. The article explains that this is the
| expected behaviour.
| purplesyringa wrote:
| What is expected behavior? Surely `echo AB> foo.txt; echo
| CD>> foo.txt` producing `ABBC` is either a bug in
| COMMAND.COM, the emulator, or something else? That can't
| be correct.
| justin_ wrote:
| I believe it is a bug in the the emulator's implementation of
| COMMAND.COM. Often, these DOS "emulators" re-implement the
| standard commands of DOS, including the shell[1]. This is in
| addition to emulating weird 16-bit environment stuff and the
| BIOS.
|
| The bug can pop up in any C program using stdio that assumes
| it's fine to do `fread` followed immediately by `fwrite`. The
| spec forbids this. To make matters more confusing, this
| behavior does _not_ seem to be in modern libc implementations.
| Or at least, it works on my machine. I bet modern
| implementations are able to be more sane about managing
| different buffers for reading and writing.
|
| The original COMMAND.COM from MS-DOS probably did not have this
| problem, since at least in some versions it was written in
| assembly[2]. Even for a shell written in C, the fix is pretty
| easy: seek the file before switching between reading/writing.
|
| The title of this post is confusing, since it clearly _is_ a
| bug somewhere. But I think the author was excited about
| possibly finding a bug in libc:
|
| > Sitting down with a debugger, I could just see how the C run-
| time library (Open Watcom) could be fixed to avoid this
| problem.
|
| [1] Here's DOSBox, for example: https://github.com/dosbox-
| staging/dosbox-staging/blob/main/s...
|
| [2] MS-DOS 4.0: https://github.com/microsoft/MS-
| DOS/tree/main/v4.0/src/CMD/C...
| rep_lodsb wrote:
| The article is very vague about which emulator and
| COMMAND.COM it is about, and if they're integrated with each
| other. Can't be DOSBox, since it handles it correctly:
| C:\> echo AB> foo.txt C:\> echo CD>> foo.txt
| C:\> type foo.txt AB CD
|
| (Note that echo adds a newline, same as on real DOS, or even
| UNIX without "-n". This other shell doesn't for some reason.)
|
| The "real" COMMAND.COM, and all other essential parts of
| MS-/PC-/DR-DOS, have _always_ been written in asm, where none
| of this libc nonsense matters.
|
| Also it annoys me greatly when people talk about " _the_ C
| Library " as if it exists in some Platonic realm, and is
| essential to all software ever written.
| stevage wrote:
| There's a lot of weird missing details.
| raldi wrote:
| I'm having trouble following whether the problem occurs with any
| append or only when it's two consecutive commands like this.
| cryptonector wrote:
| In the stdio implementations that don't support free intermixing
| of reads and writes the issue typically is that there is only one
| buffer for both reading and writing. You have to reset the buffer
| in order to switch from reading to writing or vice-versa, else
| you will have a dirty, non-empty buffer that does not correspond.
| The functions `fflush()`, `fseek()`, `rewind()`, and `fsetpos()`
| happen to clear the buffer, which is why you have to use them
| before switching from reading to writing or vice-versa!
|
| Without an indicator in `struct FILE` of whether the last
| operation was a read or a write, the stdio implementation has no
| way to detect the problem and correct the situation by
| automatically flushing and resetting the buffer, say. An
| alternative would be to have two buffers, naturally. But you can
| see how a pre-update version could be trivially made to support
| update modes without adding a second buffer or automatic buffer
| flushing. And that's almost certainly what happened when update
| mode was added. My guess is someone got bitten by that and then
| the maintainer decided to just document the problem rather than
| fix it, probably because by then fixing the problem was hard.
| fweimer wrote:
| Historically, before mandatory locking, getc and putc have been
| implemented as macros, and an extra check for stream state
| likely mattered from a performance perspective.
|
| To avoid the extra check, you don't actually need two buffers,
| just separate buffer pointers for reading and writing. (This is
| probably how most libcs implement this today.) I suppose memory
| was really scarce back then.
| cryptonector wrote:
| Separate non-overlapping pointers into one buffer is not that
| different from two buffers, notionally, but yeah.
| fweimer wrote:
| The idea is that for the non-active mode, the current/end
| pointers are equal, signifying that the buffer is
| exhausted. This forces entering the slow path, where the
| mode can be switched.
|
| I don't think an implementation with two active, non-empty
| buffers is all that useful because you can't tell which
| buffer's progress should be used for the file pointer
| adjustment in ftell.
| cryptonector wrote:
| I get that. One buffer that can be maximized by the path
| that most needs it (read or write). I'm just saying that
| notionally it's two independent buffers, which solves the
| problem of not having to force a buffer flush between
| mode change.
|
| > I don't think an implementation with two active, non-
| empty buffers is all that useful because you can't tell
| which buffer's progress should be used for the file
| pointer adjustment in ftell.
|
| Oh interesting. The other problem is that two buffers
| reduces memory utilization.
___________________________________________________________________
(page generated 2024-12-26 23:01 UTC)