[HN Gopher] Uncovering a 24-year-old bug in the Linux Kernel
___________________________________________________________________
Uncovering a 24-year-old bug in the Linux Kernel
Author : greenonion
Score : 243 points
Date : 2021-02-11 15:12 UTC (7 hours ago)
(HTM) web link (engineering.skroutz.gr)
(TXT) w3m dump (engineering.skroutz.gr)
| guenthert wrote:
| Impressive detective work and documentation.
| mooman219 wrote:
| It's like watching a murder mystery unfold. It feels really
| daunting to dive this deep into a bug on its vague symptoms.
| It's probably the selection bias for what gets on the HN front
| page, but it feels like a large minority here can tackle
| something like this. I have trouble imaging having that much of
| a handle on Linux to feel comfortable hot patching the kernel
| because I suspect something is wrong in the networking stack.
| rancor wrote:
| I have, admittedly old and very vague, memories of people talking
| about rsync being "hard on networks" or "dealing poorly with
| congestion." I'd put good odds that this bug is why those
| statements existed.
| jandrese wrote:
| This seems to be the opposite. You only see it when
| transferring titanic amounts of data over a pristine
| connection. If your network had congestion you wouldn't trigger
| this bug.
|
| But this also explains a bit why rsync is "hard on networks".
| Most bulk data transfers end up with breaks in the data that
| give more breathing room to other protocols. Not rsync, it
| tries as hard as it can to keep the pipe full 100% of the time,
| making it hard for other TCP slow starters to get a foothold.
| tinus_hn wrote:
| BitTorrent does the same thing and used to be a lot more
| common, just typically not between hosts close to each other.
| bonzini wrote:
| The bug requires transferring over 2GB of data without reading
| from the socket, so it's unlikely; also, a hang is the opposite
| of being hard on the network. ;) However the uncommon
| characteristics of rsync traffic is probably why some
| congestion control algorithms may not deal well with rsync.
| leesalminen wrote:
| I wonder is this is the cause of a nasty NFSv3 issue I was having
| years ago where clients would periodically hang indefinitely,
| necessitating a system reboot. At the time, we were ingesting
| large videos on the client and transferring to a shared volume
| via nfs.
| jandrese wrote:
| I'd suspect a bug in the NFS implementation. That would hardly
| be unheard of.
|
| NFS's failure mode of freezing up your system and requiring a
| full reboot to clear is purestrain NFS though. I never
| understood why the idea of an eventual soft failure (returning
| a socket error) was considered unacceptable in NFS land.
| toast0 wrote:
| > I never understood why the idea of an eventual soft failure
| (returning a socket error) was considered unacceptable in NFS
| land.
|
| Problems like this are usually the result of being unable to
| decide on an appropriate timeout; so no timeout is chosen. I
| like to suggest rather long timeouts, like one day or one
| week, rather than forever to get beyond that. Very few people
| are going to say, after a read tried for a whole day that it
| should have tried longer.
|
| Another issue is that POSIX file i/o doesn't have great error
| indicators; so it can be tricky to plumb things through in
| clearly correct ways.
| StillBored wrote:
| NFS is notorious for breaking kernel and application
| assumptions about posix. Linux falls into this trap in
| various ways too in an effort to simplify the common cases.
| Timeouts might be appropriate for read/open/etc calls but
| in a way the problems are worse on the write/close/etc
| side.
|
| Reading the close() manpage hints at some of those
| problems, but fundamentally posix sync file io isn't well
| suited to handling space and io errors which are deferred
| from the originating call. Consider write()'s which are
| buffered by the kernel but can't be completed due to
| network or out of space consideration. A naive reading of
| write() would imply that errors should be immediately
| returned so that the application can know the latest
| write/record update failed. Yet what really happens is that
| for performance reasons the data from those calls is
| allowed to be buffered. Leading to a situation where an IO
| call may return a failure as a result of failure at some
| point in the past. Given all the ways this can happen, the
| application cannot accurately determine what was actually
| written, if anything, since the last serialization event
| (which is itself another set of problems).
|
| edit: this also gets into the ugly case about the state of
| the fd being unspecified (per posix) following close
| failures. So per posix the correct response is to retry
| close(), while simultaneously assuring that open()s aren't
| happening anywhere. Linux simplifies this a bit by implying
| the FD is closed, but that has its own issues.
| jandrese wrote:
| I understand the reasoning, but at the same time wonder
| if this isn't perfect being the enemy of good? Since
| there is no case where a timeout/error style exit can be
| guaranteed to never lose data we instead lock the entire
| box up when a NFS server goes AWOL. This still causes the
| data to be lost, but also brings down everything else.
| StillBored wrote:
| Well, soft mounts should keep the entire machine from
| dying, unless your running critical processes off the NFS
| mount. Reporting/debugging these cases can be fruitful.
|
| OTOH, PXE/HTTPS+NFS root is a valid config, and there
| isn't really anyway to avoid machine/client death when
| the NFS goes offline for an extended period. Even without
| NFS linux has gotten better at dealing with full
| filesystems, but even that is still hit or miss.
| bjeds wrote:
| Wow, superb writeup, thank you author for writing it and the
| submitter for posting it!
| rob74 wrote:
| Great writeup, and also thoroughly answers the first question
| that popped into my mind: "how on earth could a bug in the Linux
| network stack that causes the whole data transfer to get stuck
| stay undiscovered for so long?"
| tryauuum wrote:
| I have seen an ancient "drop packets with zero-length tcp window"
| rule in iptables in my company. Funny enough to learn that zero-
| length tcp window can be found in normal, non-malicious packets!
| dokem wrote:
| > she
|
| Can we just stop pretending.
| lykr0n wrote:
| wow. We've run into rsync bugs like this and just chalked it up
| to "things happen"
| qchris wrote:
| Besides being a great technical write-up, this does an absolutely
| fantastic job of doing low-key recruitment for Skroutz. It shows
| some of the main tools and technologies that the company uses on
| a day-to-day basis, provides a window into the way that they
| approach problems, makes a compelling case that you'd be working
| with some talented engineers, and showcases a culture willing to
| engage with the open source community.
|
| The hiring pitch isn't in your face, but there's a "We're
| hiring!" button in the banner, which fairly unobtrusively follows
| you down the page, and then ends with a "hey, if you're
| interested in working with us, reach out." Overall, it just feels
| really well done.
| db48x wrote:
| Great find, and it sounds like a great place to work.
| falseprofit wrote:
| Just learned that Skroutz is pronounced 'Scrooge' in Greek, and
| this isn't a coincidence!
| unwind wrote:
| Since they never described the context, Skroutz seems to be
| the dominant online price-comparison site for Greece. Which,
| I agree, would make the name make sense.
|
| I had never heard the name before, and I felt the article
| lacked some context. Googling it, there seems to be very
| little content about them in English, which makes the nice
| blog post almost surprising. :)
| kafrofrite wrote:
| Correct. Greeks (and a few other markets) use it to compare
| prices and buy stuff. That being said, they are in the
| process of expanding their business right now with new
| products/different offerings.
|
| There are some growing pains there but it's an interesting
| company overall and they experiment with the work/life
| balance (e.g. they did 4 day weeks at some stage, unsure if
| they went ahead with that or reverted to 5-day weeks).
___________________________________________________________________
(page generated 2021-02-11 23:01 UTC)