[HN Gopher] Uncovering a 24-year-old bug in the Linux Kernel
       ___________________________________________________________________
        
       Uncovering a 24-year-old bug in the Linux Kernel
        
       Author : greenonion
       Score  : 243 points
       Date   : 2021-02-11 15:12 UTC (7 hours ago)
        
 (HTM) web link (engineering.skroutz.gr)
 (TXT) w3m dump (engineering.skroutz.gr)
        
       | guenthert wrote:
       | Impressive detective work and documentation.
        
         | mooman219 wrote:
         | It's like watching a murder mystery unfold. It feels really
         | daunting to dive this deep into a bug on its vague symptoms.
         | It's probably the selection bias for what gets on the HN front
         | page, but it feels like a large minority here can tackle
         | something like this. I have trouble imaging having that much of
         | a handle on Linux to feel comfortable hot patching the kernel
         | because I suspect something is wrong in the networking stack.
        
       | rancor wrote:
       | I have, admittedly old and very vague, memories of people talking
       | about rsync being "hard on networks" or "dealing poorly with
       | congestion." I'd put good odds that this bug is why those
       | statements existed.
        
         | jandrese wrote:
         | This seems to be the opposite. You only see it when
         | transferring titanic amounts of data over a pristine
         | connection. If your network had congestion you wouldn't trigger
         | this bug.
         | 
         | But this also explains a bit why rsync is "hard on networks".
         | Most bulk data transfers end up with breaks in the data that
         | give more breathing room to other protocols. Not rsync, it
         | tries as hard as it can to keep the pipe full 100% of the time,
         | making it hard for other TCP slow starters to get a foothold.
        
           | tinus_hn wrote:
           | BitTorrent does the same thing and used to be a lot more
           | common, just typically not between hosts close to each other.
        
         | bonzini wrote:
         | The bug requires transferring over 2GB of data without reading
         | from the socket, so it's unlikely; also, a hang is the opposite
         | of being hard on the network. ;) However the uncommon
         | characteristics of rsync traffic is probably why some
         | congestion control algorithms may not deal well with rsync.
        
       | leesalminen wrote:
       | I wonder is this is the cause of a nasty NFSv3 issue I was having
       | years ago where clients would periodically hang indefinitely,
       | necessitating a system reboot. At the time, we were ingesting
       | large videos on the client and transferring to a shared volume
       | via nfs.
        
         | jandrese wrote:
         | I'd suspect a bug in the NFS implementation. That would hardly
         | be unheard of.
         | 
         | NFS's failure mode of freezing up your system and requiring a
         | full reboot to clear is purestrain NFS though. I never
         | understood why the idea of an eventual soft failure (returning
         | a socket error) was considered unacceptable in NFS land.
        
           | toast0 wrote:
           | > I never understood why the idea of an eventual soft failure
           | (returning a socket error) was considered unacceptable in NFS
           | land.
           | 
           | Problems like this are usually the result of being unable to
           | decide on an appropriate timeout; so no timeout is chosen. I
           | like to suggest rather long timeouts, like one day or one
           | week, rather than forever to get beyond that. Very few people
           | are going to say, after a read tried for a whole day that it
           | should have tried longer.
           | 
           | Another issue is that POSIX file i/o doesn't have great error
           | indicators; so it can be tricky to plumb things through in
           | clearly correct ways.
        
             | StillBored wrote:
             | NFS is notorious for breaking kernel and application
             | assumptions about posix. Linux falls into this trap in
             | various ways too in an effort to simplify the common cases.
             | Timeouts might be appropriate for read/open/etc calls but
             | in a way the problems are worse on the write/close/etc
             | side.
             | 
             | Reading the close() manpage hints at some of those
             | problems, but fundamentally posix sync file io isn't well
             | suited to handling space and io errors which are deferred
             | from the originating call. Consider write()'s which are
             | buffered by the kernel but can't be completed due to
             | network or out of space consideration. A naive reading of
             | write() would imply that errors should be immediately
             | returned so that the application can know the latest
             | write/record update failed. Yet what really happens is that
             | for performance reasons the data from those calls is
             | allowed to be buffered. Leading to a situation where an IO
             | call may return a failure as a result of failure at some
             | point in the past. Given all the ways this can happen, the
             | application cannot accurately determine what was actually
             | written, if anything, since the last serialization event
             | (which is itself another set of problems).
             | 
             | edit: this also gets into the ugly case about the state of
             | the fd being unspecified (per posix) following close
             | failures. So per posix the correct response is to retry
             | close(), while simultaneously assuring that open()s aren't
             | happening anywhere. Linux simplifies this a bit by implying
             | the FD is closed, but that has its own issues.
        
               | jandrese wrote:
               | I understand the reasoning, but at the same time wonder
               | if this isn't perfect being the enemy of good? Since
               | there is no case where a timeout/error style exit can be
               | guaranteed to never lose data we instead lock the entire
               | box up when a NFS server goes AWOL. This still causes the
               | data to be lost, but also brings down everything else.
        
               | StillBored wrote:
               | Well, soft mounts should keep the entire machine from
               | dying, unless your running critical processes off the NFS
               | mount. Reporting/debugging these cases can be fruitful.
               | 
               | OTOH, PXE/HTTPS+NFS root is a valid config, and there
               | isn't really anyway to avoid machine/client death when
               | the NFS goes offline for an extended period. Even without
               | NFS linux has gotten better at dealing with full
               | filesystems, but even that is still hit or miss.
        
       | bjeds wrote:
       | Wow, superb writeup, thank you author for writing it and the
       | submitter for posting it!
        
       | rob74 wrote:
       | Great writeup, and also thoroughly answers the first question
       | that popped into my mind: "how on earth could a bug in the Linux
       | network stack that causes the whole data transfer to get stuck
       | stay undiscovered for so long?"
        
       | tryauuum wrote:
       | I have seen an ancient "drop packets with zero-length tcp window"
       | rule in iptables in my company. Funny enough to learn that zero-
       | length tcp window can be found in normal, non-malicious packets!
        
       | dokem wrote:
       | > she
       | 
       | Can we just stop pretending.
        
       | lykr0n wrote:
       | wow. We've run into rsync bugs like this and just chalked it up
       | to "things happen"
        
       | qchris wrote:
       | Besides being a great technical write-up, this does an absolutely
       | fantastic job of doing low-key recruitment for Skroutz. It shows
       | some of the main tools and technologies that the company uses on
       | a day-to-day basis, provides a window into the way that they
       | approach problems, makes a compelling case that you'd be working
       | with some talented engineers, and showcases a culture willing to
       | engage with the open source community.
       | 
       | The hiring pitch isn't in your face, but there's a "We're
       | hiring!" button in the banner, which fairly unobtrusively follows
       | you down the page, and then ends with a "hey, if you're
       | interested in working with us, reach out." Overall, it just feels
       | really well done.
        
       | db48x wrote:
       | Great find, and it sounds like a great place to work.
        
         | falseprofit wrote:
         | Just learned that Skroutz is pronounced 'Scrooge' in Greek, and
         | this isn't a coincidence!
        
           | unwind wrote:
           | Since they never described the context, Skroutz seems to be
           | the dominant online price-comparison site for Greece. Which,
           | I agree, would make the name make sense.
           | 
           | I had never heard the name before, and I felt the article
           | lacked some context. Googling it, there seems to be very
           | little content about them in English, which makes the nice
           | blog post almost surprising. :)
        
             | kafrofrite wrote:
             | Correct. Greeks (and a few other markets) use it to compare
             | prices and buy stuff. That being said, they are in the
             | process of expanding their business right now with new
             | products/different offerings.
             | 
             | There are some growing pains there but it's an interesting
             | company overall and they experiment with the work/life
             | balance (e.g. they did 4 day weeks at some stage, unsure if
             | they went ahead with that or reverted to 5-day weeks).
        
       ___________________________________________________________________
       (page generated 2021-02-11 23:01 UTC)