[HN Gopher] Uptime related server crashes (2011)
       ___________________________________________________________________
        
       Uptime related server crashes (2011)
        
       Author : luu
       Score  : 31 points
       Date   : 2023-12-13 20:42 UTC (2 days ago)
        
 (HTM) web link (barry.blog)
 (TXT) w3m dump (barry.blog)
        
       | forbiddenlake wrote:
       | (2011)
        
         | macNchz wrote:
         | Rather relevant in this case...I was like "ok interesting
         | debugging but have you considered just running a kernel that
         | was released in the last decade??" until I got to the date at
         | the bottom.
        
           | readingnews wrote:
           | Right. When I saw "Debian kernels ranging from 2.6.32-21 to
           | 2.6.32-24..." I was like, uhhhh.
           | 
           | Someone should change the title of the HN post to say 2011.
        
       | withzombies wrote:
       | > _sds.total_pwr is the sum of the power of all CPUs in the
       | scheduling domain. This sum ends up being zero and that's what
       | causing the crash - division by zero._
       | 
       | > _The "CPU power" is used to take into account how much
       | calculating capabilities a CPU has compared to the other CPUs and
       | the main factors for calculating it are:_
       | 
       | > _1. Whether the CPU is shared, for example by using
       | multithreading._
       | 
       | > _2. How many real-time tasks the CPU is processing._
       | 
       | > _3. In newer kernels, how much time the CPU had spent
       | processing IRQs._
       | 
       | > _The current suggested fix for this bug is relying on the
       | theory that while taking into account the real-time tasks (#2
       | above), scale_rt_power() could return negative value, and thus
       | the sum of all CPU powers may end up being zero._
       | 
       | The author doesn't really describe how this panic is related to
       | uptime. Do long running kernels collect a lot of real-time tasks,
       | is it a leak of some kind?
       | 
       | The suggested fix link doesn't provide any extra context as to
       | why its uptime related either.
        
         | theandrewbailey wrote:
         | The post doesn't explain why scale_rt_power() isn't in the code
         | snippet, or how it factors in.
        
       | krallja wrote:
       | Uptime-related crashes seem fairly common. Here's one of my
       | stories, from Thanksgiving 2012: https://jacob.jkrall.net/turkey-
       | day-down-time I've seen a couple others since, but they had the
       | same general shape so didn't bother writing the same story again.
        
       | keep_reading wrote:
       | I know of a FreeBSD 6.0 server with 13+ years uptime, what the
       | heck was Linux doing back then????
        
       ___________________________________________________________________
       (page generated 2023-12-15 23:02 UTC)