[HN Gopher] Why is my CPU usage always 100%?
       ___________________________________________________________________
        
       Why is my CPU usage always 100%?
        
       Author : pncnmnp
       Score  : 409 points
       Date   : 2025-01-09 21:15 UTC (4 days ago)
        
 (HTM) web link (www.downtowndougbrown.com)
 (TXT) w3m dump (www.downtowndougbrown.com)
        
       | veltas wrote:
       | It doesn't feel like reading 4 times is necessarily a portable
       | solution, if there will be more versions at different speeds and
       | different I/O architectures; or how this will work under more
       | load, and whether the original change was done to fix some other
       | performance problem OP is not aware of, but not sure what else
       | can be done. Unfortunately many vendors like Marvell can
       | seriously under-document crucial features like this. If anything
       | it would be good to put some of this info in the comment itself,
       | not very elegant but how else practically are we meant to keep
       | track of this, is the mailing list part of the documentation?
       | 
       | Doesn't look like there's a lot of discussion on the mailing
       | list, but I don't know if I'm reading the thread view correctly.
        
         | _nalply wrote:
         | I also wondered about this, but there's a crucial differnce, no
         | idea if it matters: in that loop it reads the register, so the
         | register is read at least 4 times.
        
         | adrian_b wrote:
         | This is a workaround for a hardware bug of a certain CPU.
         | 
         | Therefore it cannot really be portable, because other timers in
         | other devices will have different memory maps and different
         | commands for reading.
         | 
         | The fault is with the designers of these timers, who have
         | failed to provide a reliable way to read their value.
         | 
         | It in hard to believe that this still happens in this century,
         | because reading correct values despite the fact that the timer
         | is incremented or decremented continuously is an essential goal
         | in the design of any timer that may be read, and how to do it
         | has been well known for more than 3 quarters of century.
         | 
         | The only way to make such a workaround somewhat portable is to
         | parametrize it, e.g. with the number of retries for direct
         | reading or with the delay time when reading the auxiliary
         | register. This may be portable between different revisions of
         | the same buggy timer, but the buggy timers in other unrelated
         | CPU designs will need different workarounds anyway.
        
           | stkdump wrote:
           | > how to do it has been well known for more than 3 quarters
           | of century
           | 
           | Don't leave me hanging! How to do it?
        
             | adrian_b wrote:
             | Direct reading without the risk of reading incorrect values
             | is possible only when the timer is implemented using a
             | synchronous counter instead of an asynchronous counter and
             | the synchronous counter must be fast enough to ensure a
             | stable correct value by the time when it is read, and the
             | reading signal must be synchronized with the timer clock
             | signal.
             | 
             | Synchronous counters are more expensive in die area than
             | asynchronous counters, especially at high clock
             | frequencies. Moreover, it may be difficult to also
             | synchronize the reading signal with the timer clock.
             | Therefore the second solution may be preferable, which uses
             | a separate capture register for reading the timer value.
             | 
             | This was implemented in the timer described in TFA, but it
             | was done in a wrong way.
             | 
             | The capture register must either ensure that the capture is
             | already complete by the time when it is possible to read
             | its value after giving a capture command, or it must have
             | some extra bit that indicates when its value is valid.
             | 
             | In this case, one can read the capture register until the
             | valid bit is on, having a complete certainty that the end
             | value is correct.
             | 
             | When adding some arbitrary delay between the capture
             | command and reading the capture register, you can never be
             | certain that the delay value is good.
             | 
             | Even when the chosen delay is 100% effective during
             | testing, it can result in failures on other computers or
             | when the ambient temperature is different.
        
           | veltas wrote:
           | > This is a workaround for a hardware bug of a certain CPU.
           | 
           | What about different variants, revisions, and speeds of this
           | CPU?
        
         | Karliss wrote:
         | The related part of doc has one more note "This request
         | requires up to three timer clock cycles. If the selected timer
         | is working at slow clock, the request could take longer." From
         | the way doc is formatted it's not fully clear what "this
         | request" refers to. It might explain where 3-5 attempts come
         | from, and that it might not be pulled completely out of thin
         | air. But the part about taking up to but sometimes more clock
         | cycles makes it impossible to have a "proper" solution without
         | guesswork or further clarifications from vendor.
         | 
         | "working at slow clock" part, might explain why some other
         | implementations had different code path for 32.768 KHz clocks.
         | According to docs there are two available clock sources "Fast
         | clock" and "32768 Hz" which could mean that "slow clock" refers
         | to specific hardware functionality is not just a vague phrase.
         | 
         | As for portability concerns, this is already low level hardware
         | specific register access. If Marvell releases new SOC not only
         | there is no assurance that will require same timing, it might
         | was well have different set of registers which require
         | completely different read and setup procedure not just
         | different timing.
         | 
         | One thing that slightly confuses me - the old implementation
         | had 100 cycles of "cpu_relax()" which is unrelated to specific
         | timer clock, but neither is reading of TMR_CVWR register. Since
         | 3-5 of cycles of that worked better than 100 cycles of
         | cpu_relex, it clearly takes more time unless cpu_relax part got
         | completely optimized out. At least I didn't find any references
         | mentioning that timer clock affects read time of TMR_CVWR.
        
           | veltas wrote:
           | It sounds like this is an old CPU(?), so no need to worry
           | about the future here.
           | 
           | > I didn't find any references mentioning that timer clock
           | affects read time of TMR_CVWR.
           | 
           | Reading the register might be related to the timer's internal
           | clock, as it would have to wait for the timer's bus to
           | respond. This is essentially implied if Marvell recommend re-
           | reading this register, or if their reference implementation
           | did so. My main complaint is it's all guesswork, because
           | Marvell's docs aren't that good.
        
             | MBCook wrote:
             | The Chumby hardware I'm thinking of is from 2010 or so. So
             | if that's it, it would certainly be old. And it would
             | explain a possible relation with the OLPC having a similar
             | chip.
             | 
             | https://en.wikipedia.org/wiki/Chumby
        
       | begueradj wrote:
       | Oops, this is not valid.
        
         | M95D wrote:
         | I'm sure a few more software updates will take care of this
         | little problem...
        
         | zaik wrote:
         | You're probably thinking about memory and caching. There are no
         | advantages to keeping the CPU at 100% when no workload needs to
         | be done.
        
         | josephg wrote:
         | Only when your computer actually has work to do. Otherwise your
         | CPU is just a really expensive heater.
         | 
         | Modern computers are designed to idle at 0% then temporarily
         | boost up when you have work to do. Then once the task is done,
         | they can drop back to idle and cool down again.
        
           | PUSH_AX wrote:
           | Not that I disagree, but when exactly in modern operating
           | systems are there moments where there are zero instructions
           | being executed? Surely there are always processes doing
           | background things?
        
             | pintxo wrote:
             | With multi-core cpus, some of them can be fully off, while
             | others handle any background tasks.
        
             | _flux wrote:
             | There are a lot of such moments, but they are just short.
             | When you're playing music, you download a bit of data from
             | the network or the SSD/HDD by first issuing a request and
             | then waiting (i.e. doing nothing) to get the short piece of
             | data back. Then you decode it and upload a short bit of the
             | sound to your sound card and then again you wait for new
             | space to come up, before you send more data.
             | 
             | One of the older ways (in x86 side) to do this was to
             | invoke the HLT instruction
             | https://en.wikipedia.org/wiki/HLT_(x86_instruction) : you
             | stop the processor, and then the processor wakes up when an
             | interrupt wakes it up. An interrupt might come from the
             | sound card, network card, keyboard, GPU, timer (e.g. 100
             | times a second to schedule an another process, if some
             | process exists that is waiting for CPU), and during the
             | time you wait for the interrupt to happen you just do
             | nothing, thus saving energy.
             | 
             | I suspect things are more complicated in the world of
             | multiple CPUs.
        
             | johannes1234321 wrote:
             | From human perception there will "always" be work on a
             | "normal" system.
             | 
             | However for a CPU with multiple cores, each running at 2+
             | GHz, there is enough room for idling while seeming active.
        
             | reshlo wrote:
             | > Timer Coalescing attempts to enforce some order on all
             | this chaos. While on battery power, Mavericks will
             | routinely scan all upcoming timers that apps have set and
             | then apply a gentle nudge to line up any timers that will
             | fire close to each other in time. This "coalescing"
             | behavior means that the disk and CPU can awaken, perform
             | timer-related tasks for multiple apps at once, and then
             | return to sleep or idle for a longer period of time before
             | the next round of timers fire.[0]
             | 
             | > Specify a tolerance for the accuracy of when your timers
             | fire. The system will use this flexibility to shift the
             | execution of timers by small amounts of time--within their
             | tolerances--so that multiple timers can be executed at the
             | same time. Using this approach dramatically increases the
             | amount of time that the processor spends idling...[1]
             | 
             | [0] https://arstechnica.com/gadgets/2013/06/how-os-x-
             | mavericks-w...
             | 
             | [1] https://developer.apple.com/library/archive/documentati
             | on/Pe...
        
               | miki123211 wrote:
               | Modern Macs also have two different kinds of cores, slow
               | but energy-efficient e-cores and high-performance
               | p-cores.
               | 
               | The p cores can be activated and deactivated very
               | quickly, on the order of microseconds IIRC, which means
               | the processor always "feels" fast while still conserving
               | battery life.
        
             | Someone wrote:
             | We're not talking about what humans call "a moment". For a
             | (modern) computer, a millisecond is "a moment", possibly
             | even "a long moment". It can run millions of instructions
             | in such a time frame.
             | 
             | A modern CPU also has multiple cores not all of which may
             | be needed, and will be supported by hardware that can do
             | lots of tasks.
             | 
             | For example, sending out an audio signal isn't typically
             | done by the main CPU. It tells some hardware to send a
             | buffer of data at some frequency, then prepares the next
             | buffer, and can then sleep or do other stuff until it has
             | to send the new buffer.
        
             | nejsjsjsbsb wrote:
             | My processor gets several whole nanoseconds to rest up, I
             | am not a slave driver.
        
         | homebrewer wrote:
         | This feels like the often-repeated "argument" that Electron
         | applications are fine because "unused memory is wasted memory".
         | What Linus meant by that is that the operating system should
         | strive to use as much of the _free_ RAM as possible for things
         | like file and dentry caches. Not that memory should be wasted
         | on millions of layers of abstraction and too-high resolution
         | images. But it 's often misunderstood that way.
        
           | Culonavirus wrote:
           | Eeeh, the Electron issue is oveblown.
           | 
           | These days the biggest hog of memory is the browser. Not
           | everyone does this, but a lot of people, myself included,
           | have tens of tabs open at a time (with tab groups and all of
           | that)... all day. The browser is the primary reason I
           | recommend a minimum of 16gb ram to F&F when they ask "the it
           | guy" what computer to buy.
           | 
           | When my Chrome is happily munching on many gigabytes of ram I
           | don't think a few hundred megs taken by your average Electron
           | app is gonna move the needle.
           | 
           | The situation is a bit different on mobile, but Electron is
           | not a mobile framework so that's not relevant.
           | 
           | PS: Can I rant a bit how useless the new(ish) Chrome memory
           | saver thing is? What is the point having tabs open if you're
           | gonna remove them from memory and just reload on activation?
           | In the age of fast consumer ssds I'd expect you to
           | intelligently hibernate the tabs on disk, otherwise what you
           | have are silly bookmarks.
        
             | smolder wrote:
             | Your argument against electron being a memory hog is that
             | chrome is a bigger one? You are aware that electron is an
             | instance of chromium, right?
        
               | rbanffy wrote:
               | This is a good point, but it would be interesting if we
               | had a "just enough" rendering engine for UI elements that
               | was a subset of a browser with enough functionality to
               | provide a desktop app environment and that could be
               | driven by the underlying application (or by the GUI,
               | passing events to the underlying app).
        
               | nejsjsjsbsb wrote:
               | Problem there is Electron devs do it for convenience.
               | That means esbuild, npm install react this that. If it
               | ain't a full browser this won't work.
        
               | caspper69 wrote:
               | Funny thing about all of this is that it's just such
               | oppressive overkill.
               | 
               | Most GUI toolkits can do layout / graphics / fonts in a
               | much simpler (and sane) way. "Reactive" layout is not a
               | new concept.
               | 
               | HTML/CSS/JS is not an efficient or clean way to do layout
               | in an application. It only exists to shoehorn UI layout
               | into a rich text DOCUMENT format.
               | 
               | Can you imagine if Microsoft or Apple had insisted that
               | GUI application layout be handled the way we do it today
               | back in the 80s and 90s? Straight up C was easier to grok
               | that this garbage we have today. The industry as a whole
               | should be ashamed. It's not easier, it doesn't make
               | things look better, and it wastes billions in developer
               | time and user time, not to mention slowly making the
               | oceans boil.
               | 
               | Every time I have to use a web-based application (which
               | is most of the time nowadays), it infuriates me. The
               | latency is atrocious. The UIs are slow. There's
               | mysterious errors at least once or twice daily. WTF are
               | we doing? When a Windows 95 application ran faster and
               | was more responsive and more reliable than something
               | written 30 years later, we have a serious problem.
               | 
               | Here's some advice: stop throwing your web code into
               | Electron, and start using a cross-platform GUI toolkit.
               | Use local files and/or sqlite databases for storage, and
               | then sync to the cloud in the background. Voila, non-shit
               | applications that stop wasting everybody's effing time.
               | 
               | If your only tool is a hammer, something, something,
               | nails...
        
             | eadmund wrote:
             | > Eeeh, the Electron issue is oveblown.
             | 
             | > These days the biggest hog of memory is the browser.
             | 
             | That's the problem: Electron is another browser instance.
             | 
             | > I don't think a few hundred megs taken by your average
             | Electron app is gonna move the needle.
             | 
             | Low-end machines even in 2025 still come with single-digit
             | GB RAM sizes. A few hundred MB is a substantial portion of
             | an 8GB RAM bank.
             | 
             | Especially when it's just waste.
        
               | p0w3n3d wrote:
               | And this company that says: let's push to the users the
               | installer of our brand new app, that will reside in their
               | tray, which we have made in electron. Poof. 400MB taken
               | for a tray notifier that also accidentally adds a browser
               | to the memory
               | 
               | My computer: starts 5 seconds slower
               | 
               | 1mln of computers in the world: start cumulatively 5mln
               | seconds slower
               | 
               | Meanwhile a Microsoft programmer whose postgres via ssh
               | starts 500ms slower: "I think this is a rootkit installed
               | in ssh"
        
             | Dalewyn wrote:
             | >otherwise what you have are silly bookmarks.
             | 
             | My literal _several hundreds_ of tabs are silly bookmarks
             | in practice.
        
           | ack_complete wrote:
           | It's so annoying when that line is used to defend
           | applications with poor memory usage, ignoring the fact that
           | all modern OSes already put unallocated memory to use for
           | caching.
           | 
           | "Task Manager doesn't report memory usage correctly" is
           | another B.S. excuse heard on Windows. It's actually true, but
           | the other way around -- Task Manager _underreports_ the
           | memory usage of most programs.
        
         | TonyTrapp wrote:
         | What you are probably thinking of is "race to idle". A CPU
         | should process everything it can, as quickly it can (using all
         | the power), and then go to an idle state, instead of processing
         | everything slowly (potentially consuming less energy at that
         | time) but take more time.
        
         | j16sdiz wrote:
         | > computer architecture courses.
         | 
         | I guess it was some _theoretical_ task scheduling stuff....
         | When you are doing task scheduling, yes, maybe, depends on what
         | you optimize for.
         | 
         | .... but this bug have nothing to do with that. This bug is
         | about some accounting error.
        
       | g-b-r wrote:
       | I expected it to be about holding down the spacebar :/
        
         | labster wrote:
         | Spacebar heating was great for my workflow, please re-enable
        
           | smidgeon wrote:
           | For the confused: https://www.xkcd.com/1172/
        
         | lohfu wrote:
         | He must running version 10.17 or newer
        
         | g-b-r wrote:
         | Not to argue, but I don't understand why someone downvoted it
        
       | sneela wrote:
       | This is a wonderful write-up and a very enjoyable read. Although
       | my knowledge about systems programming on ARM is limited, I know
       | that it isn't easy to read hardware-based time counters; at the
       | very least, it's not as simple as the x86 rdtsc [1]. This is
       | probably why the author writes:
       | 
       | > This code is more complicated than what I expected to see. I
       | was thinking it would just be a simple register read. Instead, it
       | has to write a 1 to the register, and then delay for a while, and
       | then read back the same register. There was also a very
       | noticeable FIXME in the comment for the function, which
       | definitely raised a red flag in my mind.
       | 
       | Regardless, this was a very nice read and I'm glad they got down
       | to the issue and the problem fixed.
       | 
       | [1]: https://www.felixcloutier.com/x86/rdtsc.
        
         | pm215 wrote:
         | Bear in mind that the blog post is about a 32 bit SoC that's
         | over a decade old, and the timer it is reading is specific to
         | that CPU implementation. In the intervening time both timers
         | and performance counters have been architecturally
         | standardised, so on a modern CPU there is a register roughly
         | equivalent to the one x86 rdtsc uses and which you can just
         | read; and kernels can use the generic timer code for timers and
         | don't need to have board specific functions to do it.
         | 
         | But yeah, nice writeup of the kinds of problem you can run into
         | in embedded systems programming.
        
       | InsomniacL wrote:
       | > Chumby's kernel did a total of 5 reads of the CVWR register.
       | The other two kernels did a total of 3 reads.
       | 
       | > I opted to use 4 as a middle ground
       | 
       | reminded me of xkcd: Standards
       | 
       | https://xkcd.com/927/
        
       | thrdbndndn wrote:
       | I don't get the fix.
       | 
       | Why reading it multiple times will fix the issue?
       | 
       | Is it just because reading takes time, therefore reading multiple
       | time makes the needed time from writing to reading passes? If so,
       | it sounds like a worse solution than just extending waiting delay
       | longer like the author did initially.
       | 
       | If not, then I would like to know the reason.
       | 
       | (Needless to say, a great article!)
        
         | rep_lodsb wrote:
         | It's possible that actually reading the register takes
         | (significantly) more time than an empty countdown loop. A
         | somewhat extreme example of that would be on x86, where
         | accessing legacy I/O ports for e.g. the timer goes through a
         | much lower-clocked emulated ISA bus.
         | 
         | However, a more likely explanation is the use of "volatile"
         | (which only appears in the working version of the code).
         | Without it, the compiler might even have completely removed the
         | loop?
        
           | deng wrote:
           | > However, a more likely explanation is the use of "volatile"
           | (which only appears in the working version of the code).
           | Without it, the compiler might even have completely removed
           | the loop?
           | 
           | No, because the loop calls cpu_relax(), which is a compiler
           | barrier. It cannot be optimized away.
           | 
           | And yes, reading via the memory bus is much, much slower than
           | a barrier. It's absolutely likely that reading 4 times from
           | main memory on such an old embedded system takes several
           | hundred cycles.
        
             | rep_lodsb wrote:
             | You're right, didn't account for that. Though even when
             | declared volatile, the counter variable would be on the
             | stack, and thus already in the CPU cache (at least 32K
             | according to the datasheet)?
             | 
             | Looking at the assembly code for both versions of this
             | delay loop might clear it up.
        
               | deng wrote:
               | The only thing volatile does is to assure that the value
               | is read from memory each time (which implicitly also
               | forbids optimizations). Whether that memory is in a CPU
               | cache is purely a hardware issue and outside the C
               | specification. If you read something like a hardware
               | register, you yourself need to take care in some way that
               | a hardware cache will not give you old values (by mapping
               | it into a non-cached memory area, or by forcing a cache
               | update). If you for-loop over something that acts as a
               | compiler barrier, all that 'volatile' on the counter
               | variable will do is potentially make the for-loop slower.
               | 
               | There's really just very few reasons to ever use
               | 'volatile'. In fact, the Linux kernel even has its own
               | documentation why you should usually not use it:
               | 
               | https://www.kernel.org/doc/html/latest/process/volatile-
               | cons...
        
               | sim7c00 wrote:
               | doesnt volatile also ensure the address is not changed
               | for the read by compiler (as it might optimise data
               | layout otherwise)? (so you can be sure when using mmio
               | etc. it wont read from wrong place?)
        
               | deng wrote:
               | "volatile", according to the standard, simply is: "An
               | object that has volatile-qualified type may be modified
               | in ways unknown to the implementation or have other
               | unknown side effects. Therefore any expression referring
               | to such an object shall be evaluated strictly according
               | to the rules of the abstract machine."
               | 
               | Or simpler: don't assume anything what you think you
               | might know about this object, just do as you're told.
               | 
               | And yes, that for instance prohibits putting a value from
               | a memory address into a register for further use, which
               | would be a simple case of data optimization. Instead, a
               | fresh retrieval from memory must be done on each access.
               | 
               | However, if your system has caching or an MMU is outside
               | of the spec. The compiler does not care. If you tell the
               | compiler to give you the byte at address 0x1000, it will
               | do so. 'volatile' just forbids the compiler to deduce the
               | value from already available knowledge. If a hardware
               | cache or MMU messes with that, that's your problem, not
               | the compiler's.
        
             | Karliss wrote:
             | From what I understand the timer registers should be on
             | APB(1) bus which operates at fixed 26MHz clock. That should
             | be much closer to the scale of fast timer clocks compared
             | to cpu_relax() and main CPU clock running somewhere in the
             | range of 0.5-1GHz and potentially doing some dynamic
             | frequency scaling for power saving purpose.
             | 
             | The silliest part of this mess is that 26Mhz clock for APB1
             | bus is derived from the same source as 13Mhz, 6.5Mhz
             | 3.25Mhz, 1Mhz clocks usable by fast timers.
        
         | deng wrote:
         | > Is it just because reading takes time, therefore reading
         | multiple time makes the needed time from writing to reading
         | passes?
         | 
         | Yes.
         | 
         | > If so, it sounds like a worse solution than just extending
         | waiting delay longer like the author did initially.
         | 
         | Yeah, it's a judgement call. Previously, the code called
         | cpu_relax() for waiting, which is also dependent on how this is
         | defined (can be simply NOP or barrier(), for instance). The
         | reading of the timer register maybe has the advantage that it
         | is dependent on the actual memory bus speed, but I wouldn't
         | know for sure. Hardware at that level is just messy, and
         | especially niche platforms have their fair share of bugs where
         | you need to do ugly workarounds like these.
         | 
         | What I'm rather wondering is why they didn't try the other
         | solution that was mentioned by the manufacturer: reading the
         | timer directly two times and compare it, until you get a stable
         | output.
        
         | adrian_b wrote:
         | The article says that the buggy timer has 2 different methods
         | for reading.
         | 
         | When reading directly, the value may be completely wrong,
         | because the timer is incremented continuously and the updating
         | of its bits is not synchronous with the reading signal.
         | Therefore any bit in the value that is read may be wrong,
         | because it has been read exactly during a transition between
         | valid values.
         | 
         | The workaround in this case is to read multiple times and
         | accept as good a value that is approximately the same for
         | multiple reads. The more significant bits of the timer value
         | change much less frequently than the least significant bits, so
         | at most attempts of reading, only a few bits can be wrong. Only
         | seldom the read value can be complete garbage, when comparing
         | it with the other read values will reject it.
         | 
         | The second reading method was to use a separate capture
         | register. After giving a timer capture command, reading an
         | unchanging value from the capture register should have caused
         | no problems. Except that in this buggy timer, it is
         | unpredictable when the capture is actually completed. This
         | requires the insertion of an empirically determined delay time
         | before reading the capture register, hopefully allowing enough
         | time for the capture to be complete.
        
           | Dylan16807 wrote:
           | > The workaround in this case is to read multiple times and
           | accept as good a value that is approximately the same for
           | multiple reads.
           | 
           | It's only incrementing at 3.25MHz, right? Shouldn't you be
           | able to get exactly the same value for multiple reads? That
           | seems both simpler and faster than using this very slow
           | capture register, but maybe I'm missing something.
        
         | dougg3 wrote:
         | Author here. Thanks! I believe the register reads are just
         | extending the delay, although the new approach does have a side
         | effect of reading from the hardware multiple times. I don't
         | think the multiple reads really matter though.
         | 
         | I went with the multiple reads because that's what Marvell's
         | own kernel fork does. My reasoning was that people have been
         | using their fork, not only on the PXA168, but on the newer
         | PXAxxxx series, so it would be best to retain Marvell's
         | approach. I could have just increased the delay loop, but I
         | didn't have any way of knowing if the delay I chose would be
         | correct on newer PXAxxx models as well, like the chip used in
         | the OLPC. Really wish they had more/better documentation!
        
         | mastax wrote:
         | Karliss above found docs which mention:
         | 
         | > This request requires up to three timer clock cycles. If the
         | selected timer is working at slow clock, the request could take
         | longer.
         | 
         | Let's ignore the weirdly ambiguous second sentence and say for
         | pedagogical purposes it takes up to three timer clock cycles
         | full stop. Timer clock cycles aren't CPU clock cycles, so we
         | can't just do `nop; nop; nop;`. How do we wait three timer
         | clock cycles? Well a timer register read is handled by the
         | timer peripheral which runs at the timer clock, so reading (or
         | writing) a timer register will take until at least the end of
         | the next timer clock.
         | 
         | This is a very common pattern when dealing with memory mapped
         | peripheral registers.
         | 
         | ---
         | 
         | I'm making some reasonable assumptions about how the clock
         | peripheral works. I haven't actually dug into the Marvell
         | documentation.
        
       | TrickyReturn wrote:
       | Probably running Slack...
        
       | rbanffy wrote:
       | In the late 1990's I worked in a company that had a couple
       | mainframes in their fleet and once I looked into a resource usage
       | screen (Omegamon, perhaps? Is it that old?) and noticed the CPU
       | was pegged at 100%. I asked the operator if that was normal. His
       | answer was "Of course. We paid for that CPU, might as well use
       | it". Funny though that mainframes are designed for that - most,
       | if not all, non-application work is offloaded to other processors
       | in the system so that the CPU can run applications as fast as it
       | can.
        
         | defrost wrote:
         | Having a number of running processes take the CPU usage to 100%
         | is one thing, have an under utilised CPU with almost no
         | processes running _report_ that usage is at 100% is another
         | thing, the subject of the article here.
        
           | rbanffy wrote:
           | I didn't intend this as an example of the issue the article
           | mentions (a misreporting of usage because of a hardware
           | design issue). It was just a fun example of how different
           | hardware behaves differently.
           | 
           | One can also say Omegamon (or whatever tool) was
           | misreporting, because it didn't account for the processor
           | time of the various supporting systems that dealt with
           | peripheral operations. After all, they also paid for the disk
           | controllers, disks, tape drives, terminal controllers and so
           | on, so they could want to drive those to close to 100% as
           | well.
        
             | defrost wrote:
             | Sure, no drama - I came across as a little dry and clipped
             | as I was clarifying on the fly as it were.
             | 
             | I had my time squeezing the last cycle possible from a
             | Cyber 205 waaaay back in the day.
        
           | datadrivenangel wrote:
           | Some mainframes have the ability to lock clock speed and
           | always run at exactly 100%, so you can often have hard
           | guarantees about program latency and performance.
        
       | WediBlino wrote:
       | An old manager of mine once spent the day trying to kill a
       | process that was running at 99% on Windows box.
       | 
       | When I finally got round to see what he was doing I was
       | disappointed to find he was attempting to kill the 'system idle'
       | process.
        
         | belter wrote:
         | Did he have a pointy hair?
        
         | cassepipe wrote:
         | I abandonned Windows 8 for linux because of an bug (?) where my
         | HDD was showing it was 99% busy all the time. I had removed
         | every startup program that could be and analysed thouroughly
         | for any viruses, to no avail. Had no debugging skills at the
         | time and wasn't sure the hardware could stand windows 10.
         | That's how linux got me.
        
           | saintfire wrote:
           | I had this happen with an nvme drive. Tried changing just
           | about every setting that affected the slot.
           | 
           | Everything worked fine on my Linux install ootb
        
           | margana wrote:
           | Why is this such a huge issue if it merely shows it's busy,
           | but the performance of it indicates that it actually isn't?
           | Switching to Linux can be a good choice for a lot of people,
           | the reason just seems a bit odd here. Maybe it was simply the
           | straw that broke the camel's back.
        
             | RHSeeger wrote:
             | 1. I expect that a HD that is actually doing things 100% of
             | the time is going to have it's lifespan significantly
             | reduce, and
             | 
             | 2. If it isn't doing anything and it just lying to you...
             | when there IS a problem, your tools to diagnose the problem
             | are limited because you can't trust what they're telling
             | you
        
             | ddingus wrote:
             | Over the years I have used top and friends to profile
             | machines and identify expensive bottlenecks. Once one comes
             | to count on those tools, the idea of one being wrong, and
             | actually really wrong! --is just a bad rub.
             | 
             | Fixing it would be gratifying and reassuring too.
        
           | ryandrake wrote:
           | Recent Linux distributions are quickly catching up to Windows
           | and macOS. Do a fresh install of your favorite distribution
           | and then use 'ps' to look at what's running. Dozens of
           | processes doing who knows what? They're probably not pegging
           | your CPU at 100%, which is good, but it seems that gone are
           | the days when you could turn on your computer and it was
           | truly idle until you commanded it to actually do something.
           | That's a special use case now, I suppose.
        
             | ndriscoll wrote:
             | IME on Linux the only things that use random CPU while idle
             | are web browsers. Otherwise, there's dbus and
             | NetworkManager and bluez and oomd and stuff, but most
             | processes have a fraction of a second used CPU over months.
             | If they're not using CPU, they'll presumably swap out if
             | needed, so they're using ~nothing.
        
             | johnmaguire wrote:
             | this is why I use arch btw
        
               | rirze wrote:
               | this guy arches
        
               | diggan wrote:
               | Add Gnome3 and you can have that too! Source: me, a
               | arch+gnome user, who recently had to turn off the search
               | indexer as it was stuck processing countless multi-GB
               | binary files...
        
               | johnisgood wrote:
               | Exactly, or Void, or Alpine, but I love pacman.
        
             | craftkiller wrote:
             | This is one the reasons I love FreeBSD. You boot up a fresh
             | install of FreeBSD and there are only a couple processes
             | running and I know what each of them does / why they are
             | there.
        
             | m3047 wrote:
             | At least under some circumstances Linux shows (schedulable)
             | threads as separate processes. Just be aware of that.
        
           | BizarroLand wrote:
           | Windows 8/8.1/10 had an issue for a while where when it was
           | run on spinning rust HDD it would peg it out and slow the
           | system to a crawl.
           | 
           | The only solution was to swap over to a SSD.
        
         | m463 wrote:
         | That's what managers do.
         | 
         | Silly idle process.
         | 
         | If you've got time for leanin', you've got time for cleanin'
        
         | marcosdumay wrote:
         | Windows used to have that habit of making the processes CPU
         | starved, and yet claiming the CPU was idle all the time.
         | 
         | Since the Microsoft response to the bug was denying and
         | gaslighting the affected people, we can't tell for sure what
         | caused it. But several people were in a situation where their
         | computer couldn't finish any work, and the task-manager claimed
         | all of the CPU time was spent on that line item.
        
           | gruez wrote:
           | I've never heard of this. How do you know it's windows
           | "gaslighting" users, and not something dumb like thermal
           | throttling or page faults?
        
             | belter wrote:
             | Well this is one possible scenario. Power management....
             | 
             | "Windows 10 Task Manager shows 100% CPU but Performance
             | Monitor Shows less than 2%" -
             | https://answers.microsoft.com/en-
             | us/windows/forum/all/window...
        
             | marcosdumay wrote:
             | It's gaslighting because it consists on people from
             | Microsoft explicitly saying that it is impossible, it's not
             | how Windows behave, and the user's system is idle instead
             | of overloaded.
             | 
             | Gaslighting customers was the standard Microsoft's reaction
             | to bugs until at least 2007, when I last oversaw somebody
             | interacting with them.
        
           | RajT88 wrote:
           | > Since the Microsoft response to the bug was denying and
           | gaslighting the affected people
           | 
           | Well. I wouldn't go that far. Any busy dev team is
           | incentivized to make you run the gauntlet:
           | 
           | 1. It's not an issue (you have to prove to me it's an issue)
           | 
           | 2. It's not _my_ issue (you have to prove to me it 's my
           | issue)
           | 
           | 3. It's not that important (you have to prove it has
           | significant business value to fix it)
           | 
           | 4. It's not that time sensitive (you have to prove it's worth
           | fixing soon)
           | 
           | It was exactly like this at my last few companies. Microsoft
           | is quite a lot like this as well.
           | 
           | If you have an assigned CSAM, they can help run the gauntlet.
           | That's what they are there for.
           | 
           | See also: The 6 stages of developer realization:
           | 
           | https://www.amazon.com/Panvola-Debugging-Computer-
           | Programmer...
        
             | ziddoap wrote:
             | > _If you have an assigned CSAM_
             | 
             | That's an unfortunate acronym. I assume you mean Customer
             | Service Account Manager.
        
               | RajT88 wrote:
               | Customer Success Account Manager. And I would agree - it
               | is very unfortunate.
               | 
               | Definitely in my top 5 questionable acronym choices from
               | MSFT.
        
             | thatfunkymunki wrote:
             | Your reticence to accept the term gaslighting clearly
             | indicates you've never had to interact with MSFT support.
        
               | RajT88 wrote:
               | On the contrary, I have spent thousands of hours
               | interacting with MSFT support.
               | 
               | What I'm getting at with my post is the dev teams support
               | has to talk to, which they just forward along their
               | responses verbatim.
               | 
               | A lot of MSFT support does suck. There are also some
               | really amazing engineers in the support org.
               | 
               | I did my time in support early in my career (not at
               | MSFT), and so I understand well it's extremely hard to
               | hire good support engineers, and even harder to keep
               | them. The skills they learn on the job makes them
               | attractive to other parts of the org, and they get
               | poached.
               | 
               | There is also an industry-wide tendency for developers to
               | treat support as a bunch of knuckle-dragging idiots, but
               | at the same time they don't arm them with detailed
               | information on _how stuff works_.
        
               | RHSeeger wrote:
               | > What I'm getting at with my post is the dev teams
               | support has to talk to, which they just forward along
               | their responses verbatim.
               | 
               | But the "support" that the end user sees is that
               | combination, not two different teams (even if they know
               | it's two or more different teams). The point is that the
               | end user reached out for help and was told their own
               | experiences weren't true. The fact that Dave had Doug
               | actually tell them that is irrelevant.
        
               | RajT88 wrote:
               | I guess I see your point.
               | 
               | If we're going to call it gaslighting, then gaslighting
               | is typical dev team behavior, which of course flows back
               | down to support. It's a problem with Microsoft just like
               | it is a problem for any other company which makes
               | software.
        
               | marcosdumay wrote:
               | I've never seen the same behavior from any other software
               | supplier.
               | 
               | Almost every software company out there will jump into
               | their customers complaints, and try to fix the issue even
               | when the root cause is not on their software.
        
               | RajT88 wrote:
               | I can't say I've seen it with every vendor. Or even
               | internal dev team I've been an internal customer of - but
               | I've seen it around a lot.
               | 
               | You might be lucky in that you've worked at companies
               | where you are a big enough customer they bend over
               | backwards for you. For example: If you work for Wal-Mart,
               | you probably get this less often. They are usually the
               | biggest fish in whatever pond they are swimming in.
        
             | Twirrim wrote:
             | Even when you have an expensive contract with Microsoft and
             | a direct account manager to help you run the gauntlet you
             | _still_ end up having to deal with awful support people.
             | 
             | Years ago at a job we were seeing issues with a network
             | card on a VM. One of my coworkers spent 2-3 days working
             | his way through support engineer after support engineer
             | until they got into a call with one. He talked the engineer
             | through what was happening. Remote VM, can only access over
             | RDP (well, we could VNC too, but that idea just confuses
             | Microsoft support people for some reason.)
             | 
             | The support engineer decided that the way to resolve the
             | problem was to uninstall and re-install the network card
             | driver. Coworker decided to give the support engineer
             | enough rope to hang themselves with, hoping it'd help him
             | escalate faster: "Won't that break the RDP connection?" "No
             | sir, I've done this many times before, trust me" "Okay
             | then...."
             | 
             | Unsurprisingly enough, when you uninstall the network card
             | driver and cause the instance to have no network cards, RDP
             | stops working. Go figure.
             | 
             | Co-worker let the support engineer know that he'd now lost
             | access, and a guess why. "Oh, yeah. I can see why that
             | might have been a problem"
             | 
             | Co-worker was right though, it did finally let us escalate
             | further up the chain....
        
           | nerdile wrote:
           | As a former Windows OS engineer, based on the short statement
           | here, my assumption would be that your programs are IO-bound,
           | not CPU-bound, and that the next step would be to gather data
           | (using a profiler) to investigate the bottlenecks. This is
           | something any Win32 developer should learn how to do.
           | 
           | Although I can understand how "Please provide data to
           | demonstrate that this is an OS scheduling issue since app
           | bottlenecks are much more likely in our experience" could
           | come across as "denying and gaslighting" to less experienced
           | engineers and layfolk
        
         | Twirrim wrote:
         | Years ago I worked for a company that provided managed hosting
         | services. That included some level of alarm watching for
         | customers.
         | 
         | We used to rotate the "person of contact" (POC) each shift, and
         | they were responsible for reaching out to customers, and doing
         | initial ticket triage.
         | 
         | One customer kept having a CPU usage alarm go off on their
         | Windows instances not long after midnight. The overnight POC
         | reached out to the customer to let them know that they had
         | investigated and noticed that "system idle processes" were
         | taking up 99% of CPU time and the customer should probably
         | investigate, and then closed the ticket.
         | 
         | I saw the ticket within a minute or two of it reopening as the
         | customer responded with a barely diplomatic message to the tune
         | of "WTF". I picked up that ticket, and within 2 minutes had
         | figured out the high CPU alarm was being caused by the backup
         | service we provided, apologised to the customer and had that
         | ticket closed... but not before someone not in the team saw the
         | ticket and started sharing it around.
         | 
         | I would love to say that particular support staff never lived
         | that incident down, but sadly that particular incident was par
         | for the course with them, and the team spent inordinate amount
         | of time doing damage control with customers.
        
           | panarky wrote:
           | In the 90s I worked for a retail chain where the CIO proposed
           | to spend millions to upgrade the point-of-sale hardware. The
           | old hardware was only a year old, but the CPU was pegged at
           | 100% on every device and scanning barcodes was very sluggish.
           | 
           | He justified the capex by saying if cashiers could scan
           | products faster, customers would spend less time in line and
           | sales would go up.
           | 
           | A little digging showed that the CIO wrote the point-of-sale
           | software himself in an ancient version of Visual Basic.
           | 
           | I didn't know VB, but it didn't take long to find the loops
           | that do nothing except count to large numbers to soak up CPU
           | cycles since VB didn't have a sleep() function.
        
             | jimt1234 wrote:
             | That's hilarious. I had a similar situation, also back in
             | the 90s, when a developer shipped some code that kept
             | pegging the CPU on a production server. He insisted it was
             | the server, and the company should spend $$$ on a new one
             | to fix the problem. We went back-and-forth for a while: his
             | code was crap versus the server hardware was inadequate,
             | and I was losing the battle, because I was just a lowly
             | sysadmin, while he was a great software engineer. Also, it
             | was Java code, and back then, Java was kinda new, and
             | everyone thought it could do no wrong. I wasn't a developer
             | at all back then, but I decided to take a quick look at his
             | code. It was basically this:
             | 
             | 1. take input from a web form
             | 
             | 2. do an expensive database lookup
             | 
             | 3. do an expensive network request, wait for response
             | 
             | 4. do another expensive network request, wait for response
             | 
             | 5. and, of course, another expensive network request, wait
             | for response
             | 
             | 6. fuck it, another expensive network request, wait for
             | response
             | 
             | 7. a couple more database lookups for customer data
             | 
             | 8. store the data in a table
             | 
             | 9. store the same data in another table. and, of course,
             | another one.
             | 
             | 10. now, check to see if the form was submitted with valid
             | data. if not, repeat all steps above to back-out the data
             | from where it was written.
             | 
             | 11. finally, check to see if the customer is a valid/paying
             | customer. if not, once again, repeat all the steps above to
             | back-out the data.
             | 
             | I looked at the logs, and something like 90% of the
             | requests were invalid data from the web form or
             | invalid/non-paying customers (this service was provided
             | only to paying customers).
             | 
             | I was so upset from this dude convincing management that my
             | server was the problem that I sent an email to pretty much
             | everyone that said, basically, "This code sucks. Here's the
             | problem: check for invalid data/customers first.", and I
             | included a snippet from the code. The dude replied-to-all
             | immediately, claiming I didn't know anything about Java
             | code, and I should stay in my lane. Well, throughout the
             | day, other emails started to trickle in, saying, "Yeah, the
             | code is the problem here. Please fit it ASAP." The dude was
             | so upset that he just left, he went completely AWOL, he
             | didn't show up to work for a week or so. We were all
             | worried, like he jumped off a bridge or something. It
             | turned into an HR incident. When he finally returned, he
             | complained to HR that I stabbed him in the back, that he
             | couldn't work with me because I was so rude. I didn't
             | really care; I was a kid. Oh yeah, his nickname became AWOL
             | Wang. LOL
        
               | eludwig wrote:
               | Hehe, being a Java dev since the late 90's meant seeing a
               | lot of bad code. My favorite was when I was working for a
               | large life insurance company.
               | 
               | The company's customer-facing website was servlet based.
               | The main servlet was performing horribly, time outs,
               | spinners, errors etc. Our team looked at the code and
               | found that the original team implementing the logic had a
               | problem they couldn't figure out how to solve, so they
               | decided to apply the big hammer: they synchronized the
               | doService() method... oh dear...
        
               | foobazgt wrote:
               | For those not familiar with servlets, this means
               | serializing every single request to the server that hits
               | that servlet. And a single servlet can serve many
               | different pages. In fact, in the early days, servlet
               | filters didn't exist, so you would often implement cross-
               | cutting functionality like authentication using a
               | servlet.
               | 
               | TBF, I don't think a lot of developers at the time (90's)
               | were used to the idea of having to write MT-safe callback
               | code. Nowadays thousands of object allocations per second
               | is nothing to sweat over, so a framework might make a
               | different decision to instantiate callbacks per request
               | by default.
        
         | nullhole wrote:
         | To be fair, it is a really poorly named "process". The computer
         | equivalent of the "everything's ok" alarm.
        
           | chowells wrote:
           | Long enough ago (win95 era) it wasn't part of Windows to
           | sleep the CPU when there was no work to be done. It always
           | assigned some task to the CPU. The system idle process was a
           | way to do this that played nicely with all of the other
           | process management systems. I don't remember when they
           | finally added CPU power management. SP3? Win98? Win98SE? Eh,
           | it was somewhere in there.
        
             | drsopp wrote:
             | I remember listening on FM radio to my 100MHz computer
             | running FreeBSD, which sounded like calm rain, and to
             | Windows 95, which sounded like a screaming monster.
        
         | fifilura wrote:
         | To be fair, there are worse mistakes. It does say 99% CPU.
        
         | Agentus wrote:
         | reminds of when i was a kid and noticed a virus had taken over
         | a registry. from that point forward i attempted to delete every
         | single registry file, not quite understanding. Between that and
         | excessive bad website viewing, I dunno how i ever managed to
         | not brick my operating system, unlike my grandma who seemed to
         | brick her desktop in a timely fashion before each of the many
         | monthly visits to her place.
        
           | bornfreddy wrote:
           | The things grandmas do to see their grandsons regularly.
           | Smart. :-)
        
         | mrmuagi wrote:
         | I wonder if you make a process with idle in it you could end up
         | in the reverse track where users ignore it. Is there anything
         | preventing an executable being named System Idle.
        
         | jsight wrote:
         | I worked at a government site with a government machine at one
         | time. I had an issue, so I took it to the IT desk. They were
         | able to get that sorted, but then said I had another issue.
         | "Your CPU is running at 100% all the time, because some sort of
         | unkillable process is consuming all your cpu".
         | 
         | Yep, that was "System Idle" that was doing it. They had the
         | best people.
        
         | kernal wrote:
         | You're keeping us in suspense. Did he ever manage to kill the
         | System Idle process?
        
       | a1o wrote:
       | This was very well written, I somehow read every single line and
       | didn't skip to the end. Great work too!
        
       | amelius wrote:
       | To diagnose, why not run "time top" and look at the user and sys
       | outputs?
        
       | RajT88 wrote:
       | TIL there are still Chumby's alive in the wild. My Insignia
       | Chumby 8 didn't last.
        
       | evanjrowley wrote:
       | This headline reminded me of Mumptris, an implementation of
       | Tetris in the old mainframe-oriented language MUMPS, which by
       | design, uses 100% CPU to reduce latency:
       | https://news.ycombinator.com/item?id=4085593
        
       | Suppafly wrote:
       | Isn't this one of those problems that switching to linux is
       | supposed to fix?
        
         | DougN7 wrote:
         | He's on linux
        
           | Suppafly wrote:
           | Exactly, that's the joke. If it had been an issue on Windows
           | the default response from folks here would be to switch to
           | Linux instead of trying to get to the root of the issue.
           | Guess I should have included an /s on my comment.
        
       | NotYourLawyer wrote:
       | That's an awful lot of effort to deal with an issue that was
       | basically just cosmetic. I suspect at some point the author was
       | just nerd sniped though.
        
         | dougg3 wrote:
         | To be fair, other non-cosmetic stuff uses the CPU percentage.
         | This same bug was preventing fast user suspend on the OLPC
         | until they worked around it. It was also a fun challenge.
        
       | ndesaulniers wrote:
       | Great read! Eerily similar to some bugs I've had, but the root
       | cause has been a compiler bug. Debugging a kernel that doesn't
       | boot is... interesting. QEMU+GDB to the rescue.
        
       | dmitrygr wrote:
       | Curiously, instead of "set capture reg, wait for clock edge,
       | read", the "read reg twice, until same result is obtained"
       | approach is ignored. This is strange as it is usually much faster
       | - reading a 3.25MHz counter at 200MHz+ twice is very likely to
       | see the same value twice. For a 32KHz counter, it is basically
       | guaranteed.                  u32 val;        do {            val
       | = readl(...);        } while (val != readl(...));
       | return val;
       | 
       | compiles to a nice 6-instr little function on arm/thumb too, with
       | no delays                  readclock:          LDR  R2, =...
       | 1:          LDR  R0, [R2]          LDR  R1, [R2]          CMP
       | R0, R1          BNE  1b          BX   LR
        
       | markhahn wrote:
       | very nice investigation.
       | 
       | shame about the unnecessary use of cat :)
        
       | askvictor wrote:
       | My recurring issue (on a variety of laptops, both Linux and
       | Windows): the fans will start going full-blast, everything slows
       | down, then as soon as I open a task manager CPU usage drops from
       | 100% to something negligible.
        
         | crazydoggers wrote:
         | You my friend, most likely have mining malware on your systems.
         | They'll shutdown when they detect task manager is opened so you
         | don't notice them.
        
       ___________________________________________________________________
       (page generated 2025-01-13 23:00 UTC)