[HN Gopher] How quickly do CPUs change clock speeds?
       ___________________________________________________________________
        
       How quickly do CPUs change clock speeds?
        
       Author : zdw
       Score  : 42 points
       Date   : 2022-09-16 00:04 UTC (22 hours ago)
        
 (HTM) web link (chipsandcheese.com)
 (TXT) w3m dump (chipsandcheese.com)
        
       | LarsAlereon wrote:
       | It's interesting how the HP OEM system didn't have SpeedShift
       | working, which is absolutely critical to system responsiveness. I
       | would like to see if OEMs have this sorted out on modern systems
       | or if there's still brand-to-brand variation. Semi-related, but
       | there can be a surprising amount of performance variation between
       | motherboards that can be difficult to benchmark. DPC latency can
       | vary between <50us to ~250us board-to-board:
       | https://www.anandtech.com/show/16973/the-asrock-x570s-pg-rip...
        
       | grapescheesee wrote:
       | I think it's great the author made an account today on HN, and is
       | replying to questions from his post. This is one if the best
       | parts of the community here.
       | 
       | I would love to see some test done on newer CPUs, and also if any
       | BIOS tweaks can be used to dramatically cut the time frame.
       | 
       | It could also be interesting if you logged the PSU from the wall
       | or the AC to DC adapter as a metric. It wouldn't be that
       | comparable system to system. Yet it might show more linear draw
       | vs exponential.
        
         | clamchowder wrote:
         | > I think it's great the author made an account today on HN,
         | and is replying to questions from his post. This is one if the
         | best parts of the community here.
         | 
         | :)
         | 
         | > I would love to see some test done on newer CPUs
         | 
         | So, the site is a free time thing run by several people, all of
         | which are either students or have other full time jobs (not
         | full time reviewers with samples of the latest hardware). No
         | one happened to have an ICL/TGL/ADL CPU available. I do have an
         | ADL result from earlier today showing full boost after 0.5 ms,
         | but that person also overclocked and may have used a static
         | voltage. Maybe we can get an addenum out with more results but
         | I wouldn't bet on it.
         | 
         | And yeah probably. In an extreme scenario you could just lock
         | the CPU to its highest clock. Then you'll never see a
         | transition period at all.
         | 
         | > logged the PSU I don't have a power meter or PSU capable of
         | logging
        
           | jeffbee wrote:
           | Big fan of your site. Travis Downs has done some work on
           | characterizing hardware transition latency. I believe his
           | tools are on GitHub.
           | https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html
        
             | clamchowder wrote:
             | Yeah, he has some pretty interesting stuff.
             | 
             | But AVX512 transition latency is pretty different from
             | clocking up from idle. The CPU is already running at full
             | tilt, and is making a relatively small frequency/voltage
             | change. You can see from his writeup that it's well under a
             | millisecond. I believe newer Intel CPUs have done away with
             | AVX-512 downclocking completely.
             | 
             | I probably won't be able to look into that specific issue
             | since I don't have any AVX-512 capable CPUs.
        
       | johntb86 wrote:
       | > Sadly, Qualcomm was very bad at implementing caches, but that's
       | a story for another day.
       | 
       | Anyone know what that story is?
        
         | clamchowder wrote:
         | Big cores each have a 512 KB L2 cache with a latency of around
         | 24 cycles, while the little ones each have a 256 KB L2 with ~23
         | cycle latency. It's really terrible considering the low
         | 2.34/1.6 GHz clock speeds. Then there's no multi-megabyte last
         | level cache, so it's going out to memory a lot.
         | 
         | They do have 3 cycle L1D latency, but again that's at low
         | clocks, and it's a 24 KB L1D so it's 25% smaller than most L1Ds
         | we see today. Not great considering you're eating 20+ cycle L2
         | latency once you miss the L1D, unlike most CPUs today that have
         | 12 cycle latency for 256 or 512 KB L2s.
        
       | theevilsharpie wrote:
       | This article isn't accurate, in the sense that it's actually
       | measuring how long the operating system takes to reach the CPU's
       | maximum clock speed, rather than how long it physically takes the
       | CPU to reach that speed.
       | 
       | Modern processors can switch frequencies very fast -- generally
       | within microseconds. An example from the machine I'm currently
       | using:                   $ grep 'model name' /proc/cpuinfo |
       | uniq; sudo cpupower frequency-info -y -m              model name
       | : Intel(R) Core(TM) i5-4310U CPU @ 2.00GHz         analyzing CPU
       | 0:           maximum transition latency: 20.0 us
       | 
       | Even very deep CPU idle states can be exited in <1 ms.
       | 
       | With respect to the operating system, the amount of time it takes
       | to reach maximum frequency from idle depends on:
       | 
       | - How frequently the OS checks to see if the frequency should be
       | increased
       | 
       | - Whether the OS will step through increasing frequencies, or go
       | straight to the max frequency
       | 
       | - If the OS is stepping through increasing frequencies, how many
       | it needs to step through
       | 
       | - Whether the core is already active, or if it needs to be
       | awakened from a sleep state first
       | 
       | It looks like the OP is using Windows. Windows has a number of
       | hidden (and poorly documented) tunables that control the above
       | settings, and would have a significant impact on how quickly a
       | CPU can reach max frequency. Microsoft used to have an extensive
       | document (specifically, a Word document for Windows Server 2008)
       | describing how to tune these parameters using the `powercfg` CLI
       | tool and provided a detailed reference for each parameter, which
       | I unfortunately can't find anymore. It looks like Microsoft's
       | modern server performance tuning documentation describes a subset
       | of these parameters.[1]
       | 
       | Linux has similar tunables for its `ondemand` CPU scheduler.[2]
       | It looks like the default sampling rate to determine whether to
       | adjust the frequency is 1000 times greater than the transition
       | latency, or about 20 milliseconds on my machine.
       | 
       | I'm not familiar with macOS, but it likely has something similar
       | (although it may not be user-accessible).
       | 
       | [1] https://docs.microsoft.com/en-us/windows-
       | server/administrati...
       | 
       | [2] https://www.kernel.org/doc/html/v5.15/admin-
       | guide/pm/cpufreq...
        
         | vardump wrote:
         | When a comment has the insight you expected from the article.
         | 
         | Thanks!
        
         | clamchowder wrote:
         | Hey, author here. The Snapdragon 670, Snapdragon 821, and
         | i5-6600K tests were done on Linux, and the rest were done on
         | Windows. If Windows is delaying boost, it doesn't seem to be
         | any different from Linux. And lack of intermediate states
         | between lowest clock and max clock on all three of those (not
         | considering the little cores) indicates the OS is not stepping
         | through frequencies.
         | 
         | Since people don't usually write their own OS, I don't think
         | it's correct to take the "maximum transition latency" reported
         | by the CPU to mean anything, because users will never observe
         | that transition speed. Also, processors that rely on the OS (no
         | "Speed Shift") can transition very fast if their voltages are
         | held high to start with, suggesting most of the latency comes
         | from waiting for voltages to increase.
         | 
         | Please also read about Intel's "Speed Shift". While it's fairly
         | new (only debuted in 2015), it means the CPU can clock up by
         | itself without waiting for a transition command from the OS.
        
           | theevilsharpie wrote:
           | > Since people don't usually write their own OS, I don't
           | think it's correct to take the "maximum transition latency"
           | reported by the CPU to mean anything, because users will
           | never observe that transition speed.
           | 
           | The Linux CPU frequency governor _literally_ uses it as part
           | of the algorithm for calculating its sampling rate (i.e., how
           | frequently it checks whether to adjust the processor 's
           | frequency).
           | 
           | > Please also read about Intel's "Speed Shift". While it's
           | fairly new (only debuted in 2015), it means the CPU can clock
           | up by itself without waiting for a transition command from
           | the OS.
           | 
           | Hardware-managed P-states don't need to wait for the OS to
           | send a command to change the frequency, but the processor
           | still performs periodic sampling to determine what the
           | frequency should be (this happens ever millisecond on modern
           | AMD and Intel hardware), the processor isn't necessarily
           | going to choose the maximum frequency right away, and it's
           | still subject to delays from the OS (e.g., Windows core
           | parking).
           | 
           | In any case, a multi-millisecond delay in switching frequency
           | isn't because the processor is waiting for the voltage to
           | increase.
        
             | clamchowder wrote:
             | > The Linux CPU frequency governor literally uses it as
             | part of the algorithm for calculating its sampling rate
             | 
             | Yes, the governor can play a role. It's visible to the
             | user, which is the point. Also, the ondemand governor is
             | actually irrelevant to the article as the S821 and S670
             | used the interactive and schedutil governors respectively,
             | and the i5-6600K was using speed shift.
             | 
             | I think we're disagreeing because I really don't care about
             | how fast a CPU could pull off frequency transitions if it's
             | never observable to users. I'm looking at how it's
             | observable to user programs, and how fast the transition
             | happens in practice.
             | 
             | > Processor still performs periodic sampling...
             | 
             | Same as the above, that's not the point of the article. I'm
             | not measuring "what could theoretically happen if you
             | ignore half the steps involved in a frequency transition
             | even though a user realistically cannot avoid them without
             | serious downsides" (like artificially holding the idle
             | voltage high and drawing more idle power, as in the
             | Piledriver example)
             | 
             | > In any case, a multi-millisecond delay in switching
             | frequency isn't because the processor is waiting for the
             | voltage to increase.
             | 
             | Yes, there are other factors involved besides the voltage
             | increase. I never said it was the only factor, and did
             | mention speed shift taking OS transition commands out of
             | the picture (implying that requiring OS commands does
             | introduce a delay in CPUs without such a feature).
             | 
             | If you want to test how fast a CPU can clock up, without
             | influence from OS/processor polling, please do so and
             | publish your results along with your methodology. I think
             | it'd be interesting to see.
        
               | rz30221 wrote:
               | I read the article in full, and the data & information
               | was interesting, but I have to say a lot of the points
               | you're making in the comments now was not clear from the
               | article text alone.
               | 
               | Another important point is you're measuring the default
               | behavior of the various control systems. One can change
               | that which would allow the user to observe something
               | else.
        
               | clamchowder wrote:
               | Yeah, didn't want to start an article with a five
               | paragraph essay especially when wordpress pagination
               | doesn't work, so I can't get an Anandtech style multi-
               | page article up.
               | 
               | And yep. You can even run a CPU at full clock all the
               | time, meaning you will never observe a clock transition
               | time. Cloud providers seem to do that.
        
               | clamchowder wrote:
               | Thanks :) I guess I can't reply to a 7th level comment,
               | so hopefully this one shows up in the right place.
               | 
               | I agree, there are multiple factors at play. But I don't
               | think it's basically an implementation choice. Certainly
               | it looks like it in some cases (S821 on battery, HSW-E
               | and SNB-E). But it doesn't seem to be the case elsewhere.
               | For example, speed shift lowers clock transition time by
               | taking the OS out of the control loop.
        
               | bee_rider wrote:
               | For some reason, this site hides the reply button after a
               | certain reply-chain length, but you can just click on the
               | person's name. This will show all their posts, including
               | the one you want to reply to (you may have to look for
               | it), with the reply buttons present.
               | 
               | I guess they must be trying to softly dis-incentivize
               | really long chains, but not block them outright? It
               | doesn't really make sense to me...
        
               | sokoloff wrote:
               | Faster is click on the timestamp of the post and you can
               | reply directly in one click.
        
               | rz30221 wrote:
               | I actually liked the article and I think the first image
               | is really informative.
               | 
               | I guess my point is the "clock frequency ramp time" is
               | really due to the interplay of a bunch of different
               | control systems, some in the OS and some not. And when
               | those systems get mixed together, in a somewhat
               | uncontrolled way (which is the case for most PCs), a huge
               | amount of variability is the result and that's what the
               | article did a good job quantifying but IMHO didn't make
               | clear.
               | 
               | But at the time scales in your plots "how quickly CPUs
               | change clock speeds" is basically an implementation
               | choice.
               | 
               | Just my $0.02
        
               | vardump wrote:
               | > I think we're disagreeing because I really don't care
               | about how fast a CPU could pull off frequency transitions
               | if it's never observable to users.
               | 
               | I think we should care. What about interrupts that occur
               | during that time? There are hardware devices that will
               | just not work if it takes too long. Too long is usually
               | 0.5 ms or so.
               | 
               | However, 20 microseconds is just fine.
        
               | clamchowder wrote:
               | That's a different and unrelated topic. If you're
               | concerned about how fast device driver code can respond,
               | well you can get _a lot_ done in 0.5 ms even with the CPU
               | running at 800 MHz or whatever the idle clock is.
        
               | vardump wrote:
               | Says someone who hasn't debugged a slow Windows graphics
               | related ISR (interrupt), hogging the same CPU core where
               | the your interrupt was supposed to be. (Also, whatever
               | happened to that 50 us ISR execution time limit? I guess
               | it doesn't apply, if you're Microsoft. Then again, there
               | was some GPU vendor code running as well in the call
               | stack...)
               | 
               | 0.5 ms is not really much on Windows.
        
               | clamchowder wrote:
               | I meant it's unrelated to how fast a CPU clocks up.
               | 
               | If something is taking longer than 0.5 ms, you shouldn't
               | be doing it in the ISR. Queue up a DPC and do your longer
               | running processing there, or send it to user space. And
               | yeah it might not be your fault if another driver's ISR
               | was hogging the CPU core. That's just a case of a badly
               | written driver screwing up the world for everyone,
               | because they're not supposed to be doing long running
               | stuff in an ISR in the first place.
               | 
               | https://docs.microsoft.com/en-us/windows-
               | hardware/drivers/de... says an ISR shouldn't run longer
               | than 25 microseconds. 0.5 ms is an order of magnitude
               | off. Not something going from 800 MHz to locked 4 GHz
               | will fix.
        
               | vardump wrote:
               | My ISRs execute under 15 us, some are as fast as 2 us.
               | I'm well aware of the DPC queueing.
               | 
               | > ISR shouldn't run longer than 25 microseconds
               | 
               | Weird, I think I read 50 microseconds somewhere else.
               | Maybe I just remember it wrong?
               | 
               | > 0.5 ms is an order of magnitude off. Not something
               | going from 800 MHz to locked 4 GHz will fix.
               | 
               | 0.5 ms is actually not that far fetched with higher
               | priority interrupts masking and the delay for Windows ISR
               | dispatching. There are also SMM missing time black holes
               | occasionally.
               | 
               | Windows isn't a real-time OS for sure! Although can't
               | blame SMMs on Windows.
        
       ___________________________________________________________________
       (page generated 2022-09-16 23:01 UTC)