[HN Gopher] How quickly do CPUs change clock speeds?
___________________________________________________________________
How quickly do CPUs change clock speeds?
Author : zdw
Score : 42 points
Date : 2022-09-16 00:04 UTC (22 hours ago)
(HTM) web link (chipsandcheese.com)
(TXT) w3m dump (chipsandcheese.com)
| LarsAlereon wrote:
| It's interesting how the HP OEM system didn't have SpeedShift
| working, which is absolutely critical to system responsiveness. I
| would like to see if OEMs have this sorted out on modern systems
| or if there's still brand-to-brand variation. Semi-related, but
| there can be a surprising amount of performance variation between
| motherboards that can be difficult to benchmark. DPC latency can
| vary between <50us to ~250us board-to-board:
| https://www.anandtech.com/show/16973/the-asrock-x570s-pg-rip...
| grapescheesee wrote:
| I think it's great the author made an account today on HN, and is
| replying to questions from his post. This is one if the best
| parts of the community here.
|
| I would love to see some test done on newer CPUs, and also if any
| BIOS tweaks can be used to dramatically cut the time frame.
|
| It could also be interesting if you logged the PSU from the wall
| or the AC to DC adapter as a metric. It wouldn't be that
| comparable system to system. Yet it might show more linear draw
| vs exponential.
| clamchowder wrote:
| > I think it's great the author made an account today on HN,
| and is replying to questions from his post. This is one if the
| best parts of the community here.
|
| :)
|
| > I would love to see some test done on newer CPUs
|
| So, the site is a free time thing run by several people, all of
| which are either students or have other full time jobs (not
| full time reviewers with samples of the latest hardware). No
| one happened to have an ICL/TGL/ADL CPU available. I do have an
| ADL result from earlier today showing full boost after 0.5 ms,
| but that person also overclocked and may have used a static
| voltage. Maybe we can get an addenum out with more results but
| I wouldn't bet on it.
|
| And yeah probably. In an extreme scenario you could just lock
| the CPU to its highest clock. Then you'll never see a
| transition period at all.
|
| > logged the PSU I don't have a power meter or PSU capable of
| logging
| jeffbee wrote:
| Big fan of your site. Travis Downs has done some work on
| characterizing hardware transition latency. I believe his
| tools are on GitHub.
| https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html
| clamchowder wrote:
| Yeah, he has some pretty interesting stuff.
|
| But AVX512 transition latency is pretty different from
| clocking up from idle. The CPU is already running at full
| tilt, and is making a relatively small frequency/voltage
| change. You can see from his writeup that it's well under a
| millisecond. I believe newer Intel CPUs have done away with
| AVX-512 downclocking completely.
|
| I probably won't be able to look into that specific issue
| since I don't have any AVX-512 capable CPUs.
| johntb86 wrote:
| > Sadly, Qualcomm was very bad at implementing caches, but that's
| a story for another day.
|
| Anyone know what that story is?
| clamchowder wrote:
| Big cores each have a 512 KB L2 cache with a latency of around
| 24 cycles, while the little ones each have a 256 KB L2 with ~23
| cycle latency. It's really terrible considering the low
| 2.34/1.6 GHz clock speeds. Then there's no multi-megabyte last
| level cache, so it's going out to memory a lot.
|
| They do have 3 cycle L1D latency, but again that's at low
| clocks, and it's a 24 KB L1D so it's 25% smaller than most L1Ds
| we see today. Not great considering you're eating 20+ cycle L2
| latency once you miss the L1D, unlike most CPUs today that have
| 12 cycle latency for 256 or 512 KB L2s.
| theevilsharpie wrote:
| This article isn't accurate, in the sense that it's actually
| measuring how long the operating system takes to reach the CPU's
| maximum clock speed, rather than how long it physically takes the
| CPU to reach that speed.
|
| Modern processors can switch frequencies very fast -- generally
| within microseconds. An example from the machine I'm currently
| using: $ grep 'model name' /proc/cpuinfo |
| uniq; sudo cpupower frequency-info -y -m model name
| : Intel(R) Core(TM) i5-4310U CPU @ 2.00GHz analyzing CPU
| 0: maximum transition latency: 20.0 us
|
| Even very deep CPU idle states can be exited in <1 ms.
|
| With respect to the operating system, the amount of time it takes
| to reach maximum frequency from idle depends on:
|
| - How frequently the OS checks to see if the frequency should be
| increased
|
| - Whether the OS will step through increasing frequencies, or go
| straight to the max frequency
|
| - If the OS is stepping through increasing frequencies, how many
| it needs to step through
|
| - Whether the core is already active, or if it needs to be
| awakened from a sleep state first
|
| It looks like the OP is using Windows. Windows has a number of
| hidden (and poorly documented) tunables that control the above
| settings, and would have a significant impact on how quickly a
| CPU can reach max frequency. Microsoft used to have an extensive
| document (specifically, a Word document for Windows Server 2008)
| describing how to tune these parameters using the `powercfg` CLI
| tool and provided a detailed reference for each parameter, which
| I unfortunately can't find anymore. It looks like Microsoft's
| modern server performance tuning documentation describes a subset
| of these parameters.[1]
|
| Linux has similar tunables for its `ondemand` CPU scheduler.[2]
| It looks like the default sampling rate to determine whether to
| adjust the frequency is 1000 times greater than the transition
| latency, or about 20 milliseconds on my machine.
|
| I'm not familiar with macOS, but it likely has something similar
| (although it may not be user-accessible).
|
| [1] https://docs.microsoft.com/en-us/windows-
| server/administrati...
|
| [2] https://www.kernel.org/doc/html/v5.15/admin-
| guide/pm/cpufreq...
| vardump wrote:
| When a comment has the insight you expected from the article.
|
| Thanks!
| clamchowder wrote:
| Hey, author here. The Snapdragon 670, Snapdragon 821, and
| i5-6600K tests were done on Linux, and the rest were done on
| Windows. If Windows is delaying boost, it doesn't seem to be
| any different from Linux. And lack of intermediate states
| between lowest clock and max clock on all three of those (not
| considering the little cores) indicates the OS is not stepping
| through frequencies.
|
| Since people don't usually write their own OS, I don't think
| it's correct to take the "maximum transition latency" reported
| by the CPU to mean anything, because users will never observe
| that transition speed. Also, processors that rely on the OS (no
| "Speed Shift") can transition very fast if their voltages are
| held high to start with, suggesting most of the latency comes
| from waiting for voltages to increase.
|
| Please also read about Intel's "Speed Shift". While it's fairly
| new (only debuted in 2015), it means the CPU can clock up by
| itself without waiting for a transition command from the OS.
| theevilsharpie wrote:
| > Since people don't usually write their own OS, I don't
| think it's correct to take the "maximum transition latency"
| reported by the CPU to mean anything, because users will
| never observe that transition speed.
|
| The Linux CPU frequency governor _literally_ uses it as part
| of the algorithm for calculating its sampling rate (i.e., how
| frequently it checks whether to adjust the processor 's
| frequency).
|
| > Please also read about Intel's "Speed Shift". While it's
| fairly new (only debuted in 2015), it means the CPU can clock
| up by itself without waiting for a transition command from
| the OS.
|
| Hardware-managed P-states don't need to wait for the OS to
| send a command to change the frequency, but the processor
| still performs periodic sampling to determine what the
| frequency should be (this happens ever millisecond on modern
| AMD and Intel hardware), the processor isn't necessarily
| going to choose the maximum frequency right away, and it's
| still subject to delays from the OS (e.g., Windows core
| parking).
|
| In any case, a multi-millisecond delay in switching frequency
| isn't because the processor is waiting for the voltage to
| increase.
| clamchowder wrote:
| > The Linux CPU frequency governor literally uses it as
| part of the algorithm for calculating its sampling rate
|
| Yes, the governor can play a role. It's visible to the
| user, which is the point. Also, the ondemand governor is
| actually irrelevant to the article as the S821 and S670
| used the interactive and schedutil governors respectively,
| and the i5-6600K was using speed shift.
|
| I think we're disagreeing because I really don't care about
| how fast a CPU could pull off frequency transitions if it's
| never observable to users. I'm looking at how it's
| observable to user programs, and how fast the transition
| happens in practice.
|
| > Processor still performs periodic sampling...
|
| Same as the above, that's not the point of the article. I'm
| not measuring "what could theoretically happen if you
| ignore half the steps involved in a frequency transition
| even though a user realistically cannot avoid them without
| serious downsides" (like artificially holding the idle
| voltage high and drawing more idle power, as in the
| Piledriver example)
|
| > In any case, a multi-millisecond delay in switching
| frequency isn't because the processor is waiting for the
| voltage to increase.
|
| Yes, there are other factors involved besides the voltage
| increase. I never said it was the only factor, and did
| mention speed shift taking OS transition commands out of
| the picture (implying that requiring OS commands does
| introduce a delay in CPUs without such a feature).
|
| If you want to test how fast a CPU can clock up, without
| influence from OS/processor polling, please do so and
| publish your results along with your methodology. I think
| it'd be interesting to see.
| rz30221 wrote:
| I read the article in full, and the data & information
| was interesting, but I have to say a lot of the points
| you're making in the comments now was not clear from the
| article text alone.
|
| Another important point is you're measuring the default
| behavior of the various control systems. One can change
| that which would allow the user to observe something
| else.
| clamchowder wrote:
| Yeah, didn't want to start an article with a five
| paragraph essay especially when wordpress pagination
| doesn't work, so I can't get an Anandtech style multi-
| page article up.
|
| And yep. You can even run a CPU at full clock all the
| time, meaning you will never observe a clock transition
| time. Cloud providers seem to do that.
| clamchowder wrote:
| Thanks :) I guess I can't reply to a 7th level comment,
| so hopefully this one shows up in the right place.
|
| I agree, there are multiple factors at play. But I don't
| think it's basically an implementation choice. Certainly
| it looks like it in some cases (S821 on battery, HSW-E
| and SNB-E). But it doesn't seem to be the case elsewhere.
| For example, speed shift lowers clock transition time by
| taking the OS out of the control loop.
| bee_rider wrote:
| For some reason, this site hides the reply button after a
| certain reply-chain length, but you can just click on the
| person's name. This will show all their posts, including
| the one you want to reply to (you may have to look for
| it), with the reply buttons present.
|
| I guess they must be trying to softly dis-incentivize
| really long chains, but not block them outright? It
| doesn't really make sense to me...
| sokoloff wrote:
| Faster is click on the timestamp of the post and you can
| reply directly in one click.
| rz30221 wrote:
| I actually liked the article and I think the first image
| is really informative.
|
| I guess my point is the "clock frequency ramp time" is
| really due to the interplay of a bunch of different
| control systems, some in the OS and some not. And when
| those systems get mixed together, in a somewhat
| uncontrolled way (which is the case for most PCs), a huge
| amount of variability is the result and that's what the
| article did a good job quantifying but IMHO didn't make
| clear.
|
| But at the time scales in your plots "how quickly CPUs
| change clock speeds" is basically an implementation
| choice.
|
| Just my $0.02
| vardump wrote:
| > I think we're disagreeing because I really don't care
| about how fast a CPU could pull off frequency transitions
| if it's never observable to users.
|
| I think we should care. What about interrupts that occur
| during that time? There are hardware devices that will
| just not work if it takes too long. Too long is usually
| 0.5 ms or so.
|
| However, 20 microseconds is just fine.
| clamchowder wrote:
| That's a different and unrelated topic. If you're
| concerned about how fast device driver code can respond,
| well you can get _a lot_ done in 0.5 ms even with the CPU
| running at 800 MHz or whatever the idle clock is.
| vardump wrote:
| Says someone who hasn't debugged a slow Windows graphics
| related ISR (interrupt), hogging the same CPU core where
| the your interrupt was supposed to be. (Also, whatever
| happened to that 50 us ISR execution time limit? I guess
| it doesn't apply, if you're Microsoft. Then again, there
| was some GPU vendor code running as well in the call
| stack...)
|
| 0.5 ms is not really much on Windows.
| clamchowder wrote:
| I meant it's unrelated to how fast a CPU clocks up.
|
| If something is taking longer than 0.5 ms, you shouldn't
| be doing it in the ISR. Queue up a DPC and do your longer
| running processing there, or send it to user space. And
| yeah it might not be your fault if another driver's ISR
| was hogging the CPU core. That's just a case of a badly
| written driver screwing up the world for everyone,
| because they're not supposed to be doing long running
| stuff in an ISR in the first place.
|
| https://docs.microsoft.com/en-us/windows-
| hardware/drivers/de... says an ISR shouldn't run longer
| than 25 microseconds. 0.5 ms is an order of magnitude
| off. Not something going from 800 MHz to locked 4 GHz
| will fix.
| vardump wrote:
| My ISRs execute under 15 us, some are as fast as 2 us.
| I'm well aware of the DPC queueing.
|
| > ISR shouldn't run longer than 25 microseconds
|
| Weird, I think I read 50 microseconds somewhere else.
| Maybe I just remember it wrong?
|
| > 0.5 ms is an order of magnitude off. Not something
| going from 800 MHz to locked 4 GHz will fix.
|
| 0.5 ms is actually not that far fetched with higher
| priority interrupts masking and the delay for Windows ISR
| dispatching. There are also SMM missing time black holes
| occasionally.
|
| Windows isn't a real-time OS for sure! Although can't
| blame SMMs on Windows.
___________________________________________________________________
(page generated 2022-09-16 23:01 UTC)