[HN Gopher] How Jeff Dean's "Latency Numbers Everybody Should Kn...
___________________________________________________________________
How Jeff Dean's "Latency Numbers Everybody Should Know" decreased
from 1990-2020
Author : isaacimagine
Score : 186 points
Date : 2022-03-03 21:06 UTC (1 hours ago)
(HTM) web link (colin-scott.github.io)
(TXT) w3m dump (colin-scott.github.io)
| dustingetz wrote:
| How can network be faster than memory?
| zamadatix wrote:
| The memory number is measuring access time while the network
| number is measuring average bandwidth. The two values can't be
| compared even though they are presented using the same unit.
| the-dude wrote:
| The slider is very bad UX : I missed it too at first. It is not
| pronounced enough, partly because it is all the way to the right.
|
| A former boss would say : _make it red_.
| wolpoli wrote:
| It's really hard to notice the grey slider when the content are
| already red, green, blue, and black.
| the-dude wrote:
| Blinking Magenta then.
| nhoughto wrote:
| Oh right
|
| I was trying to see where the comparison was, totally missed
| the slider, thanks!
| greggsy wrote:
| I sympathise that the site probably wasn't designed with mobile
| in mind, but it's impossible to go beyond 2015 without hitting
| the GitHub link.
| ygra wrote:
| You can also drag on the main view instead of the slider.
| lamontcg wrote:
| How are people practically taking advantage of the increase in
| speed of SSDs these days compared to network latencies? It looks
| like disk caches directly at the edge with hot data would be the
| fastest way of doing things.
|
| I'm more familiar with the 2001-2006 era where redis-like RAM
| caches for really hot data made a lot of sense, but with spinning
| rust a disk drives, it made more sense to go over the network to
| a microservice that was effectively a big sharded RAM cache than
| to go to disk.
|
| Seems like you could push more hot data to the very edge these
| days and utilize SSDs like a very large RAM cache (and how does
| that interact with containers)?
|
| I guess the cost there might still be prohibitive if you have a
| lot of edge servers and consolidation would still be a big price
| win even if you take the latency hit across the network.
| gameswithgo wrote:
| I don't know, I have observed in my workloads, booting, game
| load, and building programs, that super fast ssds make almost
| no difference compared to cheap slow ssds. But any ssd is
| miraculous compared to a spinny drive
|
| Presumably video editing or something might get more of a win
| but I don't know.
| noizejoy wrote:
| When I got my first NVMe SSD, I was disappointed that it
| wasn't significantly faster than my SATA SSD.
|
| But soon I realized that it was Samsung's Magician software
| that made the SATA SSD competitive with an NVMe SSD via RAM
| caching.
| rvr_ wrote:
| 20 years without meaningful improvements on memory access ?
| gameswithgo wrote:
| yep, got any ideas?
| not2b wrote:
| It takes at least one clock cycle to do anything, and clock
| frequency stopped increasing in the 2003-2005 time frame,
| mainly because of the horrible effects on power with very small
| feature size.
| aidenn0 wrote:
| Good news is that SSDs are only 160x slower for random reads,
| so maybe we should just beef up L3 or L4 cache and get rid of
| ram? /s
| ohazi wrote:
| Diminishing returns over the last decade, as expected. It would
| be interesting to look at the energy consumed by each of these
| operations across the same time periods.
| swolchok wrote:
| The source displayed at the bottom if this page clearly shows
| it's just extrapolating from numbers that are older than 2020.
| gregwebs wrote:
| According to this all latencies improved dramatically except for
| SSD random read (disk seek only improved by 10x as well). Reading
| 1 million bytes sequentially from SSD improved 1000x and is now
| only 2-3x slower than a random read and for disk reading 1
| million bytes is faster than a seek. Conclusion: avoid random IO
| where performance matters.
|
| CPU and RAM latencies stopped improving in 2005 but storage and
| network kept improving.
| csours wrote:
| It looks like almost everything is blazing fast now. I'm not sure
| how long the first X takes though - how long does it take to
| establish a TCP/IP connection? How long does it take an actual
| program to start reading from disk?
| bob1029 wrote:
| Latency is everything.
|
| I believe that sometime around 2010 we peaked on the best
| software solution for high performance, low-latency processing of
| business items when working with the style of computer
| architecture we have today.
|
| https://lmax-exchange.github.io/disruptor/disruptor.html
|
| I have been building systems using this kind of technique for a
| few years now and I still fail to wrap my brain around just how
| fast you can get 1 thread to go if you are able to get out of its
| way. I caught myself trying to micro-optimize a data import
| method the other day and made myself to do it the "stupid" way
| first. Turns out I was definitely wasting my time. Being able to
| process and put to disk _millions of things per second_ is some
| kind of superpower.
| _pastel wrote:
| These numbers focus on reads. How does writing speed to cache,
| main memory, or disk compare? Anyone have some ballparks to help
| me build intuition?
| bmitc wrote:
| Today I learned that I don't know any of these numbers that
| "every" programmer should know. Where do I turn in my programmer
| card, Jeff Dean?
| morelisp wrote:
| You could take this instead as an opportunity to learn them,
| instead of reveling in your ignorance.
| bmitc wrote:
| There is plenty I don't know. It's not me reveling in my
| ignorance.
|
| My point is that programming is an incredibly diverse field
| and yet people, even people who supposedly should know
| better, are obsessed with making global laws of programming.
| I know relative comparisons of speeds that have been useful
| in my day jobs, but I'd wager that needing to know the
| details of these numbers, how they've evolved, etc. is a
| relatively niche area.
|
| Regarding learning, I try to constantly learn. This is driven
| by two things: (1) need, such as one finds in their day job
| or to complete some side project; (2) interest. If something
| hits either need or interest or hopefully both, I learn it.
| zachberger wrote:
| I don't think it's important to know the absolute numbers but
| rather the relative values and rough orders of magnitude.
|
| I can't tell you how many times I've had to explain to
| developers why their network attaches storage has higher
| latency than their locally attached NVME SSD
| morelisp wrote:
| The absolute numbers are also important. I can't tell you how
| many times I've had someone coming from a front-end world
| tells me 5ms for some trivial task (e.g. sorting a 1000ish
| element list) is "fast" just because it happened faster than
| their reaction time.
| kragen wrote:
| Does anyone have a plot of these on a log-linear scale? Where
| does the data come from?
|
| http://worrydream.com/MagicInk/
| ChuckMcM wrote:
| Not an intuitive thing but the data is fascinating. A couple of
| notes of people who are confused by it:
|
| 1) The 'ns' next to the box is a graph legend not a data label
| (normally that would be in a box labeled legend to distinguish it
| from graph data)
|
| 2) The weird box and rectangle thing on the top is a slider, I
| didn't notice that until I was looking at the code and said "what
| slider?"
|
| 3) The _only_ changes from 2005 to present are storage and
| networking speeds.
|
| What item #3 tells you is that any performance gains in the last
| decade and a half you've experienced have been driven by multi-
| core, not faster processors. And _that_ means Amdahl 's Law is
| more important than Moore's Law these days.
| DonHopkins wrote:
| At what point did Joy's Law -- "2^(year-1884) MIPS" -- break
| with reality?
|
| https://medium.com/@donhopkins/bill-joys-law-2-year-1984-mil...
|
| https://en.wikipedia.org/wiki/Joy%27s_law_(computing)
| pjc50 wrote:
| It also tells us that the speed of light has not increased.
|
| (well, speed of signal on a PCB track is roughly 2/3 light and
| determined by the transmission line geometry and the dielectric
| constant, but you all knew that)
| thfuran wrote:
| Which latency are you suggesting is limited by the speed of
| light?
| not2b wrote:
| It wasn't the speed of light, it was the size of atoms that
| was the issue here. As old-style scaling (the kind used up
| until about 2003) continued, leakage power was increasing
| rapidly because charge carriers (electrons / holes) would
| tunnel through gates (I'm simplifying a bit here, other bad
| effects were also a factor). It was no longer possible to
| keep increasing clock frequency while scaling down feature
| size. Further reduction without exploding the power
| requirement meant that the clock frequency had to be left
| the same and transistors needed to change shape.
| chubot wrote:
| _What item #3 tells you is that any performance gains in the
| last decade and a half you 've experienced have been driven by
| multi-core, not faster processors. And that means Amdahl's Law
| is more important than Moore's Law these days._
|
| Uh or storage and networking? Not sure why you would leave that
| out, since they're the bottleneck in many programs.
|
| The slowest things are the first things you should optimize
| horsawlarway wrote:
| Yeah... SSDs are so much faster than spinning disk it's not
| even funny.
|
| I literally refuse to run a machine that boots its main OS
| from spinning disk anymore. The 60 bucks to throw an SSD into
| it is so incredibly cheap for what you get.
|
| My wife's work gave her a (fairly basic but still fine)
| thinkpad - except they left the main drive as a fucking
| 5400rpm hdd. Then acted like assclowns when we repeatedly
| showed them that the machine is stalling on disk IO, while
| the rest of the system is doing diddly squat waiting around.
| I finally got tired of it and we "accidentally" spilled water
| on it, and somehow just the hdd stopped working (I left out
| the part where I'd removed it from the laptop first...). Then
| I just had her expense a new SSD and she no longer hates her
| work laptop.
|
| Long story short - Storage speeds are incredible compared to
| what they were when I went to school (when 10k rpm was
| considered exorbitant)
| capitainenemo wrote:
| The living room media/gaming machine at home is an 8
| terabyte spinning rust. I didn't bother with a separate SSD
| boot partition.
|
| It's currently been running for 23 days. Booting takes ~15
| seconds even on spinning rust for a reasonable linux
| distro, so I'm not going to stress about those 15 seconds
| every couple of months. total
| used free shared buff/cache available
|
| Mem: 31Gi 4.6Gi 21Gi 158Mi 5.1Gi 25Gi
|
| Swap: 37Gi 617Mi 36Gi
|
| 5.1 gigabytes mostly just file cache. As a result,
| everything opens essentially instantly. For a bit better
| experience, I did a:
|
| find ~/minecraft/world -type f -exec cat {} > /dev/null \;
|
| to forcibly cache that, but that was all I did.
| horsawlarway wrote:
| Hah, if you can fit the whole OS plus running
| applications easily in RAM, and you don't boot often -
| fine. But you're basically doing the same thing but with
| extra steps :P
| capitainenemo wrote:
| Well, RAM is significantly faster than even SSD, and now
| I don't have to muck about w/ a 2nd drive :)
|
| Not to mention the spinning rust is cheaper.
| dekhn wrote:
| (your hard drive story is the story of my life, up to about
| 10 years ago. I have eliminated all but one hard drive from
| my house and that one doesn't spin most of the time)
|
| Lately my vendor discussions have centered around how much
| work you can get done with a machine that has half a
| gigabyte of RAM, 96 cores, and 8 NVME SSDs (it's a lot). my
| college box: 40MB disk, 4MB RAM, 1 66MHz CPU.
| ianai wrote:
| "And that means Amdahl's Law is more important than Moore's Law
| these days."
|
| idk, sure seems like we could have 1-2 cores (permanently
| pegged?) at 5 ghz for UI/UX then ($money / $costPerCores)
| number of cores for showing off/"performance" by now. But the
| OEMs haven't gone that way.
| ChuckMcM wrote:
| We probably see things differently. As I understand it, this
| is exactly the use case for "big/little" microarchitectures.
| Take a number of big fast cores that are running full bore,
| and a bunch of little cores that can do things for them when
| they get tasked. So far they've been symmetric but with
| chiplets they needn't be.
| ianai wrote:
| Yes, for 'computational' loads. I've read though UI/UX
| benefits the most from fastest response times. I'm talking
| about the cores which actually draw the GUI the user
| sees/uses being optimized for the task at the highest
| possible rate. Then have a pool of cores for the rest of
| it.
| moonchild wrote:
| UI should be drawn on the GPU. Absent rendering, slow
| cores are more than sufficient to do layout/etc.
| interactively.
| ChuckMcM wrote:
| You are talking about the GPU? Okay, really random tidbit
| here; When I worked at Intel I was a validation engineer
| for the 82786 (which most people haven't heard about) but
| was a graphics chip that focused on building responsive,
| windowed user interfaces by using hardware features to
| display separate windows (so moving windows moved no
| actual memory, just updated a couple of registers) to
| draw the mouse, and to process character font processing
| for faster updates. Intel killed it but if you find an
| old "Number9 video card" you might find one to play with.
| It had an embedded RISC engine that did bitblit and other
| UI type things on chip.
|
| EVERYTHING that chip did, could in fact be done with a
| GPU today. It isn't, for the most part, because window
| systems evolved to be CPU driven, although a lot of
| phones these days do the UI in the GPU, not the CPU for
| this same reason. There is a fun program for HW engineers
| called "glscopeclient" which basically renders its UI via
| the GPU.
|
| So I'm wondering if I misread what you said and are
| advocating for a different GPU micro architecture or
| perhaps an integrated more general architecture on the
| chip that could also do UI like APUs?
| bee_rider wrote:
| I would rather reserve the thermal headroom for actual
| computations, rather than having those cores pegged at 5Ghz.
| stuartmscott wrote:
| > And that means Amdahl's Law is more important than Moore's
| Law these days.
|
| 100%, we can no longer rely on faster processors to make our
| code faster, and must instead write code that can take
| advantage of the hardware's parallelism.
|
| For those interested in learning more about Why Amdahl's Law is
| Important, my friend wrote an interesting article on this very
| topic - https://convey.earth/conversation?id=41
| gameswithgo wrote:
| There is some improvement from processors being faster, as more
| instructions are done at once and more instructions get down
| towards that 1ns latency that l1 caches provide. You see it
| happen in real life but the gains are small.
| [deleted]
| muh_gradle wrote:
| I would never have realized the slider functionality until I
| read this comment.
| raisedbyninjas wrote:
| I noticed the year was an editable field but didn't change
| the data before I noticed the slider.
| jll29 wrote:
| The time to open a Web browser seems roughlz constant since 1993.
| bb123 wrote:
| This site is completely unusable on mobile
| [deleted]
| dweez wrote:
| Okay since we're not going to improve the speed of light any time
| soon, here's my idea for speeding up CA to NL roundtrip: let's
| straight shot a cable through the center of the earth.
| DonHopkins wrote:
| We could really use some decent Mexican food here in the
| Netherlands.
|
| https://idlewords.com/2007/04/the_alameda_weehawken_burrito_...
| almog wrote:
| What's more, we can run that cable along a gravity train from
| CA to NL, saving the costs of digging another tunnel. :)
| Archelaos wrote:
| From CA you will end up off the coast of Madagaskar, and from
| the NL somewhere near New Zealand. You do not have to go very
| deep inside the earth to get straight from CA to NL.
| banana_giraffe wrote:
| Assuming my math is right, it'd be a 10% faster trip, but I'd
| be all for seeing that tunnel!
| jeffbee wrote:
| I doubt that same-facility RTT has been fixed at 500us for 30
| years. In EC2 us-east-1 I see < 100us same-availability-zone RTT
| on TCP sockets, and those have a lot of very unoptimized software
| in the loop. function getDCRTT() {
| // Assume this doesn't change much? return 500000; //
| ns }
| genewitch wrote:
| I show 180-350us between various machines on my network, all of
| which have some fiber between them. devices with only a switch
| and copper between them somehow perform worse, but this is
| anecdotal because i'm not running something like smokeping!
|
| Oh, additionally between VMs i'm getting 180us, so that looks
| to be my lower bound, for whatever reason. my main switches are
| very old, so maybe that's why.
| jeffbee wrote:
| Are you measuring that with something like ICMP ping? I think
| the way to gauge the actual network speed is to look at the
| all-time minimum RTT on a long-established TCP socket. The
| Linux kernel maintains this stat for normal TCP connections.
| gameswithgo wrote:
| An instructive thing here is that a lot of stuff has not improved
| since ~2004 or so, and working around those things that have not
| improved (memory latency from ram all the way down to l1 cache
| really) requires fine control of memory layout and minimizing
| cache pollution, which is difficult to do with all of our popular
| garbage collected languages, even harder with languages that
| don't offer memory layout controls, and jits and interpreters add
| further difficulty.
|
| To get the most out of modern hardware you need to:
|
| * minimize memory usage/hopping to fully leverage the CPU caches
|
| * control data layout in memory to leverage the good throughput
| you can get when you access data sequentially
|
| * be able to fully utilize multiple cores without too much
| overhead and with minimal risk of error
|
| For programs to run faster on new hardware, you need to be able
| to do at least some of those things.
| greggsy wrote:
| It's interesting that L2 cache has basically been steady at
| 2MB/core since 2004 aswell. It hasn't changed speed in that
| time, but is still an order of magnitude faster than memory
| across that whole timeframe. Does this suggest that the memory
| speed bottleneck means that there simply hasn't been a need to
| increase availability of that faster cache?
| gameswithgo wrote:
| the bigger the cache the longer it takes to address it, and
| kinda fundamental physics prevents it being faster
| formerly_proven wrote:
| Some of these numbers are clearly wrong. Some of the old
| latency numbers seem somewhat optimistic (e.g. 100 ns main
| memory ref in 1999), some of the newer ones are pessimistic
| (e.g. 100 ns main memory ref in 2020). The bandwidth for
| disks is clearly wrong, as it claims ~1.2 GB/s for a hard
| drive in 2020. The seek time is also wrong. It crossed 10 ms
| in 2000 and has reduced to 5 ms in 2010 and is 2 ms for 2020.
| Seems like linear interpolation to me. It's also unclear what
| the SSD data is supposed to mean before ~2008 as they were
| not really a commercial product before then. Also, for 2020
| the SSD transfer rate is given as over 20 GB/s. Main memory
| bandwidth is given as 300+ GB/s.
|
| Cache performance has increased massively. Especially
| bandwidth, not reflected in a latency chart. Bandwidth and
| latency are of course related; just transferring a cache line
| over a PC66 memory bus takes a lot longer than 100 ns. The
| same transfer on DDR5 takes a nanosecond or so, which leaves
| almost all of the latency budget for existential latency.
|
| edit: https://github.com/colin-
| scott/interactive_latencies/blob/ma...
|
| The data on this page is simply extrapolated using formulas
| and guesses.
| throwawaylinux wrote:
| Bigger caches could help but as a rule of thumb cache hit
| rate increases approximately with the square root of cache
| size, so it diminishes. Then the bigger you make a cache, the
| slower it tends to be so at some point you could make your
| system slower by making your cache bigger and slower.
| TillE wrote:
| It's pretty remarkable that, for efficient data processing,
| it's super super important to care about memory layout / cache
| locality in intimate detail, and this will probably be true
| until something fundamental changes about our computing model.
|
| Yet somehow this is fairly obscure knowledge unless you're into
| serious game programming or a similar field.
| efuquen wrote:
| > Yet somehow this is fairly obscure knowledge unless you're
| into serious game programming or a similar field.
|
| Because the impact in optimizing hardware like that can be
| not so important in many applications. Getting the absolute
| most out of your hardware is very clearly important in game
| programming, but web apps where scale being served is not
| huge (vast majority)? Not so much. And in this context
| developer time is more valuable when you can throw hardware
| at the problem for less.
|
| Traditional game programming you had to run on the hardware
| people used to play, you are constrained by the client's
| abilities. Cloud gaming might(?) be changing some of that,
| but GPUs are super expensive too compared to the rest of the
| computing hardware. Even in that case the amounts of data you
| are pushing you need to be efficient within the context of
| the GPU, my feeling is it's not easily horizontally scaled.
| Gigachad wrote:
| TBH I don't think cloud gaming is a long term solution. It
| might be a medium term solution for people with cheap
| laptops but eventually the chip in cheap laptops will be
| able to produce photo realistic graphics and there will be
| no point going any further than that
| sigstoat wrote:
| the code appears to just do a smooth extrapolation from some past
| value.
|
| it claims that (magnetic) disk seek is 2ms these days. since when
| did we get sub-4ms average seek time drives?
|
| it also seems to think we're reading 1.115GiB/s off of drives
| now. transfer rate on the even the largest drives hasn't exceed
| 300MiB/s or so, last i looked.
|
| ("but sigstoat, nvme drives totally are that fast or faster!"
| yes, and i assume those fall under "SSD" on the page, not
| "drive".)
| CSSer wrote:
| Is a commodity network a local network?
| kens wrote:
| Amazing performance improvements, except no improvement at all on
| the packet roundtrip time to Netherlands. Someone should really
| work on that.
| [deleted]
| skunkworker wrote:
| Maybe if hollow core fiber is deployed we could see a 50%
| reduction in latency (from .66c to .99c).
|
| Past that physics take over, and unfortunately the speed of
| light is pretty slow.
| aeyes wrote:
| Could LEO satellite networks like Starlink with inter-
| satellite links reduce the roundtrip time?
| genewitch wrote:
| the radius of the arc in low earth orbit (or whatever) is
| going to be larger than the arc across the atlantic ocean.
|
| As no one has ever said: "I'll take glass over gas,
| thanks."
| toqy wrote:
| We really need to go back to 1 supercontinent
| dsr_ wrote:
| Direct point to point conduits carrying fiber would reduce
| latency to a worst case of 21ms, but requires a fiber that
| doesn't melt at core temps (around 5200C).
| warmwaffles wrote:
| Now we just need to either beat the speed of light, or speed
| light up. (thanks futurama)
| mrfusion wrote:
| How did it decrease?
| csours wrote:
| There's a slider at the top. It took me 2 minutes to find it.
| bryanrasmussen wrote:
| It used to take 3 minutes, that's quite an improvement.
| genewitch wrote:
| 6x10^10 ns (that's a lot of zeros!)
| chairmanwow1 wrote:
| Yeah this link isn't that useful without a comparison.
| dekhn wrote:
| latencies generally got smaller but spinning rust is still slow
| and the speed of light didn't change
| tejtm wrote:
| lots till 2005, then not much since
| [deleted]
| jandrese wrote:
| The "commodity network" thing is kind of weird. I'd expect that
| to make a 10x jump when switches went from Fast Ethernet to
| Gigabit (mid-late 2000s?) and then nothing. I certainly don't
| feel like they've been smoothly increasing in speed year after
| year.
|
| I'm also curious about those slow 1990s SSDs.
___________________________________________________________________
(page generated 2022-03-03 23:00 UTC)