[HN Gopher] Are efficiency and horizontal scalability at odds?
___________________________________________________________________
Are efficiency and horizontal scalability at odds?
Author : todsacerdoti
Score : 30 points
Date : 2025-02-12 18:27 UTC (4 hours ago)
(HTM) web link (buttondown.com)
(TXT) w3m dump (buttondown.com)
| xzyyyz wrote:
| not convincing. (horizontal) scalability comes at cost, but it
| changes size of the problem we can handle considerably.
| datadrivenangel wrote:
| "The downside is that for the past couple of decades computers
| haven't gotten much faster, except in ways that require recoding
| (like GPUs and multicore)."
|
| This is false? Computers have gotten a lot faster, even if the
| clock speed is not that much faster. A single modern CPU core
| turboing at ~5Ghz is going to be significantly faster than a 20
| year old cpu overclocked to ~4.5Ghz.
| jeffbee wrote:
| Yeah, that detail sinks the rest of it. Even if we assume
| datacenter CPUs were the market preference has been for more
| cores operating at the same ~2400MHz speed for a long time,
| what you get for 1 CPU-second these days is ridiculous compared
| to what you could have gotten 20 years ago. We're talking about
| NetBurst Xeons as a baseline.
| gopalv wrote:
| > Computers have gotten a lot faster, even if the clock speed
| is not that much faster
|
| We're not stagnating but the same code I thought was too slow
| in 1998 was good enough in 2008, which is probably not true for
| code I would've thrown away in 2015.
|
| The only place where that has happened in the last decade is
| for IOPS - old IOPS heavy code which would have been rewritten
| with group-commit tricks is probably slower than a naive
| implementation that fsync'd all the time. A 2015 first cut of
| IO code probably beats the spinning disk optimized version from
| the same year on modern hardware.
|
| The clock-speed comment is totally on the money though - a lot
| of the clocks were spent waiting for memory latencies and those
| have improved significantly across the years particularly if
| you use an Apple Silicon style memory which is physically
| closer in a light cone from the DIMMs of the past.
| Legend2440 wrote:
| A lot of clocks are _still_ spent waiting for memory. GPUs in
| particular are limited by memory bandwidth despite a memory
| bus that runs at terabytes per second.
|
| Back when I started programming, it was reasonable to
| precompute lookup tables for multiplications and trig
| functions. Now you'd never do that - it's far cheaper to
| recompute it than to look it up from memory.
| paulsutter wrote:
| Could you share some numbers on this? Lots of folks would be
| interested I'm sure
| PaulKeeble wrote:
| An intel 12900k (Gen 12) compared to a 2600k (Gen 2, launched
| 2011) is about 120% faster or a bit over 2 times in single
| threaded applications, those +5-15% uplifts every generation
| add up over time but its nothing like the earlier years when
| they might double in performance in a single generation.
|
| It really depends if that application uses AES 256 bit and
| other modern instructions. The 12900k has 16 cores vs 4 of the
| 2600k, although 8 of those extra cores are E-cores. This
| performance increase doesn't necessarily come from free given
| the application may need to be adjusted to utilise those extra
| cores especially when half of them are slower to ensure the
| workload is distributed properly.
|
| Even within a vertical scaling by getting a new processor for
| just single threaded applications its interesting that much of
| the big benefits come from targeting the new instructions and
| then the new cores. Both of which may require source updates to
| get significant performance uplift from.
|
| https://www.cpu-monkey.com/en/compare_cpu-intel_core_i7_1270...
| einpoklum wrote:
| > is about 120% faster or a bit over 2 times in single
| threaded applications
|
| 1. Doesn't that also account for speedups in memory and I/O?
|
| 2. Even if the app is single-threaded, the OS isn't, so
| unless it's very very inactive other than the foreground
| application (which is possible), there might still be an
| effect of the higher core count.
| jaggederest wrote:
| Unless you're multitasking, the OS on a separate thread
| gets you about 5-10% speedup. It's not really noteworthy.
|
| Unless you lived through the 1990s I don't think you
| understand how fast things were improving. Routine doubling
| of scores every 18 months is an insane thing. In 1990 the
| state of the art was 8mhz chips. By 2002, the state of the
| art was a 5ghz chip. So almost a thousand times faster in a
| decade.
|
| Are chips now a thousand times faster than they were in
| 2015? No they are not.
| sidewndr46 wrote:
| What does "the OS on a separate thread" mean? I'm also
| not aware of any consumer chips running 5 GHz in 2002
| no_wizard wrote:
| Funnily enough, most apps aren't taking enough advantage of
| multi-core multi-threading environments that are common
| across all major platforms.
|
| The single biggest bottleneck to improvement is the general
| lack of developers using the APIs to the fullest extent
| when designing applications. Its not really hardware
| anymore.
|
| Though, to the points being made, we aren't seeing the 18
| month doubling like we did in the earlier decades of
| computing.
| bee_rider wrote:
| I think it is often the case that people want to describe the
| problem as "single core performance has stagnated for decades"
| because it makes it look like their solution is _necessary to
| make any progress at all_.
|
| Actually, single core performance has been improving. Not as
| fast as it was in the 90's maybe, but it is improving.
|
| However, we can speed things up even more by using multiple
| computers. And it is a really interesting problem where you get
| to worry about all sorts of fun things, like hiding MPI
| communication between compute.
|
| Nobody wants to say "I have found that if I can make an already
| fast process even faster by putting in a lot of effort, which I
| will do because my job is actually really fun." Technical jobs
| are supposed to be stressful and serious. The world is doomed
| and science will stop... unless I come up with a magic trick!
| Legend2440 wrote:
| Single-core performance looks pretty stagnant on this graph,
| especially in the last ten years: https://imgur.com/DrOvPZt
|
| Transistor count has continued to increase exponentially, but
| single-threaded performance has improved slowly and appears
| to be leveling off. We may never get another 100x or even 10x
| improvement in single-threaded performance.
|
| It is going to be necessary to parallelize to see gains in
| the future.
| achierius wrote:
| But it's not flat? 10% growth a year is still growth.
| Ygg2 wrote:
| > This is false? Computers have gotten a lot faster
|
| Depends, what you mean by much. Single threaded performance is
| no longer 2x fast after a year. I mean, even in the GPU
| section, you get graphics that looks slightly better for 2-4x
| the cost (see street prices of 2080 vs 3080 vs 4080).
|
| Computing has hit the point of diminishing returns, exponetial
| growth for linear prices is no longer possible.
| foota wrote:
| I think this is meant to be read as, "over the past decade, you
| haven't been able to wait a year and buy a new CPU to solve
| your vertical scalability issues.", not necessarily to claim
| that there hasn't been significant growth when compared over
| the entire window.
| jeeyoungk wrote:
| DuckDB would've been a good example to be included, because it
| tries to target the need for horizontal scalability with an
| efficient implementation altogether. If your use case stays below
| the need for horizontal scalability (which in the modern world,
| mixture of clever implementation and crazy powerful computers do
| allow), then you can tackle quite a large workload.
| memhole wrote:
| And even then you have things like this:
|
| https://www.boilingdata.com/
| awkward wrote:
| I suppose if you're doing one you're not doing the other - the
| promise of future horizontal scale definitely justifies a lot of
| arguments about premature optimization.
|
| However, they aren't necessarily opposed. Optimization is usually
| subtractive - it's slicing parts off the total runtime.
| Horizontal scale is multiplicative - you're doing the same thing
| more times. Outside some very specific limits, usually efficiency
| means horizontal scaling is more effective. A slightly shorter
| runtime many times over means a much shorter runtime.
| Joel_Mckay wrote:
| Depends what you are optimizing, and whether your design uses
| application layer implicit load-balancing. Thus, avoiding
| constraints within the design patterns before they hit the
| routers can often reduce traditional design cost by 37 times or
| more.
|
| YMMV, depends if your data stream states are truly separable. =3
| einpoklum wrote:
| I'd say they're not fundamentally at odds, but they're at odds
| with a "greedy approach". That is, it is much easier to scale out
| when you're willing to make constraining assumptions about your
| program; and willing to pay a lot of overhead for distributed
| resource management, migrating pieces of work etc. If you want to
| scale while maintaining efficiency, you have to be aware of more
| things about the work that's being distributed; you have to
| struggle much harder to avoid different kinds of overhead and
| idleness; and if you really want to go the extra mile you need to
| think of how to turn the distribution partially to your _benefit_
| (example: Using the overhead you pay for fault-tolerance or high-
| availability by storing copies of your data in different formats,
| allowing different computations to prefer one format over the
| other; while on a single machine you wouldn't even have the extra
| copies).
___________________________________________________________________
(page generated 2025-02-12 23:00 UTC)