[HN Gopher] Scalability, but at What Cost [pdf]
___________________________________________________________________
Scalability, but at What Cost [pdf]
Author : wglb
Score : 75 points
Date : 2021-04-24 15:14 UTC (7 hours ago)
(HTM) web link (www.usenix.org)
(TXT) w3m dump (www.usenix.org)
| jorangreef wrote:
| > "You can have a second computer once you've shown you know how
| to use the first one."
| Item_Boring wrote:
| "While nearly all such publi- cations detail their system's
| impressive scalability, few directly evaluate their absolute
| performance against reasonable benchmarks. To what degree are
| these systems truly improving performance, as opposed to
| parallelizing overheads that they themselves introduce?"
|
| That's one of the things that stuck with me after haven taken a
| class in high performance computing. What matters is the absolute
| performance, less the speed up one achieves by parallelisation.
| samsquire wrote:
| Interesting paper. I was recently reading the Wikipedia for graph
| databases spurred on by Neo4j where there is a glaring statement:
|
| "Research demonstrated that there is no benefit to using a graph-
| centric execution engine and storage manager" - which I take to
| be the kind of systems that the paper is critical of.
|
| Which I suppose graph execution engines should outperform a
| single threaded ""think like a vertex" problem.
|
| Which links to this paper The case against specialized graph
| analytics
| engineshttp://cidrdb.org/cidr2015/Papers/CIDR15_Paper20.pdf
|
| They use a relational database to outperform dedicated graph
| databases.
|
| Is part of the takeaway "hand coded single threaded think like a
| vertex" beats distributed system which involves communication or
| parallelisation.
| cratermoon wrote:
| I think this paper just shows that Amdhal's Law[1] is just as
| relevant when discussing distributed systems as it is when
| talking about multi-core single machines.
|
| 1. https://en.wikipedia.org/wiki/Amdahl%27s_law
| chubot wrote:
| It's related to Amdahl's law but not identical. As I understand
| it, Amdahl's law is talking about the situation where you have
| say 10 units of work, and 5 of them can be parallelized, and 5
| can't and must be run serially.
|
| The COST paper is talking about the situation where you have 10
| units of work, but it takes 10 more units of work to distribute
| it across multiple computers.
|
| Sometimes that's a win and sometimes it isn't, thus the phrase
| "parallelizing your overhead". If all you did was parallelize
| the additional overhead, then you didn't gain anything by using
| a distributed system.
|
| IOW, the overhead for distribution in distributed systems can
| be very large. It's still there in parallel computing, but
| shared memory makes it less pronounced. (Although NUMA makes
| your single computer like a distributed system again.)
| cratermoon wrote:
| > it takes 10 more units of work to distribute it across
| multiple computers.
|
| That's just serialized work in a different form: in order for
| a computer to do a unit of work, it must be sent a unit of
| work. That's strictly serial -- a computer can't process a
| unit of work it doesn't have. So no matter how many units of
| work it takes to distribute the load, that's still 10 serial
| operations.
|
| Amdahl wasn't only seeing things as separate chunks, with the
| parallelized work over one part of the code, and the serial
| work in a different part. The operations can be interleaved.
| benlivengood wrote:
| I think more important is the actual cost of communication.
| Networks are slow. 100Gbit is still a couple times slower than
| the memory controllers in our phones.
|
| https://en.m.wikipedia.org/wiki/Bulk_synchronous_parallel is a
| useful model for incorporating communication costs into design
| vs. PRAM.
| cratermoon wrote:
| BSP is a specific model for parallel computation, but it
| still operates on the fundamentals of Amdahl's Law, and the
| whole reason it exists is find the optimal divide-and-conquer
| solution given a set of real-world limitations.
|
| One might even say that the _cost_ of implementing your
| algorithm with BSP is higher than PRAM, because of the extra
| layers. But you can 't ignore some of the things that you can
| in a strict PRAM model, so you have to incorporate those into
| the cost as well.
|
| Given Gustafson's law, if you have enough data, enough to
| work to do, you can sort of ignore the un-parallelizable
| fraction by throwing enough processors at the computation.
|
| What starts to trip things up at this scale is
| synchronization.
| emodendroket wrote:
| Interesting. But breaking stuff up in this way isn't purely a
| performance optimization either; it's also intended to be a
| method of organizational management, where someone is clearly
| responsible for failures in any part of the system, and where the
| parts of the program that are parts of the contract are very
| clearly delineated.
| nice2meetu wrote:
| This reminds me of my previous job where we had an optimized
| single-server solution, whereas the competition scaled
| horizontally. They couldn't out-scale us at what we did until
| they got to 20 servers or so, which only the biggest customers
| needed anyway, and even then they would be really slow. You can
| achieve some impressive speedups when you really focus on the
| problem.
|
| Still, it wasn't that helpful from a business perspective,
| because apparently it is easier to justify mark up on solutions
| with lots of servers.
| dilyevsky wrote:
| What happens when your single server solution is disconnected
| from the network for whatever reason or you exceed the current
| configuration? Bad times i bet. Cost of components isn't the
| only factor here.
| jdc wrote:
| Isn't this a solved problem? Just use a failover, right?
| tartoran wrote:
| That's only if there is no contingency plan in place and a
| backup server. Usually these things are thought out
| beforehand. And it's not like you're excused from this type
| of problems when you're running your solution in the cloud.
| From cloud outages to configurations nightmares, data
| inconsistencies, not knowing what is happening because
| pinpointing on a complex infrastructure takes more time and
| so on. Sometimes the cloud way is the way to go but other
| times it is not justified.
| benlivengood wrote:
| I wonder if a lot of these algorithms/libraries are a decade or
| more old and modern CPUs and RAM have caught up with problems
| that used to be intractable on a single machine, and were
| furthermore optimized for older generations of clusters with
| different interconnects. Modern CPU packages incorporate a lot of
| features from world-class supercomputers from a couple decades
| ago.
|
| In theoretical CS classes there were discussions of the tradeoffs
| between networked, NUMA, and other fabrics. Analysis of what
| actually ran the fastest was talked about briefly beyond Big O
| notation, but there is a definite advantage to making problems
| tractable that otherwise wouldn't be. In the FAANGs it was mostly
| embarrassingly parallel algorithms with a skin of distributed
| computing for synchronization/coordination, and so the focus had
| always been on absolute speed or efficiency.
| Chris_Newton wrote:
| (2015), but still a paper I'd encourage developers and technical
| managers to read as a counterpoint to the cloud advocacy of
| recent years.
|
| Note to mods: The title is correctly written "COST"; it's short
| for "Configuration that Outperforms a Single Thread", the central
| concept being discussed.
___________________________________________________________________
(page generated 2021-04-24 23:01 UTC)