[HN Gopher] Scalability, but at What Cost [pdf]
       ___________________________________________________________________
        
       Scalability, but at What Cost [pdf]
        
       Author : wglb
       Score  : 75 points
       Date   : 2021-04-24 15:14 UTC (7 hours ago)
        
 (HTM) web link (www.usenix.org)
 (TXT) w3m dump (www.usenix.org)
        
       | jorangreef wrote:
       | > "You can have a second computer once you've shown you know how
       | to use the first one."
        
       | Item_Boring wrote:
       | "While nearly all such publi- cations detail their system's
       | impressive scalability, few directly evaluate their absolute
       | performance against reasonable benchmarks. To what degree are
       | these systems truly improving performance, as opposed to
       | parallelizing overheads that they themselves introduce?"
       | 
       | That's one of the things that stuck with me after haven taken a
       | class in high performance computing. What matters is the absolute
       | performance, less the speed up one achieves by parallelisation.
        
       | samsquire wrote:
       | Interesting paper. I was recently reading the Wikipedia for graph
       | databases spurred on by Neo4j where there is a glaring statement:
       | 
       | "Research demonstrated that there is no benefit to using a graph-
       | centric execution engine and storage manager" - which I take to
       | be the kind of systems that the paper is critical of.
       | 
       | Which I suppose graph execution engines should outperform a
       | single threaded ""think like a vertex" problem.
       | 
       | Which links to this paper The case against specialized graph
       | analytics
       | engineshttp://cidrdb.org/cidr2015/Papers/CIDR15_Paper20.pdf
       | 
       | They use a relational database to outperform dedicated graph
       | databases.
       | 
       | Is part of the takeaway "hand coded single threaded think like a
       | vertex" beats distributed system which involves communication or
       | parallelisation.
        
       | cratermoon wrote:
       | I think this paper just shows that Amdhal's Law[1] is just as
       | relevant when discussing distributed systems as it is when
       | talking about multi-core single machines.
       | 
       | 1. https://en.wikipedia.org/wiki/Amdahl%27s_law
        
         | chubot wrote:
         | It's related to Amdahl's law but not identical. As I understand
         | it, Amdahl's law is talking about the situation where you have
         | say 10 units of work, and 5 of them can be parallelized, and 5
         | can't and must be run serially.
         | 
         | The COST paper is talking about the situation where you have 10
         | units of work, but it takes 10 more units of work to distribute
         | it across multiple computers.
         | 
         | Sometimes that's a win and sometimes it isn't, thus the phrase
         | "parallelizing your overhead". If all you did was parallelize
         | the additional overhead, then you didn't gain anything by using
         | a distributed system.
         | 
         | IOW, the overhead for distribution in distributed systems can
         | be very large. It's still there in parallel computing, but
         | shared memory makes it less pronounced. (Although NUMA makes
         | your single computer like a distributed system again.)
        
           | cratermoon wrote:
           | > it takes 10 more units of work to distribute it across
           | multiple computers.
           | 
           | That's just serialized work in a different form: in order for
           | a computer to do a unit of work, it must be sent a unit of
           | work. That's strictly serial -- a computer can't process a
           | unit of work it doesn't have. So no matter how many units of
           | work it takes to distribute the load, that's still 10 serial
           | operations.
           | 
           | Amdahl wasn't only seeing things as separate chunks, with the
           | parallelized work over one part of the code, and the serial
           | work in a different part. The operations can be interleaved.
        
         | benlivengood wrote:
         | I think more important is the actual cost of communication.
         | Networks are slow. 100Gbit is still a couple times slower than
         | the memory controllers in our phones.
         | 
         | https://en.m.wikipedia.org/wiki/Bulk_synchronous_parallel is a
         | useful model for incorporating communication costs into design
         | vs. PRAM.
        
           | cratermoon wrote:
           | BSP is a specific model for parallel computation, but it
           | still operates on the fundamentals of Amdahl's Law, and the
           | whole reason it exists is find the optimal divide-and-conquer
           | solution given a set of real-world limitations.
           | 
           | One might even say that the _cost_ of implementing your
           | algorithm with BSP is higher than PRAM, because of the extra
           | layers. But you can 't ignore some of the things that you can
           | in a strict PRAM model, so you have to incorporate those into
           | the cost as well.
           | 
           | Given Gustafson's law, if you have enough data, enough to
           | work to do, you can sort of ignore the un-parallelizable
           | fraction by throwing enough processors at the computation.
           | 
           | What starts to trip things up at this scale is
           | synchronization.
        
       | emodendroket wrote:
       | Interesting. But breaking stuff up in this way isn't purely a
       | performance optimization either; it's also intended to be a
       | method of organizational management, where someone is clearly
       | responsible for failures in any part of the system, and where the
       | parts of the program that are parts of the contract are very
       | clearly delineated.
        
       | nice2meetu wrote:
       | This reminds me of my previous job where we had an optimized
       | single-server solution, whereas the competition scaled
       | horizontally. They couldn't out-scale us at what we did until
       | they got to 20 servers or so, which only the biggest customers
       | needed anyway, and even then they would be really slow. You can
       | achieve some impressive speedups when you really focus on the
       | problem.
       | 
       | Still, it wasn't that helpful from a business perspective,
       | because apparently it is easier to justify mark up on solutions
       | with lots of servers.
        
         | dilyevsky wrote:
         | What happens when your single server solution is disconnected
         | from the network for whatever reason or you exceed the current
         | configuration? Bad times i bet. Cost of components isn't the
         | only factor here.
        
           | jdc wrote:
           | Isn't this a solved problem? Just use a failover, right?
        
           | tartoran wrote:
           | That's only if there is no contingency plan in place and a
           | backup server. Usually these things are thought out
           | beforehand. And it's not like you're excused from this type
           | of problems when you're running your solution in the cloud.
           | From cloud outages to configurations nightmares, data
           | inconsistencies, not knowing what is happening because
           | pinpointing on a complex infrastructure takes more time and
           | so on. Sometimes the cloud way is the way to go but other
           | times it is not justified.
        
       | benlivengood wrote:
       | I wonder if a lot of these algorithms/libraries are a decade or
       | more old and modern CPUs and RAM have caught up with problems
       | that used to be intractable on a single machine, and were
       | furthermore optimized for older generations of clusters with
       | different interconnects. Modern CPU packages incorporate a lot of
       | features from world-class supercomputers from a couple decades
       | ago.
       | 
       | In theoretical CS classes there were discussions of the tradeoffs
       | between networked, NUMA, and other fabrics. Analysis of what
       | actually ran the fastest was talked about briefly beyond Big O
       | notation, but there is a definite advantage to making problems
       | tractable that otherwise wouldn't be. In the FAANGs it was mostly
       | embarrassingly parallel algorithms with a skin of distributed
       | computing for synchronization/coordination, and so the focus had
       | always been on absolute speed or efficiency.
        
       | Chris_Newton wrote:
       | (2015), but still a paper I'd encourage developers and technical
       | managers to read as a counterpoint to the cloud advocacy of
       | recent years.
       | 
       | Note to mods: The title is correctly written "COST"; it's short
       | for "Configuration that Outperforms a Single Thread", the central
       | concept being discussed.
        
       ___________________________________________________________________
       (page generated 2021-04-24 23:01 UTC)