hngopher.com

       [HN Gopher] Understanding Software Dynamics [book review]
       ___________________________________________________________________
        
       Understanding Software Dynamics [book review]
        
       Author : signa11
       Score  : 32 points
       Date   : 2024-07-08 15:09 UTC (7 hours ago)
        
 (HTM) web link (www.usenix.org)
 (TXT) w3m dump (www.usenix.org)
        
       | eatonphil wrote:
       | A gang of us are reading through this book right now and with
       | fortune, Dick Sites has joined along too. It's quite an
       | interesting book and quite challenging too. I love the
       | performance archaeology Sites has done and also like the emphasis
       | on 1) understanding his stated five fundamental resources (disk,
       | network, cpu, memory, and software critical sections) and 2) how
       | profiling (with hardware performance counters) can be cheap and
       | effective but will only help you out with average performance and
       | not p99 behavior. For that you need tracing.
       | 
       | We're halfway through the book so my takeaways my differ by the
       | end. But it's possibly the most densely packed book I've read.
       | Will definitely require future rereading.
        
         | nsguy wrote:
         | Book looks very interesting.
         | 
         | random nit (obviously not having read the book): I would say
         | p99 behaviour can be captured in profiling, it's just going to
         | be the p99 of the profiles. E.g. if you sample 10,000 stack
         | traces out of your executable, 1% of those are going to be in
         | that p99, sort of by definition. Tracing through requests
         | (e.g.) is useful but I wouldn't make as strong as a statement
         | as that's the only way of understanding p99 performance based
         | on my experience.
        
           | eatonphil wrote:
           | Yeah, bad choice of words. Good thing I didn't write the
           | book. :) How about: the long tail of bad behavior can hide
           | behind sampling profilers.
        
           | pclmulqdq wrote:
           | While this is correct in a literal sense, you are missing
           | something that the profiler tells you about your p99 (and
           | p99.9) tails of the end-to-end system: sources of "slowness"
           | are often correlated in these requests. Some systems I have
           | worked on have p99 times that are built out of a combination
           | of 90th percentile events that you would find in a profiler.
           | In this case, a profiler doesn't tell you about anything
           | being particularly bad.
           | 
           | Profilers also say nothing about queueing, and can very much
           | mislead you if you care about latency in specific.
           | 
           | If your "slowness" is driven by a single function (or made of
           | truly uncorrelated events), you can accurately measure your
           | tails with a profile. If not, a trace will give you
           | meaningfully more information.
        
             | nsguy wrote:
             | Sure. The profiler is going to give you information related
             | to what it is looking at. If your bottleneck is disk I/O
             | then you need to look at disk I/O. If your bottleneck is
             | some other mechanism that's not purely cycles then you need
             | to look at the relevant metrics.
             | 
             | Your slowness is always a function of the underlying
             | building blocks, their performance distribution and
             | bottleneck. And sure, two 90th percentiles can make for the
             | 99th percentile. A profiler won't magically convey the
             | information about what sequence of operations a request is
             | doing under the hood.
             | 
             | I agree that having visibility into the requests via
             | tracing can help zoom in on the problem. But so can having
             | metrics on the underlying systems, e.g. if you have a queue
             | in your system you could look at the performance of that
             | queue.
             | 
             | I'll admit that most of my experience is tuning for maximal
             | throughput rather than a given percentile, usually systems
             | with high performance/throughput yield a much flatter
             | distribution of latencies at a given workload. A rule of
             | thumb. I also tend to think about my "budget" in the
             | various parts of the system to get to my desired
             | performance characteristics, a luxury you don't have on
             | "legacy" systems you need to troubleshoot some behaviour on
             | where tracing also lets you get a "cut" through the stack
             | that shows you what's going on.
        
       | pclmulqdq wrote:
       | I read through this book last year, and it was a very good book.
       | I loved the first half, and it hits on something that I tell
       | other engineers about performance, which is that while your
       | performance tool will tell you the truth, it will tell you the
       | truth about a very narrow question. That means that you should be
       | using it as a hypothesis testing tool, which means developing the
       | hypothesis _first_ based on an understanding of how the computer
       | works. I 'm pretty sure it's the first half, on how to think
       | about software to generate these hypotheses, that sells the book.
       | 
       | However, the last third/half lost me. It primarily discusses (and
       | advertises) the usage of one tracing tool that the author built.
       | All performance tools, particularly the tracing tools which tend
       | to be very heavy, have strengths and weaknesses, and you are
       | going to need to mix your tools if you want to really understand
       | things.
       | 
       | It's well worth the money and the time for the first half of the
       | book, though.
        
       | CalChris wrote:
       | TL;DR? Well his article _Benchmarking "Hello, World!"_ develops a
       | lot of the ideas which show up in his book.
       | 
       | https://queue.acm.org/detail.cfm?id=3291278
        
       ___________________________________________________________________
       (page generated 2024-07-08 23:01 UTC)