[HN Gopher] Understanding Software Dynamics [book review]
___________________________________________________________________
Understanding Software Dynamics [book review]
Author : signa11
Score : 32 points
Date : 2024-07-08 15:09 UTC (7 hours ago)
(HTM) web link (www.usenix.org)
(TXT) w3m dump (www.usenix.org)
| eatonphil wrote:
| A gang of us are reading through this book right now and with
| fortune, Dick Sites has joined along too. It's quite an
| interesting book and quite challenging too. I love the
| performance archaeology Sites has done and also like the emphasis
| on 1) understanding his stated five fundamental resources (disk,
| network, cpu, memory, and software critical sections) and 2) how
| profiling (with hardware performance counters) can be cheap and
| effective but will only help you out with average performance and
| not p99 behavior. For that you need tracing.
|
| We're halfway through the book so my takeaways my differ by the
| end. But it's possibly the most densely packed book I've read.
| Will definitely require future rereading.
| nsguy wrote:
| Book looks very interesting.
|
| random nit (obviously not having read the book): I would say
| p99 behaviour can be captured in profiling, it's just going to
| be the p99 of the profiles. E.g. if you sample 10,000 stack
| traces out of your executable, 1% of those are going to be in
| that p99, sort of by definition. Tracing through requests
| (e.g.) is useful but I wouldn't make as strong as a statement
| as that's the only way of understanding p99 performance based
| on my experience.
| eatonphil wrote:
| Yeah, bad choice of words. Good thing I didn't write the
| book. :) How about: the long tail of bad behavior can hide
| behind sampling profilers.
| pclmulqdq wrote:
| While this is correct in a literal sense, you are missing
| something that the profiler tells you about your p99 (and
| p99.9) tails of the end-to-end system: sources of "slowness"
| are often correlated in these requests. Some systems I have
| worked on have p99 times that are built out of a combination
| of 90th percentile events that you would find in a profiler.
| In this case, a profiler doesn't tell you about anything
| being particularly bad.
|
| Profilers also say nothing about queueing, and can very much
| mislead you if you care about latency in specific.
|
| If your "slowness" is driven by a single function (or made of
| truly uncorrelated events), you can accurately measure your
| tails with a profile. If not, a trace will give you
| meaningfully more information.
| nsguy wrote:
| Sure. The profiler is going to give you information related
| to what it is looking at. If your bottleneck is disk I/O
| then you need to look at disk I/O. If your bottleneck is
| some other mechanism that's not purely cycles then you need
| to look at the relevant metrics.
|
| Your slowness is always a function of the underlying
| building blocks, their performance distribution and
| bottleneck. And sure, two 90th percentiles can make for the
| 99th percentile. A profiler won't magically convey the
| information about what sequence of operations a request is
| doing under the hood.
|
| I agree that having visibility into the requests via
| tracing can help zoom in on the problem. But so can having
| metrics on the underlying systems, e.g. if you have a queue
| in your system you could look at the performance of that
| queue.
|
| I'll admit that most of my experience is tuning for maximal
| throughput rather than a given percentile, usually systems
| with high performance/throughput yield a much flatter
| distribution of latencies at a given workload. A rule of
| thumb. I also tend to think about my "budget" in the
| various parts of the system to get to my desired
| performance characteristics, a luxury you don't have on
| "legacy" systems you need to troubleshoot some behaviour on
| where tracing also lets you get a "cut" through the stack
| that shows you what's going on.
| pclmulqdq wrote:
| I read through this book last year, and it was a very good book.
| I loved the first half, and it hits on something that I tell
| other engineers about performance, which is that while your
| performance tool will tell you the truth, it will tell you the
| truth about a very narrow question. That means that you should be
| using it as a hypothesis testing tool, which means developing the
| hypothesis _first_ based on an understanding of how the computer
| works. I 'm pretty sure it's the first half, on how to think
| about software to generate these hypotheses, that sells the book.
|
| However, the last third/half lost me. It primarily discusses (and
| advertises) the usage of one tracing tool that the author built.
| All performance tools, particularly the tracing tools which tend
| to be very heavy, have strengths and weaknesses, and you are
| going to need to mix your tools if you want to really understand
| things.
|
| It's well worth the money and the time for the first half of the
| book, though.
| CalChris wrote:
| TL;DR? Well his article _Benchmarking "Hello, World!"_ develops a
| lot of the ideas which show up in his book.
|
| https://queue.acm.org/detail.cfm?id=3291278
___________________________________________________________________
(page generated 2024-07-08 23:01 UTC)