https://www.usenix.org/publications/loginonline/understanding-software-dynamics Skip to main content USENIX supports diversity, equity, and inclusion and condemns hate and discrimination. Home * About * Conferences * Publications * Membership * Students * Search * Donate Today * Sign In * Search * About + USENIX Board + Staff + Newsroom + Good Works + Blog + Governance and Financials + USENIX Awards + USENIX Supporters + 2024 Board Election + Board Meeting Minutes + Annual Fund * Conferences + Upcoming + By Name + Calls for Papers + Grants + Sponsorship + Best Papers + Test of Time Awards + Multimedia + Conference FAQ + Conference Policies + Code of Conduct * Publications + Proceedings + Author Resources + ;login: Online + Writing for ;login: Online + ;login: Archive * Membership * Students + Conference Fees + Campus Representative Program + Student Grant Program * Search * Donate Today Join the conversation Back to ;login: Online Understanding Software Dynamics Donate Today by Dick Sites March 3, 2022 Bookreview Authors: Rik Farrow Article shepherded by: Rik Farrow I started reading this book in December, and am still reading it as of March 2022. I needed that much time as there is a lot to digest in Sites' book. Also, I've enjoyed reading it, and like other books I enjoy reading, I often put it down when I've finished a section I want to spend more time thinking about. While you might think that a book with this title would only be important to programmers, its audience should be a lot wider. SREs, operating systems designers, realtime systems designers and hardware designers will all find much useful information in this book. The author's focus is on uncovering the subtle causes of long tail-latencies, but there is much to learn here. The book is divided into four parts. The first two parts, more than half the book, explains measurement and observation, tools and techniques needed to understand the design of KUTrace, but also providing great advice for SREs and programmers. In the first seven chapters, Sites demonstrates the importance of measuring the four major components of computer systems: CPU, memory, disks/SSDs, and network. He includes Jeff Dean's famous chart depicting the approximate time for completing various system activities, such as reading from L1 cache, from main memory on a cache miss, or time to read from a disk. Sites adds to this a column providing the order of magnitude for each of the times given. Sites strongly encourages readers to estimate how long their systems should take to complete transactions. He starts with simple arithmetic program examples, running long loops to make operations requiring nanoseconds take long enough to easily measure, then pointing out where compiler optimizations will completely wipe out loops that do nothing further with their variables. In chapter five, he starts with a matrix operation, one that is memory bound, and shows how interference with how cache lines get chosen slows down the matrix multiplication. In the end, Sites has reorganized how data has been accessed, improving performance by an order of magnitude. He then challenges readers to improve his sample program to squeeze out another 20% performance gain as an exercise. In Part two, observations, Sites provides clear information about how to log, collect, and display information. Following his theme of measurement, he points out how best to log data so as to avoid slowing down the very systems you need to observe. The focus is on being able to instrument systems in production, where the maximum slowdown acceptable must be 1% or less. Doing so involves how data is collected, how often, and how it is stored, with this strict focus on usability and efficiency. He goes as far as describing how data best gets used in dashboards. While this might seem to have strayed far from programming, Sites points out that that you need the ability to accurately measure the systems you are observing, and having monitoring that distorts what you are measuring is useless in finding out where the issues that are creating long-tail latency are coming from. Profiling, for example, can show you where you code executes most often based on timer interrupts, and miss those rare occasions that are causing the very long tail latency that you are striving to uncover. Part three describes the design of KUTrace, kernel-user trace. KUTrace is a complete tool chain that includes kernel patches, a kernel module, and tools for converting the millions of data points into comprehensible figures. There is a toolchain required for moving from the trace output, inserting log observations, converting time skews between systems, converting into JSON and creating an SVG and HTML page that can display the data in useful form using browsers. Part four provides examples of using KUTrace. You can get a feel for these chapters by reading Site's June 2020 ;login: article that incorporates examples from the book (https://www.usenix.org/system/ files/login/articles/login_summer20_05_sit...). This final section, entitled Reasoning, covers execution, slow instruction execution, waiting for CPU, memory, disk, network, software locks, queues, and timers. Like other places in the book, I found words of wisdom here that anyone interested in improving the performance of software services can learn from. When thinking about whether running multiple instances of the same program, Sites writes: Mixing programs that run well against themselves likely will encounter little if any interference. That was in a chapter examining interference between compute-bound programs. It appears obvious, but so do a lot of things you can read in this book. And they aren't really obvious until they have been explained. The HTML files created using the toolchain contain an incredible amount of information. The diagrams use symbols, text labels, 256 colors, and even Morse code, and it takes practice to make sense out of what you are seeing. The illustrations in both the print and electronic versions of the book are in color, but I sometimes need to magnify figures so I can see the details I was missing. For example, small, pointed triangles indicating the IPC at different points overlapped so closely that I couldn't make them out without magnification. Younger eyes may not have any trouble with the illustrations. When working with the HTML files in a browser, you can zoom in, mark areas of interest, as well as select other ways to display the data. In the Preface, Sites mentions that he got many helpful suggestions while teaching graduate-level courses after retiring from Google. Unpacking this a bit, you can imagine that this is a book for graduate students and advanced, professional programmers, written by an older man who worked at Google for many years. I think that any senior CS student or professional can benefit by reading this book. While all the material in the first half of the book leads up to the use of KUTrace, the first two parts are worth reading on their own by anyone who wants to better understand the systems they are building and using. Understanding Software Dynamics by Richard L. Sites Addison-Wesley, 2022, 465 pages ISBN-13: 978-0-13-758973-9 Article Categories: Operating Systems Programming Hardware Last updated February 8, 2023 Authors: [farrow_3] Rik Farrow has been a consultant for 40 years. He has written two books, as well as worked as the technical editor for a UNIX magazine and for two editions of a popular operating system book. He also taught UNIX system administration and Internet security during the 90s internationally, and worked as a volunteer for USENIX program and steering committees. Rik has been the editor of ;login: since 2005. rik@rikfarrow.com * Log in or Register to post comments Home (c) USENIX Website designed and built by Giant Rabbit LLC * * * * * Privacy Policy * Contact Us Sign up for Our Newsletter: [ ] [ ] [ ] [Submit]