https://www.computerenhance.com/p/table-of-contents [https] Computer, Enhance! Subscribe Sign in Share this post [https] Table of Contents www.computerenhance.com Copy link Facebook Email Note Other Programming Courses Table of Contents Every entry in every series, listed for quick navigation. [https] Casey Muratori Jan 27, 2023 781 Share this post [https] Table of Contents www.computerenhance.com Copy link Facebook Email Note Other 73 Share Performance-Aware Programming Series This series is designed for programmers who know how to write programs, but don't know how hardware runs those programs. It's designed to bring you up to speed on how modern CPUs work, how to estimate the expected speed of performance-critical code, and the basic optimization techniques every programmer should know. The course is broken into parts, with the first part (the "prologue") being strictly a demonstration with no associated homework. Later parts feature weekly homework. Q&A session videos are posted every Monday. If you have a question you'd like answered, please put it in the comments of the most recent Q&A video. Homework listings are available from github. Prologue: The Five Multipliers (3 1/2 hours, no homework) This part of the course gives simple demonstrations of how seemingly minor code changes can produce dramatically different software performance, even for very simple operations. 1. Welcome to the Performance-Aware Programming Series! (22:05) 2. Waste (32:56) 3. Instructions Per Clock (25:05) 4. Single Instruction, Multiple Data (35:31) 5. Caching (22:55) 6. Multithreading (32:11) 7. Python Revisited (36:22) Interlude (1 hour, no homework) 1. The Haversine Distance Problem (30:28) 2. "Clean" Code, Horrible Performance (22:40) Part 1: Reading ASM (7 hours, plus homework) This part of the course is designed to ensure that everyone taking the course has a solid understanding of how a CPU works at the assembly-language level. 1. Instruction Decoding on the 8086 (28:28) 2. Decoding Multiple Instructions and Suffixes (43:51) 3. Opcode Patterns in 8086 Arithmetic (20:01) 4. 8086 Decoder Code Review (1:17:49) 5. Using the Reference Decoder as a Shared Library (8:48) 6. Simulating Non-memory MOVs (18:00) 7. Simulating ADD, SUB, and CMP (25:56) 8. Simulating Conditional Jumps (19:41) 9. Simulating Memory (26:32) 10. Simulating Real Programs (16:02) 11. Other Common Instructions (19:43) 12. The Stack (26:58) 13. Estimating Cycles (23:56) 14. From 8086 to x64 (26:21) 15. 8086 Simulation Code Review (33:05) Part 2: Basic Profiling (4 hours, plus homework) In this part of the course, we learn about how to measure time, and instrument programs to automatically determine where time is being spent. 1. Generating Haversine Input JSON (15:40) 2. Writing a Simple Haversine Distance Processor (12:09) 3. Initial Haversine Processor Code Review (29:22) 4. Introduction to RDTSC (48:05) 5. How does QueryPerformanceCounter measure time? (31:43) 6. Instrumentation-Based Profiling (18:01) 7. Profiling Nested Blocks (26:12) 8. Profiling Recursive Blocks (30:44) 9. A First Look at Profiling Overhead (18:37) 10. Comparing the Overhead of RDTSC and QueryPerformanceCounter (13:00) Part 3: Moving Data (currently in progress) Using our knowledge from parts 1 and 2, in Part 3 we look at how data moves into the CPU, and how to estimate the upper performance limits of our software imposed by the need to move data. 1. Measuring Data Throughput (21:54) 2. Repetition Testing (27:57) 3. Monitoring OS Performance Counters (20:25) 4. Page Faults (38:52) 1. Probing OS Page Fault Behavior* (33:05) 2. Four-Level Paging* (31:23) 3. Analyzing Page Fault Anomalies* (31:44) 4. Powerful Page Mapping Techniques* (39:20) 5. Faster Reads with Large Page Allocations (25:52) 6. Memory-Mapped Files* (20:46) 5. Inspecting Loop Assembly (32:31) 6. Intuiting Latency and Throughput (22:57) 7. Analyzing Dependency Chains (29:06) 8. Linking Directly to ASM for Experimentation (48:07) 9. CPU Front End Basics (31:09) 10. Branch Prediction (42:03) 11. Code Alignment (32:03) 12. The RAT and the Register File (45:21) 13. Execution Ports and the Scheduler (34:51) 14. Increasing Read Bandwidth with SIMD Instructions (37:52) 15. Cache Size and Bandwidth Testing (34:00) 16. Non-Power-of-Two Cache Size Testing (35:15) * Entries with an asterisk were "bonus" entries that can be skipped. Part 3 is still in progress - more videos will be added here as they are scheduled. Additional parts will follow after Part 3 is complete. 1994 Internship Interview Series 1. The Four Programming Questions from My 1994 Microsoft Internship Interview (19:02) 2. Question #1: Rectangle Copy (24:50) 3. Question #2: String Copy (14:50) 4. Question #3: Flood Fill Detection (23:58) 5. Question #4: Outline a Circle (1:09:01) 781 Share this post [https] Table of Contents www.computerenhance.com Copy link Facebook Email Note Other 73 Share 73 Comments [https] [ ] Share this discussion [https] Table of Contents www.computerenhance.com Copy link Facebook Email Note Other Daniel V Jan 28, 2023Liked by Casey Muratori I became a paid subscriber solely for this course. I am SUPER [https] stoked!! Expand full comment Reply Share Max Jan 27, 2023Liked by Casey Muratori [https] Can't wait for this! Expand full comment Reply Share 71 more comments... Top New Community No posts Ready for more? [ ] Subscribe (c) 2024 Casey Muratori Privacy [?] Terms [?] Collection notice Start WritingGet the app Substack is the home for great writing This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts