[HN Gopher] Linux Performance Analysis (2015)
___________________________________________________________________
Linux Performance Analysis (2015)
Author : benjacksondev
Score : 163 points
Date : 2025-07-29 13:15 UTC (9 hours ago)
(HTM) web link (netflixtechblog.com)
(TXT) w3m dump (netflixtechblog.com)
| emmelaich wrote:
| Nice list. sar/sysstat is underrated imho.
| mmh0000 wrote:
| Oh man. There's a blast from the past.
|
| Today, you'd want something like:
|
| Prometheus + Node Exporter [1]
|
| [1] https://github.com/prometheus/node_exporter
| mortar wrote:
| 2015
|
| Previous discussions:
| https://news.ycombinator.com/item?id=10654681
| https://news.ycombinator.com/item?id=10652076
| danieldk wrote:
| Yeah, I skipped the date and then saw Linux 3.13 in the
| examples.
| whalesalad wrote:
| I quite like `iotop` as an alternative to iostat.
| https://linux.die.net/man/1/iotop
| CodeCompost wrote:
| > At Netflix we have a massive EC2 Linux cloud
|
| Wait a minute. I thought Netflix famously ran FreeBSD.
| craftkiller wrote:
| My understanding was their CDN ran on FreeBSD, but not their
| API servers. But I don't work for Netflix.
| diab0lic wrote:
| Your understanding is correct.
| achierius wrote:
| Why did they not choose to use it for both (or neither)?
| I.e., what reasons for using FreeBSD on CDN servers would
| not also apply to using them for API servers?
| seabrookmx wrote:
| They are extremely different workloads so.. everything?
|
| The CDN servers are basically appliances, and are often
| embedded in various data centers (includes those ran by
| ISP's) to aggressively cache content. They care about
| high throughput and run a single workload. Being able to
| fine tune the entire stack, right down to the TCP/IP
| implementation is very valuable in this case. Since they
| ship the hardware and software, they can tightly
| integrate the two.
|
| By contrast, API workloads are very heterogeneous. I'd
| have to imagine the ability to run any standard Linux
| software there would also be a big plus. Linux clearly
| has much more vetting on cloud providers than FreeBSD as
| well.
| aflag wrote:
| Can't you fine tune linux as well? Does FreeBSD perform
| better somehow on a CDN workload? I find it difficult to
| imagine that the reason is performance. But I don't know
| what the reason is.
| craftkiller wrote:
| Netflix discusses their reasons starting at 18:20:
| https://www.youtube.com/watch?v=veQwkG0WdN8&t=18m20s
|
| tl;dw: the performance, the efficiency of development,
| the community, FreeBSD is a complete operating system,
| the code base is smaller, the ports system, and the
| license.
|
| and this video covers the optimizations Netflix has made
| to FreeBSD: https://www.youtube.com/watch?v=36qZYL5RlgY
|
| Also potentially a reason: According to drewg123, Linux's
| kTLS was broken. Which I see drewg123 also commenting in
| this thread. Is he the "Drew on my team" mentioned in the
| first video? Is he the speaker in the 2nd video? Idk
| https://news.ycombinator.com/item?id=28585008
| drewg123 wrote:
| The CDN runs FreeBSD. Linux is used for nearly everything else.
| __turbobrew__ wrote:
| If you like this post, I would recommend "BPF Performance Tools"
| and "Systems Performance: Enterprise and the Cloud" by Brenden
| Gregg.
|
| I have pulled out a few miracles using these tools (identifying
| kernel bottlenecks or profiling programs using ebpf) and it has
| been well worth the investment to read through the books.
| yankcrime wrote:
| Agreed, highly recommended reading. A slightly more up-to-date
| post of his which recommends tools in such situations is:
| https://www.brendangregg.com/blog/2024-03-24/linux-crisis-to...
| wcunning wrote:
| Literally did miracles at my last job with the first book and
| that got me my current job, where I also did some impressive
| proving which libraries had what performance with it again...
| Seriously valuable stuff.
| __turbobrew__ wrote:
| Yea it is kindof cheating. I was helping someone debug why
| their workload was soft locking. I ran the profiling tools
| and found that cgroup accounting for the workload was taking
| nearly all the cpu time on locks. From searches through linux
| git logs I found that cgroup accounting in older kernels had
| global locks. I saw that newer kernels didn't have this, so
| we moved to a newer kernels and all the issues went away.
|
| People thought I was a wizard lol.
| babuloseo wrote:
| he forgot about rusttop
| AnyTimeTraveler wrote:
| I'm pretty sure that that didn't exist in 2015 ;)
| janvdberg wrote:
| My first command is always 'w'. And I always urge young engineers
| to do the same.
|
| There is no shorter command to show uptime, load averages (1/5/15
| minutes), logged in users. Essential for quick system health
| checks!
| Propelloni wrote:
| Me too! So much so that I add it to my .bashrc everywhere.
| mmh0000 wrote:
| It should also be mentioned, Linux Load Average is a complex
| beast[1]. However, a general rule of thumb that works for most
| environments is:
|
| You always want the load average to be less than the total
| number of CPU cores. If higher, you're likely experiencing a
| lot of waits and context switching.
|
| [1] https://www.brendangregg.com/blog/2017-08-08/linux-load-
| aver...
| chasil wrote:
| Glances is nice. I think it is a clone of HP-UX Glance.
|
| https://nicolargo.github.io/glances/
|
| I have also hacked basic top to add database login details to
| server processes.
| louwrentius wrote:
| The iostat command has always been important to observe HDD/SDD
| latency numbers.
|
| Especially SSDs are treated like _magic_ storage devices with
| _infinite_ IOPS at Planck-scale latency.
|
| Until you discover that SSDs that can do 10GB/s don't do nearly
| so well (not even close) when you access them in a single thread
| with random IOPS, with queue depth of 1.
| wcunning wrote:
| That's where you start down the eBPF rabbit hole with
| bcc/biolatency and other block device histogram tools. Further,
| the cache hit rate and block size behavior of the SSD/NVME
| drive can really affect things if, say, your autonomous vehicle
| logging service uses MCAP with a chunk size much smaller than a
| drive block... Ask me how I know
| rkachowski wrote:
| it's 10 years later - what's the 60 second equivalent in 2025?
| wcunning wrote:
| @yankcrime posted it above:
| https://www.brendangregg.com/blog/2024-03-24/linux-crisis-to...
| BlackLotus89 wrote:
| PSI (pressure stall information) are missing.
|
| I always use a configured!(F2) htop (not mentioned as well).
| Always enable PSI information in htop (some red hat systems I
| work with still don't offer them...).
|
| If you have zfs enable those meters as well and htop has an
| io tab, use it!
| ImPostingOnHN wrote:
| Maybe I missed it, but checking available disk space is often a
| good step in diagnosing misbehaving systems.
| fduran wrote:
| shameless plug: you can practice this in a free VM
| https://docs.sadservers.com/docs/scenario-guides/practical-l...
| (there's a typo there to keep you on your feet)
| ch33zer wrote:
| Almost all of these have been replaced for me with below:
| https://developers.facebook.com/blog/post/2021/09/21/below-t...
|
| It is excellent and contains most things you could need. Downside
| is that it isn't yet a standard tool so you need to get it
| installed across your fleet
| benreesman wrote:
| Oh man nostalgia city. I vividly remember meeting atop time
| travel debugging at 3am in Menlo Park in 2012, wild times.
| 5pl1n73r wrote:
| After this article was written, `free -m` on many systems started
| to have an "available" column that shows the sum of reclaimable
| and free memory. It's nicer than the "-/+" section shown in this
| old article. $ free -m total
| used free shared buff/cache available Mem:
| 3915 2116 1288 41 769 1799
| Swap: 974 0 974
| tomhow wrote:
| Previously:
|
| _Linux Performance Analysis in 60,000 Milliseconds_ -
| https://news.ycombinator.com/item?id=10652076 - Nov 2015 (11
| comments)
|
| _Linux Performance Analysis_ -
| https://news.ycombinator.com/item?id=10654681 - Dec 2015 (82
| comments)
|
| _Linux Performance Analysis in 60k Milliseconds (2015) [pdf]_ -
| https://news.ycombinator.com/item?id=44070741 - May 2025 (1
| comment)
___________________________________________________________________
(page generated 2025-07-29 23:01 UTC)