https://matklad.github.io//2021/03/12/goroutines-are-not-significantly-lighter-than-threads.html matklad About Resume Goroutines Are Not Significantly Lighter Than Threads Mar 12, 2021 The most commonly cited drawback of OS-level threads is that they use a lot of RAM. This is not true on Linux. Let's compare memory footprint of 10_000 Linux threads with 10_000 goroutines. We spawn 10k workers, which sleep for about 10 seconds, waking up every 10 milliseconds. Each worker is staggered by a pseudorandom delay up to 200 milliseconds to avoid the thundering herd problem. main.rs 1 use std::{thread, time::Duration}; 2 3 fn main() { 4 let mut threads = Vec::new(); 5 for i in 0u32..10_000 { 6 let t = thread::spawn(move || { 7 let bad_hash = i.wrapping_mul(2654435761) % 200_000; 8 thread::sleep(Duration::from_micros(bad_hash as u64)); 9 for _ in 0..1000 { 10 thread::sleep(Duration::from_millis(10)); 11 } 12 }); 13 threads.push(t); 14 } 15 16 for t in threads { 17 t.join().unwrap() 18 } 19 } main.go 1 package main 2 3 import ( 4 "sync" 5 "time" 6 ) 7 8 func main() { 9 var wg sync.WaitGroup 10 for i := uint32(0); i < 10_000; i++ { 11 wg.Add(1) 12 go func() { 13 defer wg.Done() 14 bad_hash := (i * 2654435761) % 200_000 15 time.Sleep(time.Duration(bad_hash) * time.Microsecond) 16 for j := 0; j < 1000; j++ { 17 time.Sleep(10 * time.Millisecond) 18 } 19 }() 20 } 21 wg.Wait() 22 } We use time utility to measure memory usage: t 1 #!/bin/sh 2 command time --format 'real %es\nuser %Us\nsys %Ss\nrss %Mk' "$@" The results: 1 l rustc main.rs -C opt-level=3 && ./t ./main 2 real 10.35s 3 user 4.96s 4 sys 16.06s 5 rss 94472k 6 7 l go build main.go && ./t ./main 8 real 10.92s 9 user 13.30s 10 sys 0.55s 11 rss 34924k A thread is only 3 times as large as a goroutine. Absolute numbers are also significant: 10k threads require only 100 megabytes of overhead. If the application does 10k concurrent things, 100mb might be negligible. --------------------------------------------------------------------- Note that it is wrong to use this benchmark to compare performance of threads and goroutines. The workload is representative for measuring absolute memory overhead, but is not representative for time overhead. That being said, it is possible to explain why threads need 21 seconds of CPU time while goroutines need only 14. Go runtime spawns a thread per CPU-core, and tries hard to keep each goroutine tied to specific thread (and, by extension, CPU). Threads by default migrate between CPUs, which incurs synchronization overhead. Pining threads to cores in a round-robin fashion removes this overhead: 1 l cargo build --release && ./t ./target/release/main --pin-to-core 2 Finished release [optimized] target(s) in 0.00s 3 real 10.36s 4 user 3.01s 5 sys 9.08s 6 rss 94856k The total CPU time now is approximately the same, but the distribution is different. On this workload, goroutine scheduler spends roughly the same amount of cycles in the userspace that the thread scheduler spends in the kernel. Code for the benchmarks is available here: matklad/10k_linux_threads. fix typo rss matklad