* * * * * Stupid multithreaded benchmarks Dan Lyke ran a stupid benchmark [1]—just incrementing a variable to a billion, one just flat out, and one between some pthreads locking primitives. I wanted to see what stupid results I would get if I used spin locks [2]. First, the relevant bit of code (used nasm [3] to compile it under a 2.6GHz (gigaHertz) dual-core Pentium running Linux 2.6): > bits 32 > global t1 > global t2 > > section .data > gv dd 0 > glock dd 0 > > section .text > > align 16 > t1: mov eax,[gv] > inc eax > mov [gv],eax > cmp eax,1000000000 > jl t1 > > ; ret > > ; the following implements _exit() > ; for the multithreaded version > > xor ebx,ebx > mov eax,252 > int $80 > mov eax,1 > int $80 > hlt > > align 16 > t2: mov al,1 > t2.wait: xchg al,[glock] > or al,al > jne t2.wait > t2.here: mov eax,[gv] > inc eax > mov [gv],eax > mov byte [glock],0 > cmp eax,1000000000 > jl t2 > > ;ret > > xor ebx,ebx > mov eax,252 > int $80 > mov eax,1 > int $80 > hlt > Straightforward implementations here. t1() is the straight through counting routine, while t2() is the one with the spin lock. Running single threaded yielded these results: Table: Counting, single threaded routine time to execute ------------------------------ t1() 2.454s t2() 39.752s While I expected the spin lock to be faster than the pthread locking, I wasn't expecting it to be this slow. But maybe, just maybe, I'll get some of that speed back by running dual threads. At the very least, it should be a bit faster than single core, right? Right? Bueller? [4] Bueller? [5] Table: Counting, dual-threaded routine time to execute ------------------------------ t1() 0m10.334s t2() 2m31.307s Um … Wow. I didn't expect spinlocks to be so expensive. Ouch. [1] http://www.flutterby.com/archives/comments/10723.html [2] gopher://gopher.conman.org/0Phlog:1999/12/08.2 [3] http://nasm.sourceforge.net/ [4] http://www.imdb.com/title/tt0091042/ [5] http://www.idiotsavant.com/ftp/sounds/bueller.wav Email Sean Conner at sean@conman.org .