hngopher.com

       [HN Gopher] Twenty years of Valgrind
       ___________________________________________________________________
        
       Twenty years of Valgrind
        
       Author : nnethercote
       Score  : 584 points
       Date   : 2022-07-26 22:59 UTC (1 days ago)
        
 (HTM) web link (nnethercote.github.io)
 (TXT) w3m dump (nnethercote.github.io)
        
       | [deleted]
        
       | appleflaxen wrote:
       | What other great tools are there in the vein of valgrind and AFL?
        
         | tux3 wrote:
         | rr, for record and replay
         | 
         | I'm also a fan of systemtap, for when your probing problems
         | push into peeking at the kernel
        
         | tialaramex wrote:
         | In my obviously biased opinion, very specialised, but sometimes
         | exactly what you needed (I have used this in anger maybe 2-3
         | times in my career since then, which is why I wrote the C
         | version):
         | 
         | https://github.com/tialaramex/leakdice (or
         | https://github.com/tialaramex/leakdice-rust)
         | 
         | Leakdice implements some of Raymond Chen's "The poor man's way
         | of identifying memory leaks" for you. On Linux at least.
         | 
         | https://bytepointer.com/resources/old_new_thing/20050815_224...
         | 
         | All leakdice does is: You pick a running process which you own,
         | leakdice picks a random heap page belonging to that process and
         | shows you that page as hex + ASCII.
         | 
         | The Raymond Chen article explains why you might ever want to do
         | this.
        
         | cjbprime wrote:
         | Starting to stretch, but would have to pick strace next. Can't
         | believe macOS devs don't get to use it (at least without hoops
         | like disabling SIP).
        
         | yaantc wrote:
         | Seconding `rr` as suggested by @tux3, it's great for debugging.
         | 
         | Also, the sanitizers for GCC and Clang
         | (https://github.com/google/sanitizers), and the Clang static
         | analyzer (and tidy too) through CodeChecker
         | (https://codechecker.readthedocs.io/).
         | 
         | For the Clang static analyzer, make sure your LLVM toolchain
         | has the Z3 support enabled (OK in Debian stable for example),
         | and enable cross translation units (CTU) analysis too for
         | better results.
        
       | ahartmetz wrote:
       | Valgrind is fantastic.
       | 
       | Memcheck decreases the memory safety problem of C++ by about 80%
       | in my experience - it really is a big deal. The compiler-based
       | tools that require recompiling every library used are a bit
       | impractical for large stacks such as the ones under Qt-based GUI
       | applications. Several libraries, several build systems. But I
       | hear that they are popular for CI systems in large projects such
       | as web browsers, which probably have dedicated CI developers.
       | There are also some IME rare problems that these tools can find
       | that Memcheck can't, which is due to information unavailable in
       | compiled code. Still, Memcheck has the largest coverage by far.
       | 
       | Callgrind and Cachegrind give very precise, repeatable results,
       | complementary to but not replacing perf and AMD / Intel tooling
       | which use hardware performance counters. I tend to use all of
       | them. They all work without recompiling.
        
       | Linda703 wrote:
        
       | ssrs wrote:
       | ive used valgrind quite extensively. a big thank you to the folks
       | behind this!
        
       | cjbprime wrote:
       | I wish I hadn't read this article because now I know that I've
       | been mispronouncing Valgrind for nearly 20 years but I'm not
       | going to stop.
       | 
       | (Kidding. Thanks for Valgrind! I still use it for assessing
       | memory corruption vulnerabilities along with ASan.)
        
         | hgs3 wrote:
         | It's giving me flashbacks to the hard G vs soft G in gif image
         | format.
        
         | stormbrew wrote:
         | Fwiw I've literally worked with Nicholas (but not on valgrind)
         | and I only learned this today somehow.
        
         | klyrs wrote:
         | I learned of the tool from a native German speaker who
         | pronounced it wall-grinned, which is apparently half-right.
         | Like latex, I can't keep the pronunciation straight from one
         | sentence to the next.
        
         | dtgriscom wrote:
         | I've been promoting proper pronunciation of Valgrind at work,
         | an am making passable progress...
        
           | quickthrower2 wrote:
           | Valarie smiled. Is how I will remember it.
           | 
           | That said I sometimes get the "V" tools mixed up (Vagrant,
           | Valgrind, Varnish)
        
         | koolba wrote:
         | What other ways are there to (mis)pronounce it?
        
           | dahart wrote:
           | There are so many amazing ways! ;)
           | 
           | Since it's an old Norse word, try using Google Translate to
           | hear what happens in Danish, Dutch, German, Icelandic,
           | Norwegian, and Swedish. I don't know if it's a modern word in
           | those languages, but Translate is showing translations
           | "election gate" for several languages, and "fall gravel" for
           | Swedish.
           | 
           | According to the audio pronunciations on Translate...
           | 
           | Danish: "vale grint", long a, hard tapped r, hard d sounds
           | like t
           | 
           | Dutch: sounds like "fall hint" but there's a slight throaty r
           | in there hard to hear for English speakers, so maybe "hrint"
           | 
           | German: "val grinned", val like value, grinned with the
           | normal German r
           | 
           | Icelandic: "vall grint", vall like fall, hard tapped r
           | 
           | Norwegian: "vall grin", hard tapped r, almost "vall g'din",
           | silent or nearly silent d/t at the end.
           | 
           | Swedish: "voll grint / g'dint", hard tapped r, hard d
           | 
           | German is the only one that has "Val" like "value", all the
           | rest sound more like "fall". The word valgrind is the door to
           | Valhalla, which means literally "fall hall", as in hall of
           | the fallen. For that reason, I suspect it makes the most
           | sense to pronounce valgrind like "fall grinned", but Old
           | Norse might have used val like value, I'm not sure.
           | 
           | BTW Valhalla has an equally amusing number of ways to
           | pronounce it across Germanic languages, "val" sometimes turns
           | into what sound like "fell" instead of "fall", and in
           | Icelandic the double ell makes it fall-hat-la.
           | 
           | Languages are cool!
        
           | meowface wrote:
           | Pronouncing the "-grind" like the word "grind". I think
           | that's probably how most English-speakers first assume it's
           | pronounced.
        
             | Buttons840 wrote:
             | Safe to assume many pronounce grind as "grind".
        
         | opan wrote:
         | How do you pronounce it? I hoped it'd be near the start, but
         | several paragraphs in and I'm still not sure.
         | 
         | edit: val as in value + grinned
        
         | dietr1ch wrote:
         | Now you are really on track to mispronounce Valgrind for nearly
         | 21 years :P
        
         | galangalalgol wrote:
         | Our pipelines have asan ( and cpp check clang tidy coverity and
         | coverage stuff) but no valgrind, is there something it is good
         | at that we are missing?
        
           | gkfasdfasdf wrote:
           | If your tests can take the performance hit, Valgrind would
           | tell you about uninitialized memory reads, which isn't
           | covered by those tools you mentioned. If however, you are
           | able to add MSAN (i.e. able to rebuild the entire product,
           | including dependencies, with -fsanitize=memory) to the
           | pipeline, then you would have the same coverage as Valgrind.
        
           | cjbprime wrote:
           | The main reason for Valgrind would be if you're working with
           | a binary that you can't recompile to add the ASAN
           | instrumentation.
        
           | Jason_Gibson wrote:
           | ASAN on its own doesn't detect uninitialized memory. MSAN
           | can, though. Valgrind is also more than just the memcheck
           | sub-tool - there are others, like Cachegrind, which is a
           | cache and branch-prediction profiler.
           | 
           | https://github.com/google/sanitizers/wiki/AddressSanitizerCo.
           | .. https://github.com/google/sanitizers/wiki/MemorySanitizer
           | https://valgrind.org/docs/manual/manual.html
        
             | [deleted]
        
           | glouwbug wrote:
           | Yeah, valgrind can report L1/L2 cache misses and report the
           | percentage of branch mispredictions. It also reports the
           | exact number of instructions processed, and how many of those
           | instructions cache missed. It's great for improving small
           | code that needs to be performant.
           | 
           | I'd use asan over valgrind only for memory leaks. It's
           | faster.
        
             | Sesse__ wrote:
             | If you only want memory leaks, LSan will do that for you.
             | 
             | In general, I tend to use ASan for nearly everything I used
             | Valgrind for back in the day; it's faster and usually more
             | precise (Valgrind cannot reliably detect small overflows
             | between stack variables). Valgrind if I cannot recompile,
             | or if ASan doesn't find th issue. Callgrind and Cachegrind
             | never; perf does a much better job, much faster. DHAT
             | never; Heaptrack gives me what I want.
             | 
             | Valgrind was and is a fantastic tool; it became part of my
             | standard toolkit together with the editor, compiler,
             | debugger and build system. But technology has moved on for
             | me.
        
               | gpderetta wrote:
               | Amen. Between the various sanitizers and perf, I stopped
               | needing valgrind a few years ago.
               | 
               | But when it was the only option it was fantastically
               | useful.
        
             | themulticaster wrote:
             | If I understand correctly valgrind (cachegrind) reports
             | L1/L2 cache misses based on a simulated CPU/cache model.
             | 
             | On Linux, you can easily instrument real cache events using
             | the very powerful perf suite. There is an overwhelming
             | number of events you can instrument (use perf-list(1) to
             | show them), but a simple example could look like this:
             | $ perf stat -d -- sh -c 'find ~ -type f -print | wc -l'
             | ^Csh: Interrupt        Performance counter stats for 'sh -c
             | find ~ -type f -print | wc -l':
             | 47,91 msec task-clock                #    0,020 CPUs
             | utilized                      599      context-switches
             | #   12,502 K/sec                       81      cpu-
             | migrations            #    1,691 K/sec
             | 569      page-faults               #   11,876 K/sec
             | 185.814.947      cycles                    #    3,878 GHz
             | (28,71%)              105.650.405      instructions
             | #    0,57  insn per cycle           (46,15%)
             | 22.991.322      branches                  #  479,863 M/sec
             | (46,72%)                  643.767      branch-misses
             | #    2,80% of all branches          (46,14%)
             | 26.010.223      L1-dcache-loads           #  542,871 M/sec
             | (36,80%)                2.449.173      L1-dcache-load-
             | misses     #    9,42% of all L1-dcache accesses  (29,62%)
             | 517.052      LLC-loads                 #   10,792 M/sec
             | (22,53%)                  133.152      LLC-load-misses
             | #   25,75% of all LL-cache accesses  (16,02%)
             | 2,403975646 seconds time elapsed
             | 0,005972000 seconds user              0,046268000 seconds
             | sys
             | 
             | Ignore the command, it's just a placeholder to get
             | meaningful values. The -d flag adds basic cache events, by
             | adding another -d you also get load and load miss events
             | for the dTLB, iTLB and L1i cache.
             | 
             | But as mentioned, you can instrument any event supported by
             | your system. Including very obscure events such as
             | uops_executed.cycles_ge_2_uops_exec (Cycles where at least
             | 2 uops were executed per-thread) or
             | frontend_retired.latency_ge_2_bubbles_ge_2 (Retired
             | instructions that are fetched after an interval where the
             | front-end had at least 2 bubble-slots for a period of 2
             | cycles which was not interrupted by a back-end stall).
             | 
             | You can also record data using perf-record(1) and inspect
             | them using perf-report(1) or - my personal favorite - the
             | Hotspot tool (https://github.com/KDAB/hotspot).
             | 
             | Sorry for hijacking the discussion a little, but I think
             | perf is an awesome little tool and not as widely known as
             | it should be. IMO, when using it as a profiler (perf-
             | record), it is vastly superior to any language-specific
             | built-in profiler. Unfortunately some languages (such as
             | Python or Haskell) are not a good fit for profiling using
             | perf instrumentation as their stack frame model does not
             | quite map to the C model.
        
         | harry8 wrote:
         | I was introduced to valgrind by Andrew Tridgell during the main
         | content of a vaguely famous lecture he gave that finished with
         | the audience collectively writing a shellscript bitkeeper
         | client [1] demonstrating beyond doubt that Tridge had not in
         | any way acted like a "git" when bitkeeper's licenseholder
         | pulled the license for the linux kernel community.
         | 
         | Tridge said words to the effect "if you program in C and you
         | don't aren't using valgrind you flipping should be!" And went
         | on to talk about how some projects like to have a "valgrind
         | clean" build the same way they compile without warnings and
         | that it's a really useful thing. As ever well expressed with
         | examples from samba development.
         | 
         | He was obviously right and I started using valgrind right there
         | in the lecture theatre. apt-get install is a beautiful thing.
         | 
         | He pronounced it val grind like the first part of "value" and
         | "grind" as in grinding coffee beans. I haven't been able to
         | change my pronunciation since then regardless of it being
         | "wrong".
         | 
         | [1] https://lwn.net/Articles/132938/
         | 
         | Corbett's account of this is actually wrong in the lwn link
         | above. Noted by akumria in the comments below it. Every single
         | command and suggestion came from the audience, starting with
         | telnetting to Ted Tso's bitkeeper ip & port that he made
         | available for the demo. Typing help came from the audience as
         | did using netcat and the entire nc command. The audience wrote
         | the bitkeeper client in 2 minutes with tridge doing no more
         | than encouraging, typing and pointing out the "tridge is a
         | wizard reverse engineer who has used his powers for evil" Was
         | clearly just some "wrong thinking." Linus claimed thereafter
         | that Git was named after himself and not Tridge.
        
           | throwawaylinux wrote:
           | Tridgell is possibly the most intelligent person I've ever
           | met, and I've met Torvalds and a bunch of other Linux
           | developers -- not that they aren't intelligent too, among
           | them might be a challenger to that title.
           | 
           | Tridge has a way of explaining complicated ideas in a way
           | that pares them down to their essence and helps you to
           | understand them that just really struck me (a smart person is
           | able to talk about a complicated thing in a way that makes
           | you feel dumb, a _really_ smart person is able to talk about
           | a complicated thing in a way that makes you feel like a
           | genius). As well as the ability and intellectual curiosity to
           | jump seemingly effortlessly across disciplines.
           | 
           | And he's a fantastic and very entertaining public speaker.
           | Highly recommend any talk he gives.
        
         | glandium wrote:
         | I've known the right pronunciation for about 10 years. I still
         | say it wrong.
        
       | mynegation wrote:
       | I am old enough that I started with Purify and I used Valgrind
       | starting from the version 1.0, because Purify was commercial and
       | Solaris only. It saved my behind multiple multiple times.
        
         | cpeterso wrote:
         | And BoundsChecker was also great!
         | 
         | https://en.m.wikipedia.org/wiki/BoundsChecker
        
           | sumtechguy wrote:
           | That tool saved me tons of time tracking down bugs. It also
           | taught me to be a better C/C++ programmer. Run time
           | sanitizers like Purify/Valgrind/Boundchecker do not tolerate
           | poor C code. What is kind of cool is you can find whole
           | classes of bugs in your code. Because as devs we get
           | something working once we tend to copy and paste that pattern
           | everywhere. So find a bug in one place you will probably find
           | it a few dozen other places in your codebase.
        
         | hn_go_brrrrr wrote:
         | I worked at a company 11 years ago that was still using Purify!
        
           | unmole wrote:
           | I used Purify 8 years ago. On Windows. I don't remember the
           | specifics but the company kept a few XP machines around just
           | so they could continue using Purify.
        
         | pjmlp wrote:
         | Purify fanboy over here.
        
         | atgreen wrote:
         | Purify was an amazing tool. I recently noticed that one of my
         | libraries (libffi) still has an --enable-purify configure
         | option, although it probably hasn't been exercised in.. 20
         | years? A Purify patent prevented work-alikes for many years,
         | but valgrind eventually emerged as a more-than-worthy
         | successor.
         | 
         | Fun fact: the creator of Purify went on to found Netflix and is
         | still their CEO.
        
           | mynegation wrote:
           | Ha! And I thought that the same person writing bzip2 and Val
           | grind is my surprise for the day.
        
             | snovv_crash wrote:
             | And Ardupilot
        
       | edsiper2 wrote:
       | First of all congratulations to Valgrind and the team behind it!
       | This is an essential tool that help me personally over the years
       | while developing.
       | 
       | What needs to be done to get Valgrind binaries available for
       | MacOS (M1) ?, from a company perspective we are happy to support
       | this work. If you know who's interest and can accomplish this pls
       | drop me an email to eduardo at calyptia dot com.
        
       | RustyRussell wrote:
       | I once submitted a bug fix for an obscure issue to valgrind. They
       | asked for a test case, which I managed to provide, but I was a
       | bit nervous as I couldn't immediately see how to fit in their
       | test suite.
       | 
       | The response from Julian Seward was so nice it set a permanently
       | high bar for me when random people I don't know report bugs on my
       | projects!
       | 
       | We still run our entire testsuite under valgrind in CI. Amazing
       | tool!
        
         | sealeck wrote:
         | What was the response?
        
       | vlmutolo wrote:
       | > I still use Cachegrind, Callgrind, and DHAT all the time. I'm
       | amazed that I'm still using Cachegrind today, given that it has
       | hardly changed in twenty years. (I only use it for instruction
       | counts, though. I wouldn't trust the icache/dcache results at all
       | given that they come from a best-guess simulation of an AMD
       | Athlon circa 2002.)
       | 
       | I'm pretty sure I've seen people using the icache/dcache miss
       | counts from valgrind for profiling. I wonder how unreliable these
       | numbers are.
        
         | andrewf wrote:
         | https://sqlite.org/cpu.html#microopt -
         | 
         |  _Cachegrind is used to measure performance because it gives
         | answers that are repeatable to 7 or more significant digits. In
         | comparison, actual (wall-clock) run times are scarcely
         | repeatable beyond one significant digit [...] The high
         | repeatability of cachegrind allows the SQLite developers to
         | implement and measure "microoptimizations"._
         | 
         | There's a bunch of ways for caches to behave differently but
         | have they changed much over the past 20 years? i.e. is the
         | difference between [2022 AMD cache, 2002 AMD cache]
         | significantly greater than the difference between [2002 PowerPC
         | G4 cache, 2002 AMD cache, 2002 Intel cache] ?
        
           | BeefWellington wrote:
           | I would guess yes, just based on the L1/L2 (later L3) use and
           | sizing between all those systems. 2002 vs 2022 is K8 vs
           | 5800X3D for AMD, so you're looking at having 1 core and
           | 64+64KB of L1 cache, 512KB of L2 cache[1] vs 8 cores (+ht)
           | and 32+32KB L1 _per core_ , 512KB L2 _per core_ , 96MB L3.
           | 
           | Just managing the cache access between L2 and L3 I think
           | would be additional consideration, but then you have to
           | consider the actual architectural differences and on server
           | chips locality will matter quite a bit.
           | 
           | [1]: https://en.wikipedia.org/wiki/Athlon_64
        
           | tux3 wrote:
           | I don't know how sophisticates the streaming/prefetch/access
           | pattern prediction the 2002 cpus did was.
           | 
           | I'm speculating, but if that's not modeled, cachegrind may
           | pessimize some less simple predictable patterns and report a
           | lot of expected misses when the cpu would have been able to
           | prefetch it
        
             | andrewf wrote:
             | Agreed, I suspect it'd be most accurate to say the SQLite
             | folks are minimizing their working set.
             | 
             | I picked a couple of random performance commits out of
             | their code repo, and they look like they might keep 1 or 2
             | lines out of i-cache:
             | https://sqlite.org/src/info/f48bd8f85d86fd93
             | https://sqlite.org/src/info/390717e68800af9b
        
       | t43562 wrote:
       | I was working on an application for Symbian mobile phones and I
       | was able to implement large parts of it as a portable library -
       | the bits which compressed results using a dictionary to make them
       | tiny enough to fit into an SMS message or a UDP frame. This was
       | before the days of flat-rate charges for internet access and we
       | were trying to be very economical with data.
       | 
       | I was able to build and debug them on Linux with Valgrind finding
       | many stupid mistakes and the library worked flawlessly on
       | Symbian.
       | 
       | It's just one of the many times that Valgrind has saved my bacon.
       | It's awesome.
        
       | nicoburns wrote:
       | Well damn, no wonder he's so good at optimising the Rust
       | compiler. He literally has a PhD in profiling tools!
        
       | bayindirh wrote:
       | I still use Valgrind memcheck for memory leak verification of a
       | large piece of code I have developed, with a long end-to-end
       | test.
       | 
       | Also, it has a nice integration with Eclipse which reflects the
       | Valgrind memcheck output to the source files directly, enabling
       | you to see where problems are rooted.
       | 
       | All in all, Valgrind is a great toolset.
       | 
       | P.S.: I was pronouncing Valgrind correctly! :)
        
       | gkhartman wrote:
       | Many thanks for Valgrind. I can honestly say that it helped me
       | become a better C++ programmer.
        
       | sharmin123 wrote:
        
       | compiler-guy wrote:
       | I sort of owe callgrind a big chunk of my career.
       | 
       | I was working at a company full of PhDs and well seasoned
       | veterans, who looked at me as a new kid, kind of underqualified
       | to be working in their tools group. I had been at the firm for a
       | while, and they were nice enough, but didn't really have me down
       | as someone who was going to contribute as anything other than a
       | very junior engineer.
       | 
       | We had a severe problem with a program's performance, and no one
       | really had any idea why. And as it was clearly not a
       | sophisticated project, I got assigned to figure something out.
       | 
       | I used the then very new callgrind and the accompanying
       | flamegraph, and discovered that we were passing very large bit
       | arrays for register allocation _by value_. Very, very large. They
       | had started small enough to fit in registers, but over time had
       | grown so large that a function call to manipulate them
       | effectively flushed the cache, and the rest of the code assumed
       | these operations were cheap.
       | 
       | Profiling tools at the time were quite primitive, and the
       | application was a morass of shared libraries, weird dynamic
       | allocations and JIT, and a bunch of other crap.
       | 
       | Valgrind was able to get the profiles after failing with
       | everything else I could try.
       | 
       | The presentation I made on that discovery, and my proposed fixes
       | (which eventually sped everything up greatly), finally earned the
       | respect of my colleagues, and no phd wasn't a big deal after
       | that. Later on, those colleagues who had left the company invited
       | me to my next gig. And the one after that.
       | 
       | So thanks!
        
         | LAC-Tech wrote:
         | I love this story. I'm becoming an older dev now and I've often
         | been blindsided by some insight or finding by juniors - it's
         | really great to see & you've always got to make sure they get
         | credit!
        
         | intelVISA wrote:
         | Always find it weird when people berate C++ tooling, Valgrind
         | and adjacent friends are legitimately best in class and
         | incredibly useful. Between RAII and a stack of robust static
         | analyzers you'd have to deliberately write unsafe code these
         | days.
        
           | nicoburns wrote:
           | That sounds great until you realise in other languages you
           | get that by default without any tooling. And with better
           | guarantees too (C++ static analysers aren't foolproof).
           | 
           | Where C++ tooling really lacks is around library management
           | and build tooling. The problem is less that any of the
           | individual tools don't work and more that there are many of
           | them and they don't interoperate nicely.
        
             | bluGill wrote:
             | What language that has anything like cachegrind which is
             | the topic of this thread? Cache misuse is one of the
             | largest causes of bad performance these days, and I can't
             | think of any language that has anything built in for that.
             | 
             | Sure other languages have some nice tools to do garbage
             | collection (so does C++, but it is optional, and reference
             | counting does have drawbacks), but there are a lot more to
             | tooling than just garbage collection. Even rust's memory
             | model has places where it can't do what C++ can. (you can't
             | use atomic to write data from two different threads at the
             | same time)
             | 
             | No language has good tools around library and builds. So
             | long as you stick to exactly one language with the build
             | system of that language things seem nice. However in the
             | real world we have a lot of languages, and a lot of
             | libraries that already exist. Let me know what I can use
             | any build/library tool with this library that builds with
             | autotools, this other one from cmake, here is one with
             | qmake (though at least qt is switching to cmake which is
             | becoming the de-facto c++ standard), just to name a couple
             | that handle dependencies in very different ways.
        
               | njs12345 wrote:
               | > Even rust's memory model has places where it can't do
               | what C++ can. (you can't use atomic to write data from
               | two different threads at the same time)
               | 
               | Perhaps not in safe Rust, but can you provide an example
               | of something Rust can't do that C++ can? It has the same
               | memory model as C++20: https://doc.rust-
               | lang.org/nomicon/atomics.html
        
               | seoaeu wrote:
               | You totally can in safe rust: https://doc.rust-
               | lang.org/std/sync/atomic/struct.AtomicU64.h...
        
               | intelVISA wrote:
               | To be fair as an outsider to both Rust and Js they seem
               | to have pretty robust package management between cargo
               | and npm, although npm is kinda cheating as collating
               | scripts isn't quite as complex building binaries whereas
               | PIP's absolutely unberable with all the virtual env
               | stuff.
               | 
               | I've been quite lucky with CMake, after the initial
               | learning period I've found everything "just works" as it
               | is quite well supported by modern libs.
        
         | jerf wrote:
         | I've mentioned this before on HN as a way for a "newbie" to
         | look like a superhero in a job very quickly; nice to hear a
         | story of it actually working!
         | 
         | There is _so much_ code in the world that nobody has even so
         | much as _glanced_ at a profile of, and any non-trivial,
         | unprofiled code base is virtually guaranteed to have some kind
         | of massive performance problem that is also almost trivial to
         | fix like this.
         | 
         | Put this one in your toolbelt, folks. It's also so fast that
         | you can easily try it without having to "schedule" it, and if
         | I'm wrong and there aren't any easy profiling wins, hey, nobody
         | has to know you even looked. Although in that case, you just
         | learned something about the quality of the code base; if there
         | aren't any profiling quick wins, that means someone else
         | claimed them. As the codebase grows the probability of a quick
         | win being available quickly goes to 1.
        
         | nullify88 wrote:
         | I have a similar experience with xdebug for a PHP shop I used
         | to work at. It feels very similar to being a nerd back at
         | school, rescuing peoples home work, and being rewarded with
         | some respect.
        
         | azurezyq wrote:
         | I have a very similar experience, but with a different
         | profiling tool. When I first graduated from school and joined a
         | big internet company, I'm not that "different". The serving
         | stack was all in C++. My colleagues were really capable but not
         | that into "tools", they'd rather depend on themselves (guess,
         | tune, measure).
         | 
         | But I, as a fresh member in the team, learned and introduced
         | Google perftools to the team and did a presentation of the
         | breakdown of the running time of the big binary. I have to say
         | that presentation was a life-changing moment in my career.
         | 
         | So together with you, I really want to thank those who devoted
         | heavily into building these tools. When I was doing the
         | presentation, I really felt standing on the shoulders of giants
         | and those giants were helping me.
         | 
         | And over years, I used more and more tools like valgrind,
         | pahole, asan, tsan.
         | 
         | Much appreciated!
        
         | dijonman2 wrote:
         | I'm surprised to see the attribution to the tools and not your
         | proposed fixes. Sure the discovery was the first step in the
         | order of operations, but can you elaborate on what enabled you
         | to understand the problem statement and subsequent resolution?
         | 
         | There has to be a deeper understanding I think
        
           | imetatroll wrote:
           | Sounds like the solution probably had something to do with
           | switching to passing by reference + other changes I would
           | assume.
        
             | intelVISA wrote:
             | A big pain point for using coroutines is having to pass-by-
             | value more frequently due to uncertain lifetimes.. it's
             | jarring when you come from zero copy programming.
        
             | cbrogrammer wrote:
             | That is what many people fail to understand as to why us C
             | programmers dislike C++
        
               | pjmlp wrote:
               | Indeed, because languages with reference parameters
               | preceed C for about 15 years, and are present in most
               | ALGOL derived dialects.
        
           | azurezyq wrote:
           | I can share mine. It's an ads retrieval system. Latency is
           | very sensitive and it has to be efficient. To avoid mem
           | allocations, special hashtables with fixed number of buckets
           | (also open addressing) are used in multiple places in query
           | processing. Default is 1000. However, there are cases that
           | number of elements are only a handful. Then in this case, it
           | fails to utilize the cache, hence slower.
           | 
           | The solution is to tune number of buckets from info derived
           | from the pprof callgraph.
           | 
           | There were others too, like redundant serialization, etc. But
           | this one is the most interesting.
        
             | alexott wrote:
             | I also heavily used callgrind/cachegrind to tune critical
             | paths in our high performance web proxy, we're each
             | micro/milliseconds counts... For example, in media type
             | detection that is called multiple times per request
             | (minimum twice for request/response), etc.
        
             | zasdffaa wrote:
             | That's surprising. If I was writing this I'd have
             | instrumented the code for the buckets to (optionally) log
             | the use, and probably add an alert.
             | 
             | (being an armchair expert is easy though)
        
       | mukundesh wrote:
       | Using Cachegrind to get hardware independent performance numbers
       | (https://pythonspeed.com/articles/consistent-benchmarking-in-...)
       | 
       | Also used by SQLite in their performance measurement
       | workflow(https://sqlite.org/cpu.html#performance_measurement)
        
       | [deleted]
        
       | anewpersonality wrote:
       | Is Valgrind any use in Rust?
        
         | pjmlp wrote:
         | Depends how much unsafe code blocks you make use of.
        
         | jackosdev wrote:
         | I work full-time with Rust, use it all the time to see how much
         | memory is being allocated to the heap, make a change and then
         | see if there's a difference, and also for cache misses:
         | 
         | valgrind target/debug/rustbinary
         | 
         | ==10173== HEAP SUMMARY:
         | 
         | ==10173== in use at exit: 854,740 bytes in 175 blocks
         | 
         | ==10173== total heap usage: 2,046 allocs, 1,871 frees,
         | 3,072,309 bytes allocated
         | 
         | ==10173==
         | 
         | ==10173== LEAK SUMMARY:
         | 
         | ==10173== definitely lost: 0 bytes in 0 blocks
         | 
         | ==10173== indirectly lost: 0 bytes in 0 blocks
         | 
         | ==10173== possibly lost: 1,175 bytes in 21 blocks
         | 
         | ==10173== still reachable: 853,565 bytes in 154 blocks
         | 
         | ==10173== suppressed: 0 bytes in 0 blocks
         | 
         | ==10173== Rerun with --leak-check=full to see details of leaked
         | memory
         | 
         | valgrind --tool=cachegrind target/debug/rustbinary
         | 
         | ==146711==
         | 
         | ==146711== I refs: 1,054,791,445
         | 
         | ==146711== I1 misses: 11,038,023
         | 
         | ==146711== LLi misses: 62,896
         | 
         | ==146711== I1 miss rate: 1.05%
         | 
         | ==146711== LLi miss rate: 0.01%
         | 
         | ==146711==
         | 
         | ==146711== D refs: 793,113,817 (368,907,959 rd + 424,205,858
         | wr)
         | 
         | ==146711== D1 misses: 757,883 ( 535,230 rd + 222,653 wr)
         | 
         | ==146711== LLd misses: 119,285 ( 49,251 rd + 70,034 wr)
         | 
         | ==146711== D1 miss rate: 0.1% ( 0.1% + 0.1% )
         | 
         | ==146711== LLd miss rate: 0.0% ( 0.0% + 0.0% )
         | 
         | ==146711==
         | 
         | ==146711== LL refs: 11,795,906 ( 11,573,253 rd + 222,653 wr)
         | 
         | ==146711== LL misses: 182,181 ( 112,147 rd + 70,034 wr)
         | 
         | ==146711== LL miss rate: 0.0% ( 0.0% + 0.0% )
        
         | rwmj wrote:
         | Not used it with Rust, but have used it with OCaml, Perl, Ruby,
         | Tcl successfully. In managed languages it's mainly useful for
         | detecting problems in C bindings rather than the language
         | itself. Languages where it doesn't work well: Python and
         | Golang.
        
       | pjmlp wrote:
       | > Speaking of software quality, I think it's fitting that I now
       | work full time on Rust, a systems programming language that
       | didn't exist when Valgrind was created, but which basically
       | prevents all the problems that Memcheck detects.
       | 
       | Just like Ada has been doing since 1983.
        
         | oconnor663 wrote:
         | My understanding is that dynamically freeing memory is an
         | unsafe operation in Ada, do I have that right?
        
           | pjmlp wrote:
           | Depends on which dynamic memory you are talking about.
           | 
           | Ada can manage dynamic stacks, strings and arrays on its own.
           | 
           | For example, Ada has what one could call type safe VLAs,
           | instead of corrupting the stack like C, you get an exception
           | and can redo the call with a smaller size, for example.
           | 
           | As for explicit heap types and _Ada.Unchecked_Deallocation_ ,
           | yes if we are speaking about Ada 83.
           | 
           | Ada 95 introduced controlled types, which via Initialize,
           | Adjust, and Finalize, provide the basis of RAII like features
           | in Ada.
           | 
           | Here is an example on how to implement smart pointers with
           | controlled types,
           | 
           | https://www.adacore.com/gems/gem-97-reference-counting-in-
           | ad...
           | 
           | There is also the possiblity to wrap heap allocation
           | primitives with safe interfaces exposed via storage pools,
           | like on this tutorial https://blog.adacore.com/header-
           | storage-pools
           | 
           | Finally thanks to SPARK, nowadays integrated into Ada
           | 2012[0], you can also have formal proofs that it is safe to
           | release heap memory.
           | 
           | In top of all this, Ada is in the process of integrating
           | affine types as well.
           | 
           | [0] - Supported in PTC and GNAT, remaining Ada compilers have
           | a mix of Ada 95 - 2012 features, see
           | https://news.ycombinator.com/item?id=27603292
        
             | touisteur wrote:
             | That said, I still use valgrind because we have to
             | integrate C libraries sometimes (libpcl is my favorite
             | culprit, only because I'm trying , and there's still
             | possibility to blow the stack (yeah you can use gnatstack
             | to get a good idea of your maximum stack size, but it's
             | doesn't cover the whole Ada featureset and stack canaries -
             | fstack-check don't catch everything.
             | 
             |  _edit_ Also massif, call /cachegrind and hellgrind have
             | saved our bacon many, many times.
             | 
             | Even more interesting is writing your own tools with
             | valgrind. Here https://github.com/AdaCore/gnatcoverage/tree
             | /master/tools/gn... is the code of a branch-trace adapter
             | for valgrind (outputs all branches taken/not-taken in
             | 'qemu' format). Very useful if you can run a pintool or
             | Intel Processor Trace just for that.
             | 
             | And if you keep digging, the angr symbolic execution
             | toolkit use (used?) VEX as an intermediate representation.
             | _end of edit_
             | 
             | Ada doesn't catch uninitialized variables by default
             | (although warnings are getting better). You can either go
             | Spark 'bronze level' (dataflow proof, every variable is
             | initialized) or use 'pragma Initialize_Scalars' combined
             | with -gnatVa.
             | 
             | Some of these techniques described in that now old blog
             | post full of links https://blog.adacore.com/running-
             | american-fuzzy-lop-on-your-... (shameless plug) where one
             | can infer that even proof of absence of runtime errors
             | isn't a panacea and fuzzing still has its use even on
             | fully-proved SPARK code.
        
       | lma21 wrote:
       | When we moved to Linux, Valgrind was THE tool that saved our as*s
       | day after day after day. An issue in production? rollback,
       | valgrind, fix, push, repeat. Thank you for all the hard work, in
       | fact i don't i can thank you enough.
        
       | junon wrote:
       | Valgrind's maintainers are super pleasant and have been quite
       | helpful in a number of cases I've personally had to reach out to
       | them.
       | 
       | Lovely piece of software toward which I owe a lot of gratitude.
        
       | amelius wrote:
       | Are people using Valgrind on Python packages?
       | 
       | It seems some packages (even basic ones) are not compatible with
       | Valgrind, thereby spoiling the entire debugging experience.
        
       | Olumde wrote:
       | Happy birthday Valgrind. Next year you'll be able to drink in the
       | US!
       | 
       | Being a UK PhD holder, a sentence stood out out to me was a
       | commentary/comparison between UK and US PhDs: "This was a three
       | year UK PhD, rather than a brutal six-or-more year US PhD."
       | 
       | My cousin has a US PhD and judging from what he tells me. It is a
       | lot more rigorous than UK PhDs.
        
         | wenc wrote:
         | The UK PhD is 3 yrs, after a 1 yr Masters and 3 yr bachelors.
         | (7 years)
         | 
         | The US PhD is usually 4-5 years after a 4 year bachelors (8-9
         | years). It is a little bit longer with more graduate-level
         | coursework.
         | 
         | That said, the US bachelors starts at age 17 while a UK
         | bachelors starts after 2 years of A-levels. So in terms of
         | length it's a wash.
        
           | pbhjpbhj wrote:
           | FWIW, you have to be slightly careful as Scotland has a
           | different post-16 education provision.
           | 
           | AIUI you can do Highers (equivalent to GCSE, at 16) and enter
           | Uni then with sufficiently high grades (aged 16/17). Or, stay
           | on for one more year to do Advanced Higher (most common). Uni
           | courses can then be 4 or occasionally 3 years. Don't quote
           | me!
        
           | piker wrote:
           | US college starts around age 18, which I understand is about
           | the time A-levels are completed, so I believe there are 2
           | more years of education associated with a US PhD.
        
         | not2b wrote:
         | It took me four years for my US PhD, but I had a masters and
         | industrial experience which might have helped speed things up.
        
       | nneonneo wrote:
       | Hah, I teach my students to use Valgrind, _and_ I've been
       | pronouncing it wrong this whole time. Guess I'll have to make
       | sure to get that right next semester :)
       | 
       | The magic of Valgrind really lies in its ability to detect errors
       | without recompiling the code. Sure, there's a performance hit,
       | but sometimes all you have is a binary. It's damn solid on Linux,
       | and works even with the custom threading library we use for the
       | course; shame the macOS port is barely maintained (last I
       | checked, it only worked on OSes from a few years back - anything
       | more recent will execute syscalls during process startup that
       | Valgrind doesn't handle).
        
       | amelius wrote:
       | One problem with Valgrind is that the thing you're debugging
       | should have been tested with Valgrind from the start, otherwise
       | you're just going to be flooded with false triggers.
       | 
       | Now imagine that you're developing a new application and you want
       | to use some library, and it _hasn 't_ been tested with valgrind
       | and generates tons of false messages. Should you then use it? Or
       | look for an alternative library?
        
       | Sesse__ wrote:
       | I live not far from Valgrindvegen (Valgrind road); I've always
       | wondered whether the developers knew it existed. :-)
        
       | j1elo wrote:
       | Valgrind is an amazingly useful tool. The biggest pain point,
       | though, has always been to read through and process the huge
       | amount of false positives that typically come from 3rd-party
       | support libraries, such as GLib. It provides some suppression
       | files to be used with Valgrind, but still, GLib has its own
       | memory allocator, so things tend to go awry.
       | 
       | Running Helgrind or DRD (for threading issues) with GLib has been
       | a bit frustrating, too. If anyone has some advice to share about
       | this, I'm all ears!
       | 
       | (EDIT: I had mistakenly left out the phrase about suppression
       | files)
        
       | whimsicalism wrote:
       | It's unfortunate that so many of these great tools (like `perf`
       | and I believe `valgrind`) are basically not available locally on
       | the Mac.
       | 
       | And running in a container is not really a solution for most of
       | these.
        
         | wyldfire wrote:
         | Sanitizers and electric fence are ultra portable, they're
         | definitely available on macos. The feature set from valgrind is
         | a bit richer but not by much.
        
           | whimsicalism wrote:
           | I am not familiar with electric fence but I remember from my
           | experience that there are definitely important things that I
           | got from `perf` and `valgrind` that the alternative
           | sanitizers did not provide. Can't recall what now of course.
        
             | nyanpasu64 wrote:
             | asan/ubsan do not detect uninitialized memory reads (though
             | ubsan can detect when bools take on invalid bit patterns
             | from uninitialized memory), and msan requires rebuilding
             | the standard library or something, so I've never used msan.
             | Valgrind is slow, but detects uninitialized memory reads
             | properly, and doesn't require rebuilding the app (which is
             | useful when running a complex or prebuilt app for short
             | periods of time).
             | 
             | On the topic of profiling, callgrind can count exact
             | function calls and generate accurate call graphs, which I
             | find useful for not only profiling, but tracing the
             | execution of unfamiliary code. I just wish rr had similarly
             | fast tooling (pernosco is close enough to be useful, but I
             | think there's value in exploring different workflows than
             | what they picked).
        
               | 1over137 wrote:
               | >msan requires rebuilding the standard library or
               | something
               | 
               | Yes, which is a PITA. But even then, macOS is not
               | supported anyway:
               | 
               | https://clang.llvm.org/docs/MemorySanitizer.html#supporte
               | d-p...
        
           | dwroberts wrote:
           | Valgrind does a lot of low level trickery so it hasn't always
           | supported the latest macOS releases straight away (or
           | sometimes would support them with serious
           | gotchas/limitations)
        
         | glandium wrote:
         | valgrind is available on mac. From the homepage: "It runs on
         | the following platforms: (...) X86/Darwin and AMD64/Darwin (Mac
         | OS X 10.12).". There's a notable omission of ARM64/Darwin in
         | there, and I don't think it's an oversight.
         | 
         | What Mac is definitely lacking, though, is reverse debugging.
         | Linux has rr, Windows has Time Travel Debugging. macOS still
         | doesn't have an equivalent.
        
           | saagarjha wrote:
           | Valgrind, as I understand it, was essentially maintained by
           | one engineer at Apple who has since left the company, so
           | nobody has really updated it.
        
             | plorkyeran wrote:
             | He's still at Apple, but he works on the Swift runtime
             | these days rather than C/C++ tooling.
        
             | 1over137 wrote:
             | That's my understanding too, and I believe you're referring
             | Greg Parker:
             | 
             | http://www.sealiesoftware.com/valgrind/
        
           | 1over137 wrote:
           | There have been 6 major releases since 10.12 (which was from
           | late 2016). In other words, valgrind has basically stopped
           | supporting macOS.
        
             | glandium wrote:
             | I don't think it means it doesn't work with newer versions.
        
               | 1over137 wrote:
               | I'm afraid you're wrong. It does _not_ work with newer
               | macOS versions, I 've tried.
        
       | syockit wrote:
       | There are times when LeakSanitizer (in gcc-8.2) would not give me
       | the full backtrace of a leak, while valgrind would, so to me it's
       | still an indispensable tool for debugging leaks. One caveat is
       | that it's magnitudes slower than valgrind. Now, if only I know
       | how to make valgrind run as fast as LeakSanitizer... (command
       | line options?)
        
         | rigtorp wrote:
         | You might need to add -fno-omit-frame-pointer to help ASAN
         | unwind the stack.
        
           | abbeyj wrote:
           | This is definitely an option you want to be using when using
           | ASan or LSan. You may also want to consider additionally
           | using -momit-leaf-frame-pointer to skip frame pointers only
           | in leaf functions while keeping frame pointers for non-leaf
           | functions. This can make small leaf functions significantly
           | shorter, limiting some of the negative impact of using -fno-
           | omit-frame-pointer alone.
           | 
           | Sometimes even -fno-omit-frame-pointer won't help, like if
           | the stack is being unwound through a system library that was
           | built without frame pointers. In that case you can switch to
           | the slow unwinder. Set the environment variable
           | `ASAN_OPTIONS=fast_unwind_on_malloc=0` when running your
           | program. But note that this will make most programs run
           | significantly slower so you probably want to use it only when
           | you really need it and not as the default setting for all
           | runs.
        
       | tarasglek wrote:
       | Beyond raw technical ability, Nick and Julian were the kindest,
       | most reasonable developers I've ever interacted with. I think a
       | lot of Valgrind's success stems from combination of sophisticated
       | tech and approachability of the core team.
        
       ___________________________________________________________________
       (page generated 2022-07-27 23:02 UTC)