hngopher.com

       [HN Gopher] Bytehound: Memory Profiler for Linux
       ___________________________________________________________________
        
       Bytehound: Memory Profiler for Linux
        
       Author : klaussilveira
       Score  : 87 points
       Date   : 2024-05-23 14:40 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | kouteiheika wrote:
       | Didn't expect to see this on the front page.
       | 
       | Hi, it's my project. Feel free to ask me anything.
        
         | vardump wrote:
         | Custom fast stack unwinding sounds interesting. How's the
         | performance on ARMv8?
        
           | quotemstr wrote:
           | > fast stack unwinding sounds interesting
           | 
           | Frame-pointer-based, I imagine?
        
             | kouteiheika wrote:
             | No. It's DWARF based.
             | 
             | The main two tricks are: it preprocesses all of the DWARF
             | info at startup for faster lookups, and it dynamically
             | patches the return addresses of functions on the stack
             | injecting an address to its own trampoline, which allows it
             | to skip going through the whole stack trace every time it
             | needs to dump a backtrace. For example, if you're running a
             | function nested 100 stack frames deep and that function
             | calls malloc 100 times then Bytehound will only go through
             | ~300 stack frames in total (~100 times for the first call
             | then only ~2 frames for each successive call, if my math is
             | right), while other similar tools will go through 10000
             | stack frames (going through all ~100 frames to the very
             | bottom for every call).
        
               | vlovich123 wrote:
               | Any plans to extend this idea into a performance
               | profiler?
               | 
               | Also nice use of Gimli - did something similar to make
               | creating stack traces on crash cheaper to symbolicate.
        
               | felixge wrote:
               | Dynamic patching of return addresses is a very cool
               | trick. I don't think I've seen this before. Have you run
               | into any situations where this crashes programs or
               | otherwise interferes with their execution?
        
               | peterfirefly wrote:
               | Turbo Pascal used it for the overlay implementation (for
               | DOS) -- overlays = virtual memory at home.
               | 
               | TP 5.0 from 1988 was the first version that had it.
               | 
               | The idea was to make sure the code the CPU returned to
               | would actually be in memory.
               | 
               | I'm pretty sure Windows 1.0 did something very similar.
        
               | tdullien wrote:
               | It's going to play poorly when C++ exceptions are
               | thrown/caught.
        
               | felixge wrote:
               | Looking at the code [1] it seems like the library is
               | actively trying to handle this problem.
               | 
               | [1] https://github.com/koute/not-
               | perf/blob/master/nwind/src/loca...
        
               | tubs wrote:
               | Any way this can work on arm64 without dwarf info at
               | runtime? Would be very interested.
        
           | kouteiheika wrote:
           | Haven't used it on ARM in a very long time, but should work
           | just as well as on AMD64. (As long as you disable pointer
           | authentication/CFI/whatever it was called on ARM.)
        
         | AymanB wrote:
         | I was wondering, any way to use it with distributed systems for
         | data analytics?
         | 
         | Imagine a set of workers that ingest data in parallel, would
         | that work?
         | 
         | Currently it's pretty simple and i am spawning a process within
         | the worker so it reads some stuff such as memory usage, cpu
         | usage etc... But I would like to improve it.
        
         | superleaf wrote:
         | If I am, for example, running a test on an android device
         | connected to my linux machine as the host to send adb commands
         | and what not, can I use this profiler to profile android app
         | memory consumption?
        
         | klaussilveira wrote:
         | Thank you for helping me fix a nasty leak! :)
        
       | j1elo wrote:
       | I'd like to learn more about the dual license "MIT OR
       | Apache-2.0": is there any practical advantage of using one over
       | the other? Are there any expected use cases where Apache-2.0
       | wouldn't be appropriate but MIT would?
       | 
       | I had always assumed that if the time came to choose a permissive
       | OSS license, I'd just go with Apache-2.0 for the more complete
       | legal ground that it provides (especially wrt. patents). Didn't
       | even occur to me that it would as much as _make sense_ to offer
       | MIT too (like, why not also BSD now that we 're at it?)
        
         | nicbn wrote:
         | It's common among Rust projects (the standard library also uses
         | it).
         | 
         | Apache 2 has a patent grant so it's preferred by companies, but
         | is not compatible with GPLv2, and MIT is compatible with GPLv2.
         | 
         | Source: https://prev.rust-lang.org/id-ID/faq.html#why-a-dual-
         | mit-asl...
        
           | o11c wrote:
           | For completeness:
           | 
           | Apache 2 _is_ compatible with GPL 3, which outside the kernel
           | most of the world uses.
        
       ___________________________________________________________________
       (page generated 2024-05-23 23:01 UTC)