[HN Gopher] Show HN: Tracexec - TUI for tracing execve and pre-e...
       ___________________________________________________________________
        
       Show HN: Tracexec - TUI for tracing execve and pre-exec behavior
        
       tracexec helps you to figure out what and how programs get executed
       when you execute a command.  It's useful for debugging build
       systems, catching fd leaks, understanding what shell scripts
       actually do, figuring out what programs does a proprietary software
       run, etc.
        
       Author : kxxt
       Score  : 37 points
       Date   : 2024-05-08 12:04 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | _flux wrote:
       | Pretty nice! I recall having used strace for this, and this is
       | certainly nicer for that use case. I think I used it to determine
       | how a certain file was built. Somehow it was easier to do it that
       | way than understand the build system :).
       | 
       | What I did though was generate dot-files for graphviz to process.
       | To that graph it's easy to add information about opened/created
       | files as well.
        
         | _flux wrote:
         | Found the script, so here it is for posterity!
         | #!/bin/sh         if [ -z "$1" ]; then            echo "usage
         | (in zsh): 'strace -f program -o logi; okular =(dot -Tpdf =($0
         | logi))'"            exit 0         fi         LOG=$1
         | echo 'digraph {'         echo 'rankdir="LR";'         sed
         | 's/^\([0-9]*\) \(<... \)\?\(clone\|vfork\).*= \([0-9]*\)/\1 ->
         | \4;/; t; d' "$LOG"         sed '/ENOENT/ d; s/^\([0-9]*\)
         | execve("[^"]*\/\([^"]*\)".*/\1 [label="\1: \2"];/; t; d' "$LOG"
         | | sort | uniq         sed '/ENOENT/ d; s/^\([0-9]*\)
         | openat([^,]*, "\([^"]*\)", .*\(O_WRONLY\|O_RDWR\).*/\1 -> "\2";
         | "\2" [shape=rectangle];/; t; d' "$LOG" | sort | uniq
         | echo '}'
         | 
         | I _imagine_ it makes a graph of executed programs along the
         | files they created, perhaps it even works in some cases.
        
       | nh2 wrote:
       | Ah yes, very good!
       | 
       | More such tools are needed that make common strace tasks more
       | convenient.
       | 
       | strace is the ultimate debugging tool to dive into almost every
       | problem where a program doesn't do what you want, but its stream
       | based nature and lack of scripting doesn't make all tasks
       | convenient. Most importantly, you can't really "program" with it.
       | 
       | Some time ago, I started implementing "strace as a library" in
       | Haskell, to get an easy-to-process stream of syscalls/signals as
       | an ADT, so that one can easily build features on top of it, e.g.
       | a tree of fork()ed/execve()'d child processes. That's point
       | "special run modes tailored to specific tasks (e.g. execve tree)"
       | from https://github.com/nh2/hatrace. I haven't gotten as far as
       | I'd like yet, because I'd like to first explicitly support all
       | Linux syscalls, and build the features on top of that afterwards.
       | And some features are not so easy, e.g. reading memory of the
       | traced process (which strace does to do e.g. show what data was
       | `read()`).
       | 
       | From that experience, I applaud anybody who makes good use of
       | ptrace() -- that man page is not the easiest to read!
        
         | nh2 wrote:
         | After playing with the software a bit:
         | 
         | The view that diffs environment variables of a process with the
         | ones from the parent process is very nice.
         | 
         | Could you add a feature that makes it possible to search/filter
         | the "Environment" panel, maybe even the events panel? Often one
         | wants to figure out how a certain argument, environment
         | variable, or assigned value of each of those finds its way into
         | a process. Being able to filter that quickly would be great!
        
         | nh2 wrote:
         | I packaged your program for Nix [1].
         | 
         | If possible, have a look at the build, or even better, become
         | co-maintainer if you can!
         | 
         | If anybody wants to try it, on any Linux distro with `nix`
         | installed:                   NIX_PATH=nixpkgs=https://github.co
         | m/nh2/nixpkgs/archive/e5943da391879c12ed8d6477a06805c8de4b27da.
         | tar.gz nix-shell -p tracexec
         | 
         | This will drop you into a shell where `tracexec` is installed.
         | 
         | [1]: https://github.com/NixOS/nixpkgs/pull/310158
        
       | abathur wrote:
       | We have some common cause / need in the Nix/nixpkgs ecosystem,
       | where finding missing package dependencies (whether they're bare
       | invocations or absolute paths) is a bit of an ongoing problem.
       | 
       | The golden path stuff isn't too hard to find during packaging,
       | but there's a long tail of stuff that only shows up when you hold
       | something exactly right.
       | 
       | We can (with some effort) handle a decent fraction of posix/bash
       | shell statically with https://github.com/abathur/resholve. I
       | think a similar approach could generalize to most interpreted
       | languages (but I haven't gotten around to it myself and I think
       | it's broadly rarer beyond Shell).
       | 
       | I have taken one or two steps into a lower-fidelity approach
       | (https://github.com/abathur/binlore) that uses YARA rules to scan
       | for signs of ~exec in both binary formats and interpreted
       | languages. For now this is mostly just a top-N formats thing, but
       | there's no reason we can't keep going. One of the big advantages
       | is that it's fairly cross-platform, even if accuracy is
       | suboptimal.
       | 
       | The scanning approach isn't just amazing _for resholve 's
       | specific needs_. I'm stepping down this path to try to identify
       | executables that likely exec their command-line arguments--
       | because this is a sign that resholve need to know how to check
       | invocations of that command for other executables. But it's a
       | good bit better at flagging things that have _any_ exec behavior,
       | and that's a decent-enough first step for letting focus on a
       | smaller set.
       | 
       | Once we find these, though, we're generally left with human
       | analysis of the help, manpage, and/or source to figure out what
       | exec the scanner is hitting on and whether it's part of the CLI.
       | 
       | I've been ~snoozing on this front for a while, but I feel like a
       | decent fraction of the components for building something like an
       | 80% toolkit exist in at least a primitive form. It could probably
       | benefit from some more static source analysis bits. I also
       | vaguely intend to look into whether LLMs can do a reasonably-good
       | job of making it easier to answer these questions from the
       | help/man/source. Unless something out in that space can be
       | reliably automated, it also needs some way to ~pin assertions
       | about the arguments to the relevant code in a way that they'll
       | break if that specific corner of the codebase changes. (I've
       | wondered if semgrep expressions could handle this.)
       | 
       | There are also some other bits around that might fit in, though
       | cross-platform is a common-ish problem. We've looked a little at
       | using a FUSE, but that is hard to operationalize on macOS. I also
       | poked at 3 or so less-interactive dynamic approaches a few years
       | back in https://github.com/abathur/commandeer/ and I'm sure a few
       | more have cut their teeth since then. I also wrote a ~shelved-
       | for-now WIP (https://github.com/abathur/faffer) for using some
       | cursed shell metaprogramming to do something like fuzz/tree-shake
       | a shell script for loose dependencies. That isn't directly viable
       | for many languages, but its focus on ~overloading flow-control
       | structures to make it easier for force branch coverage does seem
       | like something that might be feasible with rewriting tools.
        
       | ranger_danger wrote:
       | Does this use eBPF? Can it be expanded to work with other
       | syscalls besides exec? I find myself using not only execsnoop.bt
       | but also opensnoop.bt, statsnoop.bt, tcpconnect.bt etc.
        
       ___________________________________________________________________
       (page generated 2024-05-08 23:01 UTC)