https://rwmj.wordpress.com/2023/02/14/frame-pointers-vs-dwarf-my-verdict/

Richard WM Jones
Skip to content

  * Home
  * About

[dogs]
- SmithForth
February 14, 2023 * 10:52 am
| Jump to Comments

Frame pointers vs DWARF - my verdict

A couple of weeks ago I wrote a blog posting here about Fedora having
frame pointers (LWN backgrounder, HN thread). I made some mistakes in
that blog posting and retracted it, but I wasn't wrong about the
conclusions, just wrong about how I reached them. Frame pointers are
much better than DWARF. DWARF unwinding might have some theoretical
advantages but it's worse in every practical respect.

In particular:

 1. Frame pointers give you much faster profiling with much less
    overhead. This practically means you can do continuous
    performance collection and analysis which would be impossible
    with DWARF.
 2. DWARF unwinding has foot-guns which make it easy to screw up and
    collect insufficient data for analysis. You cannot know in
    advance how much data to collect. The defaults are much too
    small, and even increasing the collection size to unreasonably
    large sizes isn't enough.
 3. The overhead of collecting DWARF callgraph data adversely affects
    what you're trying to analyze.
 4. Frame pointers have some corner cases which they don't handle
    well (certain leaf and most inlined functions aren't collected),
    but these don't matter a great deal in reality.
 5. DWARF unwinding can show inlined functions as if they are
    separate stack frames. (Opinions differ as to whether or not this
    is an advantage.)

Below I'll try to demonstrate some of the issues, but first a little
bit of background is necessary about how all this works.

When you run perf record -a on a workload, the kernel fires a timer
interrupt on every CPU 100s or 1000s of times a second. Each
interrupt must collect a stack trace for that CPU at that moment
which is then sent up to the userspace perf process that writes it to
a perf.data file in the current directory. Obviously collecting this
stack trace and writing it to the file must be done as quickly as
possible with the least overhead.

Also the stack trace may start inside the kernel and go all the way
out to userspace (unless the CPU was running userspace code at the
moment it was interrupted in which case it just collects userspace).
That involves unwinding the two different stacks.

For the kernel stack, the kernel has its own unwinding information
called ORC. For the userspace stack you choose (with the perf
--call-graph option) whether to use frame pointers or DWARF. For
frame pointers the kernel is able to immediately walk up the
userspace stack all the way to the top (assuming everything was
compiled with frame pointers, but that is now true for Fedora 38).
For DWARF however the format is complicated and the kernel cannot
unwind it immediately. Instead the kernel just collects the user
stack. But collecting the whole stack would consume far too much
storage, so by default it only collects the first 8K. Many userspace
stacks will be larger than this, in which case the data collection
will simply be incomplete - it will never be possible to recover the
full stack trace. You can adjust the size of stack collected, but
that massively bloats the perf.data file as we'll see below.

To demonstrate what I mean, I collected a set of traces using fio and
nbdkit on Fedora 38, using both frame pointers and DWARF. The command
is:

sudo perf record -a -g [--call-graph=...] -- nbdkit -U - null 1G --run 'export uri; fio nbd.fio'

with the nbd.fio file from fio's examples.

I used no --call-graph option for collecting frame pointers (as it is
the default), and --call-graph=dwarf,{4096,8192,16384,32768} to
collect the DWARF examples with 4 different stack sizes.

I converted the resulting data into flame graphs using Brendan
Gregg's tools.

Everything was run on my idle 12 core / 24 thread AMD development
machine.

Type           Size of perf.data Lost chunks Flame graph
Frame pointers 934 MB            0           Link
DWARF (4K)     10,104 MB         425         Link
DWARF (8K)     18,733 MB         1,643       Link
DWARF (16K)    35,149 MB         5,333       Link
DWARF (32K)    57,590 MB         545,024     Link

The first most obvious thing is that even with the smallest stack
data collection, DWARF's perf.data is over 10 times larger, and it
balloons even larger once you start to collect more reasonable stack
sizes. For a single minute of data collection, collecting 10s of
gigabytes of data is not very practical even on high end machines,
and continuous performance analysis would be impossible at these data
rates.

Related to this, the overhead of perf increases. It is ~ 0.1% for
frame pointers. For DWARF the overhead goes: 0.8% (4K), 1.5% (8K),
2.8% (16K), 2.7% (32K). But this disguises the true overhead because
it doesn't count the cost of writing to disk. Unfortunately on this
machine I have full disk encryption enabled (which does add a lot to
the overhead of writing nearly 60 GB of perf data), but you can see
the overhead of all that encryption separate from perf in the flame
graph. The total overhead of perf + writing + encryption is about
20%.

[perf-overh]

This may also be the reason for seeing so many "lost chunks" even on
this very fast machine. All of the DWARF tests even at the smallest
size printed:

Check IO/CPU overload!

But is the DWARF data accurate? Clearly not. This is to be expected,
collecting a partial user stack is not going to be enough to
reconstruct a stack trace, but remember that even with 4K of stack,
the perf.data is already > 10 times larger than for frame pointers.
Zooming in to the nbdkit process only and comparing the flamegraphs
shows significant amounts of incomplete stack traces, even when
collecting 32K of stack.

On the left, nbdkit with frame pointers (correct). On the right,
nbdkit with DWARF and 32K collection size. Notice on the right the
large number of unattached frames. nbdkit main() does not directly
call Unix domain socket send and receive functions!

[framepointer-nbdkit][dwarf-32k-nbdkit]

If 8K (the default) is insufficient, and even 32K is not enough, how
large do we need to make the DWARF stack collection? I couldn't find
out because I don't have enough space for the expected 120 GB
perf.data file at the next size up.

Let's have a look at one thing which DWARF can do -- show inlined and
leaf functions. The stack trace for these is more accurate as you can
see below. (To reproduce, zoom in on the nbd_poll function). On the
left, frame pointers. On the right DWARF with 32K stacks, showing the
extra enter_* frames which are inlined.

[nbdpoll-bo]

My final summary here is that for most purposes you would be better
off using frame pointers, and it's a good thing that Fedora 38 now
compiles everything with frame pointers. It should result in easier
performance analysis, and even makes continuous performance analysis
more plausible.

Advertisement

Share this:

  * Reddit
  * Twitter
  * Email
  * Print
  * 

Like this:

Like Loading...

Related

Leave a comment

Filed under Uncategorized

Tagged as dwarf, fedora, fio, frame pointers, nbdkit, performance,
performance analysis

- SmithForth

Leave a Reply Cancel reply

Enter your comment here...
[                    ]

Fill in your details below or click an icon to log in:

  *  
  *  
  *  
  *  

Gravatar
Email (required) (Address never made public)
[                    ]
Name (required)
[                    ]
Website
[                    ]
WordPress.com Logo

You are commenting using your WordPress.com account. ( Log Out / 
Change )

Twitter picture

You are commenting using your Twitter account. ( Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. ( Log Out /  Change )

Cancel

Connecting to %s

[ ] Notify me of new comments via email.

[ ] Notify me of new posts via email.

[Post Comment] 

 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
D[                                             ] 

This site uses Akismet to reduce spam. Learn how your comment data is
processed.

  * Search for: [                    ] [Search]
  * Recent Posts

      + Frame pointers vs DWARF - my verdict
      + SmithForth
      + Frame pointers - an important update
      + nbdkit + libblkio
      + Creating a modifiable gzipped disk image
      + An NBD block device written using Linux ublk (user
        block device)
      + nbdkit for macOS
      + SSH from RHEL 9 to RHEL 5 or RHEL 6
      + Composable tools for disk images
      + nbdkit now supports LUKS encryption
      + Installing Fedora 34 on my Turing Pi 7 node cluster
      + Interview for Red Hat Blog
      + HiFive Unmatched
      + BeagleV
      + Turing Pi 1
      + nbdkit 1.24 & libnbd 1.6, new copying tool
      + nbdkit 1.24, new data plugin features
      + Read and writing VMware .vmdk disks
      + Loop mount an S3 or Ceph object
      + Ridiculously big "files"
  * Recent Comments

            Frame pointers vs DW... on AMD Ryzen 9 3900X -...
            Frame pointers vs DW... on Frame pointers - an impo...
    [8e232] Laszlo Ersek on SmithForth
    [bc8b8] CH on Creating a modifiable gzipped...
            nbdkit for macOS | R... on nbdkit now ported to Wind...
    [ac6a6] CE on Why the Windows Registry sucks...
    [e3ec3] rich on nbdkit now supports LUKS ...
    [44d01] Sunil on nbdkit now supports LUKS ...
    [e3ec3] rich on Does virt-v2v preserve sp...
    [90f5e] Michelle on Does virt-v2v preserve sp...
  * About the author

    I am Richard W.M. Jones, a computer programmer. I have strong
    opinions on how we write software, about Reason and the
    scientific method. Consequently I am an atheist [To nutcases:
    Please stop emailing me about this, I'm not interested in your
    views on it] By day I work for Red Hat on all things to do with
    virtualization. I am a "citizen of the world".

    My motto is "often wrong". I don't mind being wrong (I'm often
    wrong), and I don't mind changing my mind.

    This blog is not affiliated or endorsed by Red Hat and all views
    are entirely my own.

  * 

    aarch64 AMD ARM bbc c++ centos cluster cron debian disk image
    disk images febootstrap fedora filesystems fosdem fpga FUSE git
    guestfish guestfs-browser guestmount hardware hivex ideas kernel
    kvm kvm forum libguestfs libguestfs-1.12 libnbd libvirt linux lvm
    nbd nbdkit ocaml odroid openstack performance perl programming
    python qemu rants red hat registry rhel risc-v rpm security ssh
    tip ubuntu v2v video virt-builder virt-cat virt-df virt-edit
    virt-inspector virt-install virt-manager virt-p2v virt-rescue
    virt-resize virt-sysprep virt-tools virt-v2v virt-win-reg
    virtualization virtual machine vmware whenjobs windows windows
    registry
  * RSS Feed RSS - Posts

    RSS Feed RSS - Comments

Richard WM Jones * Virtualization, tools and tips
Blog at WordPress.com.
[Close and accept] Privacy & Cookies: This site uses cookies. By
continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie
Policy
  * Follow Following
      + [wpcom-] Richard WM Jones
        Join 239 other followers
        [                    ]
        Sign me up
      + Already have a WordPress.com account? Log in now.
  * 
      + [wpcom-] Richard WM Jones
      + Customize
      + Follow Following
      + Sign up
      + Log in
      + Copy shortlink
      + Report this content
      + View post in Reader
      + Manage subscriptions
      + Collapse this bar

 

  

Loading Comments...
 
Write a Comment... [                    ]
Email (Required) [                    ] Name (Required)
[                    ] Website [                    ]
[Post Comment]

%d bloggers like this:

[b]