https://lwn.net/SubscriberLink/885729/c495a793abeee387/

LWN.net Logo LWN
.net News from the source LWN

  * Content
      + Weekly Edition
      + Archives
      + Search
      + Kernel
      + Security
      + Distributions
      + Events calendar
      + Unread comments
      + -------------------------------------------------------------
      + LWN FAQ
      + Write for us
  * Edition
      + Return to the Front page

User: [        ] Password: [        ] [Log in]
|
[Subscribe]
|
[Register]
Subscribe / Log in / New account

Better visibility into packet-dropping decisions

[LWN subscriber-only content]

    Welcome to LWN.net

    The following subscription-only content has been made available
    to you by an LWN subscriber. Thousands of subscribers depend on
    LWN for the best news from the Linux and free software
    communities. If you enjoy this article, please consider
    subscribing to LWN. Thank you for visiting LWN.net!

By Jonathan Corbet
February 25, 2022
Dropped packets are a fact of life in networking; there can be any
number of reasons why a packet may not survive the journey to its
destination. Indeed, there are so many ways that a packet can meet
its demise that it can be hard for an administrator to tell why
packets are being dropped. That, in turn, can make life difficult in
times when users are complaining about high packet-loss rates.
Starting with 5.17, the kernel is getting some improved
instrumentation that should shed some light on why the kernel decides
to route packets into the bit bucket.

This problem is not new, and neither are attempts to address it. The
kernel currently contains a "drop_monitor" functionality that was
introduced in the 2.6.30 kernel back in 2009. Over the years, it has
gained some functionality but has managed to remain thoroughly and
diligently undocumented. This feature appears to support a netlink
API that can deliver notifications when packets are dropped. Those
notifications include an address within the kernel showing where the
decision to drop the packet was made, and can optionally include the
dropped packets themselves. User-space code can turn the addresses
into function names; desperate administrators can then dig through
the kernel source to try to figure out what is going on.

It seems like there should be a better way. As it happens, the
beginning of the infrastructure to provide that better way was
contributed to 5.17 by Menglong Dong. The internal kernel function
that frees the memory holding a packet is kfree_skb(); in 5.17, that
function has become:

    void kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason);

The reason argument is new; it is intended to say why the packet
passed as skb has reached the end of the line. This information is
not actually useful to the kernel, but it has been added to the
existing kfree_skb tracepoint, making it available to any program
that connects to that tracepoint. Analysis scripts can quickly print
out why packets are being dropped; administrators can also attach BPF
programs to, for example, create a histogram of reasons for dropped
packets.

A new version of kfree_skb() has also been added; it simply calls
kfree_skb_reason() with "unspecified" as the reason.

In 5.17, the use of this infrastructure is relatively limited. There
are a few TCP-level drop locations that have been instrumented with
the new call, including code that drops packets for being smaller
than the TCP header size, not being associated with an existing TCP
socket, exhibiting checksum failures, or having been explicitly
dropped by an add-on socket filter program. The UDP subsystem has
also been enhanced to note those same reasons for dropped packets.

The situation is set to improve considerably in 5.18; patches already
in linux-next add a number of new reasons. These document packets
dropped by the netfilter subsystem, that contain IP-header errors, or
have been identified as a spoofed packet by the reverse-path filter
(rp_filter) mechanism. Administrators will be able to see IP packets
that have been dropped due to an unsupported higher-level protocol.
Reasons have also been added for UDP packets dropped by the IPSec
XFRM policy or a lack of memory within the kernel.

There is yet another set of reason annotations that has been
accepted, but which has not yet found its way into linux-next;
chances are that these will show up in 5.18 as well. They extend the
XFRM-policy annotation to TCP, note packets dropped due to missing or
incorrect MD5 hashes (which are evidently still a thing in 2020), as
well as those containing invalid TCP flags or sequence numbers
outside of the current TCP window. These patches also add new
instances of the other reasons noted above; some situations can be
detected in multiple places.

While the above set of reasons may seem long, this work could be seen
as having just begun. In current linux-next, there are over 2,700
calls to kfree_skb(), compared to 18 to kfree_skb_reason(). That
suggests that a lot of packets will still be dropped for unspecified
reasons. Still, this work represents a useful step forward, one that
should make many of the reasons for packet loss more readily
available to system administrators.

The part that remains missing, of course, is the user-space side. The
current reason codes are all defined in <linux/skbuff.h>, which is
not part of the externally available kernel API. Moving them to a
separate file under the uapi directory would make them more
accessible to developers. Also helpful, of course, would be to have
some documentation for this mechanism and how to use it (and
interpret the results), but even your editor, often cited for naive
optimism, will not be holding his breath for that to show up.

Meanwhile, though, an important piece of the kernel's network
functionality is becoming a little more transparent to users. That
should make life easier for system administrators who will be able to
spend less time trying to figure out why packets aren't making it
through the system. Unfortunately, though, this work offers no help
for users who are wondering why their packets are disappearing
somewhere in the far reaches of the Internet.

Index entries for this article
Kernel     Networking


[Send a free link]


-----------------------------------------
(Log in to post comments)

Better visibility into packet-dropping decisions

Posted Feb 25, 2022 20:29 UTC (Fri) by atnot (subscriber, #124910) [
Link]

Has this been considered for other things too? I regularly find
myself wishing something like this existed for figuring out which of
the many mechanism an EPERM/EACCES was caused by (unix permissions,
acl, selinux and other LSMs, file systems, dm layers, cgroups,
namespaces, seccomp, capabilities, API misuse, ...)
[Reply to this comment]
Better visibility into packet-dropping decisions

Posted Feb 26, 2022 2:04 UTC (Sat) by shemminger (subscriber, #5739)
[Link]

Netlink was enhanced to provide error messages (not just errno).
Many places have it, but lots still need work -- volunteers wanted.
[Reply to this comment]
Better visibility into packet-dropping decisions

Posted Feb 26, 2022 5:52 UTC (Sat) by tititou (subscriber, #75162) [
Link]

Hi,
Can you provide a link or an example about it ?
[Reply to this comment]
Better visibility into packet-dropping decisions

Posted Feb 26, 2022 19:03 UTC (Sat) by johill (subscriber, #25196) [
Link]

Check out commit 2d4bc93368f5a ("netlink: extended ACK reporting")
which added the bare minimum infrastructure a long time ago, and you
can find many users of NL_SET_ERR_MSG/GENL_SET_ERR_MSG (and similar
macros) these days.

It supports reporting a string (error message), a pointer to a bad
attribute, and if NL_SET_ERR_MSG_ATTR_POL was used (which it is in
the general policy-based parsing) will even return the policy for the
attribute back to userspace to explain why the attribute failed (e.g.
if it's NLA_RANGE(U32, 1,2) and you gave a value 3).
[Reply to this comment]
return -Exxxxx;

Posted Feb 26, 2022 15:20 UTC (Sat) by jreiser (subscriber, #11027) [
Link]

There is a need for a facility to locate at run time every failed
subroutine call. The source code be edited with sed so that return
-Exxxxx; becomes return ErrorCode(Exxxxx); with a default macro
definiton something like

     #ifndef ErrorCode
     #define ErrorCode(errnum) -(errnum)
     #endif

Then the determined investigator can re-compile selected source files
with something like

     #define ErrorCode(errnum) myErrorDiagnostic(errnum, __builtin_return_address(0), __FUNCTION__, __LINE__)

and supply a definition for the added subroutine myErrorDiagnostic.
Of course there are a handful of cases where error numbers are
variables or the syntax is complex, and also a few places where
simple automated editing fails. Rate limiting the reporting can be a
problem. But I did this once, and got the answer I wanted.
[Reply to this comment]
return -Exxxxx;

Posted Feb 26, 2022 19:05 UTC (Sat) by johill (subscriber, #25196) [
Link]

In most files you can even just

#define EINVAL ({printk(...); 22;})

if you really want :-)
[Reply to this comment]
return -Exxxxx;

Posted Feb 27, 2022 3:21 UTC (Sun) by roc (subscriber, #30627) [Link]

That would surely fail to build with EINVAL being used in a case
label.
[Reply to this comment]
return -Exxxxx;

Posted Feb 27, 2022 9:17 UTC (Sun) by jengelh (subscriber, #33263) [
Link]

Good thing the main kernel has just two `case EINVAL` across its ~30
million lines.
[Reply to this comment]
Better visibility into packet-dropping decisions

Posted Feb 26, 2022 4:49 UTC (Sat) by alison (subscriber, #63752) [
Link]

Assuredly knowing when packets are dropped because NAPI polling isn't
keeping up with what's incoming would be valuable. Yeah, I'm sure
that patches and test data would be welcome.
[Reply to this comment]
Better visibility into packet-dropping decisions

Posted Feb 27, 2022 21:26 UTC (Sun) by shemminger (subscriber, #5739)
[Link]

In order to see packets dropping because CPU can't keep up you have
to look at the hardware statistics.
This is reported in rx_missed. Not sure if there more that HW can
tell you.
There are lots of rx_dropped places in drivers, these could/should be
instrumented.
[Reply to this comment]
Better visibility into packet-dropping decisions

Posted Feb 27, 2022 23:43 UTC (Sun) by amarao (subscriber, #87073) [
Link]

Md5 for TCP is really a single good protection against RST attacks on
BGP. You can filter ingress, but there always is a risk to miss
something. Having MD allow to have month-long tcp session without
risks of malicious rst.
[Reply to this comment]
Better visibility into packet-dropping decisions

Posted Mar 2, 2022 3:25 UTC (Wed) by MaZe (subscriber, #53908) [Link]

eh, most uses of tcp md5 are pretty pointless because they just use
well known passwords...
[Reply to this comment]
Better visibility into packet-dropping decisions

Posted Mar 2, 2022 9:58 UTC (Wed) by amarao (subscriber, #87073) [
Link]

I do understand you. When a new session is agreed with a party, a
password is provided together with IP and AS number. Even md5 is
considered hopelessly broken, for the sake of RST protection it is
more than enough, because even 32 additional bits pushes attack from
`feasible` to `unfeasible` realm.
[Reply to this comment]

                  Copyright (c) 2022, Eklektix, Inc.
   Comments and public postings are copyrighted by their creators.
          Linux is a registered trademark of Linus Torvalds