https://lwn.net/SubscriberLink/885729/c495a793abeee387/ LWN.net Logo LWN .net News from the source LWN * Content + Weekly Edition + Archives + Search + Kernel + Security + Distributions + Events calendar + Unread comments + ------------------------------------------------------------- + LWN FAQ + Write for us * Edition + Return to the Front page User: [ ] Password: [ ] [Log in] | [Subscribe] | [Register] Subscribe / Log in / New account Better visibility into packet-dropping decisions [LWN subscriber-only content] Welcome to LWN.net The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net! By Jonathan Corbet February 25, 2022 Dropped packets are a fact of life in networking; there can be any number of reasons why a packet may not survive the journey to its destination. Indeed, there are so many ways that a packet can meet its demise that it can be hard for an administrator to tell why packets are being dropped. That, in turn, can make life difficult in times when users are complaining about high packet-loss rates. Starting with 5.17, the kernel is getting some improved instrumentation that should shed some light on why the kernel decides to route packets into the bit bucket. This problem is not new, and neither are attempts to address it. The kernel currently contains a "drop_monitor" functionality that was introduced in the 2.6.30 kernel back in 2009. Over the years, it has gained some functionality but has managed to remain thoroughly and diligently undocumented. This feature appears to support a netlink API that can deliver notifications when packets are dropped. Those notifications include an address within the kernel showing where the decision to drop the packet was made, and can optionally include the dropped packets themselves. User-space code can turn the addresses into function names; desperate administrators can then dig through the kernel source to try to figure out what is going on. It seems like there should be a better way. As it happens, the beginning of the infrastructure to provide that better way was contributed to 5.17 by Menglong Dong. The internal kernel function that frees the memory holding a packet is kfree_skb(); in 5.17, that function has become: void kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason); The reason argument is new; it is intended to say why the packet passed as skb has reached the end of the line. This information is not actually useful to the kernel, but it has been added to the existing kfree_skb tracepoint, making it available to any program that connects to that tracepoint. Analysis scripts can quickly print out why packets are being dropped; administrators can also attach BPF programs to, for example, create a histogram of reasons for dropped packets. A new version of kfree_skb() has also been added; it simply calls kfree_skb_reason() with "unspecified" as the reason. In 5.17, the use of this infrastructure is relatively limited. There are a few TCP-level drop locations that have been instrumented with the new call, including code that drops packets for being smaller than the TCP header size, not being associated with an existing TCP socket, exhibiting checksum failures, or having been explicitly dropped by an add-on socket filter program. The UDP subsystem has also been enhanced to note those same reasons for dropped packets. The situation is set to improve considerably in 5.18; patches already in linux-next add a number of new reasons. These document packets dropped by the netfilter subsystem, that contain IP-header errors, or have been identified as a spoofed packet by the reverse-path filter (rp_filter) mechanism. Administrators will be able to see IP packets that have been dropped due to an unsupported higher-level protocol. Reasons have also been added for UDP packets dropped by the IPSec XFRM policy or a lack of memory within the kernel. There is yet another set of reason annotations that has been accepted, but which has not yet found its way into linux-next; chances are that these will show up in 5.18 as well. They extend the XFRM-policy annotation to TCP, note packets dropped due to missing or incorrect MD5 hashes (which are evidently still a thing in 2020), as well as those containing invalid TCP flags or sequence numbers outside of the current TCP window. These patches also add new instances of the other reasons noted above; some situations can be detected in multiple places. While the above set of reasons may seem long, this work could be seen as having just begun. In current linux-next, there are over 2,700 calls to kfree_skb(), compared to 18 to kfree_skb_reason(). That suggests that a lot of packets will still be dropped for unspecified reasons. Still, this work represents a useful step forward, one that should make many of the reasons for packet loss more readily available to system administrators. The part that remains missing, of course, is the user-space side. The current reason codes are all defined in , which is not part of the externally available kernel API. Moving them to a separate file under the uapi directory would make them more accessible to developers. Also helpful, of course, would be to have some documentation for this mechanism and how to use it (and interpret the results), but even your editor, often cited for naive optimism, will not be holding his breath for that to show up. Meanwhile, though, an important piece of the kernel's network functionality is becoming a little more transparent to users. That should make life easier for system administrators who will be able to spend less time trying to figure out why packets aren't making it through the system. Unfortunately, though, this work offers no help for users who are wondering why their packets are disappearing somewhere in the far reaches of the Internet. Index entries for this article Kernel Networking [Send a free link] ----------------------------------------- (Log in to post comments) Better visibility into packet-dropping decisions Posted Feb 25, 2022 20:29 UTC (Fri) by atnot (subscriber, #124910) [ Link] Has this been considered for other things too? I regularly find myself wishing something like this existed for figuring out which of the many mechanism an EPERM/EACCES was caused by (unix permissions, acl, selinux and other LSMs, file systems, dm layers, cgroups, namespaces, seccomp, capabilities, API misuse, ...) [Reply to this comment] Better visibility into packet-dropping decisions Posted Feb 26, 2022 2:04 UTC (Sat) by shemminger (subscriber, #5739) [Link] Netlink was enhanced to provide error messages (not just errno). Many places have it, but lots still need work -- volunteers wanted. [Reply to this comment] Better visibility into packet-dropping decisions Posted Feb 26, 2022 5:52 UTC (Sat) by tititou (subscriber, #75162) [ Link] Hi, Can you provide a link or an example about it ? [Reply to this comment] Better visibility into packet-dropping decisions Posted Feb 26, 2022 19:03 UTC (Sat) by johill (subscriber, #25196) [ Link] Check out commit 2d4bc93368f5a ("netlink: extended ACK reporting") which added the bare minimum infrastructure a long time ago, and you can find many users of NL_SET_ERR_MSG/GENL_SET_ERR_MSG (and similar macros) these days. It supports reporting a string (error message), a pointer to a bad attribute, and if NL_SET_ERR_MSG_ATTR_POL was used (which it is in the general policy-based parsing) will even return the policy for the attribute back to userspace to explain why the attribute failed (e.g. if it's NLA_RANGE(U32, 1,2) and you gave a value 3). [Reply to this comment] return -Exxxxx; Posted Feb 26, 2022 15:20 UTC (Sat) by jreiser (subscriber, #11027) [ Link] There is a need for a facility to locate at run time every failed subroutine call. The source code be edited with sed so that return -Exxxxx; becomes return ErrorCode(Exxxxx); with a default macro definiton something like #ifndef ErrorCode #define ErrorCode(errnum) -(errnum) #endif Then the determined investigator can re-compile selected source files with something like #define ErrorCode(errnum) myErrorDiagnostic(errnum, __builtin_return_address(0), __FUNCTION__, __LINE__) and supply a definition for the added subroutine myErrorDiagnostic. Of course there are a handful of cases where error numbers are variables or the syntax is complex, and also a few places where simple automated editing fails. Rate limiting the reporting can be a problem. But I did this once, and got the answer I wanted. [Reply to this comment] return -Exxxxx; Posted Feb 26, 2022 19:05 UTC (Sat) by johill (subscriber, #25196) [ Link] In most files you can even just #define EINVAL ({printk(...); 22;}) if you really want :-) [Reply to this comment] return -Exxxxx; Posted Feb 27, 2022 3:21 UTC (Sun) by roc (subscriber, #30627) [Link] That would surely fail to build with EINVAL being used in a case label. [Reply to this comment] return -Exxxxx; Posted Feb 27, 2022 9:17 UTC (Sun) by jengelh (subscriber, #33263) [ Link] Good thing the main kernel has just two `case EINVAL` across its ~30 million lines. [Reply to this comment] Better visibility into packet-dropping decisions Posted Feb 26, 2022 4:49 UTC (Sat) by alison (subscriber, #63752) [ Link] Assuredly knowing when packets are dropped because NAPI polling isn't keeping up with what's incoming would be valuable. Yeah, I'm sure that patches and test data would be welcome. [Reply to this comment] Better visibility into packet-dropping decisions Posted Feb 27, 2022 21:26 UTC (Sun) by shemminger (subscriber, #5739) [Link] In order to see packets dropping because CPU can't keep up you have to look at the hardware statistics. This is reported in rx_missed. Not sure if there more that HW can tell you. There are lots of rx_dropped places in drivers, these could/should be instrumented. [Reply to this comment] Better visibility into packet-dropping decisions Posted Feb 27, 2022 23:43 UTC (Sun) by amarao (subscriber, #87073) [ Link] Md5 for TCP is really a single good protection against RST attacks on BGP. You can filter ingress, but there always is a risk to miss something. Having MD allow to have month-long tcp session without risks of malicious rst. [Reply to this comment] Better visibility into packet-dropping decisions Posted Mar 2, 2022 3:25 UTC (Wed) by MaZe (subscriber, #53908) [Link] eh, most uses of tcp md5 are pretty pointless because they just use well known passwords... [Reply to this comment] Better visibility into packet-dropping decisions Posted Mar 2, 2022 9:58 UTC (Wed) by amarao (subscriber, #87073) [ Link] I do understand you. When a new session is agreed with a party, a password is provided together with IP and AS number. Even md5 is considered hopelessly broken, for the sake of RST protection it is more than enough, because even 32 additional bits pushes attack from `feasible` to `unfeasible` realm. [Reply to this comment] Copyright (c) 2022, Eklektix, Inc. Comments and public postings are copyrighted by their creators. Linux is a registered trademark of Linus Torvalds