From sthaug@nethelp.no  Sun Jun 22 06:37:05 1997
Received: from verdi.nethelp.no (verdi.nethelp.no [195.1.171.130])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id GAA01872
          for <FreeBSD-gnats-submit@freebsd.org>; Sun, 22 Jun 1997 06:37:03 -0700 (PDT)
Received: (qmail 29392 invoked by uid 1001); 22 Jun 1997 13:36:54 +0000 (GMT)
Message-Id: <19970622133654.29391.qmail@verdi.nethelp.no>
Date: 22 Jun 1997 13:36:54 +0000 (GMT)
From: sthaug@nethelp.no
Reply-To: sthaug@nethelp.no
To: FreeBSD-gnats-submit@freebsd.org
Cc: sthaug@nethelp.no
Subject: SO_SNDLOWAT of 0 causes kernel to use 99% of CPU time on TCP send
X-Send-Pr-Version: 3.2

>Number:         3925
>Category:       kern
>Synopsis:       SO_SNDLOWAT of 0 causes kernel to use 99% of CPU time on TCP send
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    wollman
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Jun 22 06:40:01 PDT 1997
>Closed-Date:    Sun Apr 12 03:51:45 PDT 1998
>Last-Modified:  Sun Apr 12 03:52:07 PDT 1998
>Originator:     Steinar Haug
>Release:        FreeBSD 2.2-961014-SNAP i386
>Organization:
Nethelp Consulting
>Environment:

This may be a generic 4.4BSD networking bug. It applies to NetBSD-1.2, FreeBSD
(all versions I've checked, eg. 2.2-BETA, 3.0-970124-SNAP). It was discovered
because BIND-8.1.1-T2B named sets SO_SNDLOWAT to 0 if SO_SNDLOWAT is defined.

>Description:

Setting SO_SNDLOWAT to 0 and then sending data with TCP to a site which is more
than a few milliseconds away (eg. connected over a 28.800 link) causes the kernel
to use 99% of the CPU time, as reported by vmstat or top. Using a SO_SNDLOWAT > 0
makes the problem disappear.

It's quite possible that this effect *always* occurs, but it's much more visible
against a slow site.

>How-To-Repeat:

Compile the following program, which sends a number of buffers to the discard
port at a given address. Run the program with

% tstlowat 0 50 ip-address

ie. send 50 buffers to a given address with SO_SNDLOWAT set to 0. Observe with
vmstat how the kernel uses all of the CPU.

#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

main(int argc, char *argv[])
{
	struct sockaddr_in sin;
	int i, n, s, sndlowat;
	char buf[65536];

	sndlowat = atoi(argv[1]);
	n = atoi(argv[2]);

	if ((s = socket(PF_INET, SOCK_STREAM, 0)) < 0) {
		perror("socket"); exit(1);
	}
	if (setsockopt(s, SOL_SOCKET, SO_SNDLOWAT, (char *)&sndlowat,
		       sizeof sndlowat) < 0) {
		perror("setsockopt"); exit(1);
	}
	sin.sin_port = htons(9);	/* Discard port */
	sin.sin_family = AF_INET;
	if (inet_aton(argv[3], &sin.sin_addr) == 0) {
		fprintf(stderr, "inet_aton"); exit(1);
	}
	if (connect(s, (struct sockaddr *)&sin, sizeof sin) < 0) {
		perror("connect"); exit(1);
	}

	for (i=0; i<n; i++) {
		if (write(s, buf, sizeof buf) < 0) {
			perror("write"); exit(1);
		}
	}
}


>Fix:
	
Sorry, no known fix. Easy workaround is to use a SO_SNDLOWAT which is > 0.

>Release-Note:
>Audit-Trail:

From: sthaug@nethelp.no
To: freebsd-gnats-submit@freebsd.org
Cc: sthaug@nethelp.no
Subject: Re: kern/3925: SO_SNDLOWAT of 0 causes kernel to use 99% of CPU time on TCP send
Date: Sun, 22 Jun 1997 16:17:13 +0200

 A quick followup to my bug report: The bug appears in NetBSD also. One of
 the NetBSD persons here made a more thorough analysis of the problem. Here
 is the corresponding NetBSD problem report.
 
 Steinar Haug, Nethelp consulting, sthaug@nethelp.no
 ----------------------------------------------------------------------
 Subject: Setting SO_SNDLOWAT to 0 causes busy-wait inside kernel
 From: Havard Eidnes <he@vader.runit.sintef.no>
 To: gnats-bugs@gnats.netbsd.org
 Date: Sun, 22 Jun 1997 15:45:56 +0200 (MEST)
 
 
 >Submitter-Id:	net
 >Originator:	Havard Eidnes
 >Organization:	SINTEF RUNIT
 >Confidential:	no
 >Synopsis:	Setting SO_SNDLOWAT to 0 causes busy-wait inside kernel
 >Severity:	critical
 >Priority:	high
 >Category:	kern
 >Class:		sw-bug
 >Release:	NetBSD-1.2.1 (and newer versions too)
 >Environment:   System: NetBSD vader.runit.sintef.no 1.2G NetBSD 1.2G (VADER) #2: Mon Jun 16 21:58:48 MEST 1997 he@vader.runit.sintef.no:/usr/src/sys/arch/i386/compile/VADER i386
 
 
 >Description:
 	Setting SO_SNDLOWAT to 0 on a TCP socket and sending to a non-
 	local host (reachable with some delay and/or with limited bandwidth)
 	will cause the sending machine to go into busy-wait inside the kernel.
 
 	What appears to happen is this:
 
 	 o select() will always return that the socket is writable, even when
 	   sbspace() returns 0 due to the >= comparison in the sowritable()
 	   macro.
 
 	 o once the user writes, it appears that sosend() will loop internally
 	   since sbspace() is 0, no data will be added to the outgoing buffer,
 	   and the residue for each turn of the loop stays the same.  The test
 	   for the low-water mark does not kick in, so sbwait() will not be
 	   called.
 
 	It can clearly be argued that the program setting SO_SNDLOWAT to 0
 	is buggy, but the robustness against this mis-setting should be better
 	to prevent denial-of-service attacks by local users.
 
 >How-To-Repeat:
 	Run the attached program towards a non-local host reachable via
 	a "thin" line or over some delay by calling it:
 
 	% tstlowat low-water-mark number-of-buffers ip-address
 
 	and trying low-water-mark of 0.
 
 --- snip, snip --
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <netinet/in.h>
 #include <arpa/inet.h>
 
 main(int argc, char *argv[])
 {
 	struct sockaddr_in sin;
 	int i, n, s, sndlowat;
 	char buf[65536];
 
 	sndlowat = atoi(argv[1]);
 	n = atoi(argv[2]);
 
 	if ((s = socket(PF_INET, SOCK_STREAM, 0)) < 0) {
 		perror("socket"); exit(1);
 	}
 	if (setsockopt(s, SOL_SOCKET, SO_SNDLOWAT, (char *)&sndlowat,
 		       sizeof sndlowat) < 0) {
 		perror("setsockopt"); exit(1);
 	}
 	sin.sin_port = htons(9);	/* Discard port */
 	sin.sin_family = AF_INET;
 	if (inet_aton(argv[3], &sin.sin_addr) == 0) {
 		fprintf(stderr, "inet_aton"); exit(1);
 	}
 	if (connect(s, (struct sockaddr *)&sin, sizeof sin) < 0) {
 		perror("connect"); exit(1);
 	}
 
 	for (i=0; i<n; i++) {
 		if (write(s, buf, sizeof buf) < 0) {
 			perror("write"); exit(1);
 		}
 	}
 }
 --- snip, snip ---
 
 >Fix:
 	Sorry, I don't know.
 
 	Does a socket low-water mark of 0 really make sense?
 	If not, return EINVAL or something like that on an attempt at
 	setting it to 0?

From: Peter Wemm <peter@spinner.dialix.com.au>
To: sthaug@nethelp.no
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/3925: SO_SNDLOWAT of 0 causes kernel to use 99% of CPU time on TCP send 
Date: Sun, 22 Jun 1997 23:20:44 +0800

 When setting the parameters, there is no validity checking:
                         case SO_SNDBUF:
                         case SO_RCVBUF:
                                 if (sbreserve(optname == SO_SNDBUF ?   
                                     &so->so_snd : &so->so_rcv,
                                     (u_long) *mtod(m, int *)) == 0) {
                                         error = ENOBUFS;
                                         goto bad;
                                 }
                                 break;
 
                         case SO_SNDLOWAT:
                                 so->so_snd.sb_lowat = *mtod(m, int *);
                                 break;
                         case SO_RCVLOWAT:
                                 so->so_rcv.sb_lowat = *mtod(m, int *);
                                 break;
                         }
 
 However, soreserve() clips the results:
 soreserve(so, sndcc, rcvcc)
 {
         if (sbreserve(&so->so_snd, sndcc) == 0)
                 goto bad;
         if (sbreserve(&so->so_rcv, rcvcc) == 0)
                 goto bad2;
         if (so->so_rcv.sb_lowat == 0)
                 so->so_rcv.sb_lowat = 1;
         if (so->so_snd.sb_lowat == 0)
                 so->so_snd.sb_lowat = MCLBYTES;
         if (so->so_snd.sb_lowat > so->so_snd.sb_hiwat)
                 so->so_snd.sb_lowat = so->so_snd.sb_hiwat;
         return (0);
 [..]
 
 sbreserve() also clips:
 
 sbreserve(sb, cc)
 {
 [..]
         sb->sb_hiwat = cc;
         sb->sb_mbmax = min(cc * sb_efficiency, sb_max);
         if (sb->sb_lowat > sb->sb_hiwat)
                 sb->sb_lowat = sb->sb_hiwat;
         return (1);
 [..]
 
 It seems to me that SO_SNDLOWAT = 0 is an error..  Depending on the timing 
 of the setsockopt() calls relative to when soreserve() has been called, a 
 value of zero is set to MCLBYTES or left at zero.
 
 I suspect the reason that there is no parameter checking at set time is so 
 that the sanity checking is done after _all_ the parameters are set..  
 Otherwise, doing a series of setsockopt()'s could be a bit hairy if all 
 combinations along the way were sanity checked.  For example, if lowat is 
 1 and hiwat is 1024, and you wanted to change it to 2048/8192, under the 
 present system setting lowat to 2048 first then hiwat next would work, but 
 checking lowat > hiwat along the way would leave you with 1/8192 as a 
 result which is not what would be expected.
 
 I guess the question is, when is soreserve() called to do the sanity 
 check?  Does this mean that soreserve() is being missed somewhere along 
 the way and allowing nonsensical hi/low values to be used on a different 
 size buffer?
 
 Cheers,
 -Peter
 
 

From: Bill Fenner <fenner@parc.xerox.com>
To: freebsd-gnats-submit@freebsd.org, peter@spinner.dialix.com.au
Cc:  Subject: Re: kern/3925: SO_SNDLOWAT of 0 causes kernel to use 99% of CPU time on TCP send
Date: Sun, 22 Jun 1997 10:05:35 PDT

 soreserve() is usually only called when creating a new socket.
 
 Sanity-checking 0 is clearly acceptable when doing the setsockopt().
 Since soreserve() silently "fixes" it, perhaps setsockopt() should
 too.  I don't know what to think about sanity-checking in other
 situations.
 
   Bill

From: sthaug@nethelp.no
To: freebsd-gnats-submit@freebsd.org
Cc:  Subject: Re: kern/3925: SO_SNDLOWAT of 0 causes kernel to use 99% of CPU time on TCP send
Date: Wed, 25 Jun 1997 10:58:20 +0200

 The problem is now fixed in NetBSD, here is the log message.
 
 Steinar Haug, Nethelp consulting, sthaug@nethelp.no
 ----------------------------------------------------------------------
 thorpej
 Tue Jun 24 13:04:46 PDT 1997
 Update of /cvsroot/src/sys/kern
 In directory netbsd1:/var/slash-tmp/cvs-serv25765
 
 Modified Files:
         uipc_socket.c
 Log Message:
 In sosetopt():
 - Disallow < 1 values for SO_SNDBUF, SO_RCVBUF, SO_SNDLOWAT, and
   SO_RCVLOWAT; return EINVAL if the user attempts to set <= 0.
   Inspired by PR #3770, from Havard Eidnes <he@vader.runit.sintef.no>.
 - For SO_SNDLOWAT and SO_RCVLOWAT, don't let the low-water mark get
   set above the high-water mark.  Behavior is now consistent with
   BSD/OS: If such an attempt is made, silently truncate to the high-water
   value.
State-Changed-From-To: open->feedback 
State-Changed-By: fenner 
State-Changed-When: Sat Jul 5 11:55:17 PDT 1997 
State-Changed-Why:  
Could you verify that rev 1.27 of kern/uipc_socket.c fixes this problem? 
Responsible-Changed-From-To: freebsd-bugs->wollman 
Responsible-Changed-By: phk 
Responsible-Changed-When: Sat Apr 11 14:41:21 PDT 1998 
Responsible-Changed-Why:  
somebody has to do it :-) 
State-Changed-From-To: feedback->closed 
State-Changed-By: phk 
State-Changed-When: Sun Apr 12 03:51:45 PDT 1998 
State-Changed-Why:  
fixed in version 1.27 of uipc_socket.c 
>Unformatted:
