From nobody@FreeBSD.org  Fri Sep  1 15:34:58 2006
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C1F2816A4DA
	for <freebsd-gnats-submit@FreeBSD.org>; Fri,  1 Sep 2006 15:34:58 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 84BA243D45
	for <freebsd-gnats-submit@FreeBSD.org>; Fri,  1 Sep 2006 15:34:58 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id k81FYvAv045215
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 1 Sep 2006 15:34:57 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id k81FYv5n045214;
	Fri, 1 Sep 2006 15:34:57 GMT
	(envelope-from nobody)
Message-Id: <200609011534.k81FYv5n045214@www.freebsd.org>
Date: Fri, 1 Sep 2006 15:34:57 GMT
From: Vclav Haisman <v.haisman@sh.cvut.cz>
To: freebsd-gnats-submit@FreeBSD.org
Subject: malloc(M_WAITOK) of "g_bio", forcing M_NOWAIT with non-sleepable locks held
X-Send-Pr-Version: www-2.3

>Number:         102752
>Category:       kern
>Synopsis:       malloc(M_WAITOK) of "g_bio", forcing M_NOWAIT with non-sleepable locks held
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    rwatson
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Sep 01 15:40:20 GMT 2006
>Closed-Date:    Sun Aug 03 16:17:09 UTC 2008
>Last-Modified:  Sun Aug 03 16:17:09 UTC 2008
>Originator:     Vclav Haisman
>Release:        6.1
>Organization:
SU SH
>Environment:
FreeBSD logout.sh.cvut.cz 6.1-STABLE FreeBSD 6.1-STABLE #0: Thu Aug 10 00:33:03 CEST 2006     root@logout.sh.cvut.cz:/usr/obj/usr/src/sys/LOGOUT  i386
>Description:
+malloc(M_WAITOK) of "g_bio", forcing M_NOWAIT with the following non-sleepable locks held:
+exclusive sleep mutex inp (tcpinp) r = 0 (0xc50c5d38) locked @ /usr/src/sys/netinet/tcp_usrreq.c:1029
+KDB: stack backtrace:
+kdb_backtrace(c08eef84,e78be89c,1,c45752c0,c1035380) at kdb_backtrace+0x2f
+witness_warn(5,0,c0819ce0,c07fc905,c05eaca1) at witness_warn+0x1ac
+uma_zalloc_arg(c1035380,0,102,e78be8e4,c07562a1) at uma_zalloc_arg+0x3d
+g_alloc_bio(8,c0819777,c4575400,c45752c0,d367aac8) at g_alloc_bio+0x23
+swapgeom_strategy(d367aac8,c45752c0,c0819777,271) at swapgeom_strategy+0x3a
+swp_pager_strategy(d367aac8,0,c0819777,437,c0760ee7) at swp_pager_strategy+0x88
+swap_pager_getpages(c728e948,e78be9ec,1,0,e78be9b0) at swap_pager_getpages+0x382
+vm_fault(c4f9b128,806b000,1,0,c4c0b300) at vm_fault+0xb13
+trap_pfault(e78beaa8,0,806b2c0,c08a9c60,806b2c0) at trap_pfault+0xf4
+trap(c0800008,28,c4c00028,e78beb28,806b2c0) at trap+0x33e
+calltrap() at calltrap+0x5
+--- trap 0xc, eip = 0xc07b9b16, esp = 0xe78beae8, ebp = 0xe78beb08 ---
+generic_copyin(e78bec84,e78beb28,4,4,c50c5ca8) at generic_copyin+0x32
+tcp_ctloutput(c4d1dc84,e78bec84,0,c589b400,e78bec68) at tcp_ctloutput+0x182
+sosetopt(c4d1dc84,e78bec84,e78bec80,c08eef80,c4bd15a0) at sosetopt+0x38
+kern_setsockopt(c4c0b300,7,6,1,806b2c0) at kern_setsockopt+0xd6
+setsockopt(c4c0b300,e78bed04,14,28279000,5) at setsockopt+0x3e
+syscall(3b,2808003b,bfbf003b,1,806b2c0) at syscall+0x295
+Xint0x80_syscall() at Xint0x80_syscall+0x1f
+--- syscall (105, FreeBSD ELF32, setsockopt), eip = 0x2827959b, esp = 0xbf9fea3c, ebp = 0xbf9fea68 ---
+Sleeping on "swread" with the following non-sleepable locks held:
+exclusive sleep mutex inp (tcpinp) r = 0 (0xc50c5d38) locked @ /usr/src/sys/netinet/tcp_usrreq.c:1029
+KDB: stack backtrace:
+kdb_backtrace(c08eef84,e78be8e0,1,1,0) at kdb_backtrace+0x2f
+witness_warn(5,c08fe2a0,c0802eb1,c0819833,c08fe2a0) at witness_warn+0x1ac
+msleep(c1bb6fe8,c08fe2a0,40,c0819833,4e20) at msleep+0x58
+swap_pager_getpages(c728e948,e78be9ec,1,0,e78be9b0) at swap_pager_getpages+0x400
+vm_fault(c4f9b128,806b000,1,0,c4c0b300) at vm_fault+0xb13
+trap_pfault(e78beaa8,0,806b2c0,c08a9c60,806b2c0) at trap_pfault+0xf4
+trap(c0800008,28,c4c00028,e78beb28,806b2c0) at trap+0x33e
+calltrap() at calltrap+0x5
+--- trap 0xc, eip = 0xc07b9b16, esp = 0xe78beae8, ebp = 0xe78beb08 ---
+generic_copyin(e78bec84,e78beb28,4,4,c50c5ca8) at generic_copyin+0x32
+tcp_ctloutput(c4d1dc84,e78bec84,0,c589b400,e78bec68) at tcp_ctloutput+0x182
+sosetopt(c4d1dc84,e78bec84,e78bec80,c08eef80,c4bd15a0) at sosetopt+0x38
+kern_setsockopt(c4c0b300,7,6,1,806b2c0) at kern_setsockopt+0xd6
+setsockopt(c4c0b300,e78bed04,14,28279000,5) at setsockopt+0x3e
+syscall(3b,2808003b,bfbf003b,1,806b2c0) at syscall+0x295
+Xint0x80_syscall() at Xint0x80_syscall+0x1f
+--- syscall (105, FreeBSD ELF32, setsockopt), eip = 0x2827959b, esp = 0xbf9fea3c, ebp = 0xbf9fea68 ---

>How-To-Repeat:

>Fix:

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->rwatson 
Responsible-Changed-By: rwatson 
Responsible-Changed-When: Fri Sep 1 16:26:32 UTC 2006 
Responsible-Changed-Why:  
Grab ownership of this PR, since it is of interest to me.  The fix here 
is to drop the inpcb lock before going near copyin/copyout, but the 
tricky bit is that the connection may change state while the lock is 
dropped, so the path from the socket to the inpcb/tcpcb must be 
re-evaluated, and the timewait state checked for.  In HEAD this is 
easier since the inpcb can't go away, but in -STABLE we need to check 
that so_pcb is still non-NULL also, as the connection could have been 
reset. 

I'll work on a patch for this in the near future. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=102752 

From: Robert Watson <rwatson@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/102752: malloc(M_WAITOK) of "g_bio", forcing M_NOWAIT with
 non-sleepable locks held
Date: Fri, 10 Nov 2006 17:54:42 +0000 (GMT)

 The attached patch likely fixes this problem in -CURRENT; pending review, once 
 committed, I'll look at a back-port.
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge
 
 Index: tcp_usrreq.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/netinet/tcp_usrreq.c,v
 retrieving revision 1.141
 diff -u -2 -0 -r1.141 tcp_usrreq.c
 --- tcp_usrreq.c	17 Sep 2006 13:39:35 -0000	1.141
 +++ tcp_usrreq.c	10 Nov 2006 17:28:11 -0000
 @@ -1240,188 +1240,179 @@
   		ti->tcpi_snd_wscale = tp->snd_scale;
   		ti->tcpi_rcv_wscale = tp->rcv_scale;
   	}
   	ti->tcpi_snd_ssthresh = tp->snd_ssthresh;
   	ti->tcpi_snd_cwnd = tp->snd_cwnd;
 
   	/*
   	 * FreeBSD-specific extension fields for tcp_info.
   	 */
   	ti->tcpi_rcv_space = tp->rcv_wnd;
   	ti->tcpi_snd_wnd = tp->snd_wnd;
   	ti->tcpi_snd_bwnd = tp->snd_bwnd;
   }
 
   /*
    * The new sockopt interface makes it possible for us to block in the
    * copyin/out step (if we take a page fault).  Taking a page fault at
    * splnet() is probably a Bad Thing.  (Since sockets and pcbs both now
    * use TSM, there probably isn't any need for this function to run at
    * splnet() any more.  This needs more examination.)
 - *
 - * XXXRW: The locking here is wrong; we may take a page fault while holding
 - * the inpcb lock.
    */
   int
   tcp_ctloutput(so, sopt)
   	struct socket *so;
   	struct sockopt *sopt;
   {
   	int	error, opt, optval;
   	struct	inpcb *inp;
   	struct	tcpcb *tp;
   	struct	tcp_info ti;
 
   	error = 0;
   	inp = sotoinpcb(so);
   	KASSERT(inp != NULL, ("tcp_ctloutput: inp == NULL"));
   	INP_LOCK(inp);
   	if (sopt->sopt_level != IPPROTO_TCP) {
   		INP_UNLOCK(inp);
   #ifdef INET6
   		if (INP_CHECK_SOCKAF(so, AF_INET6))
   			error = ip6_ctloutput(so, sopt);
   		else
   #endif /* INET6 */
   		error = ip_ctloutput(so, sopt);
   		return (error);
   	}
 +	if (sopt->sopt_dir == SOPT_SET) {
 +		INP_UNLOCK(inp);
 +		error = sooptcopyin(sopt, &optval, sizeof optval,
 +				    sizeof optval);
 +		if (error)
 +			return (error);
 +		INP_LOCK(inp);
 +	}
   	if (inp->inp_vflag & (INP_TIMEWAIT | INP_DROPPED)) {
 -		error = ECONNRESET;
 -		goto out;
 +		INP_UNLOCK(inp);
 +		return (ECONNRESET);
   	}
   	tp = intotcpcb(inp);
 
   	switch (sopt->sopt_dir) {
   	case SOPT_SET:
   		switch (sopt->sopt_name) {
   #ifdef TCP_SIGNATURE
   		case TCP_MD5SIG:
 -			error = sooptcopyin(sopt, &optval, sizeof optval,
 -					    sizeof optval);
 -			if (error)
 -				break;
 -
   			if (optval > 0)
   				tp->t_flags |= TF_SIGNATURE;
   			else
   				tp->t_flags &= ~TF_SIGNATURE;
   			break;
   #endif /* TCP_SIGNATURE */
   		case TCP_NODELAY:
   		case TCP_NOOPT:
 -			error = sooptcopyin(sopt, &optval, sizeof optval,
 -					    sizeof optval);
 -			if (error)
 -				break;
 -
   			switch (sopt->sopt_name) {
   			case TCP_NODELAY:
   				opt = TF_NODELAY;
   				break;
   			case TCP_NOOPT:
   				opt = TF_NOOPT;
   				break;
   			default:
   				opt = 0; /* dead code to fool gcc */
   				break;
   			}
 
   			if (optval)
   				tp->t_flags |= opt;
   			else
   				tp->t_flags &= ~opt;
   			break;
 
   		case TCP_NOPUSH:
 -			error = sooptcopyin(sopt, &optval, sizeof optval,
 -					    sizeof optval);
 -			if (error)
 -				break;
 -
   			if (optval)
   				tp->t_flags |= TF_NOPUSH;
   			else {
   				tp->t_flags &= ~TF_NOPUSH;
   				error = tcp_output(tp);
   			}
   			break;
 
   		case TCP_MAXSEG:
 -			error = sooptcopyin(sopt, &optval, sizeof optval,
 -					    sizeof optval);
 -			if (error)
 -				break;
 -
   			if (optval > 0 && optval <= tp->t_maxseg &&
   			    optval + 40 >= tcp_minmss)
   				tp->t_maxseg = optval;
   			else
   				error = EINVAL;
   			break;
 
   		case TCP_INFO:
   			error = EINVAL;
   			break;
 
   		default:
   			error = ENOPROTOOPT;
   			break;
   		}
 +		INP_UNLOCK(inp);
   		break;
 
   	case SOPT_GET:
   		switch (sopt->sopt_name) {
   #ifdef TCP_SIGNATURE
   		case TCP_MD5SIG:
   			optval = (tp->t_flags & TF_SIGNATURE) ? 1 : 0;
 +			INP_UNLOCK(inp);
   			error = sooptcopyout(sopt, &optval, sizeof optval);
   			break;
   #endif
   		case TCP_NODELAY:
   			optval = tp->t_flags & TF_NODELAY;
 +			INP_UNLOCK(inp);
   			error = sooptcopyout(sopt, &optval, sizeof optval);
   			break;
   		case TCP_MAXSEG:
   			optval = tp->t_maxseg;
 +			INP_UNLOCK(inp);
   			error = sooptcopyout(sopt, &optval, sizeof optval);
   			break;
   		case TCP_NOOPT:
   			optval = tp->t_flags & TF_NOOPT;
 +			INP_UNLOCK(inp);
   			error = sooptcopyout(sopt, &optval, sizeof optval);
   			break;
   		case TCP_NOPUSH:
   			optval = tp->t_flags & TF_NOPUSH;
 +			INP_UNLOCK(inp);
   			error = sooptcopyout(sopt, &optval, sizeof optval);
   			break;
   		case TCP_INFO:
   			tcp_fill_info(tp, &ti);
 +			INP_UNLOCK(inp);
   			error = sooptcopyout(sopt, &ti, sizeof ti);
   			break;
   		default:
 +			INP_UNLOCK(inp);
   			error = ENOPROTOOPT;
   			break;
   		}
   		break;
   	}
 -out:
 -	INP_UNLOCK(inp);
   	return (error);
   }
 
   /*
    * tcp_sendspace and tcp_recvspace are the default send and receive window
    * sizes, respectively.  These are obsolescent (this information should
    * be set by the route).
    */
   u_long	tcp_sendspace = 1024*32;
   SYSCTL_ULONG(_net_inet_tcp, TCPCTL_SENDSPACE, sendspace, CTLFLAG_RW,
       &tcp_sendspace , 0, "Maximum outgoing TCP datagram size");
   u_long	tcp_recvspace = 1024*64;
   SYSCTL_ULONG(_net_inet_tcp, TCPCTL_RECVSPACE, recvspace, CTLFLAG_RW,
       &tcp_recvspace , 0, "Maximum incoming TCP datagram size");
 
   /*
    * Attach TCP protocol to socket, allocating
    * internet protocol control block, tcp control block,
    * bufer space, and entering LISTEN state if to accept connections.
    */
State-Changed-From-To: open->analyzed 
State-Changed-By: rwatson 
State-Changed-When: Sun Jan 13 19:11:20 UTC 2008 
State-Changed-Why:  
Moved to analyzed state.  Apparently I never applied this patch, but will 
work on it shortly and look to merge it to 7.x and possibly 6.x. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=102752 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/102752: commit references a PR
Date: Fri, 18 Jan 2008 12:19:57 +0000 (UTC)

 rwatson     2008-01-18 12:19:51 UTC
 
   FreeBSD src repository
 
   Modified files:
     sys/netinet          tcp_usrreq.c 
   Log:
   In tcp_ctloutput(), don't hold the inpcb lock over sooptcopyin(), rather,
   drop the lock and then re-acquire it, revalidating TCP connection state
   assumptions when we do so.  This avoids a potential lock order reversal
   (and potential deadlock, although none have been reported) due to the
   inpcb lock being held over a page fault.
   
   MFC after:      1 week
   PR:             102752
   Reviewed by:    bz
   Reported by:    Václav Haisman <v dot haisman at sh dot cvut dot cz>
   
   Revision  Changes    Path
   1.166     +55 -25    src/sys/netinet/tcp_usrreq.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: analyzed->patched 
State-Changed-By: rwatson 
State-Changed-When: Fri Jan 18 12:42:02 UTC 2008 
State-Changed-Why:  
Fix committed to HEAD, will MFC in a couple of weeks. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=102752 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/102752: commit references a PR
Date: Sat,  1 Mar 2008 11:50:10 +0000 (UTC)

 rwatson     2008-03-01 11:50:00 UTC
 
   FreeBSD src repository
 
   Modified files:        (Branch: RELENG_7)
     sys/netinet          tcp_usrreq.c 
   Log:
   Merge tcp_usrreq.c:1.166 from HEAD to RELENG_7:
   
     In tcp_ctloutput(), don't hold the inpcb lock over sooptcopyin(), rather,
     drop the lock and then re-acquire it, revalidating TCP connection state
     assumptions when we do so.  This avoids a potential lock order reversal
     (and potential deadlock, although none have been reported) due to the
     inpcb lock being held over a page fault.
   
     PR:             102752
     Reviewed by:    bz
     Reported by:    Václav Haisman <v dot haisman at sh dot cvut dot cz>
   
   Revision   Changes    Path
   1.163.2.3  +55 -25    src/sys/netinet/tcp_usrreq.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: patched->closed 
State-Changed-By: rwatson 
State-Changed-When: Sun Aug 3 16:16:33 UTC 2008 
State-Changed-Why:  
The fix has now been MFC'd; close the PR.  If you experience further 
problems along these lines, please let me know.  Thanks for the report 

http://www.freebsd.org/cgi/query-pr.cgi?pr=102752 
>Unformatted:
