From nobody@FreeBSD.org  Thu Feb 18 14:23:13 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6C1EB1065670
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 18 Feb 2010 14:23:13 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 5005E8FC13
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 18 Feb 2010 14:23:13 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o1IEND12077445
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 18 Feb 2010 14:23:13 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id o1IENDMJ077444;
	Thu, 18 Feb 2010 14:23:13 GMT
	(envelope-from nobody)
Message-Id: <201002181423.o1IENDMJ077444@www.freebsd.org>
Date: Thu, 18 Feb 2010 14:23:13 GMT
From: Mikolaj Golub <to.my.trociny@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: race on unix socket close
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         144061
>Category:       kern
>Synopsis:       [socket] race on unix socket close
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    rwatson
>State:          patched
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Feb 18 14:30:02 UTC 2010
>Closed-Date:    
>Last-Modified:  Tue Jun  1 14:10:01 UTC 2010
>Originator:     Mikolaj Golub
>Release:        8.0-STABLE, 7.2-RELEASE-p6
>Organization:
>Environment:
FreeBSD zhuzha.ua1 8.0-STABLE FreeBSD 8.0-STABLE #8: Thu Feb 18 15:48:46 EET 2010     root@zhuzha.ua1:/usr/obj/usr/src/sys/GENERIC  i386
>Description:
This issue was dissussed in freebsd-hacker@, the subject "unix socket: race on close?"

http://lists.freebsd.org/pipermail/freebsd-hackers/2010-February/030741.html

Below is a simple test code with unix sockets: the client does
connect()/close() in loop and the server -- accept()/close().

--------------------------------------------------------------------------------

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <strings.h>
#include <string.h>
#include <unistd.h>
#include <sys/select.h>
#include <err.h>

#define UNIXSTR_PATH "/tmp/mytest.socket"
#define USLEEP  100

int main(int argc, char **argv)
{
	int			listenfd, connfd, pid;
	struct sockaddr_un	servaddr;
	
	pid = fork();
	if (-1 == pid)
		errx(1, "fork(): %d", errno);

	if (0 != pid) { /* parent */

		if ((listenfd = socket(AF_LOCAL, SOCK_STREAM, 0)) < 0)
			errx(1, "parent: socket error: %d", errno);

		unlink(UNIXSTR_PATH);
		bzero(&servaddr, sizeof(servaddr));
		servaddr.sun_family = AF_LOCAL;
		strcpy(servaddr.sun_path, UNIXSTR_PATH);

		if (bind(listenfd, (struct sockaddr *) &servaddr, sizeof(servaddr)) < 0)
			errx(1, "parent: bind error: %d", errno);

		if (listen(listenfd, 1024) < 0)
			errx(1, "parent: listen error: %d", errno);
		
		for ( ; ; ) {
			if ((connfd = accept(listenfd, (struct sockaddr *) NULL, NULL)) < 0)
				errx(1, "parent: accept error: %d", errno);

			//usleep(USLEEP / 2); // (I) uncomment this or (II) below to avoid the race
			
	        	if (close(connfd) < 0)
				errx(1, "parent: close error: %d", errno);
		}
		
	} else { /* child */

		sleep(1); /* give the parent some time to create the socket */

		for ( ; ; ) {

			if ((connfd = socket(AF_LOCAL, SOCK_STREAM, 0)) < 0)
				errx(1, "child: socket error: %d", errno);

			bzero(&servaddr, sizeof(servaddr));
			servaddr.sun_family = AF_LOCAL;
			strcpy(servaddr.sun_path, UNIXSTR_PATH);

			if (connect(connfd, (struct sockaddr *) &servaddr, sizeof(servaddr)) < 0)
				errx(1, "child: connect error %d", errno);
			
			// usleep(USLEEP); // (II) uncomment this or (I) above to avoid the race

			if (close(connfd) != 0) 
				errx(1, "child: close error: %d", errno);

			usleep(USLEEP);
		}
	}

	return 0;
}

--------------------------------------------------------------------------------

Sometimes close() fails with 'Socket is not connected' error:

a.out: parent: close error: 57

or

a.out: child: close error: 57

It looks like race in close(). Looking at uipc_socket.c:soclose():

int
soclose(struct socket *so)
{
        int error = 0;

        KASSERT(!(so->so_state & SS_NOFDREF), ("soclose: SS_NOFDREF on enter"));

        CURVNET_SET(so->so_vnet);
        funsetown(&so->so_sigio);
        if (so->so_state & SS_ISCONNECTED) {
                if ((so->so_state & SS_ISDISCONNECTING) == 0) {
                        error = sodisconnect(so);
                        if (error)
                                goto drop;
                }

so_state is checked without locking and then sodisconnect() is called, which
closes both sockets of the connection. So if the close() is called for both ends simultaneously it is possible that sodisconnect() will be called for both ends and for one ENOTCONN will be returned.

I made the following modifications (suggested by Robert Watson) to the code to have some confirmation:

1) just add logging the error when sodisconnect() returns error:

--- uipc_socket.c.orig	2010-02-18 14:25:25.000000000 +0200
+++ uipc_socket.c	2010-02-18 14:55:26.000000000 +0200
@@ -120,6 +120,7 @@ __FBSDID("$FreeBSD: src/sys/kern/uipc_so
 #include <sys/proc.h>
 #include <sys/protosw.h>
 #include <sys/socket.h>
+#include <sys/syslog.h>
 #include <sys/socketvar.h>
 #include <sys/resourcevar.h>
 #include <net/route.h>
@@ -136,6 +137,7 @@ __FBSDID("$FreeBSD: src/sys/kern/uipc_so
 
 #include <vm/uma.h>
 
+
 #ifdef COMPAT_IA32
 #include <sys/mount.h>
 #include <sys/sysent.h>
@@ -657,7 +659,7 @@ soclose(struct socket *so)
 		if ((so->so_state & SS_ISDISCONNECTING) == 0) {
 			error = sodisconnect(so);
 			if (error)
-				goto drop;
+				log(LOG_INFO, "soclose: sodisconnect error: %d\n", error);
 		}
 		if (so->so_options & SO_LINGER) {
 			if ((so->so_state & SS_ISDISCONNECTING) &&


Then on every error exit of the test application, like this

a.out: parent: close error: 57

I have the message log:

Feb 18 15:35:32 zhuzha kernel: soclose: sodisconnect error: 57

2) add logging the error when sodisconnect() returns error and ignore the error:

--- uipc_socket.c.orig	2010-02-18 14:25:25.000000000 +0200
+++ uipc_socket.c	2010-02-18 15:41:07.000000000 +0200
@@ -120,6 +120,7 @@ __FBSDID("$FreeBSD: src/sys/kern/uipc_so
 #include <sys/proc.h>
 #include <sys/protosw.h>
 #include <sys/socket.h>
+#include <sys/syslog.h>
 #include <sys/socketvar.h>
 #include <sys/resourcevar.h>
 #include <net/route.h>
@@ -136,6 +137,7 @@ __FBSDID("$FreeBSD: src/sys/kern/uipc_so
 
 #include <vm/uma.h>
 
+
 #ifdef COMPAT_IA32
 #include <sys/mount.h>
 #include <sys/sysent.h>
@@ -656,8 +658,11 @@ soclose(struct socket *so)
 	if (so->so_state & SS_ISCONNECTED) {
 		if ((so->so_state & SS_ISDISCONNECTING) == 0) {
 			error = sodisconnect(so);
-			if (error)
-				goto drop;
+			if (error) {
+				log(LOG_INFO, "soclose: sodisconnect error: %d\n", error);
+				if (error == ENOTCONN)
+					error = 0;
+			}
 		}
 		if (so->so_options & SO_LINGER) {
 			if ((so->so_state & SS_ISDISCONNECTING) &&

After this the test application does not exits and I see in the message log these errors:

Feb 18 16:02:37 zhuzha kernel: soclose: sodisconnect error: 57
Feb 18 16:03:31 zhuzha kernel: soclose: sodisconnect error: 57
Feb 18 16:05:49 zhuzha last message repeated 4 times
Feb 18 16:15:50 zhuzha last message repeated 13 times
>How-To-Repeat:

>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->rwatson 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Thu Feb 18 14:45:06 UTC 2010 
Responsible-Changed-Why:  
IIRC rwatson expressed an interest in this one. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=144061 
State-Changed-From-To: open->analyzed 
State-Changed-By: rwatson 
State-Changed-When: Mon Feb 22 16:25:49 UTC 2010 
State-Changed-Why:  
This diagnosis appears accurate, and we can reproduce the problem in the 
Netperf cluster.  Ignoring the return value of sodisconnect() fixes the 
problem, but we should think at a larger level about what the best fix is; 
completely fixing so_state locking would come at a significant complexity 
(and likely error rate) cost, but perhaps there's something a bit more 
substantive we can do here than igoring this ENOTCONN. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=144061 

From: Matt Reimer <mreimer@vpop.net>
To: bug-followup@FreeBSD.org,
 to.my.trociny@gmail.com
Cc:  
Subject: Re: kern/144061: [socket] race on unix socket close
Date: Fri, 7 May 2010 14:31:06 -0700

 We just hit this same bug, and Mikolaj's patch fixed it for us.
 
 Is the patch suitable to commit, at least temporarily? If not, what =
 would be the proper fix?
 
 Matt=

From: "Robert N. M. Watson" <rwatson@FreeBSD.org>
To: Matt Reimer <mreimer@vpop.net>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/144061: [socket] race on unix socket close
Date: Sat, 8 May 2010 20:21:09 +0100

 On 7 May 2010, at 22:40, Matt Reimer wrote:
 
 > The following reply was made to PR kern/144061; it has been noted by =
 GNATS.
 >=20
 > From: Matt Reimer <mreimer@vpop.net>
 > To: bug-followup@FreeBSD.org,
 > to.my.trociny@gmail.com
 > Cc: =20
 > Subject: Re: kern/144061: [socket] race on unix socket close
 > Date: Fri, 7 May 2010 14:31:06 -0700
 >=20
 > We just hit this same bug, and Mikolaj's patch fixed it for us.
 >=20
 > Is the patch suitable to commit, at least temporarily? If not, what =3D
 > would be the proper fix?
 
 Hi Matt--
 
 Yes, committing a version that simply ignores ENOTCONN there is probably =
 a good interim workaround.
 
 Robert=

From: Nikolay Denev <ndenev@gmail.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/144061: [socket] race on unix socket close
Date: Wed, 26 May 2010 10:38:16 +0300

 Hi,
 
 I used a modified version of the patch :
 
 --- uipc_socket.c	2010-04-07 04:24:41.000000000 +0200
 +++ uipc_socket.c.patched	2010-05-26 09:34:15.000000000 +0200
 @@ -656,6 +656,10 @@
  	if (so->so_state & SS_ISCONNECTED) {
  		if ((so->so_state & SS_ISDISCONNECTING) =3D=3D 0) {
  			error =3D sodisconnect(so);
 +			if (error =3D=3D ENOTCONN) {
 +				printf("soclose : sodisconnect ignoring =
 ENOTCONN\n");
 +				error =3D 0;
 +			}
  			if (error)
  				goto drop;
  		}
 
 This seems to fixed a problem with nginx + rubygem-passenger's =
 HelperServer crashing.
 
 Can we hope for a temporary fix to be committed before 8.1?
 
 Regards,
 Niki Denev
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/144061: commit references a PR
Date: Wed, 26 May 2010 10:46:17 +0000 (UTC)

 Author: rwatson
 Date: Wed May 26 10:46:03 2010
 New Revision: 208562
 URL: http://svn.freebsd.org/changeset/base/208562
 
 Log:
   Add unix_close_race, a regresion test to catch ENOTCONN being returned
   improperly from one of two instances of close(2) being called
   simultaneously on both ends of a connected UNIX domain socket.  The test
   tool is slightly tweaked to improve failure modes, and while often does
   trigger the problem, doesn't do so consistently due to the nature of the
   race.
   
   PR:		kern/144061
   Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
   MFC after:	3 days
 
 Added:
   head/tools/regression/sockets/unix_close_race/
   head/tools/regression/sockets/unix_close_race/Makefile   (contents, props changed)
   head/tools/regression/sockets/unix_close_race/unix_close_race.c   (contents, props changed)
 
 Added: head/tools/regression/sockets/unix_close_race/Makefile
 ==============================================================================
 --- /dev/null	00:00:00 1970	(empty, because file is newly added)
 +++ head/tools/regression/sockets/unix_close_race/Makefile	Wed May 26 10:46:03 2010	(r208562)
 @@ -0,0 +1,7 @@
 +# $FreeBSD$
 +
 +PROG=	unix_close_race
 +NO_MAN=
 +WARNS?=	3
 +
 +.include <bsd.prog.mk>
 
 Added: head/tools/regression/sockets/unix_close_race/unix_close_race.c
 ==============================================================================
 --- /dev/null	00:00:00 1970	(empty, because file is newly added)
 +++ head/tools/regression/sockets/unix_close_race/unix_close_race.c	Wed May 26 10:46:03 2010	(r208562)
 @@ -0,0 +1,132 @@
 +/*-
 + * Copyright (c) 2010 Mikolaj Golub
 + * All rights reserved.
 + *
 + * Redistribution and use in source and binary forms, with or without
 + * modification, are permitted provided that the following conditions
 + * are met:
 + * 1. Redistributions of source code must retain the above copyright
 + *    notice, this list of conditions and the following disclaimer.
 + * 2. Redistributions in binary form must reproduce the above copyright
 + *    notice, this list of conditions and the following disclaimer in the
 + *    documentation and/or other materials provided with the distribution.
 + *
 + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 + * SUCH DAMAGE.
 + *
 + * $FreeBSD$
 + */
 +
 +/*
 + * This regression test attempts to trigger a race that occurs when both
 + * endpoints of a connected UNIX domain socket are closed at once.  The two
 + * close paths may run concurrently leading to a call to sodisconnect() on an
 + * already-closed socket in kernel.  Before it was fixed, this might lead to
 + * ENOTCONN being returned improperly from close().
 + *
 + * This race is fairly timing-dependent, so it effectively requires SMP, and
 + * may not even trigger then.
 + */
 +
 +#include <sys/types.h>
 +#include <sys/select.h>
 +#include <sys/socket.h>
 +#include <sys/sysctl.h>
 +#include <sys/un.h>
 +#include <netinet/in.h>
 +#include <arpa/inet.h>
 +#include <errno.h>
 +#include <fcntl.h>
 +#include <signal.h>
 +#include <stdlib.h>
 +#include <stdio.h>
 +#include <strings.h>
 +#include <string.h>
 +#include <unistd.h>
 +#include <err.h>
 +
 +#define	UNIXSTR_PATH	"/tmp/mytest.socket"
 +#define	USLEEP	100
 +#define	LOOPS	100000
 +
 +int
 +main(int argc, char **argv)
 +{
 +	struct sockaddr_un servaddr;
 +	int listenfd, connfd, pid;
 +	u_int counter, ncpus;
 +	size_t len;
 +
 +	len = sizeof(ncpus);
 +	if (sysctlbyname("kern.smp.cpus", &ncpus, &len, NULL, 0) < 0)
 +		err(1, "kern.smp.cpus");
 +	if (len != sizeof(ncpus))
 +		errx(1, "kern.smp.cpus: invalid length");
 +	if (ncpus < 2)
 +		warnx("SMP not present, test may be unable to trigger race");
 +
 +	/*
 +	 * Create a UNIX domain socket that the parent will repeatedly
 +	 * accept() from, and that the child will repeatedly connect() to.
 +	 */
 +	if ((listenfd = socket(AF_LOCAL, SOCK_STREAM, 0)) < 0)
 +		err(1, "parent: socket error");
 +	(void)unlink(UNIXSTR_PATH);
 +	bzero(&servaddr, sizeof(servaddr));
 +	servaddr.sun_family = AF_LOCAL;
 +	strcpy(servaddr.sun_path, UNIXSTR_PATH);
 +	if (bind(listenfd, (struct sockaddr *) &servaddr,
 +	    sizeof(servaddr)) < 0)
 +		err(1, "parent: bind error");
 +	if (listen(listenfd, 1024) < 0)
 +		err(1, "parent: listen error");
 +
 +	pid = fork();
 +	if (pid == -1)
 +		err(1, "fork()");
 +	if (pid != 0) {
 +		/*
 +		 * In the parent, repeatedly connect and disconnect from the
 +		 * socket, attempting to induce the race.
 +		 */
 +		close(listenfd);
 +		sleep(1);
 +		bzero(&servaddr, sizeof(servaddr));
 +		servaddr.sun_family = AF_LOCAL;
 +		strcpy(servaddr.sun_path, UNIXSTR_PATH);
 +		for (counter = 0; counter < LOOPS; counter++) {
 +			if ((connfd = socket(AF_LOCAL, SOCK_STREAM, 0)) < 0)
 +				err(1, "child: socket error");
 +			if (connect(connfd, (struct sockaddr *)&servaddr,
 +			    sizeof(servaddr)) < 0)
 +				err(1, "child: connect error");
 +			if (close(connfd) < 0)
 +				err(1, "child: close error");
 +			usleep(USLEEP);
 +		}
 +		(void)kill(pid, SIGTERM);
 +	} else {
 +		/*
 +		 * In the child, loop accepting and closing.  We may pick up
 +		 * the race here so report errors from close().
 +		 */
 +		for ( ; ; ) {
 +			if ((connfd = accept(listenfd,
 +			    (struct sockaddr *)NULL, NULL)) < 0)
 +				err(1, "parent: accept error");
 +			if (close(connfd) < 0)
 +				err(1, "parent: close error");
 +		}
 +	}
 +	printf("OK\n");
 +	exit(0);
 +}
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: Robert Watson <rwatson@FreeBSD.org>
To: Matt Reimer <mreimer@vpop.net>, Nikolay Denev <ndenev@gmail.com>, 
    Mikolaj Golub <to.my.trociny@gmail.com>
Cc: bug-followup@FreeBSD.org
Subject: kern/144061: commit candidate
Date: Wed, 26 May 2010 12:02:14 +0100 (BST)

 Dear all:
 
 Thanks for your reports of the ENOTCONN socket close() race.  As those of you 
 following the PR have likely by now seen, I've committed a version of 
 Mikolaj's regression test to the 9-CURRENT tree.  I'd now like to commit the 
 attached useful workaround, which is similar to the ones proposed by several 
 of you except that it skips linger-handling in the event of ENOTCONN, which 
 might otherwise sleep indefinitely (linger is rarely used with UNIX domain 
 sockets but could happen).
 
 Could I ask you to give it a spin and make sure it resolves the problem for 
 you, as well as not introducing new problems?  If all goes well, I'll commit 
 this in the next day or so.  We should then be able to merge for 8.1.  Thanks 
 for your patience...
 
 (I'm currently unable to reproduce the problem on two boxes, but was 
 previously, so it seems fairly timing-sensitive).
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge
 
 Index: uipc_socket.c
 ===================================================================
 --- uipc_socket.c	(revision 208558)
 +++ uipc_socket.c	(working copy)
 @@ -665,8 +665,11 @@
   	if (so->so_state & SS_ISCONNECTED) {
   		if ((so->so_state & SS_ISDISCONNECTING) == 0) {
   			error = sodisconnect(so);
 -			if (error)
 +			if (error) {
 +				if (error == ENOTCONN)
 +					error = 0;
   				goto drop;
 +			}
   		}
   		if (so->so_options & SO_LINGER) {
   			if ((so->so_state & SS_ISDISCONNECTING) &&

From: Mikolaj Golub <to.my.trociny@gmail.com>
To: Robert Watson <rwatson@FreeBSD.org>
Cc: Matt Reimer <mreimer@vpop.net>,  Nikolay Denev <ndenev@gmail.com>,  bug-followup@FreeBSD.org
Subject: Re: kern/144061: commit candidate
Date: Wed, 26 May 2010 16:47:29 +0300

 On Wed, 26 May 2010 12:02:14 +0100 (BST) Robert Watson wrote:
 
  RW> Could I ask you to give it a spin and make sure it resolves the
  RW> problem for you, as well as not introducing new problems?  If all goes
  RW> well, I'll commit this in the next day or so.  We should then be able
  RW> to merge for 8.1.  Thanks for your patience...
 
 I can't modify the kernel on production servers (where the problem with "real"
 software was observed), but I applied your patch to my 8-STABLE box and ran
 tests (test program from this patch and some other tests for our applications
 that use unix sockets) for more an hour -- no issues have been detected.
 
 I keep running the patched kernel and will let you know immediately if any
 issue that might be related are detected (so no news from me will mean that
 everything is ok).
 
 Thanks,
 
 -- 
 Mikolaj Golub

From: Nikolay Denev <ndenev@gmail.com>
To: Robert Watson <rwatson@FreeBSD.org>
Cc: Matt Reimer <mreimer@vpop.net>,
 Mikolaj Golub <to.my.trociny@gmail.com>,
 bug-followup@FreeBSD.org
Subject: Re: kern/144061: commit candidate
Date: Thu, 27 May 2010 14:55:20 +0300

 On May 26, 2010, at 2:02 PM, Robert Watson wrote:
 
 >=20
 > Dear all:
 >=20
 > Thanks for your reports of the ENOTCONN socket close() race.  As those =
 of you following the PR have likely by now seen, I've committed a =
 version of Mikolaj's regression test to the 9-CURRENT tree.  I'd now =
 like to commit the attached useful workaround, which is similar to the =
 ones proposed by several of you except that it skips linger-handling in =
 the event of ENOTCONN, which might otherwise sleep indefinitely (linger =
 is rarely used with UNIX domain sockets but could happen).
 >=20
 > Could I ask you to give it a spin and make sure it resolves the =
 problem for you, as well as not introducing new problems?  If all goes =
 well, I'll commit this in the next day or so.  We should then be able to =
 merge for 8.1.  Thanks for your patience...
 >=20
 > (I'm currently unable to reproduce the problem on two boxes, but was =
 previously, so it seems fairly timing-sensitive).
 >=20
 > Robert N M Watson
 > Computer Laboratory
 > University of Cambridge
 >=20
 
 
 24 hours after the patch everyting seems ok. No crashes or coredumps of =
 the nginx passenger HelperServer.
 
 Thanks
 
 --
 Niki Denev
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/144061: commit references a PR
Date: Thu, 27 May 2010 15:27:40 +0000 (UTC)

 Author: rwatson
 Date: Thu May 27 15:27:31 2010
 New Revision: 208601
 URL: http://svn.freebsd.org/changeset/base/208601
 
 Log:
   When close() is called on a connected socket pair, SO_ISCONNECTED might be
   set but be cleared before the call to sodisconnect().  In this case,
   ENOTCONN is returned: suppress this error rather than returning it to
   userspace so that close() doesn't report an error improperly.
   
   PR:		kern/144061
   Reported by:	Matt Reimer <mreimer at vpop.net>,
   		Nikolay Denev <ndenev at gmail.com>,
   		Mikolaj Golub <to.my.trociny at gmail.com>
   MFC after:	3 days
 
 Modified:
   head/sys/kern/uipc_socket.c
 
 Modified: head/sys/kern/uipc_socket.c
 ==============================================================================
 --- head/sys/kern/uipc_socket.c	Thu May 27 15:17:06 2010	(r208600)
 +++ head/sys/kern/uipc_socket.c	Thu May 27 15:27:31 2010	(r208601)
 @@ -665,8 +665,11 @@ soclose(struct socket *so)
  	if (so->so_state & SS_ISCONNECTED) {
  		if ((so->so_state & SS_ISDISCONNECTING) == 0) {
  			error = sodisconnect(so);
 -			if (error)
 +			if (error) {
 +				if (error == ENOTCONN)
 +					error = 0;
  				goto drop;
 +			}
  		}
  		if (so->so_options & SO_LINGER) {
  			if ((so->so_state & SS_ISDISCONNECTING) &&
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: analyzed->patched 
State-Changed-By: rwatson 
State-Changed-When: Thu May 27 15:30:58 UTC 2010 
State-Changed-Why:  
Fix committed to head as r208601; will MFC to 8-STABLE within a few days. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=144061 

From: Robert Watson <rwatson@FreeBSD.org>
To: Nikolay Denev <ndenev@gmail.com>
Cc: Matt Reimer <mreimer@vpop.net>, Mikolaj Golub <to.my.trociny@gmail.com>, 
    bug-followup@FreeBSD.org
Subject: Re: kern/144061: commit candidate
Date: Thu, 27 May 2010 16:36:17 +0100 (BST)

 On Thu, 27 May 2010, Nikolay Denev wrote:
 
 > 24 hours after the patch everyting seems ok. No crashes or coredumps of the 
 > nginx passenger HelperServer.
 
 Thanks for the testing -- I'v e now committed this to svn as r208601, and will 
 merge to 8.x within a few days (subject to RE approval).
 
 Robert

From: Philip Murray <pmurray@nevada.net.nz>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/144061: [socket] race on unix socket close
Date: Fri, 28 May 2010 10:31:39 +1200

 --Apple-Mail-1-339556331
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain;
 	charset=us-ascii
 
 Hi Robert,
 
 Thank you for committing this fix, really appreciate it. Seems to solve =
 most of my Nginx+Passenger problems.=20
 
 I am curious however, if this PR/Patch has any effect on this old PR?
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=3D79138
 
 It may not be, but it does seem to be a case of ENOTCONN leaking back to =
 userspace programs.=
 
 --Apple-Mail-1-339556331
 Content-Transfer-Encoding: 7bit
 Content-Type: text/html;
 	charset=us-ascii
 
 <html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Robert,<div><br></div><div>Thank you for committing this fix, really appreciate it. Seems to solve most of my Nginx+Passenger problems.&nbsp;</div><div><br></div><div>I am curious however, if this PR/Patch has any effect on this old PR?</div><div><font class="Apple-style-span" face="monospace"><span class="Apple-style-span" style="white-space: pre;"><font class="Apple-style-span" fac e="Helvetica"><span class="Apple-style-span" style="white-space: normal;"><br></span></font></span></font></div><div><a href="http://www.freebsd.org/cgi/query-pr.cgi?pr=79138">http://www.freebsd.org/cgi/query-pr.cgi?pr=79138</a></div><div><br></div><div>It may not be, but it does seem to be a case of ENOTCONN leaking back to userspace programs.</div></body></html>
 --Apple-Mail-1-339556331--

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/144061: commit references a PR
Date: Tue,  1 Jun 2010 14:00:14 +0000 (UTC)

 Author: rwatson
 Date: Tue Jun  1 13:59:48 2010
 New Revision: 208692
 URL: http://svn.freebsd.org/changeset/base/208692
 
 Log:
   Merge r208601 from head to stable/8:
   
     When close() is called on a connected socket pair, SO_ISCONNECTED might be
     set but be cleared before the call to sodisconnect().  In this case,
     ENOTCONN is returned: suppress this error rather than returning it to
     userspace so that close() doesn't report an error improperly.
   
     PR:		kern/144061
     Reported by:	Matt Reimer <mreimer at vpop.net>,
   		Nikolay Denev <ndenev at gmail.com>,
   		Mikolaj Golub <to.my.trociny at gmail.com>
   
   Approved by:	re (kib)
 
 Modified:
   stable/8/sys/kern/uipc_socket.c
 Directory Properties:
   stable/8/sys/   (props changed)
   stable/8/sys/amd64/include/xen/   (props changed)
   stable/8/sys/cddl/contrib/opensolaris/   (props changed)
   stable/8/sys/contrib/dev/acpica/   (props changed)
   stable/8/sys/contrib/pf/   (props changed)
   stable/8/sys/dev/xen/xenpci/   (props changed)
   stable/8/sys/geom/sched/   (props changed)
 
 Modified: stable/8/sys/kern/uipc_socket.c
 ==============================================================================
 --- stable/8/sys/kern/uipc_socket.c	Tue Jun  1 13:57:58 2010	(r208691)
 +++ stable/8/sys/kern/uipc_socket.c	Tue Jun  1 13:59:48 2010	(r208692)
 @@ -656,8 +656,11 @@ soclose(struct socket *so)
  	if (so->so_state & SS_ISCONNECTED) {
  		if ((so->so_state & SS_ISDISCONNECTING) == 0) {
  			error = sodisconnect(so);
 -			if (error)
 +			if (error) {
 +				if (error == ENOTCONN)
 +					error = 0;
  				goto drop;
 +			}
  		}
  		if (so->so_options & SO_LINGER) {
  			if ((so->so_state & SS_ISDISCONNECTING) &&
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
>Unformatted:
