From nobody@FreeBSD.org  Sat Jan 10 17:39:07 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C6D141065672
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 10 Jan 2009 17:39:07 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id B3DF08FC2C
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 10 Jan 2009 17:39:07 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n0AHd7Zw049482
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 10 Jan 2009 17:39:07 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id n0AHd7t3049481;
	Sat, 10 Jan 2009 17:39:07 GMT
	(envelope-from nobody)
Message-Id: <200901101739.n0AHd7t3049481@www.freebsd.org>
Date: Sat, 10 Jan 2009 17:39:07 GMT
From: Ivan Shcheklein <shcheklein@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [socket] accept() prematurely allocates an inheritable descriptor
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         130348
>Category:       kern
>Synopsis:       [socket] accept() prematurely allocates an inheritable descriptor [regression]
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    rwatson
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jan 10 17:40:01 UTC 2009
>Closed-Date:    Mon Mar 02 11:19:31 UTC 2009
>Last-Modified:  Mon Mar 02 11:19:31 UTC 2009
>Originator:     Ivan Shcheklein
>Release:        FreeBSD 7.1
>Organization:
ISP RAS
>Environment:
FreeBSD freebsd2.localdomain 7.1-RELEASE FreeBSD 7.1-RELEASE #0: Fri Jan  9 23:36:55 MSK 2009     modis@freebsd2.localdomain:/usr/obj/usr/src/sys/GENERIC  i386

>Description:
kern_accept() allocates a file descriptor before it is blocked until a connection is present. This descriptor could be unexpectedly inherited if the process calls exec() in a different thread.

It means that the child process may obtain a connected descriptor it doesn't know anything about. Moreover, parent process also doesn't expect that there are references on this descriptor in the system.

Seems this behaviour appeared first in 1.186 revision:
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/uipc_syscalls.c#rev1.186:

 "Reorganize the optimistic concurrency behavior in accept1() to
  always allocate a file descriptor with falloc() so that if we do
  find a socket, we don't have to encounter the "Oh, there wasn't
  a socket" race that can occur if falloc() sleeps in the current
  code, which broke inbound accept() ordering, not to mention
  requiring backing out socket state changes in a way that raced
  with the protocol level.  We may want to add a lockless read of
  the queue state if polling of empty queues proves to be important
  to optimize."
>How-To-Repeat:
1. Build (cc -Wall server.c -o server) the following code:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <netdb.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <errno.h>
#include <string.h>
#include <netinet/in.h>

int main()
{
	int fd, error = 0;
	struct sockaddr_in in;
	struct hostent *hp;

	if ((fd = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
		perror("socket");
		error = -1;
		goto done;	
	}
	if((hp = gethostbyname("0.0.0.0")) == NULL) {
		perror("gethostbyname");
		error = -2;
		goto done;	
	}
		
	memset(&in, 0, sizeof(in));
	in.sin_family = AF_INET;
	in.sin_port = htons(5050);
	memcpy(&in.sin_addr, hp->h_addr, hp->h_length);

	if (bind(fd, (struct sockaddr *)&in, sizeof(in)) < 0) {
		perror("bind");
		error = -3;
		goto done;	
	}
	if (listen(fd, 10) < 0) {
		perror("listen");
		error = -4;
		goto done;	
	}
	if (accept(fd,0,0) < 0) {
		perror("accept");
		error = -5;
	}

done:	close(fd);
	return error;
}

2. Run "lsof | grep server". You will see a number of file descriptors and among them should be something like this:

server    1076  root    3u    IPv4 0xc50ec910        0t0     TCP *:mmcc (LISTEN)
server    1076  root    4                                        0xc584b090 file struct, ty=0, op=0xc0979ec0

The first one (3u) is the descriptor we call accept() on. The second one (4) is a file struct which is allocated by falloc() in kern_accept(). It is inheritable. Therefore child may obtain a connection it doesn't expect.
>Fix:


>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->analyzed 
State-Changed-By: rwatson 
State-Changed-When: Tue Feb 10 09:33:38 UTC 2009 
State-Changed-Why:  
This race condition is inherent to all system calls that allocate 
file descriptors, especially where file descriptor allocation may 
block (such as open with O_EXLOCK, open on a fifo, etc) -- 
threaded user application must synchronize around such calls to 
deterministically prevent this result.  However, I will review 
both the code and other operating systems to see whether (a) this 
application race can be narrowed without affecting performance/ 
reliability, and (b) other systems provide a fix or workaround 
such that portable applications might expect not to see this 
race. 




Responsible-Changed-From-To: freebsd-bugs->rwatson 
Responsible-Changed-By: rwatson 
Responsible-Changed-When: Tue Feb 10 09:33:38 UTC 2009 
Responsible-Changed-Why:  
Grab ownership. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=130348 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/130348: commit references a PR
Date: Wed, 11 Feb 2009 13:44:41 +0000 (UTC)

 Author: rwatson
 Date: Wed Feb 11 13:44:27 2009
 New Revision: 188483
 URL: http://svn.freebsd.org/changeset/base/188483
 
 Log:
   Add a regresion test to determine whether or not a file descriptor is
   allocated in a fork(2)-inheritable way at the beginning or end of an
   accept(2) system call.  This test creates a test thread and blocks it
   in accept(2), then forks a child process which tests to see if the
   next available file descriptor is defined or not (EBADF vs EINVAL for
   ftruncate(2)).
   
   This detects a regression introduced during the network stack locking
   work, in which a very narrow race during which fork(2) from one
   thread during accept(2) in a second thread lead to an extra inherited
   file descriptor turned into a very wide race ensuring that a
   descriptor was leaked into the child even though it hadn't been
   returned.
   
   PR:		kern/130348
 
 Added:
   head/tools/regression/file/newfileops_on_fork/
   head/tools/regression/file/newfileops_on_fork/Makefile   (contents, props changed)
   head/tools/regression/file/newfileops_on_fork/newfileops_on_fork.c   (contents, props changed)
 
 Added: head/tools/regression/file/newfileops_on_fork/Makefile
 ==============================================================================
 --- /dev/null	00:00:00 1970	(empty, because file is newly added)
 +++ head/tools/regression/file/newfileops_on_fork/Makefile	Wed Feb 11 13:44:27 2009	(r188483)
 @@ -0,0 +1,8 @@
 +# $FreeBSD$
 +
 +PROG=	newfileops_on_fork
 +NO_MAN=
 +WARNS?=	6
 +LDFLAGS=	-lpthread
 +
 +.include <bsd.prog.mk>
 
 Added: head/tools/regression/file/newfileops_on_fork/newfileops_on_fork.c
 ==============================================================================
 --- /dev/null	00:00:00 1970	(empty, because file is newly added)
 +++ head/tools/regression/file/newfileops_on_fork/newfileops_on_fork.c	Wed Feb 11 13:44:27 2009	(r188483)
 @@ -0,0 +1,121 @@
 +/*-
 + * Copyright (c) 2009 Robert N. M. Watson
 + * All rights reserved.
 + *
 + * This software was developed at the University of Cambridge Computer
 + * Laboratory with support from a grant from Google, Inc. 
 + *
 + * Redistribution and use in source and binary forms, with or without
 + * modification, are permitted provided that the following conditions
 + * are met:
 + * 1. Redistributions of source code must retain the above copyright
 + *    notice, this list of conditions and the following disclaimer.
 + * 2. Redistributions in binary form must reproduce the above copyright
 + *    notice, this list of conditions and the following disclaimer in the
 + *    documentation and/or other materials provided with the distribution.
 + *
 + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 + * SUCH DAMAGE.
 + *
 + * $FreeBSD$
 + */
 +
 +/*
 + * When a multi-threaded application calls fork(2) from one thread while
 + * another thread is blocked in accept(2), we prefer that the file descriptor
 + * to be returned by accept(2) not appear in the child process.  Test this by
 + * creating a thread blocked in accept(2), then forking a child and seeing if
 + * the fd it would have returned is defined in the child or not.
 + */
 +
 +#include <sys/socket.h>
 +#include <sys/wait.h>
 +
 +#include <netinet/in.h>
 +
 +#include <err.h>
 +#include <errno.h>
 +#include <pthread.h>
 +#include <signal.h>
 +#include <stdlib.h>
 +#include <string.h>
 +#include <unistd.h>
 +
 +#define	PORT	9000
 +
 +static int listen_fd;
 +
 +static void *
 +do_accept(__unused void *arg)
 +{
 +	int accept_fd;
 +
 +	accept_fd = accept(listen_fd, NULL, NULL);
 +	if (accept_fd < 0)
 +		err(-1, "accept");
 +
 +	return (NULL);
 +}
 +
 +static void
 +do_fork(void)
 +{
 +	int pid;
 +
 +	pid = fork();
 +	if (pid < 0)
 +		err(-1, "fork");
 +	if (pid > 0) {
 +		waitpid(pid, NULL, 0);
 +		exit(0);
 +	}
 +
 +	/*
 +	 * We will call ftruncate(2) on the next available file descriptor,
 +	 * listen_fd+1, and get back EBADF if it's not a valid descriptor,
 +	 * and EINVAL if it is.  This (currently) works fine in practice.
 +	 */
 +	if (ftruncate(listen_fd + 1, 0 < 0)) {
 +		if (errno == EBADF)
 +			exit(0);
 +		else if (errno == EINVAL)
 +			errx(-1, "file descriptor still open in child");
 +		else
 +			err(-1, "unexpected error");
 +	} else
 +		errx(-1, "ftruncate succeeded");
 +}
 +
 +int
 +main(__unused int argc, __unused char *argv[])
 +{
 +	struct sockaddr_in sin;
 +	pthread_t accept_thread;
 +
 +	listen_fd = socket(PF_INET, SOCK_STREAM, 0);
 +	if (listen_fd < 0)
 +		err(-1, "socket");
 +	bzero(&sin, sizeof(sin));
 +	sin.sin_family = AF_INET;
 +	sin.sin_len = sizeof(sin);
 +	sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
 +	sin.sin_port = htons(PORT);
 +	if (bind(listen_fd, (struct sockaddr *)&sin, sizeof(sin)) < 0)
 +		err(-1, "bind");
 +	if (listen(listen_fd, -1) <0)
 +		err(-1, "listen");
 +	if (pthread_create(&accept_thread, NULL, do_accept, NULL) < 0)
 +		err(-1, "pthread_create");
 +	sleep(1);	/* Easier than using a CV. */;
 +	do_fork();
 +	exit(0);
 +}
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/130348: commit references a PR
Date: Wed, 11 Feb 2009 15:22:10 +0000 (UTC)

 Author: rwatson
 Date: Wed Feb 11 15:22:01 2009
 New Revision: 188485
 URL: http://svn.freebsd.org/changeset/base/188485
 
 Log:
   Modify fdcopy() so that, during fork(2), it won't copy file descriptors
   from the parent to the child process if they have an operation vector
   of &badfileops.  This narrows a set of races involving system calls that
   allocate a new file descriptor, potentially block for some extended
   period, and then return the file descriptor, when invoked by a threaded
   program that concurrently invokes fork(2).  Similar approches are used
   in both Solaris and Linux, and the wideness of this race was introduced
   in FreeBSD when we moved to a more optimistic implementation of
   accept(2) in order to simplify locking.
   
   A small race necessarily remains because the fork(2) might occur after
   the finit() in accept(2) but before the system call has returned, but
   that appears unavoidable using current APIs.  However, this race is
   vastly narrower.
   
   The fix can be validated using the newfileops_on_fork regression test.
   
   PR:		kern/130348
   Reported by:	Ivan Shcheklein <shcheklein at gmail dot com>
   Reviewed by:	jhb, kib
   MFC after:	1 week
 
 Modified:
   head/sys/kern/kern_descrip.c
 
 Modified: head/sys/kern/kern_descrip.c
 ==============================================================================
 --- head/sys/kern/kern_descrip.c	Wed Feb 11 14:25:09 2009	(r188484)
 +++ head/sys/kern/kern_descrip.c	Wed Feb 11 15:22:01 2009	(r188485)
 @@ -1583,7 +1583,8 @@ fdcopy(struct filedesc *fdp)
  	newfdp->fd_freefile = -1;
  	for (i = 0; i <= fdp->fd_lastfile; ++i) {
  		if (fdisused(fdp, i) &&
 -		    fdp->fd_ofiles[i]->f_type != DTYPE_KQUEUE) {
 +		    fdp->fd_ofiles[i]->f_type != DTYPE_KQUEUE &&
 +		    fdp->fd_ofiles[i]->f_ops != &badfileops) {
  			newfdp->fd_ofiles[i] = fdp->fd_ofiles[i];
  			newfdp->fd_ofileflags[i] = fdp->fd_ofileflags[i];
  			fhold(newfdp->fd_ofiles[i]);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: analyzed->patched 
State-Changed-By: rwatson 
State-Changed-When: Wed Feb 11 15:38:28 UTC 2009 
State-Changed-Why:  
Fix committed to 8.x; transition to patched until MFC. 

Hi Ivan: 

Thanks for this bug report; per commentary in the commit and the PR, there 
is an unavoidable race here due to the nature of the API, but I have made 
a change to our fork(2) code so that it is quite a narrow race (consistent 
with that found previously on FreeBSD and on other platforms) rather than 
a wide one.  I will merge this fix to 7.x in a week or so once it has 
settled.  If you're able to apply the patch manually to your local tree 
and confirm it fixes the problem you were seeing, that would be helpful. 

Thanks, 


http://www.freebsd.org/cgi/query-pr.cgi?pr=130348 

From: Ivan Shcheklein <shcheklein@gmail.com>
To: bug-followup@FreeBSD.org, shcheklein@gmail.com
Cc:  
Subject: Re: kern/130348: [socket] accept() prematurely allocates an 
	inheritable descriptor [regression]
Date: Thu, 12 Feb 2009 12:53:02 +0300

 --001636c5b1f4a9ef4b0462b5af69
 Content-Type: text/plain; charset=ISO-8859-1
 Content-Transfer-Encoding: 7bit
 
 On Wed, Feb 11, 2009 at 6:40 PM, <rwatson@freebsd.org> wrote:
 
 > Synopsis: [socket] accept() prematurely allocates an inheritable descriptor
 > [regression]
 >
 > State-Changed-From-To: analyzed->patched
 > State-Changed-By: rwatson
 > State-Changed-When: Wed Feb 11 15:38:28 UTC 2009
 > State-Changed-Why:
 > Fix committed to 8.x; transition to patched until MFC.
 >
 > Hi Ivan:
 >
 > Thanks for this bug report; per commentary in the commit and the PR, there
 > is an unavoidable race here due to the nature of the API, but I have made
 > a change to our fork(2) code so that it is quite a narrow race (consistent
 > with that found previously on FreeBSD and on other platforms) rather than
 > a wide one.  I will merge this fix to 7.x in a week or so once it has
 > settled.  If you're able to apply the patch manually to your local tree
 > and confirm it fixes the problem you were seeing, that would be helpful.
 >
 > Thanks,
 >
 >
 > http://www.freebsd.org/cgi/query-pr.cgi?pr=130348
 >
 
 Works fine on 7.1. Thank you, Robert.
 
 Also, I think we will use select() to avoid this race condition at all.
 
 --001636c5b1f4a9ef4b0462b5af69
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable
 
 <div class=3D"gmail_quote">On Wed, Feb 11, 2009 at 6:40 PM,  <span dir=3D"l=
 tr">&lt;<a href=3D"mailto:rwatson@freebsd.org">rwatson@freebsd.org</a>&gt;<=
 /span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"border-left: 1p=
 x solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
 Synopsis: [socket] accept() prematurely allocates an inheritable descriptor=
  [regression]<br>
 <br>
 State-Changed-From-To: analyzed-&gt;patched<br>
 State-Changed-By: rwatson<br>
 State-Changed-When: Wed Feb 11 15:38:28 UTC 2009<br>
 State-Changed-Why:<br>
 Fix committed to 8.x; transition to patched until MFC.<br>
 <br>
 Hi Ivan:<br>
 <br>
 Thanks for this bug report; per commentary in the commit and the PR, there<=
 br>
 is an unavoidable race here due to the nature of the API, but I have made<b=
 r>
 a change to our fork(2) code so that it is quite a narrow race (consistent<=
 br>
 with that found previously on FreeBSD and on other platforms) rather than<b=
 r>
 a wide one. &nbsp;I will merge this fix to 7.x in a week or so once it has<=
 br>
 settled. &nbsp;If you&#39;re able to apply the patch manually to your local=
  tree<br>
 and confirm it fixes the problem you were seeing, that would be helpful.<br=
 >
 <br>
 Thanks,<br>
 <br>
 <br>
 <a href=3D"http://www.freebsd.org/cgi/query-pr.cgi?pr=3D130348" target=3D"_=
 blank">http://www.freebsd.org/cgi/query-pr.cgi?pr=3D130348</a><br>
 </blockquote></div><br>Works fine on 7.1. Thank you, Robert.<br><br>Also, I=
  think we will use select() to avoid this race condition at all.<br>
 
 --001636c5b1f4a9ef4b0462b5af69--

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/130348: commit references a PR
Date: Wed, 25 Feb 2009 15:04:46 +0000 (UTC)

 Author: rwatson
 Date: Wed Feb 25 15:04:30 2009
 New Revision: 189044
 URL: http://svn.freebsd.org/changeset/base/189044
 
 Log:
   Merge r188485 from head to stable/7:
   
     Modify fdcopy() so that, during fork(2), it won't copy file descriptors
     from the parent to the child process if they have an operation vector
     of &badfileops.  This narrows a set of races involving system calls that
     allocate a new file descriptor, potentially block for some extended
     period, and then return the file descriptor, when invoked by a threaded
     program that concurrently invokes fork(2).  Similar approches are used
     in both Solaris and Linux, and the wideness of this race was introduced
     in FreeBSD when we moved to a more optimistic implementation of
     accept(2) in order to simplify locking.
   
     A small race necessarily remains because the fork(2) might occur after
     the finit() in accept(2) but before the system call has returned, but
     that appears unavoidable using current APIs.  However, this race is
     vastly narrower.
   
     The fix can be validated using the newfileops_on_fork regression test.
   
     PR:           kern/130348
     Reported by:  Ivan Shcheklein <shcheklein at gmail dot com>
     Reviewed by:  jhb, kib
 
 Modified:
   stable/7/sys/   (props changed)
   stable/7/sys/contrib/pf/   (props changed)
   stable/7/sys/dev/ath/ath_hal/   (props changed)
   stable/7/sys/dev/cxgb/   (props changed)
   stable/7/sys/kern/kern_descrip.c
 
 Modified: stable/7/sys/kern/kern_descrip.c
 ==============================================================================
 --- stable/7/sys/kern/kern_descrip.c	Wed Feb 25 15:01:26 2009	(r189043)
 +++ stable/7/sys/kern/kern_descrip.c	Wed Feb 25 15:04:30 2009	(r189044)
 @@ -1613,7 +1613,8 @@ fdcopy(struct filedesc *fdp)
  	newfdp->fd_freefile = -1;
  	for (i = 0; i <= fdp->fd_lastfile; ++i) {
  		if (fdisused(fdp, i) &&
 -		    fdp->fd_ofiles[i]->f_type != DTYPE_KQUEUE) {
 +		    fdp->fd_ofiles[i]->f_type != DTYPE_KQUEUE &&
 +		    fdp->fd_ofiles[i]->f_ops != &badfileops) {
  			newfdp->fd_ofiles[i] = fdp->fd_ofiles[i];
  			newfdp->fd_ofileflags[i] = fdp->fd_ofileflags[i];
  			fhold(newfdp->fd_ofiles[i]);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: patched->closed 
State-Changed-By: rwatson 
State-Changed-When: Mon Mar 2 11:18:43 UTC 2009 
State-Changed-Why:  
The patch has now been merged to 7.x and should appear in 7.2-RELEASE 
later this year.  Thanks for the report! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=130348 
>Unformatted:
