From nobody@FreeBSD.org  Fri Jan 21 05:32:13 2005
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 37D2216A4CE
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 21 Jan 2005 05:32:13 +0000 (GMT)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 09F4743D31
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 21 Jan 2005 05:32:13 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id j0L5WCkI022859
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 21 Jan 2005 05:32:12 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id j0L5WC8r022857;
	Fri, 21 Jan 2005 05:32:12 GMT
	(envelope-from nobody)
Message-Id: <200501210532.j0L5WC8r022857@www.freebsd.org>
Date: Fri, 21 Jan 2005 05:32:12 GMT
From: Daniel Fuller / Greg Ward <defuller@lbl.gov>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Subsequent calls to select() a FIFO after a previous FIFO EOF causes calling process to hang
X-Send-Pr-Version: www-2.3

>Number:         76525
>Category:       kern
>Synopsis:       [fifo] select() hangs on EOF from named pipe (FIFO)
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    jilles
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jan 21 05:40:15 GMT 2005
>Closed-Date:    Sun Dec 01 22:18:33 UTC 2013
>Last-Modified:  Sun Dec 01 22:18:33 UTC 2013
>Originator:     Daniel Fuller / Greg Ward
>Release:        5.3 Release
>Organization:
Berkeley Lab
>Environment:
FreeBSD render08.lbl.gov 5.3-STABLE FreeBSD 5.3-STABLE #2: Tue Jan 18 15:11:54 PST 2005     root@render08.lbl.gov:/farm/FreeBSD/releases/amd64/RELENG_5/obj/farm/FreeBSD/releases/amd64/RELENG_5/src/sys/SMP  amd64
>Description:
(Taken from an email by Greg Ward gward@lmi.net)

Hi Danny,

Well, it only took me 8 hours, but I found the problem with the latest version of [Free]BSD.  I tracked down several false leads before I got on the right track -- it's only named FIFO's that seem to exhibit this problem.  I depend on them for the -P and -PP option of rtrace, which is needed for memory sharing as I've set it up in Radiance.  I don't think named FIFO's are used very often, which might explain why this has gone undetected (or at least unfixed).

There are two test programs that demonstrate the problem.  The first is called pipe.c, and on OS X, it produces the following (correct) output:

pipe available for read
Read 4 bytes from pipe: 'TEST'
pipe available for read
Read 0 bytes from pipe: ''

Under FreeBSD 5.3, for some reason I get an exception condition on my pipe every time, which is strange but not fatal:

Exception on pipe
pipe available for read
Read 4 bytes from pipe: 'TEST'
Exception on pipe
pipe available for read
Read 0 bytes from pipe: ''

On FreeBSD 4.10, I only get an exception at EOF, which I might expect:

pipe available for read
Read 4 bytes from pipe: 'TEST'
Exception on pipe
pipe available for read
Read 0 bytes from pipe: ''

The real trouble begins with the second FIFO test in fifo.c.  Under OS X, I get the correct output:

FIFO available for read
Read 4 bytes from FIFO: 'TEST'
FIFO available for read
Read 0 bytes from FIFO: ''

Under FreeBSD 4.10, I get exactly the same output -- even the exception condition is gone:

FIFO available for read
Read 4 bytes from FIFO: 'TEST'
FIFO available for read
Read 0 bytes from FIFO: ''

However, under FreeBSD 5.3-STABLE, the poor thing hangs at the EOF, and select(2) never returns:

FIFO available for read
Read 4 bytes from FIFO: 'TEST'
(process hangs in second call to select)

Keep in mind that there should be no difference in the behavior between a named FIFO and a pipe -- the only difference is how they are mechanically connected by the two processes.  Having the select() call hang when an EOF condition exists is not acceptable.

I hope you can forward this to the appropriate FreeBSD gurus.

Thanks,
-Greg

>How-To-Repeat:
source code for test programs described above:

... pipe.c: ...

/*
 * Check pipe behavior
 *
 * Greg Ward <gward@lmi.net>
 * Compare also fifo.c
 */

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <unistd.h>
#include <fcntl.h>

look(int fd)
{
	fd_set	readfds, excepfds;

	FD_ZERO(&readfds);
	FD_ZERO(&excepfds);
	FD_SET(fd, &readfds);
	FD_SET(fd, &excepfds);
	if (select(fd+1, &readfds, NULL, &excepfds, NULL) < 0) {
		perror("select");
		exit(1);
	}
	if (FD_ISSET(fd, &excepfds))
		puts("Exception on pipe");
	if (FD_ISSET(fd, &readfds))
		puts("pipe available for read");
}

void
spit(int fd)
{
	char	buf[512];
	int	n = read(fd, buf, sizeof(buf));
	buf[n] = '\0';
	printf("Read %d bytes from pipe: '%s'\n", n, buf);
}


main()
{
	int	pp[2];

	pipe(pp);
	if (fork() == 0) {
		close(pp[0]);
		write(pp[1], "TEST", 4);
		close(pp[1]);
		_exit(0);
	}
	close(pp[1]);
	look(pp[0]);
	spit(pp[0]);
	look(pp[0]);
	spit(pp[0]);
	return(0);
}


..fifo.c:...


/*
 * Reproduce bug in FreeBSD 5.3-STABLE
 *
 * Greg Ward	<gward@lmi.net>
 * See also pipe.c for comparison.
 */

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <unistd.h>
#include <fcntl.h>

const char	FIFO[] = "/tmp/fifo";

void
look(int fd)
{
	fd_set	readfds, excepfds;

	FD_ZERO(&readfds);
	FD_ZERO(&excepfds);
	FD_SET(fd, &readfds);
	FD_SET(fd, &excepfds);
	if (select(fd+1, &readfds, NULL, &excepfds, NULL) < 0) {
		perror("select");
		exit(1);
	}
	if (FD_ISSET(fd, &excepfds))
		puts("Exception on FIFO");
	if (FD_ISSET(fd, &readfds))
		puts("FIFO available for read");
}

void
spit(int fd)
{
	char	buf[512];
	int	n = read(fd, buf, sizeof(buf));
	buf[n] = '\0';
	printf("Read %d bytes from FIFO: '%s'\n", n, buf);
}

main()
{
	int	fifo_fd;

	unlink(FIFO);
	mkfifo(FIFO, 0666);
	if (fork() == 0) {
		fifo_fd = open(FIFO, O_WRONLY);
		write(fifo_fd, "TEST", 4);
		close(fifo_fd);
		_exit(0);
	}
	fifo_fd = open(FIFO, O_RDONLY);
	look(fifo_fd);
	spit(fifo_fd);
	look(fifo_fd);
	spit(fifo_fd);
	return(0);
}

..END...
      

>Fix:
None known for FreeBSD 5.3X.
>Release-Note:
>Audit-Trail:

From: Bruce Evans <bde@zeta.org.au>
To: Daniel Fuller / Greg Ward <defuller@lbl.gov>
Cc: freebsd-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org
Subject: Re: kern/76525: Subsequent calls to select() a FIFO after a previous
 FIFO EOF causes calling process to hang
Date: Sun, 23 Jan 2005 12:05:18 +1100 (EST)

 On Fri, 21 Jan 2005, Daniel Fuller / Greg Ward wrote:
 
 > Well, it only took me 8 hours, but I found the problem with the latest version of [Free]BSD.  I tracked down several false leads before I got on the right track -- it's only named FIFO's that seem to exhibit this problem.  I depend on them for the -P and -PP option of rtrace, which is needed for memory sharing as I've set it up in Radiance.  I don't think named FIFO's are used very often, which might explain why this has gone undetected (or at least unfixed).
 
 Please limit line lengths to considerably less than 463 characters.
 
 This is essentially the same bug as the one for poll() that was reported
 recently in PR 76144.  The problem is that some users want poll() and
 select() on a FIFO with no writer (and no data) to block in some contexts,
 and some systems implement this.  FreeBSD used to never block, but was
 changed to always block.  I knew that this might break things and tried
 to limit the breakage, but somehow missed that it broke the most important
 case of a writer going away.  It seems that I only limited the breakage
 for read() but made things worse for select() and poll().
 
 Other OS's apaprently have more context (and associated races) so that
 they can handle the EOF from a writer going away differently from EOF
 when there has "never" been a writer.  A hangup flag is obviously
 needed to implement POLLHUP for poll(), but select() cannot report
 hangups properly.  The semantics of "never" are unclear.  I think the
 flag should be per-file so that new opens don't see the hangup/EOF
 condition just because another reader has seen a hangup.  The
 implementation only has a per-device hangup flag, so fixing the bug
 involves more than just making the behaviour depend on that flag.
 
 History of related bugs, all from wrong setting and wrong use of the
 per-device hangup flag:
 (1) in rev.1.1 of fifo_vnops.c, the flag was set on the wrong half of
     the socket pair at open() time, so everything starting with read()
     was potentially broken for the context where there has never been
     a writer.  read() was plain broken -- it blocked waiting for a
     writer, but must return 0 immediately in the O_NONBLOCK case.
     select() accidentally blocked, which is apparently what is wanted.
     After fixing this:
 (2) in rev.1.1, the flag was cleared on the first successful() read.  This
     broke subsequent reads in the case of no writer (nonblocking reads must
     return 0 immediately, but only did so for the first read).
 (3) the above 2 bugs were fixed in rev.1.40 in December 1997.  FIFOs are
     apparently not often used, since no one seemed to notice the presence
     or absence of these bugs.  I only noticed because some POSIX conformance
     test programs reported the bugs.
 (4) bug (2) was reimplemented in rev.1.56 in November 2001 by undoing part
     of 1.40.
 (5) bug (1) was reimplemented in rev.1.57 in November 2001 by deleting the
     initialization of the hangup flag instead of by initializing the flag
     in the wrong place.  Thus after (4) and (5), read() was broken but
     select() sort of worked, as in rev.1.1, and the new poll() syscall
     sort of worked, like select().  POLLHUP has never been implemented
     for FIFOs or sockets, so poll()'s reporting of the hangup condition
     has never worked (see PR 76144).
 (6) read() was unbroken in rev.1.62 in January 2002 by restoring the fixes
     for (1) and (2), but select() and poll() were broken by ignoring the
     flag for them.  For poll(), it is possible to get the old behaviour (3)
     usign a new poll flag to request not ignoring the hangup flag.
 (7) 3 years later, some PRs about (6) were filed.  FIFOs are apparently
     still not often used.
 
 > There are two test programs that demonstrate the problem.  The first is called pipe.c, and on OS X, it produces the following (correct) output:
 >
 > pipe available for read
 > Read 4 bytes from pipe: 'TEST'
 > pipe available for read
 > Read 0 bytes from pipe: ''
 
 I believe this stuff is all implemented correctly for nameless pipes,
 but only in old versions of FreeBSD, apparently including the one that
 OS X is based on.  The case of a reader with no writer doesn't occur
 initially and writers don't come back after they are closed, so things
 are simpler.
 
 > Under FreeBSD 5.3, for some reason I get an exception condition on my pipe every time, which is strange but not fatal:
 >
 > Exception on pipe
 > pipe available for read
 > Read 4 bytes from pipe: 'TEST'
 > Exception on pipe
 > pipe available for read
 > Read 0 bytes from pipe: ''
 >
 > On FreeBSD 4.10, I only get an exception at EOF, which I might expect:
 >
 > pipe available for read
 > Read 4 bytes from pipe: 'TEST'
 > Exception on pipe
 > pipe available for read
 > Read 0 bytes from pipe: ''
 
 I think the exception is for hangup.  select() and poll() use the same low-
 level interface.  This interface gives poll() semantics, and select()
 semantics are derived.  After a hangup, poll() always sets POLLHUP and
 doesn't block.  Also, POLLHUP is actually implemented for nameless pipes,
 unlike for named pipes.  After a hangup on a pipe, select() on an
 exception descriptor cannot block so it must report an exception.
 
 The extra exception in 5.3 is a bug.  I've debugged it before in
 connection with piping input to gdb.  5.3 returns POLLHUP from poll()
 (and exceptions for select()) as soon as the writer is closed, despite
 there being data in the pipe.  This breaks applications like gdb which
 stop reading input when they see POLLHUP.  See PR 53447 for more
 details.  PR 53447 is also primarily about select/poll hangup handling.
 Your test shows that 4.10 is somehow missing this bug.
 
 > The real trouble begins with the second FIFO test in fifo.c.  Under OS X, I get the correct output:
 >
 > FIFO available for read
 > Read 4 bytes from FIFO: 'TEST'
 > FIFO available for read
 > Read 0 bytes from FIFO: ''
 >
 > Under FreeBSD 4.10, I get exactly the same output -- even the exception condition is gone:
 >
 > FIFO available for read
 > Read 4 bytes from FIFO: 'TEST'
 > FIFO available for read
 > Read 0 bytes from FIFO: ''
 
 This is because these systems are based on versions of fifo_vnops.c older
 than rev.1.56 (so only POLLHUP for hangup and arguably blocking for no
 writer && no data && no hangup are broken).
 
 > However, under FreeBSD 5.3-STABLE, the poor thing hangs at the EOF, and select(2) never returns:
 >
 > FIFO available for read
 > Read 4 bytes from FIFO: 'TEST'
 > (process hangs in second call to select)
 
 This is because not just POLLHUP for hangup is broken; not blocking for
 hangup is broken too.
 
 > Keep in mind that there should be no difference in the behavior between a named FIFO and a pipe -- the only difference is how they are mechanically connected by the two processes.  Having the select() call hang when an EOF condition exists is not acceptable.
 
 I agree for select(), but this is nonstandard for the EOF that occurs
 when there is no writer && no data && no hangup, and for poll() the
 hangup condition can be reported separately so there is no need to
 overload the EOF condition.  For poll(), the difficulty is clearing
 the hangup condition: if it is cleared on open() of a reader, then
 open() races with clearing the flag and select() after open() may block
 when we don't want it to after losing a race, but if the hangup condition
 is not cleared until all readers and writers are closed, then select()
 after open() may return immediately when we want it to block.
 
 Bruce

From: "Dorr H. Clark" <dclark@applmath.scu.edu>
To: freebsd-gnats-submit@FreeBSD.org, defuller@lbl.gov
Cc:  
Subject: NAB - Re: kern/76525: select() hangs on EOF from named pipe (FIFO)
Date: Sat, 09 Apr 2005 13:49:10 -0700

 We believe that the behavior of select() on fifo
 in the absence of a writer is the desired behavior of FreeBSD5.3
 and is therefore not a bug.
 
 We propose the following fix to the bug author's source code fifo.c
 
 --- fifo_orig.c Sun Mar 13 08:54:14 2005
 +++ fifo.c      Sun Mar 13 09:45:30 2005
 @@ -4,26 +4,21 @@
   #include <sys/time.h>
   #include <unistd.h>
   #include <fcntl.h>
 +#include <sys/poll.h>
 
   const char      FIFO[] = "/tmp/fifo";
 
   void
   look(int fd)
   {
 -        fd_set  readfds, excepfds;
 +        struct pollfd poll_list[1];
 
 -        FD_ZERO(&readfds);
 -        FD_ZERO(&excepfds);
 -        FD_SET(fd, &readfds);
 -        FD_SET(fd, &excepfds);
 -        if (select(fd+1, &readfds, NULL, &excepfds, NULL) < 0) {
 -                perror("select");
 +        poll_list[0].fd = fd;
 +        poll_list[0].events = POLLIN|POLLINIGNEOF;
 +        if (poll(poll_list , 1, -1) < 0) {
 +                perror("poll");
                   exit(1);
           }
 -        if (FD_ISSET(fd, &excepfds))
 -                puts("Exception on FIFO");
 -        if (FD_ISSET(fd, &readfds))
 -                puts("FIFO available for read");
   }
 
 In FreeBSD4.x, select() on fifo always returned when there was no
 writer.  In FreeBSD5.x, select() on fifo was changed to block
 in the absence of a writer.
 
 Historically, some users have desired select() to block waiting for a
 writer to appear.  They want fifo to behave as a data source without
 worrying about connections coming and going, but there are also
 users who want select() to return in the absence of a writer.
 
 This disagreement has played out in CVS, specifically
 /usr/src/sys/fs/fifofs/fifo_vnops.c has experienced
 a series of changes, alternating between blocking
 and non-blocking behavior for select().
 
 The following CVS revisions are relevant:
 In 1.40, select() on fifo with no writer did not block.
 In 1.56 it was restored to blocking.  In revision 1.62,
 the fixes of 1.40 were restored but select() on fifo was
 made to block waiting for a writer to appear.
 
 Also, a new event bitmask, POLLINIGNEOF for poll() has been
 implemented.  If the users want non-blocking behavior
 when there is no writer, they can call poll() instead,
 setting the event bitmask POLLINIGNEOF.  To illustrate,
 we have implemented this change in the bug author's test program.
 
 We also believe that the current 5.x behavior is consistent
 with the POSIX.1 standard as well as the overall intent
 of select(), but we are aware that this interpretation
 is not universally shared.
 
 If the current behavior of select() on fifo,
 is not desirable, the following patch can be applied to
 filo_poll() in fifo_vnops.c, which reverses the select()
 behavior for fifo by reverting it to non-blocking.Users
 need to set the event bitmask POLLINIGNEOF to get blocking
 behavior.
 
 While offering the change, we would like to reiterate
 that we believe the change is inappropriate (inconsistent
 with POSIX.1) and should not be captured into CVS.
 
 --- fifo_vnops_orig.c   Wed Mar 16 16:13:24 2005
 +++ fifo_vnops.c        Wed Mar 16 16:25:12 2005
 @@ -531,7 +531,7 @@
                   * set POLLINIGNEOF to get non-blocking behavior.
                   */
                  if (events & (POLLIN | POLLRDNORM) &&
 -                   !(events & POLLINIGNEOF)) {
 +                   (events & POLLINIGNEOF)) {
                          events &= ~(POLLIN | POLLRDNORM);
                          events |= POLLINIGNEOF;
                  }
 @@ -544,7 +544,7 @@
 
                  /* Reverse the above conversion. */
                  if ((revents & POLLINIGNEOF) &&
 -                   !(ap->a_events & POLLINIGNEOF)) {
 +                   (ap->a_events & POLLINIGNEOF)) {
                          revents |= (ap->a_events & (POLLIN | POLLRDNORM));
                          revents &= ~POLLINIGNEOF;
                  }
 
 
 
 Shikha Shrivastava, engineer
 Dorr H. Clark, advisor
 COEN 284 - Operating Systems Case Study
 Santa Clara University,
 Santa Clara CA.
State-Changed-From-To: open->closed 
State-Changed-By: jilles 
State-Changed-When: Sun Dec 1 22:16:08 UTC 2013 
State-Changed-Why:  
The test program works with supported FreeBSD versions such as 8.4-STABLE. 
It seems to have been fixed as a result of PR kern/94772. 


Responsible-Changed-From-To: freebsd-bugs->jilles 
Responsible-Changed-By: jilles 
Responsible-Changed-When: Sun Dec 1 22:16:08 UTC 2013 
Responsible-Changed-Why:  
Track replies. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=76525 
>Unformatted:
