From olli@lurza.secnetix.de  Tue Mar 21 09:56:57 2006
Return-Path: <olli@lurza.secnetix.de>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1457416A400
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 21 Mar 2006 09:56:57 +0000 (UTC)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (lurza.secnetix.de [83.120.8.8])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 77C1943D49
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 21 Mar 2006 09:56:56 +0000 (GMT)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (ncvkbc@localhost [127.0.0.1])
	by lurza.secnetix.de (8.13.4/8.13.4) with ESMTP id k2L9untZ094655;
	Tue, 21 Mar 2006 10:56:54 +0100 (CET)
	(envelope-from oliver.fromme@secnetix.de)
Received: (from olli@localhost)
	by lurza.secnetix.de (8.13.4/8.13.1/Submit) id k2L9ungW094654;
	Tue, 21 Mar 2006 10:56:49 +0100 (CET)
	(envelope-from olli)
Message-Id: <200603210956.k2L9ungW094654@lurza.secnetix.de>
Date: Tue, 21 Mar 2006 10:56:49 +0100 (CET)
From: Oliver Fromme <olli@secnetix.de>
Reply-To: Oliver Fromme <olli@secnetix.de>
To: FreeBSD-gnats-submit@freebsd.org
Cc: Oliver Fromme <olli@secnetix.de>
Subject: FIFOs (named pipes) + select() == broken
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         94772
>Category:       kern
>Synopsis:       [fifo] [patch] FIFOs (named pipes) + select() == broken
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Mar 21 10:00:29 GMT 2006
>Closed-Date:    Wed Aug 04 14:05:28 UTC 2010
>Last-Modified:  Wed Aug 04 14:05:28 UTC 2010
>Originator:     Oliver Fromme
>Release:        FreeBSD 6.1-PRERELEASE i386
>Organization:
secnetix GmbH & Co. KG
		http://www.secnetix.de/bsd
>Environment:
System: FreeBSD epia.fromme.com 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Tue Mar 21 10:21:23 CET 2006 olli@epia.fromme.com:/usr/src/sys/i386/compile/EPIA i386

I'm using the latest RELENG_6 from today (March 21, 2006).

>Description:

I recently wondered why several of my scripts that use
a named pipe (FIFO) don't work on FreeBSD.

After some debugging it turned out that select() seems
to be broken when used with FIFOs on FreeBSD 6.
Particularly, this is the bug I'm suffering from:

When a FIFO had been opened for reading and the writing
process closes it, the reading process blocks in select(),
even though the descriptor is ready for read().  If the
select() call is omitted, read() returns 0 immediately
indicating EOF.  But as soon as you use select(), it blocks
and there is no chance to detect the EOF condition.

That's clearly a serious violation of POSIX, SUSv3,
Stevens APUE and all other documentations about select()
and named pipes that I'm aware of.  It needs to be fixed.

>How-To-Repeat:

Please see the small test program below.  Compile it like this:
cc -O -o fifotest fifotest.c

Then create a named pipe, e.g.:  $ mkfifo fifo
And run the test program:  ./fifotest fifo
It will block on the open(), which is to be expected
(correct behaviour so far).

Then open another shell (e.g. second terminal window)
and type:   echo foo > fifo

You will see from the output of the fifotest program
that the open() succeeds, the select() returns 1, and
the read() returns 4 bytes ("foo\n").  But then the
next call to select() blocks, even though there is
an EOF condition!

The same test program (with "err()" replaced by a small
self-made function) runs without error on all other UNIX
systems that I've tried:  Linux 2.4.32, Solaris 10, and
DEC UNIX 4.0 (predecessor of Tru64).  By the way, it's
even sufficient to do "cat /dev/null > fifo", i.e. not
writing anything at all, but issuing EOF immediately.
Under FreeBSD, nothing happens at all in that case.
All other UNIX systems recognize EOF (select() returns).

The source contains a #define WITH_SELECT.  When you
undefine it, select() won't be called, only read().
Then the program runs fine and detects the EOF condition
correctly.

Here's the source code.  In case it is mangled somehow
by send-pr, I've put a copy on this web page:
http://www.secnetix.de/~olli/tmp/fifotest.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/select.h>
#include <fcntl.h>
#include <err.h>

#define WITH_SELECT

int
main (int argc, char *argv[])
{
        int fd_in, result;
        char buffer[4096];

        if (argc != 2)
                errx (1, "Usage:  %s <fifo>\n", argv[0]);

        fprintf (stderr, "Opening FIFO for reading (might block) ...\n");

        if ((fd_in = open(argv[1], O_RDONLY, 0666)) < 0)
                err (1, argv[1]);

        fprintf (stderr, "FIFO opened successfully.\n");

        for (;;) {

#ifdef WITH_SELECT
                fd_set fds_r;

                FD_ZERO (&fds_r);
                FD_SET (fd_in, &fds_r);

                fprintf (stderr, "Calling select(read FD %d) ...\n", fd_in);

                if ((result = select(fd_in + 1, &fds_r, NULL, NULL, NULL)) < 0)
                        err (1, "select()");

                fprintf (stderr, "... return value is %d.\n", result);

                if (! FD_ISSET(fd_in, &fds_r))
                        continue;
#endif
                result = read(fd_in, buffer, 4096);
                fprintf (stderr, "read() returned %d bytes.\n", result);
                if (result < 1) {
                        if (result < 0)
                                err (1, "read()");
                        break;
                }
        }

        fprintf (stderr, "Got EOF!\n");
        close (fd_in);
        return 0;
}

/* END OF SOURCE */

>Fix:

None, known, unfortunately.  :-(

>Release-Note:
>Audit-Trail:

From: Bruce Evans <bde@zeta.org.au>
To: Oliver Fromme <olli@secnetix.de>
Cc: FreeBSD-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Wed, 22 Mar 2006 12:09:16 +1100 (EST)

 On Tue, 21 Mar 2006, Oliver Fromme wrote:
 
 >> Description:
 >
 > I recently wondered why several of my scripts that use
 > a named pipe (FIFO) don't work on FreeBSD.
 >
 > After some debugging it turned out that select() seems
 > to be broken when used with FIFOs on FreeBSD 6.
 > Particularly, this is the bug I'm suffering from:
 >
 > When a FIFO had been opened for reading and the writing
 > process closes it, the reading process blocks in select(),
 > even though the descriptor is ready for read().  If the
 > select() call is omitted, read() returns 0 immediately
 > indicating EOF.  But as soon as you use select(), it blocks
 > and there is no chance to detect the EOF condition.
 
 See also:
 
 PR 76125 (about the same bug)
 PR 76144 (about a related bug in poll())
 PR 53447 (about a another or the same related bug in poll())
 PR 34020 (about the inverse of the bug (return on non-EOF) for select()
            and poll())
 
 The fix for PR 34020 inverted (a symptom of) a bug for poll() to
 give a worse bug, and gave the (inverted) bug for select() where there
 was no bug before.  fifo_poll() now instructs lower layers to ignore
 EOF by setting POLLINIGNEOF.  This moves the bug for poll() and isn't
 directly harmful for poll(), but it is directly harmful for select()
 and it makes the real bug for poll() more harmful.  The real bug for
 poll() is that POLLHUP needs to be set to indicate EOF, but it isn't
 actually set for many file types including named pipes.  So we now
 always get no indication of EOF where we should get POLLHUP or
 POLLIN | POLLHUP.  Previously, we always got EOF indicated by POLLIN,
 and we also got EOF indicated by POLLIN in the tricky case where other
 systems don't indicate EOF.
 
 The tricky case is for a named pipe that has not had any writers during
 the lifetime of current readers or thereabouts (races seem to be a
 problem).  Such a pipe can never have had a connection on it, so it
 cannot be in the hangup state and it is clear that poll() should not
 return POLLHUP for it.  It is less clear that select() and poll()
 should block waiting for a writer, but that is what other systems do
 for poll() at least.  I fixed FreeBSD long ago to not always block in
 read() on a named piped when there are no current writers, since
 blocking in read() is just wrong in the O_NONBLOCK case.  This had the
 side effect of making select() never block when there are no current
 writers.  poll() inherited this behaviour from select().  This behaviour
 is wrong since select()/poll are the only reasonable ways to block
 waiting for a writer, but it doesn't cause many problems.
 
 >> How-To-Repeat:
 >
 > Please see the small test program below.  Compile it like this:
 > cc -O -o fifotest fifotest.c
 > ...
 > The same test program (with "err()" replaced by a small
 > self-made function) runs without error on all other UNIX
 > systems that I've tried:  Linux 2.4.32, Solaris 10, and
 > DEC UNIX 4.0 (predecessor of Tru64).  By the way, it's
 > even sufficient to do "cat /dev/null > fifo", i.e. not
 > writing anything at all, but issuing EOF immediately.
 > Under FreeBSD, nothing happens at all in that case.
 > All other UNIX systems recognize EOF (select() returns).
 
 Here is a program that tests more cases.  I made it give no output
 (for no errors) under Linux-2.6.10.  It also gives no output for
 the nameless pipe case under FreeBSD-4.10 and FreeBSD-oldcurrent
 and for the named piped case under FreeBSD-4.10, but it fails with
 (only) the error in this PR under FreeBSD-oldcurrent.  Please test
 it on Solaris etc.  Compile it with -DNAMEDPIPE for the named pipe
 case.  In this case, it creates and leaves a fifo "p" in the current
 directory (or doesn't handle the error if "p" exists but is not
 an accessible fifo) but shouldn't have any other side effects.
 
 %%%
 #include <sys/select.h>
 #include <sys/stat.h>
 
 #include <err.h>
 #include <errno.h>
 #include <fcntl.h>
 #include <signal.h>
 #include <unistd.h>
 
 static pid_t cpid;
 static pid_t ppid;
 static volatile sig_atomic_t state;
 
 static void
 catch(int sig)
 {
  	state++;
 }
 
 static void
 child(int fd)
 {
  	fd_set rfds;
  	struct timeval tv;
  	char buf[256];
 
 #ifdef NAMEDPIPE
  	fd = open("p", O_RDONLY | O_NONBLOCK);
  	if (fd < 0)
  		err(1, "open for read");
 #endif
  	kill(ppid, SIGUSR1);
 
  	/* XXX should check that fd fits in rfds. */
 
  	usleep(1);
  	while (state != 1)
  		;
 #ifndef NAMEDPIPE
  	/*
  	 * The connection cannot be restablished.  Use the code that delays
  	 * the read until after the writer disconnects since that case is
  	 * more interesting.
  	 */
  	state = 4;
  	goto state4;
 #endif
  	FD_ZERO(&rfds);
  	FD_SET(fd, &rfds);
  	tv.tv_sec = 0;
  	tv.tv_usec = 0;
  	if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
  		err(1, "select");
  	if (FD_ISSET(fd, &rfds))
  		warnx("state 1: expected clear; got set");
  	kill(ppid, SIGUSR1);
 
  	usleep(1);
  	while (state != 2)
  		;
  	FD_ZERO(&rfds);
  	FD_SET(fd, &rfds);
  	tv.tv_sec = 0;
  	tv.tv_usec = 0;
  	if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
  		err(1, "select");
  	if (!FD_ISSET(fd, &rfds))
  		warnx("state 2: expected set; got clear");
  	if (read(fd, buf, sizeof buf) != 1)
  		err(1, "read");
  	FD_ZERO(&rfds);
  	FD_SET(fd, &rfds);
  	tv.tv_sec = 0;
  	tv.tv_usec = 0;
  	if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
  		err(1, "select");
  	if (FD_ISSET(fd, &rfds))
  		warnx("state 2a: expected clear; got set");
  	kill(ppid, SIGUSR1);
 
  	usleep(1);
  	while (state != 3)
  		;
  	FD_ZERO(&rfds);
  	FD_SET(fd, &rfds);
  	tv.tv_sec = 0;
  	tv.tv_usec = 0;
  	if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
  		err(1, "select");
  	if (!FD_ISSET(fd, &rfds))
  		warnx("state 3: expected set; got clear");
  	kill(ppid, SIGUSR1);
 
  	/*
  	 * Now we expect a new writer, and a new connection too since
  	 * we read all the data.  The only new point is that we didn't
  	 * start quite from scratch since the read fd is not new.  Check
  	 * startup state as above, but don't do the read as above.
  	 */
  	usleep(1);
  	while (state != 4)
  		;
 state4:
  	FD_ZERO(&rfds);
  	FD_SET(fd, &rfds);
  	tv.tv_sec = 0;
  	tv.tv_usec = 0;
  	if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
  		err(1, "select");
  	if (FD_ISSET(fd, &rfds))
  		warnx("state 4: expected clear; got set");
  	kill(ppid, SIGUSR1);
 
  	usleep(1);
  	while (state != 5)
  		;
  	FD_ZERO(&rfds);
  	FD_SET(fd, &rfds);
  	tv.tv_sec = 0;
  	tv.tv_usec = 0;
  	if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
  		err(1, "select");
  	if (!FD_ISSET(fd, &rfds))
  		warnx("state 5: expected set; got clear");
  	kill(ppid, SIGUSR1);
 
  	usleep(1);
  	while (state != 6)
  		;
  	/*
  	 * Now we have no writer, but should still have data from the old
  	 * writer. Check that we have both a data condition and a hangup
  	 * condition, and that the data can read the data in the usual way.
  	 * Since Linux does this, programs must not quite reading when they
  	 * see POLLHUP; they must see POLLHUP without POLLIN (or another
  	 * input condition) before they decide that there is EOF.  gdb-6.1.1
  	 * is an example of a broken program that quits on POLLHUP only --
  	 * see its event-loop.c.
  	 */
  	FD_ZERO(&rfds);
  	FD_SET(fd, &rfds);
  	tv.tv_sec = 0;
  	tv.tv_usec = 0;
  	if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
  		err(1, "select");
  	if (!FD_ISSET(fd, &rfds))
  		warnx("state 6: expected set; got clear");
  	if (read(fd, buf, sizeof buf) != 1)
  		err(1, "read");
  	FD_ZERO(&rfds);
  	FD_SET(fd, &rfds);
  	tv.tv_sec = 0;
  	tv.tv_usec = 0;
  	if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
  		err(1, "select");
  	if (!FD_ISSET(fd, &rfds))
  		warnx("state 6a: expected set; got clear");
  	close(fd);
  	kill(ppid, SIGUSR1);
  	exit(0);
 }
 
 static void
 parent(int fd)
 {
  	usleep(1);
  	while (state != 1)
  		;
 #ifdef NAMEDPIPE
  	fd = open("p", O_WRONLY | O_NONBLOCK);
  	if (fd < 0)
  		err(1, "open for write");
 #endif
  	kill(cpid, SIGUSR1);
 
  	usleep(1);
  	while (state != 2)
  		;
  	if (write(fd, "", 1) != 1)
  		err(1, "write");
  	kill(cpid, SIGUSR1);
 
  	usleep(1);
  	while (state != 3)
  		;
  	if (close(fd) != 0)
  		err(1, "close for write");
  	kill(cpid, SIGUSR1);
 
  	usleep(1);
  	while (state != 4)
  	    ;
 #ifndef NAMEDPIPE
  	return;
 #endif
  	fd = open("p", O_WRONLY | O_NONBLOCK);
  	if (fd < 0)
  		err(1, "open for write");
  	kill(cpid, SIGUSR1);
 
  	usleep(1);
  	while (state != 5)
  		;
  	if (write(fd, "", 1) != 1)
  		err(1, "write");
  	kill(cpid, SIGUSR1);
 
  	usleep(1);
  	while (state != 6)
  		;
  	if (close(fd) != 0)
  		err(1, "close for write");
  	kill(cpid, SIGUSR1);
 
  	usleep(1);
  	while (state != 7)
  		;
 }
 
 int
 main(void)
 {
  	int fd[2];
  	int i;
 
 #ifdef NAMEDPIPE
  	if (mkfifo("p", 0666) != 0 && errno != EEXIST)
  		err(1, "mkfifo");
 #endif
  	signal(SIGUSR1, catch);
  	ppid = getpid();
  	for (i = 0; i < 2; i++) {
 #ifndef NAMEDPIPE
  		if (pipe(fd) != 0)
  			err(1, "pipe");
 #else
  		fd[0] = -1;
  		fd[1] = -1;
 #endif
  		state = 0;
  		switch (cpid = fork()) {
  		case -1:
  			err(1, "fork");
  		case 0:
  			(void)close(fd[1]);
  			child(fd[0]);
  			break;
  		default:
  			(void)close(fd[0]);
  			parent(fd[1]);
  			break;
  		}
  	}
  	return (0);
 }
 %%%
 
 The error output from this is:
 
 %%%
 selectp: state 3: expected set; got clear
 selectp: state 6a: expected set; got clear
 selectp: state 3: expected set; got clear
 selectp: state 6a: expected set; got clear
 %%%
 
 These messages are all caused by the same bug.  "state 3" is EOF without
 any data ever having been readable.  "state 6" is EOF with data readable
 (the test for this passed so there is no output for it).  "state 6a" is
 EOF after having read the data available in state 6.  It is good that
 state 6a fails in the same way as state 3 -- at least the bug doesn't
 seem to involve races.  The duplicate messages are caused by iterating
 the test to see if the bug depends on previous activity on the pipe
 (but the program is probably too careful cleaning up for iteration to
 show problems).
 
 A similar test program (not enclosed) shows many more bugs for poll():
 
 - no output under Linux-2.6.10
 
 - FreeBSD-4.10 on nameless pipes:
    % poll: state 6a: expected POLLHUP; got 0x11
    % poll: state 6a: expected POLLHUP; got 0x11
      0x11 is POLLIN | POLLHUP.  Nameless pipes are one of the few file
      types for which POLLHUP is actually implemented (POLLHUP is also
      implemented (not quite right) for ttys but isn't implemented for
      any other important file type).  Linux returns only POLLHUP here.
      This is best, since it allows distinguishing the case of pure EOF
      from the case of EOF with data reasable.  However, buggy
      applications like gdb don't actually understand the difference
      between EOF-with-data and pure EOF (see the comment in the test
      program).  Also, select() depends on pipe_poll() returning POLLIN
      to work.
 
 - FreeBSD-oldcurrent on nameless pipes:
    % poll: state 6a: expected POLLHUP; got 0x11
    % poll: state 6a: expected POLLHUP; got 0x11
      No change.  The kernel code for select() and poll() hasn't been either
      fixed or broken for nameless pipes.
 
 - FreeBSD-4.10 on named pipes:
    % pollp: state 3: expected POLLHUP; got 0x1
    % pollp: state 6: expected POLLIN | POLLHUP; got 0x1
    % pollp: state 6a: expected POLLHUP; got 0x1
    % pollp: state 3: expected POLLHUP; got 0x1
    % pollp: state 6: expected POLLIN | POLLHUP; got 0x1
    % pollp: state 6a: expected POLLHUP; got 0x1
      0x1 is POLLIN.  FreeBSD-4.10 never returns POLLHUP for named pipes.
      This and/or returning POLLIN in too many cases causes the problem in
      PRs 34020, 53447 and 76144.
 
 - FreeBSD-oldcurrent on nameless pipes:
    % pollp: state 6: expected POLLIN | POLLHUP; got 0x1
    % pollp: state 6: expected POLLIN | POLLHUP; got 0x1
      FreeBSD-current still never returns POLLHUP for nameless pipes.
      However, my test program determines whether POLLHUP should have been
      returned in some cases and reduces the bugs to the above.  It uses
      FreeBSD(>4)'s (my) POLLINIGNEOF to do this.  POLLINIGNEOF was supposed
      to be usable for this to limit the damage caused by the fix for PR34020,
      since I knew that this fix would break EOF handling in some cases.
      However, POLLINIGNEOF doesn't really work.  To use it, the test
      program has to use nonblocking syscalls for everything including
      poll() (it gets nonblocking polls using a timeout of 0).  Even with
      this, EOF can't be detected in state 6 (EOF-with-data).  But the
      bug in state 6 is small (it even compensates for the bug in gdb).
 
      Here is the code to determine POLLHUP:
 
 %%%
 #ifdef POLLINIGNEOF
 /*
   * FreeBSD's POLLINIGNEOF (which causes half of the bugs when the kernel
   * uses it) can be used to fix up the broken cases 3 and 6a if the kernel
   * uses it, i.e., for named pipes but not for pipes.  Note that the sense
   * of POLLINIGNEOF is reversed when passed to the kernel -- it means
   * don't-ignore-EOF in .events and if it is set there then it means
   * not-POLLHUP in .revents.
   */
 int
 mypoll(struct pollfd *fds, nfds_t nfds, int timeout)
 {
  	struct pollfd mypfd;
  	int r;
 
  	r = poll(fds, nfds, timeout);
  	if (nfds != 1 || timeout != 0 || fds[0].revents & POLLIN)
  		return (r);
  	mypfd = fds[0];
  	mypfd.events |= POLLINIGNEOF;
  	r = poll(&mypfd, 1, 0);
  	if (r >= 0) {
  		if (mypfd.revents &= POLLIN) {
  			mypfd.revents &= ~POLLIN;
  			mypfd.revents |= POLLHUP;
  		}
  		fds[0].revents = mypfd.revents;
  	}
  	return (r);
 }
 #define	poll(fds, nfds, timeout)	mypoll((fds), (nfds), (timeout))
 #endif
 %%%
 
      With this userland fixup for the missing POLLHUP, the above shows that
      states 3 and 6a have been fixed for poll() on named pipes.  These are
      precisely the states that have been broken for select() on named
      pipes.  Toggling the seting of POLLIN for these states toggles the
      location of one of the bugs.
 
      Without this userland fixup, the output for FreeBSD-oldcurrent on
      named pipes is:
      % state 3: expected POLLHUP; got 0
      % state 6: expected POLLIN | POLLHUP; got 0x1
      % state 6a: expected POLLHUP; got 0
      % state 3: expected POLLHUP; got 0
      % state 6: expected POLLIN | POLLHUP; got 0x1
      % state 6a: expected POLLHUP; got 0
        The POLLHUP flag is now never set, so states 3 and 6a aren't actually
        fixed; in fact they are more broken than before, just like for select()
        -- now no poll flag is set for these cases, so poll() and select()
        don't even see normal hangups unless they are used with a timeout
        and/or with the negative-logic POLLINIGNEOF as in my test program.
        IIRC, PRs 53447 and 76144 are about this problem.
 
 Quick fix (?): #defining POLLINIGNEOF as 0 in <sys/poll.h> should give the
 FreeBSD-4.10 behaviour.
 
 Fix (?):
 - actually implement returning POLLHUP in sopoll() and other places.  Return
    POLLHUP but not POLLIN for the pure-EOF case.  Return POLLIN* | POLLHUP
    for EOF-with-data.
 - remove POLLINIGNEOF and associated complications in sopoll(), fifo_poll()
    and <sys/poll.h>
 - change selscan() to check for POLLHUP so that POLLIN, POLLIN | POLLHUP
    and POLLHUP all act the same for select()
 - remove POLLHUP from the comment in selscan().  Fix the rest of this
    comment or remove it (most backends are too broken to return poll flags
    if appropriate, and the comment only mentions one of the other poll flags
    that selscan() ignores)
 - remove the corresponding comment in pollscan() since it is wrong and says
    nothing relevant (pollscan() just accepts whatever flags the backends set).
 
 Bruce

From: Oliver Fromme <olli@lurza.secnetix.de>
To: bde@zeta.org.au (Bruce Evans)
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Wed, 22 Mar 2006 15:50:06 +0100 (CET)

 Thank you very much for your detailed explanations!
 
 Bruce Evans wrote:
  > On Tue, 21 Mar 2006, Oliver Fromme wrote:
  > > When a FIFO had been opened for reading and the writing
  > > process closes it, the reading process blocks in select(),
  > > even though the descriptor is ready for read().  If the
  > > select() call is omitted, read() returns 0 immediately
  > > indicating EOF.  But as soon as you use select(), it blocks
  > > and there is no chance to detect the EOF condition.
  > 
  > See also:
  > 
  > PR 76125 (about the same bug)
  > PR 76144 (about a related bug in poll())
  > PR 53447 (about a another or the same related bug in poll())
  > PR 34020 (about the inverse of the bug (return on non-EOF) for select()
  >            and poll())
 
 Thank you for pointing those out.
 
 Before sending my PR, I spent about 15 minutes using
 the PR search facility on www.freebsd.org, with various
 combinations of the words "FIFO", "pipe", "select" and
 "poll", but I didn't get any useful results.  It seems
 that the search CGI is broken.
 
  > Here is a program that tests more cases.  I made it give no output
  > (for no errors) under Linux-2.6.10.  It also gives no output for
  > the nameless pipe case under FreeBSD-4.10 and FreeBSD-oldcurrent
  > and for the named piped case under FreeBSD-4.10, but it fails with
  > (only) the error in this PR under FreeBSD-oldcurrent.  Please test
  > it on Solaris etc.  Compile it with -DNAMEDPIPE for the named pipe
  > case.
 
 It does not produce any output on Solaris 9, NetBSD 3.0,
 DEC UNIX 4.0D and Linux 2.4.32.  (I had to replace signal()
 with sigset() on Solaris, add a few missing #includes and
 write small replacements for err() and warnx().)
 
 (By the way, DEC UNIX 4.0D _does_ have a bug:  If the FIFO
 has O_NONBLOCK set and no writer has opened the FIFO, then
 select() doesn't block.  That's a violation of SUSv3/POSIX.
 However, that's not related to the bug described in this
 PR, and it doesn't seem to be checked by the test program.)
 
  > Quick fix (?): #defining POLLINIGNEOF as 0 in <sys/poll.h> should give the
  > FreeBSD-4.10 behaviour.
 
 In fact, I noticed the comment in fifo_vnops.c that mentions
 POLLINIGNEOF, but I wasn't sure if it's related to the bug.
 
 There's a small problem with that workaround:  When I'm
 distributing software (which is supposed to be portable
 across various UNIX and UNIX-like systems), it's somewhat
 ugly to tell the users that they have to modify a system
 header file before my software will work on FreeBSD.
 
 However, I get your point that a real fix is non-trivial.
 
  > Fix (?):
  > - actually implement returning POLLHUP in sopoll() and other places.  Return
  >    POLLHUP but not POLLIN for the pure-EOF case.  Return POLLIN* | POLLHUP
  >    for EOF-with-data.
  > - remove POLLINIGNEOF and associated complications in sopoll(), fifo_poll()
  >    and <sys/poll.h>
  > - change selscan() to check for POLLHUP so that POLLIN, POLLIN | POLLHUP
  >    and POLLHUP all act the same for select()
  > - remove POLLHUP from the comment in selscan().  Fix the rest of this
  >    comment or remove it (most backends are too broken to return poll flags
  >    if appropriate, and the comment only mentions one of the other poll flags
  >    that selscan() ignores)
  > - remove the corresponding comment in pollscan() since it is wrong and says
  >    nothing relevant (pollscan() just accepts whatever flags the backends set).
 
 I would be happy to help out, but I'm not familiar with
 that part of the kernel code.  I wouldn't even know how
 to start.  Also, I don't have a spare box running Current
 (I assume that such patches would have to go into Current
 first).  Or is the difference of that code between 6-Stable
 and Current very small?
 
 Best regards
    Oliver
 
 -- 
 Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
 Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
 Any opinions expressed in this message may be personal to the author
 and may not necessarily reflect the opinions of secnetix in any way.
 
 "anyone new to programming should be kept as far from C++ as
 possible;  actually showing the stuff should be considered a
 criminal offence" -- Jacek Generowicz

From: Oliver Fromme <olli@lurza.secnetix.de>
To: bde@zeta.org.au (Bruce Evans)
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Wed, 22 Mar 2006 22:12:32 +0100 (CET)

 Hi,
 
 Sorry for replying to myself, but there are a few new
 things ...
 
 Oliver Fromme wrote:
  > Bruce Evans wrote:
  > > On Tue, 21 Mar 2006, Oliver Fromme wrote:
  > > > When a FIFO had been opened for reading and the writing
  > > > process closes it, the reading process blocks in select(),
  > > > even though the descriptor is ready for read().  If the
  > > > select() call is omitted, read() returns 0 immediately
  > > > indicating EOF.  But as soon as you use select(), it blocks
  > > > and there is no chance to detect the EOF condition.
  > > 
  > > See also:
  > > 
  > > PR 76125 (about the same bug)
  > > PR 76144 (about a related bug in poll())
  > > PR 53447 (about a another or the same related bug in poll())
  > > PR 34020 (about the inverse of the bug (return on non-EOF) for select()
  > >            and poll())
 
 The first one (76125) seems completely unrelated.  Typo?
 
  > (By the way, DEC UNIX 4.0D _does_ have a bug:  If the FIFO
  > has O_NONBLOCK set and no writer has opened the FIFO, then
  > select() doesn't block.
 
 Actually, it's not a bug.  I've read SUSv3 wrong.  That
 behaviour is perfectly fine.  In fact, SUSv3 (a.k.a.
 POSIX-2001) requires that select() doesn't block in that
 case, and the behaviour of select() and poll() must be
 independet of whether O_NONBLOCK is set or not.
 
  > > Fix (?):
  > > - actually implement returning POLLHUP in sopoll() and other places.  Return
  > >    POLLHUP but not POLLIN for the pure-EOF case.  Return POLLIN* | POLLHUP
  > >    for EOF-with-data.
  > > - remove POLLINIGNEOF and associated complications in sopoll(), fifo_poll()
  > >    and <sys/poll.h>
  > > - change selscan() to check for POLLHUP so that POLLIN, POLLIN | POLLHUP
  > >    and POLLHUP all act the same for select()
  > > - remove POLLHUP from the comment in selscan().  Fix the rest of this
  > >    comment or remove it (most backends are too broken to return poll flags
  > >    if appropriate, and the comment only mentions one of the other poll flags
  > >    that selscan() ignores)
  > > - remove the corresponding comment in pollscan() since it is wrong and says
  > >    nothing relevant (pollscan() just accepts whatever flags the backends set).
  > 
  > I would be happy to help out, but I'm not familiar with
  > that part of the kernel code.
 
 Well, now (a few hours later) I'm a little bit more
 familiar with it.  Patch attached below, and also
 available from this URL:
 http://www.secnetix.de/~olli/tmp/fifodiff.txt
 
 With that patch, my own test program (the one from
 the top of this PR) runs successfully, i.e. EOF is
 recognized correctly in all cases that I have tested
 (with and without select()), and it also behaves
 standard-compliant when O_NONBLOCK is used (see
 above).
 
 Also, with that patch applied, your test program
 runs successfully (without producing any output).
 I've tested everything under RELENG_6 (cvsupped
 today).  I've also had a look at the Current sources,
 and the relevant parts don't seem to be any different,
 so the diff should be applicable to HEAD as well.
 
 The patch addresses the following things:
  - implement POLLHUP in sopoll().
  - remove POLLINIGNEOF.
  - selscan() doesn't need to be patched: it already
    works as expected when fo_poll() returns POLLHUP.
  - I don't think the comment is entirely incorrect,
    but I'm not sure, so I left it alone.
 
 Please give this patch a try and let me know if there
 are any problems (functional, style, whatever).  As
 far as I can tell, the patch fixes the existing bugs
 with FIFOs + select()/poll(), so I would be happy to
 see it committed to Current and RELENG_6.  (Maybe
 even in time for the release of 6.1?)
 
 Best regards
    Oliver
 
 
 --- src/sys/fs/fifofs/fifo_vnops.c.orig	Tue Mar 21 09:42:32 2006
 +++ src/sys/fs/fifofs/fifo_vnops.c	Wed Mar 22 18:49:27 2006
 @@ -661,31 +661,11 @@
  	int levents, revents = 0;
  
  	fip = fp->f_data;
 -	levents = events &
 -	    (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLRDBAND);
 +	levents = events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND);
  	if ((fp->f_flag & FREAD) && levents) {
 -		/*
 -		 * If POLLIN or POLLRDNORM is requested and POLLINIGNEOF is
 -		 * not, then convert the first two to the last one.  This
 -		 * tells the socket poll function to ignore EOF so that we
 -		 * block if there is no writer (and no data).  Callers can
 -		 * set POLLINIGNEOF to get non-blocking behavior.
 -		 */
 -		if (levents & (POLLIN | POLLRDNORM) &&
 -		    !(levents & POLLINIGNEOF)) {
 -			levents &= ~(POLLIN | POLLRDNORM);
 -			levents |= POLLINIGNEOF;
 -		}
 -
  		filetmp.f_data = fip->fi_readsock;
  		filetmp.f_cred = cred;
  		revents |= soo_poll(&filetmp, levents, cred, td);
 -
 -		/* Reverse the above conversion. */
 -		if ((revents & POLLINIGNEOF) && !(events & POLLINIGNEOF)) {
 -			revents |= (events & (POLLIN | POLLRDNORM));
 -			revents &= ~POLLINIGNEOF;
 -		}
  	}
  	levents = events & (POLLOUT | POLLWRNORM | POLLWRBAND);
  	if ((fp->f_flag & FWRITE) && levents) {
 --- src/sys/kern/uipc_socket.c.orig	Wed Dec 28 19:05:13 2005
 +++ src/sys/kern/uipc_socket.c	Wed Mar 22 18:44:08 2006
 @@ -2036,23 +2036,26 @@
  		if (soreadable(so))
  			revents |= events & (POLLIN | POLLRDNORM);
  
 -	if (events & POLLINIGNEOF)
 -		if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat ||
 -		    !TAILQ_EMPTY(&so->so_comp) || so->so_error)
 -			revents |= POLLINIGNEOF;
 -
 -	if (events & (POLLOUT | POLLWRNORM))
 -		if (sowriteable(so))
 -			revents |= events & (POLLOUT | POLLWRNORM);
 +	if (events & (POLLOUT | POLLWRNORM) && sowriteable(so))
 +		revents |= events & (POLLOUT | POLLWRNORM);
 +	else {
 +		/*
 +		 * POLLOUT and POLLHUP shall not both be set.
 +		 * Therefore check only for POLLHUP if POLLOUT
 +		 * has not been set.  (Note that POLLHUP need
 +		 * not be in events; it's always checked.)
 +		 */
 +		if (so->so_rcv.sb_state & SBS_CANTRCVMORE &&
 +		    so->so_rcv.sb_cc == 0)
 +			revents |= POLLHUP;
 +	}
  
  	if (events & (POLLPRI | POLLRDBAND))
  		if (so->so_oobmark || (so->so_rcv.sb_state & SBS_RCVATMARK))
  			revents |= events & (POLLPRI | POLLRDBAND);
  
  	if (revents == 0) {
 -		if (events &
 -		    (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM |
 -		     POLLRDBAND)) {
 +		if (events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND)) {
  			selrecord(td, &so->so_rcv.sb_sel);
  			so->so_rcv.sb_flags |= SB_SEL;
  		}
 --- src/sys/sys/poll.h.orig	Wed Jul 10 06:47:25 2002
 +++ src/sys/sys/poll.h	Wed Mar 22 18:41:03 2006
 @@ -66,11 +66,6 @@
  #define	POLLRDBAND	0x0080		/* OOB/Urgent readable data */
  #define	POLLWRBAND	0x0100		/* OOB/Urgent data can be written */
  
 -#if __BSD_VISIBLE
 -/* General FreeBSD extension (currently only supported for sockets): */
 -#define	POLLINIGNEOF	0x2000		/* like POLLIN, except ignore EOF */
 -#endif
 -
  /*
   * These events are set if they occur regardless of whether they were
   * requested.
 
 
 
 -- 
 Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
 Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
 Any opinions expressed in this message may be personal to the author
 and may not necessarily reflect the opinions of secnetix in any way.
 
 "I learned Java 3 years before Python.  It was my language of
 choice.  It took me two weekends with Python before I was more
 productive with it than with Java." -- Anthony Roberts

From: Bruce Evans <bde@zeta.org.au>
To: Oliver Fromme <olli@lurza.secnetix.de>
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Thu, 23 Mar 2006 11:47:36 +1100 (EST)

 On Wed, 22 Mar 2006, Oliver Fromme wrote:
 
 > Bruce Evans wrote:
 > > Here is a program that tests more cases.  I made it give no output
 > > ...
 >
 > It does not produce any output on Solaris 9, NetBSD 3.0,
 > DEC UNIX 4.0D and Linux 2.4.32.  (I had to replace signal()
 > with sigset() on Solaris, add a few missing #includes and
 > write small replacements for err() and warnx().)
 
 I thought that the signal() was portable.  Under FreeBSD, <stdlib.h>
 of all things is the only missing include.  I stopped trying to avoid
 using the err() family in test programs when Linux got them 6-8 years
 ago.
 
 > (By the way, DEC UNIX 4.0D _does_ have a bug:  If the FIFO
 > has O_NONBLOCK set and no writer has opened the FIFO, then
 > select() doesn't block.  That's a violation of SUSv3/POSIX.
 > However, that's not related to the bug described in this
 > PR, and it doesn't seem to be checked by the test program.)
 
 Will reply to later mail about this.
 
 > I would be happy to help out, but I'm not familiar with
 > that part of the kernel code.  I wouldn't even know how
 > to start.  Also, I don't have a spare box running Current
 > (I assume that such patches would have to go into Current
 > first).  Or is the difference of that code between 6-Stable
 > and Current very small?
 
 I've been trying for about a month to get someone else to fix this for
 -current since I haven't run most of it for a long time.  The diffs
 relative to -current would be small but still need testing.
 
 Bruce

From: Bruce Evans <bde@zeta.org.au>
To: Oliver Fromme <olli@lurza.secnetix.de>
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Thu, 23 Mar 2006 13:13:15 +1100 (EST)

 On Wed, 22 Mar 2006, Oliver Fromme wrote:
 > Oliver Fromme wrote:
 > > Bruce Evans wrote:
 
 > > > See also:
 > > >
 > > > PR 76125 (about the same bug)
 
 > The first one (76125) seems completely unrelated.  Typo?
 
 Yes, it's actually 76525.
 
 > > (By the way, DEC UNIX 4.0D _does_ have a bug:  If the FIFO
 > > has O_NONBLOCK set and no writer has opened the FIFO, then
 > > select() doesn't block.
 >
 > Actually, it's not a bug.  I've read SUSv3 wrong.  That
 > behaviour is perfectly fine.  In fact, SUSv3 (a.k.a.
 > POSIX-2001) requires that select() doesn't block in that
 > case, and the behaviour of select() and poll() must be
 > independet of whether O_NONBLOCK is set or not.
 
 I have tried to find POSIX saying that many times since I think
 it is the correct behaviour, but I couldn't find it for either
 select() or poll() before today.  Now I can find it for [p]select()
 but not for poll().  From POSIX.1-2001-draft7.txt for pselect():
 
 %%%
 31193              A descriptor shall be considered ready for reading when a call to an input function with
 31194              O_NONBLOCK clear would not block, whether or not the function would transfer data
 31195              successfully. (The function might return data, an end-of-file indication, or an error other than
 31196              one indicating that it is blocked, and in each of these cases the descriptor shall be considered
 31197              ready for reading.)
 %%%
 
 Other parts of POSIX make it clear that O_NONBLOCK reads must never block,
 so if O_NONBLOCK is set then pselect() for read must never block either.
 This requires the behaviour of pselect() dependent on O_NONBLOCK but not on
 the current or previous presence of a writer.
 
 I still can't find similar words for poll().  The spec for poll() seems to
 be fuzzier in general, and the closest I could find is:
 
 %%%
 27931               POLLIN             Data other than high-priority data may be read without blocking.
 %%%
 
 "Data" doesn't seem to be defined anywhere.  Is null data (at EOF) data?...
 poll() presumably does depend on the previous presence of a writer, since
 POLLHUP only makes sense if there was a previous presence.  But POLLHUP
 seems to be specified even more fuzzily than POLLIN.  Clearly previous
 presences of writers shouldn't count if the previous set of readers, writers
 and data all went away, but this doesn't seem to be specified in detail, and
 what happens with multiple readers and/or writers living across sessions
 either intentionally or due to races or bugs?
 
 I intened to check the behaviour for this in my test programs but don't
 seem to have done it.  I intended to follow Linux's behaviour even if this
 is nonstandard.  Linux used to have some special cases including a gripe
 in a comment about having to have them to match Sun's behaviour, but I
 couldn't find these when I last checked.  Perhaps the difference is
 precisely between select() and poll(), to follow the standard for select()
 and exploit the fuzziness for poll().
 
 > > I would be happy to help out, but I'm not familiar with
 > > that part of the kernel code.
 >
 > Well, now (a few hours later) I'm a little bit more
 > familiar with it.  Patch attached below, and also
 > available from this URL:
 > http://www.secnetix.de/~olli/tmp/fifodiff.txt
 >
 > With that patch, my own test program (the one from
 > the top of this PR) runs successfully, i.e. EOF is
 > recognized correctly in all cases that I have tested
 > (with and without select()), and it also behaves
 > standard-compliant when O_NONBLOCK is used (see
 > above).
 
 I'll add tests for the O_NONBLOCK behaviour before mailing the
 test for poll().
 
 > The patch addresses the following things:
 > - implement POLLHUP in sopoll().
 > - remove POLLINIGNEOF.
 > - selscan() doesn't need to be patched: it already
 >   works as expected when fo_poll() returns POLLHUP.
 > - I don't think the comment is entirely incorrect,
 >   but I'm not sure, so I left it alone.
 
 Ah, selscan() just uses the result of fo_poll() as a boolean to decide
 whether a descriptor is ready (I thought that it checked only for the
 bits that it asked for).  fo_poll() returns revents.  Thus selscan()
 returns for when one of the output-only parameter bits like POLLHUP
 is set even if none of the input-output parameter bits are set.  I think
 the comment should say this more directly.
 
 > --- src/sys/fs/fifofs/fifo_vnops.c.orig	Tue Mar 21 09:42:32 2006
 > +++ src/sys/fs/fifofs/fifo_vnops.c	Wed Mar 22 18:49:27 2006
 > @@ -661,31 +661,11 @@
 > ...
 
 Good.
 > --- src/sys/kern/uipc_socket.c.orig	Wed Dec 28 19:05:13 2005
 > +++ src/sys/kern/uipc_socket.c	Wed Mar 22 18:44:08 2006
 > @@ -2036,23 +2036,26 @@
 > 		if (soreadable(so))
 > 			revents |= events & (POLLIN | POLLRDNORM);
 >
 > -	if (events & POLLINIGNEOF)
 > -		if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat ||
 > -		    !TAILQ_EMPTY(&so->so_comp) || so->so_error)
 > -			revents |= POLLINIGNEOF;
 > -
 
 Good.
 
 > -	if (events & (POLLOUT | POLLWRNORM))
 > -		if (sowriteable(so))
 > -			revents |= events & (POLLOUT | POLLWRNORM);
 > +	if (events & (POLLOUT | POLLWRNORM) && sowriteable(so))
 > +		revents |= events & (POLLOUT | POLLWRNORM);
 > +	else {
 > +		/*
 > +		 * POLLOUT and POLLHUP shall not both be set.
 > +		 * Therefore check only for POLLHUP if POLLOUT
 > +		 * has not been set.  (Note that POLLHUP need
 > +		 * not be in events; it's always checked.)
 > +		 */
 > +		if (so->so_rcv.sb_state & SBS_CANTRCVMORE &&
 > +		    so->so_rcv.sb_cc == 0)
 > +			revents |= POLLHUP;
 > +	}
 
 I think SBS_CANTSENDMORE in so_snd should be checked here.  This flag has
 already been checked to be clear for in sowritable() in most cases.  I
 think the receiver count shouldn't be checked here.  I'm surprised that
 my test succeeds with this -- doesn't it prevent POLLHUP being set in the
 hangup+<old data to read> case?  Old versions of fifo_vnops.c had bugs
 from confusing these flags and/or the sender/receiver.  I hope these
 are all fixed now.
 
 This might be clearer with SBS_CANTSENDMORE checked first.
 SBS_CANTSENDMORE set implies !sowriteable() so the behaviour is the same,
 and I think it is clearer to not even look at the output bits in
 `events' in the hangup case.  I think that none of the other checks in
 sowriteable() is related to hangup, but I don't understand the
 PR_CONNREQUIRED one.
 
 >...
 
 The rest looks good.
 
 This also fixes poll() on sockets.  Sockets are more often used than named
 pipes so the change needs a few weeks of testing before MFC.  Applications
 might be confused by poll() actually setting POLLHUP.  It sets only POLLIN
 for hangup now (this is because SBS_CANTRCVMORE implies soreadable()).
 
 Bruce

From: Bruce Evans <bde@zeta.org.au>
To: Oliver Fromme <olli@lurza.secnetix.de>
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Thu, 23 Mar 2006 16:02:54 +1100 (EST)

 On Thu, 23 Mar 2006, Bruce Evans wrote:
 
 > On Wed, 22 Mar 2006, Oliver Fromme wrote:
 >> Oliver Fromme wrote:
 >> > Bruce Evans wrote:
 
 > I intened to check the behaviour for this in my test programs but don't
 > seem to have done it.  I intended to follow Linux's behaviour even if this
 > is nonstandard.  Linux used to have some special cases including a gripe
 > in a comment about having to have them to match Sun's behaviour, but I
 > couldn't find these when I last checked.  Perhaps the difference is
 > precisely between select() and poll(), to follow the standard for select()
 > and exploit the fuzziness for poll().
 
 I added the check.  Linux-2.6.10 in fact acts as guessed above.  So the
 check for select() is for the behaviour specified by POSIX (select() on
 a read descriptor that is in nonblocking mode and is for a fifo that has
 never had a writer returns success), while the check for poll() is
 for exactly the opposite behaviour (poll() blocks instead of returning
 with POLLIN set; the test actually uses a nonblocking poll() and only
 sees checks for POLLIN not set, since a test that poll() blocks would
 be messier and I think I understand at least the FreeBSD implementation
 well enough to know that this test is equivalent).
 
 > I'll add tests for the O_NONBLOCK behaviour before mailing the
 > test for poll().
 
 First a small change to add it to the select() test:
 
 %%%
 --- select.c~	Sun Feb 12 23:42:30 2006
 +++ select.c	Thu Mar 23 13:47:23 2006
 @@ -30,7 +30,19 @@
   		err(1, "open for read");
   #endif
 -	kill(ppid, SIGUSR1);
 +	if (fd >= FD_SETSIZE)
 +		errx(1, "fd = %d too large for select()", fd);
 +
 +#ifdef NAMEDPIPE
 +	FD_ZERO(&rfds);
 +	FD_SET(fd, &rfds);
 +	tv.tv_sec = 0;
 +	tv.tv_usec = 0;
 +	if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
 +		err(1, "select");
 +	if (!FD_ISSET(fd, &rfds))
 +		warnx("state 0: expected set; got clear");
 +#endif
 
 -	/* XXX should check that fd fits in rfds. */
 +	kill(ppid, SIGUSR1);
 
   	usleep(1);
 %%%
 
 poll() test:
 
 %%%
 #include <sys/poll.h>
 #include <sys/stat.h>
 
 #include <err.h>
 #include <errno.h>
 #include <fcntl.h>
 #include <signal.h>
 #include <unistd.h>
 
 static pid_t cpid;
 static pid_t ppid;
 static volatile sig_atomic_t state;
 
 static void
 catch(int sig)
 {
  	state++;
 }
 
 #ifdef USE_POLLINIGNEOF
 /*
   * FreeBSD's POLLINIGNEOF (which causes half of the bugs when the kernel
   * uses it) can be used to fix up the broken cases 3 and 6a if the kernel
   * uses it, i.e., for named pipes but not for pipes.  Note that the sense
   * of POLLINIGNEOF is reversed when passed to the kernel -- it means
   * don't-ignore-EOF in .events and if it is set there then it means
   * not-POLLHUP in .revents.
   *
   * This leaves the following broken cases:
   * state 6 (hangup but data available) for poll on a named pipe:
   *         should have POLLIN | POLLHUP, but have POLLIN only.  In this
   *         case, we don't try POLLINIGNEOF since resulting pair of revents
   *         cannot be distinguished from the pair for a case in which POLLIN
   *         only is correct.
   * state 6a (hangup and no data available) for poll on a plain pipe:
   *         should have POLLHUP only, but have POLLIN | POLLHUP.  This is
   *         what I thought is correct, but it is not what Linux-2.6.10 does
   *         for named pipes.  FreeBSD's select() currently depends on POLLIN
   *         being set in this case, and Linux's select() acts the same as
   *         FreeBSD's select() in this case.
   * states 3 and 6a (hangup and no data available) for select on a named pipe:
   *         should have FD_SET() set as in old-FreeBSD and Linux-2.6.10, but
   *         have FD_SET() clear.  The POLLINIGNEOF changes just broke select()
   *         here.  So what was the PR (34020?) which inspired these changes
   *         about?  poll() only?  This regression test uses nonblocking mode
   *         for all polls and a timeout of 0 for all selects so that the
   *         kernel state can be seen without blocking for long.  I hope that
   *         the select() blocks iff the resulting .revents indicates that it
   *         should block (it shouldn't block if it would set POLLIN).
   */
 int
 mypoll(struct pollfd *fds, nfds_t nfds, int timeout)
 {
  	struct pollfd mypfd;
  	int r;
 
  	r = poll(fds, nfds, timeout);
  	if (nfds != 1 || timeout != 0 || fds[0].revents & POLLIN)
  		return (r);
  	mypfd = fds[0];
  	mypfd.events |= POLLINIGNEOF;
  	r = poll(&mypfd, 1, 0);
  	if (r >= 0) {
  		if (mypfd.revents &= POLLIN) {
  			mypfd.revents &= ~POLLIN;
  			mypfd.revents |= POLLHUP;
  		}
  		fds[0].revents = mypfd.revents;
  	}
  	return (r);
 }
 #define	poll(fds, nfds, timeout)	mypoll((fds), (nfds), (timeout))
 #endif
 
 static void
 child(int fd)
 {
  	struct pollfd pfd;
  	char buf[256];
 
 #ifdef NAMEDPIPE
  	pfd.fd = open("p", O_RDONLY | O_NONBLOCK);
  	if (pfd.fd < 0)
  		err(1, "open for read");
 #else
  	pfd.fd = fd;
 #endif
  	pfd.events = POLLIN;
 
 #ifdef NAMEDPIPE
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != 0)
  		warnx("state 0: expected 0; got %#x", pfd.revents);
 #endif
 
  	kill(ppid, SIGUSR1);
 
  	usleep(1);
  	while (state != 1)
  		;
 #ifndef NAMEDPIPE
  	/*
  	 * The connection cannot be restablished.  Use the code that delays
  	 * the read until after the writer disconnects since that case is
  	 * more interesting.
  	 */
  	state = 4;
  	goto state4;
 #endif
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != 0)
  		warnx("state 1: expected 0; got %#x", pfd.revents);
  	kill(ppid, SIGUSR1);
 
  	usleep(1);
  	while (state != 2)
  		;
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != POLLIN)
  		warnx("state 2: expected POLLIN; got %#x", pfd.revents);
  	if (read(pfd.fd, buf, sizeof buf) != 1)
  		err(1, "read");
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != 0)
  		warnx("state 2a: expected 0; got %#x", pfd.revents);
  	kill(ppid, SIGUSR1);
 
  	usleep(1);
  	while (state != 3)
  		;
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != POLLHUP)
  		warnx("state 3: expected POLLHUP; got %#x",
  		    pfd.revents);
  	kill(ppid, SIGUSR1);
 
  	/*
  	 * Now we expect a new writer, and a new connection too since
  	 * we read all the data.  The only new point is that we didn't
  	 * start quite from scratch since the read fd is not new.  Check
  	 * startup state as above, but don't do the read as above.
  	 */
  	usleep(1);
  	while (state != 4)
  		;
 state4:
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != 0)
  		warnx("state 4: expected 0; got %#x", pfd.revents);
  	kill(ppid, SIGUSR1);
 
  	usleep(1);
  	while (state != 5)
  		;
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != POLLIN)
  		warnx("state 5: expected POLLIN; got %#x", pfd.revents);
  	kill(ppid, SIGUSR1);
 
  	usleep(1);
  	while (state != 6)
  		;
  	/*
  	 * Now we have no writer, but should still have data from the old
  	 * writer. Check that we have both a data condition and a hangup
  	 * condition, and that the data can read the data in the usual way.
  	 * Since Linux does this, programs must not quite reading when they
  	 * see POLLHUP; they must see POLLHUP without POLLIN (or another
  	 * input condition) before they decide that there is EOF.  gdb-6.1.1
  	 * is an example of a broken program that quits on POLLHUP only --
  	 * see its event-loop.c.
  	 */
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != (POLLIN | POLLHUP))
  		warnx("state 6: expected POLLIN | POLLHUP; got %#x",
  		    pfd.revents);
  	if (read(pfd.fd, buf, sizeof buf) != 1)
  		err(1, "read");
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != POLLHUP)
  		warnx("state 6a: expected POLLHUP; got %#x",
  		    pfd.revents);
  	close(pfd.fd);
  	kill(ppid, SIGUSR1);
  	exit(0);
 }
 
 static void
 parent(int fd)
 {
  	usleep(1);
  	while (state != 1)
  		;
 #ifdef NAMEDPIPE
  	fd = open("p", O_WRONLY | O_NONBLOCK);
  	if (fd < 0)
  		err(1, "open for write");
 #endif
  	kill(cpid, SIGUSR1);
 
  	usleep(1);
  	while (state != 2)
  		;
  	if (write(fd, "", 1) != 1)
  		err(1, "write");
  	kill(cpid, SIGUSR1);
 
  	usleep(1);
  	while (state != 3)
  		;
  	if (close(fd) != 0)
  		err(1, "close for write");
  	kill(cpid, SIGUSR1);
 
  	usleep(1);
  	while (state != 4)
  	    ;
 #ifndef NAMEDPIPE
  	return;
 #endif
  	fd = open("p", O_WRONLY | O_NONBLOCK);
  	if (fd < 0)
  		err(1, "open for write");
  	kill(cpid, SIGUSR1);
 
  	usleep(1);
  	while (state != 5)
  		;
  	if (write(fd, "", 1) != 1)
  		err(1, "write");
  	kill(cpid, SIGUSR1);
 
  	usleep(1);
  	while (state != 6)
  		;
  	if (close(fd) != 0)
  		err(1, "close for write");
  	kill(cpid, SIGUSR1);
 
  	usleep(1);
  	while (state != 7)
  		;
 }
 
 int
 main(void)
 {
  	int fd[2];
  	int i;
 
 #ifdef NAMEDPIPE
  	if (mkfifo("p", 0666) != 0 && errno != EEXIST)
  		err(1, "mkfifo");
 #endif
  	signal(SIGUSR1, catch);
  	ppid = getpid();
  	for (i = 0; i < 2; i++) {
 #ifndef NAMEDPIPE
  		if (pipe(fd) != 0)
  			err(1, "pipe");
 #else
  		fd[0] = -1;
  		fd[1] = -1;
 #endif
  		state = 0;
  		switch (cpid = fork()) {
  		case -1:
  			err(1, "fork");
  		case 0:
  			(void)close(fd[1]);
  			child(fd[0]);
  			break;
  		default:
  			(void)close(fd[0]);
  			parent(fd[1]);
  			break;
  		}
  	}
  	return (0);
 }
 %%%
 
 The error output of these is null under Linux-2.6.10, but under
 FreeBSD-5.oldcurrent it is:
 
 poll() on a nameless pipe:
 % poll: state 6a: expected POLLHUP; got 0x11
 % poll: state 6a: expected POLLHUP; got 0x11
 
 No change for this.  For poll(), Linux consistently doesn't set POLLIN when
 there is only null data, so we check for this.
 
 poll() on a named pipe:
 % pollp: state 3: expected POLLHUP; got 0
 % pollp: state 6: expected POLLIN | POLLHUP; got 0x1
 % pollp: state 6a: expected POLLHUP; got 0
 % pollp: state 3: expected POLLHUP; got 0
 % pollp: state 6: expected POLLIN | POLLHUP; got 0x1
 % pollp: state 6a: expected POLLHUP; got 0
 
 No change for this, except I didn't compile with POLLINIGNEOF used so
 the 3 and 6a state don't get fixed up.
 
 select() on a nameless pipe:
 <no output>
 
 No change for this.  Here it doesn't matter if hangup is indicated by
 POLLHUP or POLLIN | POLLHUP -- selscan() converts both to data-ready
 although it's null data.
 
 select() on a named pipe:
 % selectp: state 0: expected set; got clear
 % selectp: state 3: expected set; got clear
 % selectp: state 6a: expected set; got clear
 % selectp: state 0: expected set; got clear
 % selectp: state 3: expected set; got clear
 % selectp: state 6a: expected set; got clear
 
 Now there is an extra failure for state 0.  Some complications will be
 required to fix this without breaking poll() on named pipe.  State 0 is
 when the read descriptor is open with O_NONBLOCK and there has "never"
 been a writer.  In this state, select() on the read descriptor must
 succeed to conform to POSIX, but poll() on the read descriptor must
 block to conform to Linux.  I think the Linux behaviour is what happens
 naturally -- the socket isn't hung up so sopoll() won't set POLLHUP,
 and there is no input so sopoll() won't set POLLIN, so sopoll() won't
 set any flags in revents and poll() will block.  An extra flag seems to
 be necessary to distinguish this state so that select() doesn't block.
 POLLINIGNEOF was supposed to be this flag.
 
 Bruce

From: Oliver Fromme <olli@lurza.secnetix.de>
To: bde@zeta.org.au (Bruce Evans)
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Thu, 23 Mar 2006 11:08:50 +0100 (CET)

 Hi,
 
 I'm answering on several emails at once,
 to make things simpler.
 
 Bruce Evans wrote:
  > Oliver Fromme wrote:
  > > Bruce Evans wrote:
  > > > Here is a program that tests more cases.  I made it give no output
  > > > ...
  > >
  > > It does not produce any output on Solaris 9, NetBSD 3.0,
  > > DEC UNIX 4.0D and Linux 2.4.32.  (I had to replace signal()
  > > with sigset() on Solaris, add a few missing #includes and
  > > write small replacements for err() and warnx().)
  > 
  > I thought that the signal() was portable.
 
 Unfortunately, it's not.  SysV (e.g. Solaris) has different
 semantics:  When the signal handler is executed, the signal's
 disposition is set to SIG_DFL.  That means that the handler
 is only executed once, unless you call signal() again.  The
 solution is to use sigset() which behaves more like BSD's
 signal().  On the other hand, FreeBSD doesn't know sigset()
 at all.
 
 On Linux, the situation is even more complex:  When using
 libc4 or libc5, signal() has SysV semantics, and when using
 glibc2, it has BSD semantics.  However, when using glibc2
 with -D_XOPEN_SOURCE=500, it's again SysV, and in this
 latter case sigset() is defined in the header file (not
 in the other cases).
 
 Bottom line:  For portable programs, neither signal() nor
 sigset() should be used.  Instead, sigaction() should be
 used, which behaves the same on BSD and SysV, and should
 be supported everywhere.
 
  > Under FreeBSD, <stdlib.h>
  > of all things is the only missing include.
 
 FreeBSD generally seems to require less includes than the
 standard says.  I had to add <sys/types.h>, <stdlib.h>,
 <string.h> and <stdio.h> (although the latter two probably
 only because of my err() and warnx() replacements).
 
  > I stopped trying to avoid
  > using the err() family in test programs when Linux got them 6-8 years
  > ago.
 
 Yes, but Solaris and DEC UNIX (and probably other commercial
 UNIX systems) don't have them.  Fortunately, it was easy
 to write replacements in this case, because they were only
 called with single constant strings.
 
  > > > (By the way, DEC UNIX 4.0D _does_ have a bug:  If the FIFO
  > > > has O_NONBLOCK set and no writer has opened the FIFO, then
  > > > select() doesn't block.
  > >
  > > Actually, it's not a bug.  I've read SUSv3 wrong.  That
  > > behaviour is perfectly fine.  In fact, SUSv3 (a.k.a.
  > > POSIX-2001) requires that select() doesn't block in that
  > > case, and the behaviour of select() and poll() must be
  > > independet of whether O_NONBLOCK is set or not.
  > 
  > I have tried to find POSIX saying that many times since I think
  > it is the correct behaviour, but I couldn't find it for either
  > select() or poll() before today.  Now I can find it for [p]select()
  > but not for poll().  From POSIX.1-2001-draft7.txt for pselect():
  > 
  > %%%
  > 31193              A descriptor shall be considered ready for reading when a call to an input function with
  > 31194              O_NONBLOCK clear would not block, whether or not the function would transfer data
  > 31195              successfully. (The function might return data, an end-of-file indication, or an error other than
  > 31196              one indicating that it is blocked, and in each of these cases the descriptor shall be considered
  > 31197              ready for reading.)
  > %%%
 
 I've got SUSv3 a.k.a. IEEE Std 1003.1-2001 ("POSIX").  You
 can download it from The Open Group's website (you have to
 register with them, but it's free).  However, I don't know
 how much it differs from the draft that you have.
 
 The above paragraph from the select() spec seems to be the
 same.
 
  > Other parts of POSIX make it clear that O_NONBLOCK reads must never block,
 
 That's right, but it does not matter for select()/poll().
 
  > so if O_NONBLOCK is set then pselect() for read must never block either.
 
 No, I think that's not right.  The standard clearly says
 that select() should always behave as if O_NONBLOCK was not
 set:  "A descriptor shall be considered ready for reading
 when a call to an input function with O_NONBLOCK clear
 would not block".
 
 For poll there is a similar statement which is even clearer:
 "The poll() function shall not be affected by the O_NONBLOCK
 flag."
 
 Therefore:  select() and poll() are not dependent on the
 O_NONBLOCK flag.  They should always behave as if it was
 not set.
 
 Furthermore, the standard says a few things about the
 read() function when used on (nameless) pipes or FIFOs:
 
 [quote begin]
    When attempting to read from an empty pipe or FIFO:                        
 
     * If no process has the pipe open for writing, read()
       shall return 0 to indicate end-of-file.                                               
 
     * If some process has the pipe open for writing and
       O_NONBLOCK is set, read() shall return -1 and set
       errno to [EAGAIN].                      
 
     * If some process has the pipe open for writing and
       O_NONBLOCK is clear, read() shall block the calling
       thread until some data is written or the pipe is
       closed by all processes that had the pipe open for
       writing.                                                      
 [quote end]
 
 That clearly means that select() should _not_ block when
 no process has the FIFO open for writing.  (Because the
 select() behaviour depends on the behaviour of read() as
 if the O_NONBLOCK flag is clear.)
 
 Furthermore, it als means that it does _not_ matter if
 there was a a writer previously or not.
 
  > > -	if (events & (POLLOUT | POLLWRNORM))
  > > -		if (sowriteable(so))
  > > -			revents |= events & (POLLOUT | POLLWRNORM);
  > > +	if (events & (POLLOUT | POLLWRNORM) && sowriteable(so))
  > > +		revents |= events & (POLLOUT | POLLWRNORM);
  > > +	else {
  > > +		/*
  > > +		 * POLLOUT and POLLHUP shall not both be set.
  > > +		 * Therefore check only for POLLHUP if POLLOUT
  > > +		 * has not been set.  (Note that POLLHUP need
  > > +		 * not be in events; it's always checked.)
  > > +		 */
  > > +		if (so->so_rcv.sb_state & SBS_CANTRCVMORE &&
  > > +		    so->so_rcv.sb_cc == 0)
  > > +			revents |= POLLHUP;
  > > +	}
  > 
  > I think SBS_CANTSENDMORE in so_snd should be checked here.
 
 Agreed.
 
  > I think the receiver count shouldn't be checked here.
 
 Agreed.  That would handle the case correctly where both
 POLLIN and POLLHUP can be set at the same time.
 
  > I'm surprised that
  > my test succeeds with this -- doesn't it prevent POLLHUP being set in the
  > hangup+<old data to read> case?
 
 Yes, I think it prevents that (i.e. POLLHUP would act more
 like a "POLLEOF").  That's not correct behaviour, of course.
 I'll fix that.
 
  > This might be clearer with SBS_CANTSENDMORE checked first.
  > SBS_CANTSENDMORE set implies !sowriteable() so the behaviour is the same,
  > and I think it is clearer to not even look at the output bits in
  > `events' in the hangup case.
 
 So you mean in the SBS_CANTSENDMORE case, POLLHUP should be
 set without checking if the caller has requested POLLOUT in
 the events mask?  That sounds reasonable, because POLLOUT
 certainly can't be returned in that case.  It makes the
 code more complex, though.
 
 I'll have a look at that and try to implement it that way.
 
  > This also fixes poll() on sockets.  Sockets are more often used than named
  > pipes so the change needs a few weeks of testing before MFC.
 
 I see.
 
 Bruce Evans wrote:
  > Bruce Evans wrote:
  > > I intened to check the behaviour for this in my test programs but don't
  > > seem to have done it.  I intended to follow Linux's behaviour even if this
  > > is nonstandard.  Linux used to have some special cases including a gripe
  > > in a comment about having to have them to match Sun's behaviour, but I
  > > couldn't find these when I last checked.  Perhaps the difference is
  > > precisely between select() and poll(), to follow the standard for select()
  > > and exploit the fuzziness for poll().
  > 
  > I added the check.
 
 I'll try that later today.  (At least I hope to have enough
 time for it.)
 
  > select() on a named pipe:
  > % selectp: state 0: expected set; got clear
  > [...]
  > Now there is an extra failure for state 0.  Some complications will be
  > required to fix this without breaking poll() on named pipe.  State 0 is
  > when the read descriptor is open with O_NONBLOCK and there has "never"
  > been a writer.  In this state, select() on the read descriptor must
  > succeed to conform to POSIX, but poll() on the read descriptor must
  > block to conform to Linux.  I think the Linux behaviour is what happens
  > naturally -- the socket isn't hung up so sopoll() won't set POLLHUP,
 
 Now that might be debatable.  SUSv3 says that POLLHUP means
 that the device is disconnected.  That doesn't sound like
 it should make a difference if there was a previous writer
 or not.  In fact, when I open a FIFO which doesn't have a 
 writer currently, there's no way to know if there was a
 writer previously (before I opened the FIFO) who "hung it
 up".
 
 Personally I think that Linux is in error.  POLLHUP should
 be set when "the device is disconnected" (SUSv3), i.e. when
 there is no writer, period.
 
 However, I see your point that it might be more beneficial
 to be Linux-compliant instead of standard-compliant.
 
  > and there is no input so sopoll() won't set POLLIN, so sopoll() won't
  > set any flags in revents and poll() will block.  An extra flag seems to
  > be necessary to distinguish this state so that select() doesn't block.
 
 Yes, if we want to be Linux-compliant.  That'll make the
 code a lot more complicated.  *sigh*
 
 Best regards
    Oliver
 
 -- 
 Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
 Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
 Any opinions expressed in this message may be personal to the author
 and may not necessarily reflect the opinions of secnetix in any way.
 
 "The ITU has offered the IETF formal alignment with its
 corresponding technology, Penguins, but that won't fly."
         -- RFC 2549

From: Oliver Fromme <olli@lurza.secnetix.de>
To: bde@zeta.org.au (Bruce Evans)
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Thu, 23 Mar 2006 17:25:45 +0100 (CET)

 Hi,
 
 I have to correct myself slightly, and I have a few more
 insights ...
 
 Oliver Fromme wrote:
  > Bruce Evans wrote:
  > > select() on a named pipe:
  > > % selectp: state 0: expected set; got clear
  > > [...]
  > > Now there is an extra failure for state 0.  Some complications will be
  > > required to fix this without breaking poll() on named pipe.  State 0 is
  > > when the read descriptor is open with O_NONBLOCK and there has "never"
  > > been a writer.  In this state, select() on the read descriptor must
  > > succeed to conform to POSIX, but poll() on the read descriptor must
  > > block to conform to Linux.  I think the Linux behaviour is what happens
  > > naturally -- the socket isn't hung up so sopoll() won't set POLLHUP,
  > 
  > Now that might be debatable.  SUSv3 says that POLLHUP means
  > that the device is disconnected.  That doesn't sound like
  > it should make a difference if there was a previous writer
  > or not.
 
 SUSv3 says about POLLHUP:  "The device has been disconnected".
 I suppose that "has been disconnected" is different from "is
 disconnected".  I'm sorry, English is not my native language,
 so I didn't notice that slight difference when I read that
 page first.
 
 Thinking about it again, the Linux implementation seems to be
 reasonable, and it's probably conformant with the standard
 (even though the standard is somewhat fuzzy).
 
 So I agree with you that FreeBSD should behave the same as
 Linux in that regard.
 
  > Yes, if we want to be Linux-compliant.  That'll make the
  > code a lot more complicated.  *sigh*
 
 Just to make it clear:  This case happens only if the FIFO
 is opened with O_RDONLY | O_NONBLOCK and there is currently
 no process who has opened the FIFO for writing.
 
 Because "hung up" and "EOF" are different things, a new
 flag is required for the case when no previous writer
 exists (which means "EOF", but not "hung up").  select()
 is only interested in the EOF case, while poll() is only
 interested in the "hung up" case.
 
 I propose a new SBS_* flag for the so_rcv.sb_state mask.
 Lets call it SBS_EOFNOHUP for now (I'm sure someone can
 come up with a better name).  It will be set in fifo_open()
 in the case O_RDONLY | O_NONBLOCK and no writers.  It will
 be cleared in fifo_open() when someone opens the FIFO for
 writing.  In fifo_poll_f(), POLLHUP will be replaced by
 POLLIGNEOF in the result of soo_poll() if SBS_EOFNOHUP is
 set.
 
 selscan() does not need to be changed.  It will handle
 POLLIGNEOF just like POLLHUP, so select() won't block.
 
 pollscan() needs a slight change in order to remove
 POLLIGNEOF from the result of the fo_poll() call.
 I think POLLIGNEOF should not be exposed to userland.
 Its sole purpose is to communicate the abovementioned
 case from fifo_poll_f() to selscan(), and only those
 two functions should use that flag.
 
 That should fix both select() and poll(), if I didn't
 miss anything.
 
 What do you think?
 
 Best regards
    Oliver
 
 -- 
 Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
 Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
 Any opinions expressed in this message may be personal to the author
 and may not necessarily reflect the opinions of secnetix in any way.
 
 $ dd if=/dev/urandom of=test.pl count=1
 $ file test.pl
 test.pl: perl script text executable

From: Oliver Fromme <olli@lurza.secnetix.de>
To: bde@zeta.org.au (Bruce Evans)
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Thu, 23 Mar 2006 23:05:17 +0100 (CET)

 OK, here are new patches.  I wrote and tested them on
 RELENG_6, but they should apply to HEAD as well, AFAICT.
 
 With these patches, all of the test programs pass with
 success (no output), i.e. the select test and the poll
 test.  My own test program from the beginning of this
 PR passes without problems, too.
 
 Best regards
    Oliver
 
 
 --- ./fs/fifofs/fifo_vnops.c.orig	Tue Mar 21 09:42:32 2006
 +++ ./fs/fifofs/fifo_vnops.c	Thu Mar 23 19:57:21 2006
 @@ -231,6 +231,12 @@
  				wakeup(&fip->fi_writers);
  				sowwakeup(fip->fi_writesock);
  			}
 +			else if (ap->a_mode & O_NONBLOCK) {
 +				SOCKBUF_LOCK(&fip->fi_readsock->so_rcv);
 +				fip->fi_readsock->so_rcv.sb_state |=
 +				    SBS_EOFNOHUP;
 +				SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv);
 +			}
  		}
  	}
  	if (ap->a_mode & FWRITE) {
 @@ -241,7 +247,8 @@
  		fip->fi_writers++;
  		if (fip->fi_writers == 1) {
  			SOCKBUF_LOCK(&fip->fi_readsock->so_rcv);
 -			fip->fi_readsock->so_rcv.sb_state &= ~SBS_CANTRCVMORE;
 +			fip->fi_readsock->so_rcv.sb_state &=
 +			    ~(SBS_CANTRCVMORE | SBS_EOFNOHUP);
  			SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv);
  			if (fip->fi_readers > 0) {
  				wakeup(&fip->fi_readers);
 @@ -661,37 +668,23 @@
  	int levents, revents = 0;
  
  	fip = fp->f_data;
 -	levents = events &
 -	    (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLRDBAND);
 +	levents = events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND);
  	if ((fp->f_flag & FREAD) && levents) {
 -		/*
 -		 * If POLLIN or POLLRDNORM is requested and POLLINIGNEOF is
 -		 * not, then convert the first two to the last one.  This
 -		 * tells the socket poll function to ignore EOF so that we
 -		 * block if there is no writer (and no data).  Callers can
 -		 * set POLLINIGNEOF to get non-blocking behavior.
 -		 */
 -		if (levents & (POLLIN | POLLRDNORM) &&
 -		    !(levents & POLLINIGNEOF)) {
 -			levents &= ~(POLLIN | POLLRDNORM);
 -			levents |= POLLINIGNEOF;
 -		}
 -
  		filetmp.f_data = fip->fi_readsock;
  		filetmp.f_cred = cred;
  		revents |= soo_poll(&filetmp, levents, cred, td);
 -
 -		/* Reverse the above conversion. */
 -		if ((revents & POLLINIGNEOF) && !(events & POLLINIGNEOF)) {
 -			revents |= (events & (POLLIN | POLLRDNORM));
 -			revents &= ~POLLINIGNEOF;
 -		}
  	}
  	levents = events & (POLLOUT | POLLWRNORM | POLLWRBAND);
  	if ((fp->f_flag & FWRITE) && levents) {
  		filetmp.f_data = fip->fi_writesock;
  		filetmp.f_cred = cred;
  		revents |= soo_poll(&filetmp, levents, cred, td);
 +	}
 +	if (revents & POLLHUP) {
 +		SOCKBUF_LOCK(&fip->fi_readsock->so_rcv);
 +		if (fip->fi_readsock->so_rcv.sb_state & SBS_EOFNOHUP)
 +			revents = (revents & ~POLLHUP) | POLLHUPIGNEOF;
 +		SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv);
  	}
  	return (revents);
  }
 --- ./kern/uipc_socket.c.orig	Wed Dec 28 19:05:13 2005
 +++ ./kern/uipc_socket.c	Thu Mar 23 22:50:33 2006
 @@ -2033,16 +2033,15 @@
  	SOCKBUF_LOCK(&so->so_snd);
  	SOCKBUF_LOCK(&so->so_rcv);
  	if (events & (POLLIN | POLLRDNORM))
 -		if (soreadable(so))
 -			revents |= events & (POLLIN | POLLRDNORM);
 -
 -	if (events & POLLINIGNEOF)
  		if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat ||
  		    !TAILQ_EMPTY(&so->so_comp) || so->so_error)
 -			revents |= POLLINIGNEOF;
 +			revents |= events & (POLLIN | POLLRDNORM);
  
 -	if (events & (POLLOUT | POLLWRNORM))
 -		if (sowriteable(so))
 +	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) ||
 +	    (so->so_snd.sb_state & SBS_CANTSENDMORE))
 +		revents |= POLLHUP;
 +	else
 +		if (events & (POLLOUT | POLLWRNORM) && sowriteable(so))
  			revents |= events & (POLLOUT | POLLWRNORM);
  
  	if (events & (POLLPRI | POLLRDBAND))
 @@ -2050,9 +2049,7 @@
  			revents |= events & (POLLPRI | POLLRDBAND);
  
  	if (revents == 0) {
 -		if (events &
 -		    (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM |
 -		     POLLRDBAND)) {
 +		if (events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND)) {
  			selrecord(td, &so->so_rcv.sb_sel);
  			so->so_rcv.sb_flags |= SB_SEL;
  		}
 --- ./kern/sys_generic.c.orig	Thu Jul  7 20:17:55 2005
 +++ ./kern/sys_generic.c	Thu Mar 23 19:58:03 2006
 @@ -1027,7 +1027,7 @@
  				 * POLLERR if appropriate.
  				 */
  				fds->revents = fo_poll(fp, fds->events,
 -				    td->td_ucred, td);
 +				    td->td_ucred, td) & ~POLLHUPIGNEOF;
  				if (fds->revents != 0)
  					n++;
  			}
 --- ./sys/poll.h.orig	Wed Jul 10 06:47:25 2002
 +++ ./sys/poll.h	Thu Mar 23 19:56:58 2006
 @@ -66,9 +66,8 @@
  #define	POLLRDBAND	0x0080		/* OOB/Urgent readable data */
  #define	POLLWRBAND	0x0100		/* OOB/Urgent data can be written */
  
 -#if __BSD_VISIBLE
 -/* General FreeBSD extension (currently only supported for sockets): */
 -#define	POLLINIGNEOF	0x2000		/* like POLLIN, except ignore EOF */
 +#ifdef _KERNEL
 +#define	POLLHUPIGNEOF	0x2000
  #endif
  
  /*
 --- ./sys/socketvar.h.orig	Sat Jul  9 14:24:40 2005
 +++ ./sys/socketvar.h	Thu Mar 23 19:20:25 2006
 @@ -215,6 +215,7 @@
  #define	SBS_CANTSENDMORE	0x0010	/* can't send more data to peer */
  #define	SBS_CANTRCVMORE		0x0020	/* can't receive more data from peer */
  #define	SBS_RCVATMARK		0x0040	/* at mark on input */
 +#define	SBS_EOFNOHUP		0x0080	/* no initial writer */
  
  /*
   * Socket state bits stored in so_qstate.
 
 
 
 -- 
 Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
 Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
 Any opinions expressed in this message may be personal to the author
 and may not necessarily reflect the opinions of secnetix in any way.

From: Oliver Fromme <olli@lurza.secnetix.de>
To: bde@zeta.org.au (Bruce Evans)
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Fri, 24 Mar 2006 12:52:02 +0100 (CET)

 Hi Bruce,
 
 I took the liberty to modify your test programs so that
 their output is compliant with the regression framework
 in src/tools/regression.
 
 http://www.secnetix.de/~olli/tmp/pipepoll/
 
 I also modified them so that they perform all tests both
 with nameless pipes and with FIFOs, without having to
 recompile with different defines.
 
 Shall I open a separate PR to get them commited to
 src/tools/regression/pipepoll?
 
 Oh, by the way, the patch set that I mailed still has
 two failure cases with nameless pipes (I didn't notice
 at first because I only tested the NAMEDPIPE case):
 
 not ok 4  Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
 not ok 8  Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
 
 Those were broken before, too, so my patch doesn't make
 things worse, at least.  :-)   I'll try to fix those,
 too.  However, some feedback on my patches so far would
 be welcome.
 
 Best regards
    Oliver
 
 -- 
 Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
 Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
 Any opinions expressed in this message may be personal to the author
 and may not necessarily reflect the opinions of secnetix in any way.
 
  > Can the denizens of this group enlighten me about what the
  > advantages of Python are, versus Perl ?
 "python" is more likely to pass unharmed through your spelling
 checker than "perl".
         -- An unknown poster and Fredrik Lundh

From: Bruce Evans <bde@zeta.org.au>
To: Oliver Fromme <olli@lurza.secnetix.de>
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Sat, 25 Mar 2006 00:19:33 +1100 (EST)

 On Thu, 23 Mar 2006, Oliver Fromme wrote:
 
 > Bruce Evans wrote:
 
 > > I thought that the signal() was portable.
 >
 > Unfortunately, it's not.  SysV (e.g. Solaris) has different
 > semantics:  When the signal handler is executed, the signal's
 > disposition is set to SIG_DFL.  That means that the handler
 
 Oops, I forgot that SysV signal handling is broken.
 
 > On Linux, the situation is even more complex:  When using
 > libc4 or libc5, signal() has SysV semantics, and when using
 > glibc2, it has BSD semantics.  However, when using glibc2
 > with -D_XOPEN_SOURCE=500, it's again SysV, and in this
 > latter case sigset() is defined in the header file (not
 > in the other cases).
 
 I forgot that too.  I now remember being surprised when Linux
 defaulted to the SySV signal() brokenness and other SysV mistakes
 (like termio.h instead of POSIX termios.h).  I stopped keeping
 track of Linux userland at much the same time that it started
 switching to BSD signal().  Of course, switching made the mess
 larger.
 
 > > Under FreeBSD, <stdlib.h>
 > > of all things is the only missing include.
 >
 > FreeBSD generally seems to require less includes than the
 > standard says.  I had to add <sys/types.h>, <stdlib.h>,
 > <string.h> and <stdio.h> (although the latter two probably
 > only because of my err() and warnx() replacements).
 
 <sys/types.h> used to be standard (required by POSIX.1-1990),
 but this was fixed in POSIX.1-2001 or earlier, and FreeBSD
 (mainly mike@, who seems to have departed :() cleaned up the
 most important headers so that they don't need <sys/types.h>.
 
 > > ..
 > > but not for poll().  From POSIX.1-2001-draft7.txt for pselect():
 
 > I've got SUSv3 a.k.a. IEEE Std 1003.1-2001 ("POSIX").  You
 > can download it from The Open Group's website (you have to
 > register with them, but it's free).  However, I don't know
 > how much it differs from the draft that you have.
 
 I'm already registered and sometimes look at the web site, but
 grepping draft7 mostly works better.
 
 > > Other parts of POSIX make it clear that O_NONBLOCK reads must never block,
 >
 > That's right, but it does not matter for select()/poll().
 >
 > > so if O_NONBLOCK is set then pselect() for read must never block either.
 >
 > No, I think that's not right.  The standard clearly says
 > that select() should always behave as if O_NONBLOCK was not
 > set:  "A descriptor shall be considered ready for reading
 > when a call to an input function with O_NONBLOCK clear
 > would not block".
 
 Oops.  So select() must not block when there are no writers because that
 state is special for read() -- read() doesn't block waiting for writers.
 BTW, in private (?) followup to one of the #76xxx PRs mentioned in this
 thread, it was pointed out that this is a problem -- at least if poll()
 does the same, there is no good way to wait for a writer.  I said that
 a blocking open could be used (while keeping the original fd open), but
 the original poster pointed out the problem with that -- it requires a
 separate thread for each fd being waited on.  This is the best argument
 that I know of for having Linux's poll() behaviour.  I think it would be
 better for a writer appearing to be an exceptional event for select()
 and a POLLWRITER event for poll().
 
 > > This might be clearer with SBS_CANTSENDMORE checked first.
 > > SBS_CANTSENDMORE set implies !sowriteable() so the behaviour is the same,
 > > and I think it is clearer to not even look at the output bits in
 > > `events' in the hangup case.
 >
 > So you mean in the SBS_CANTSENDMORE case, POLLHUP should be
 > set without checking if the caller has requested POLLOUT in
 > the events mask?  That sounds reasonable, because POLLOUT
 > certainly can't be returned in that case.  It makes the
 > code more complex, though.
 
 Yes.  POLLHUP Is also needed for making poll() return for poll()
 waiting for input only.  I think it would make the code slightly
 less complex.
 
 > Bruce Evans wrote:
 > > Bruce Evans wrote:
 > > > I intened to check the behaviour for this in my test programs but don't
 > > > seem to have done it.  I intended to follow Linux's behaviour even if this
 > > > is nonstandard.  Linux used to have some special cases including a gripe
 > > > in a comment about having to have them to match Sun's behaviour, but I
 > > > couldn't find these when I last checked.  Perhaps the difference is
 > > > precisely between select() and poll(), to follow the standard for select()
 > > > and exploit the fuzziness for poll().
 > >
 > > I added the check.
 >
 > I'll try that later today.  (At least I hope to have enough
 > time for it.)
 
 I'm interested in what non-Linux non-FreeBSD systems do.  I won't have time
 to look at your patches today :-).
 
 > > select() on a named pipe:
 > > % selectp: state 0: expected set; got clear
 > > [...]
 > > Now there is an extra failure for state 0.  Some complications will be
 > > required to fix this without breaking poll() on named pipe.  State 0 is
 > > when the read descriptor is open with O_NONBLOCK and there has "never"
 > > been a writer.  In this state, select() on the read descriptor must
 > > succeed to conform to POSIX, but poll() on the read descriptor must
 > > block to conform to Linux.  I think the Linux behaviour is what happens
 > > naturally -- the socket isn't hung up so sopoll() won't set POLLHUP,
 >
 > Now that might be debatable.  SUSv3 says that POLLHUP means
 > that the device is disconnected.  That doesn't sound like
 > it should make a difference if there was a previous writer
 > or not.  In fact, when I open a FIFO which doesn't have a
 > writer currently, there's no way to know if there was a
 > writer previously (before I opened the FIFO) who "hung it
 > up".
 >
 > Personally I think that Linux is in error.  POLLHUP should
 > be set when "the device is disconnected" (SUSv3), i.e. when
 > there is no writer, period.
 
 I tend to agree, but trying to keep the semantics as simple as
 that is half of what caused this bug suite (the other half is
 not implementing POLLHUP).
 
 > However, I see your point that it might be more beneficial
 > to be Linux-compliant instead of standard-compliant.
 
 Hmm, the regression test needs to be even more delicate to test
 the timing of previous hangups.  FreeBSD clears the hangup flag
 on transition from 0 to 1 readers, so history of connections
 is handled reasonably well.
 
 Bruce

From: Oliver Fromme <olli@lurza.secnetix.de>
To: bde@zeta.org.au (Bruce Evans)
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Fri, 24 Mar 2006 21:49:29 +0100 (CET)

 Oliver Fromme wrote:
  > Oh, by the way, the patch set that I mailed still has
  > two failure cases with nameless pipes (I didn't notice
  > at first because I only tested the NAMEDPIPE case):
  > 
  > not ok 4  Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
  > not ok 8  Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
  > 
  > Those were broken before, too, so my patch doesn't make
  > things worse, at least.  :-)   I'll try to fix those,
  > too.
 
 Those were easy to fix.  It's basically a patch that just
 removes one line in kern/sys_pipe.c:pipe_poll().  This
 patch is independent from the others and can be applied
 separately.  I've added it to the patchset on my website:
 http://www.secnetix.de/~olli/tmp/fifodiff.txt
 
 Now _all_ current checks from the test programs are
 successful, both select() and poll(), and both for FIFOs
 and for nameless pipes.
 
 Best regards
    Oliver
 
 --- src/sys/kern/sys_pipe.c.orig	Tue Jan 31 16:44:51 2006
 +++ src/sys/kern/sys_pipe.c	Fri Mar 24 19:23:03 2006
 @@ -1350,8 +1350,7 @@
  #endif
  	if (events & (POLLIN | POLLRDNORM))
  		if ((rpipe->pipe_state & PIPE_DIRECTW) ||
 -		    (rpipe->pipe_buffer.cnt > 0) ||
 -		    (rpipe->pipe_state & PIPE_EOF))
 +		    (rpipe->pipe_buffer.cnt > 0))
  			revents |= events & (POLLIN | POLLRDNORM);
  
  	if (events & (POLLOUT | POLLWRNORM))
 
 
 
 -- 
 Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
 Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
 Any opinions expressed in this message may be personal to the author
 and may not necessarily reflect the opinions of secnetix in any way.
 
 "In My Egoistical Opinion, most people's C programs should be indented
 six feet downward and covered with dirt."
         -- Blair P. Houghton

From: Oliver Fromme <olli@lurza.secnetix.de>
To: bde@zeta.org.au (Bruce Evans)
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Fri, 24 Mar 2006 22:53:45 +0100 (CET)

 Bruce Evans wrote:
  > Oliver Fromme wrote:
  > > So you mean in the SBS_CANTSENDMORE case, POLLHUP should be
  > > set without checking if the caller has requested POLLOUT in
  > > the events mask?  That sounds reasonable, because POLLOUT
  > > certainly can't be returned in that case.  It makes the
  > > code more complex, though.
  > 
  > Yes.  POLLHUP Is also needed for making poll() return for poll()
  > waiting for input only.  I think it would make the code slightly
  > less complex.
 
 You're right.  My patch made that part of the code slightly
 less complex, indeed.
 
  > I'm interested in what non-Linux non-FreeBSD systems do.
 
 DEC UNIX 4.0D doesn't return POLLHUP at all, only POLLIN.
 I can give the detailed output, but I think it's not very
 interesting, given the fact that that system is about 7
 or 8 years old.  Unfortunately I don't know anybody with
 access to a Tru64 machine, which would be more interesting.
 
 Solaris 9 seems to behave exactly the same as Linux in the
 test program, i.e. it passes all checks successfully.
 Given the fact that Solaris went the Linux route (or vice
 versa), that's a strong point that FreeBSD should do the
 same.
 
 NetBSD 3.0 is very interesting, so I give the detailed
 output from the test program (which I modified to produce
 regression test compliant output, see my other mail):
 
 1..26
 ok 1      Pipe state 4: expected 0; got 0
 ok 2      Pipe state 5: expected POLLIN; got POLLIN
 ok 3      Pipe state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP
 not ok 4  Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
 ok 5      Pipe state 4: expected 0; got 0
 ok 6      Pipe state 5: expected POLLIN; got POLLIN
 ok 7      Pipe state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP
 not ok 8  Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
 ok 9      FIFO state 0: expected 0; got 0
 ok 10     FIFO state 1: expected 0; got 0
 ok 11     FIFO state 2: expected POLLIN; got POLLIN
 ok 12     FIFO state 2a: expected 0; got 0
 not ok 13 FIFO state 3: expected POLLHUP; got POLLIN
 ok 14     FIFO state 4: expected 0; got 0
 ok 15     FIFO state 5: expected POLLIN; got POLLIN
 not ok 16 FIFO state 6: expected POLLIN | POLLHUP; got POLLIN
 not ok 17 FIFO state 6a: expected POLLHUP; got POLLIN
 ok 18     FIFO state 0: expected 0; got 0
 ok 19     FIFO state 1: expected 0; got 0
 ok 20     FIFO state 2: expected POLLIN; got POLLIN
 ok 21     FIFO state 2a: expected 0; got 0
 not ok 22 FIFO state 3: expected POLLHUP; got POLLIN
 ok 23     FIFO state 4: expected 0; got 0
 ok 24     FIFO state 5: expected POLLIN; got POLLIN
 not ok 25 FIFO state 6: expected POLLIN | POLLHUP; got POLLIN
 not ok 26 FIFO state 6a: expected POLLHUP; got POLLIN
 
 That means two things:
 1.  When POLLHUP is returned, POLLIN is also always
     returned.
 2.  For FIFOs, POLLHUP is not used at all, but POLLIN
     is used instead.  This is the behaviour that Stevens
     describes in APUE, by the way.
 
 I guess portable programs cannot rely on the results from
 poll() too much ...  They probably just look if at least
 one of POLLHUP and POLLIN is set, and then call read().
 Otherwise they would break on one platform or another.
 
 Here's a web page from someone who did similar tests on
 a wide range of operating systems:
 
 http://www.greenend.org.uk/rjk/2001/06/poll.html
 
 His conclusions are a little bit different.  *SIGH*
 It's all the fault of fuzzy SUS/POSIX.  :-(
 
 Best regards
    Oliver
 
 -- 
 Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
 Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
 Any opinions expressed in this message may be personal to the author
 and may not necessarily reflect the opinions of secnetix in any way.
 
 "A language that doesn't have everything is actually easier
 to program in than some that do."
         -- Dennis M. Ritchie

From: Bruce Evans <bde@zeta.org.au>
To: Oliver Fromme <olli@lurza.secnetix.de>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Sat, 25 Mar 2006 16:08:58 +1100 (EST)

 On Fri, 24 Mar 2006, Oliver Fromme wrote:
 
 > I took the liberty to modify your test programs so that
 > their output is compliant with the regression framework
 > in src/tools/regression.
 >
 > http://www.secnetix.de/~olli/tmp/pipepoll/
 
 Thanks.
 
 I made some changes (mostly style fixes) and will send patches io
 provate mail.
 
 > I also modified them so that they perform all tests both
 > with nameless pipes and with FIFOs, without having to
 > recompile with different defines.
 >
 > Shall I open a separate PR to get them commited to
 > src/tools/regression/pipepoll?
 
 OK with me.  I was going to ask whoever committed the fix for
 this PR (not me) to handle the regression tests too.  The followup
 to this PR is already too long so a separate PR seems best.
 
 > Oh, by the way, the patch set that I mailed still has
 > two failure cases with nameless pipes (I didn't notice
 > at first because I only tested the NAMEDPIPE case):
 >
 > not ok 4  Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
 > not ok 8  Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
 >
 > Those were broken before, too, so my patch doesn't make
 > things worse, at least.  :-)   I'll try to fix those,
 > too.  However, some feedback on my patches so far would
 > be welcome.
 
 This case is unimportant, and as you reported in later mail it is easy
 to fix but is another fuzzy area POSIX/original-SysV-poll so everyOS
 does it differently.
 
 Bruce

From: Bruce Evans <bde@zeta.org.au>
To: Oliver Fromme <olli@lurza.secnetix.de>
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Sat, 25 Mar 2006 16:00:30 +1100 (EST)

 On Thu, 23 Mar 2006, Oliver Fromme wrote:
 
 > I have to correct myself slightly, and I have a few more
 > insights ...
 >
 > Oliver Fromme wrote:
 > > Bruce Evans wrote:
 > > > select() on a named pipe:
 > > > % selectp: state 0: expected set; got clear
 > > > [...]
 > > > Now there is an extra failure for state 0.  Some complications will be
 > > > required to fix this without breaking poll() on named pipe.  State 0 is
 > > > when the read descriptor is open with O_NONBLOCK and there has "never"
 > > > been a writer.  In this state, select() on the read descriptor must
 > > > succeed to conform to POSIX, but poll() on the read descriptor must
 > > > block to conform to Linux.  I think the Linux behaviour is what happens
 > > > naturally -- the socket isn't hung up so sopoll() won't set POLLHUP,
 > >
 > > Now that might be debatable.  SUSv3 says that POLLHUP means
 > > that the device is disconnected.  That doesn't sound like
 > > it should make a difference if there was a previous writer
 > > or not.
 >
 > SUSv3 says about POLLHUP:  "The device has been disconnected".
 > I suppose that "has been disconnected" is different from "is
 > disconnected".  I'm sorry, English is not my native language,
 > so I didn't notice that slight difference when I read that
 > page first.
 >
 > Thinking about it again, the Linux implementation seems to be
 > reasonable, and it's probably conformant with the standard
 > (even though the standard is somewhat fuzzy).
 
 It's fairly subtle even for a native speaker.  (I think it would look
 like a larger difference for a non-native speaker, but you know
 English too well :-).  It's not quite precise enough for a standard
 since the literal English meaning would cover all past disconnections
 (unless we consider a fifo to be a virtual device whose life began
 on the previous "first" open by a reader or a writer).
 
 > So I agree with you that FreeBSD should behave the same as
 > Linux in that regard.
 
 > I propose a new SBS_* flag for the so_rcv.sb_state mask.
 > Lets call it SBS_EOFNOHUP for now (I'm sure someone can
 > come up with a better name).  It will be set in fifo_open()
 > in the case O_RDONLY | O_NONBLOCK and no writers.  It will
 > be cleared in fifo_open() when someone opens the FIFO for
 > writing.  In fifo_poll_f(), POLLHUP will be replaced by
 > POLLIGNEOF in the result of soo_poll() if SBS_EOFNOHUP is
 > set.
 
 I'm not sure that SBS_EOFNOHUP is needed.  SBS_CANTRCVMORE might
 be sufficient.  It is cleared when someone opens the FIFO for
 writing.  Also, the O_NONBLOCK open for read() shouldn't be very
 different, since in addition to previously discussed reasons the
 SBS flags are per-socket so they can't be used to give different
 behaviour for a mix of nonblocking and blocking reads.
 
 > selscan() does not need to be changed.  It will handle
 > POLLIGNEOF just like POLLHUP, so select() won't block.
 >
 > pollscan() needs a slight change in order to remove
 > POLLIGNEOF from the result of the fo_poll() call.
 > I think POLLIGNEOF should not be exposed to userland.
 > Its sole purpose is to communicate the abovementioned
 > case from fifo_poll_f() to selscan(), and only those
 > two functions should use that flag.
 >
 > That should fix both select() and poll(), if I didn't
 > miss anything.
 >
 > What do you think?
 
 Good.  I'll reply to the mail that has the patch.
 
 Bruce

From: Bruce Evans <bde@zeta.org.au>
To: Oliver Fromme <olli@lurza.secnetix.de>
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Sun, 26 Mar 2006 22:50:17 +1100 (EST)

 On Fri, 24 Mar 2006, Oliver Fromme wrote:
 
 I'm still catching up with your mail on Thursday-Friday.  This
 and the one with the main patch.  I tested and debugged the
 patch and found a few problems and many more complications...
 
 > Bruce Evans wrote:
 > > Oliver Fromme wrote:
 > > > So you mean in the SBS_CANTSENDMORE case, POLLHUP should be
 > > > set without checking if the caller has requested POLLOUT in
 > > > the events mask?  That sounds reasonable, because POLLOUT
 > > > certainly can't be returned in that case.  It makes the
 > > > code more complex, though.
 > >
 > > Yes.  POLLHUP Is also needed for making poll() return for poll()
 > > waiting for input only.  I think it would make the code slightly
 > > less complex.
 >
 > You're right.  My patch made that part of the code slightly
 > less complex, indeed.
 
 It tests both SBS_CANTSENDMORE and SBS_CANTRCVMORE.  Testing both
 seems to be needed, but after my changes things got more complicated
 again.  For fifos there are 2 sockets each with these 2 flags, so
 there are 2**4 combinations of flags to consider.  When we set
 POLLHUP we are supposed to not set POLLOUT, but even when we force
 this in sopoll() we have to worry about fifo_poll() ORing POLLUP
 for the read socket together with POLLOUT for the write socket.
 Anyway, userland is not ready for POLLHUP, so I think we shouldn't
 add it to sopoll() yet.
 
 > > I'm interested in what non-Linux non-FreeBSD systems do.
 >
 > DEC UNIX 4.0D doesn't return POLLHUP at all, only POLLIN.
 > ...
 > Solaris 9 seems to behave exactly the same as Linux in the
 > ...
 >
 > NetBSD 3.0 is very interesting, so I give the detailed
 > output from the test program (which I modified to produce
 > regression test compliant output, see my other mail):
 
 I've only looked at NetBSD-2.0.1 sources.  These seem to still have
 some of the bugs in 4.4BSD that I fixed.  NetBSD-3.0 seems to be better.
 
 > 1..26
 > ok 1      Pipe state 4: expected 0; got 0
 > ok 2      Pipe state 5: expected POLLIN; got POLLIN
 > ok 3      Pipe state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP
 > not ok 4  Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
 
 I think we'll need to go back to this (always return POLLIN with POLLHUP).
 I found that lat_rpc in lmbench2 is broken without this.  At least in my
 old version of libc, libc/rpc uses poll() a lot, and it doesn't understand
 POLLHUP.  E.g., at EOF read_vc() spins forever waiting for POLLIN unless
 POLLIN is set together with POLLHUP.
 
 > ok 5      Pipe state 4: expected 0; got 0
 > ok 6      Pipe state 5: expected POLLIN; got POLLIN
 > ok 7      Pipe state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP
 > not ok 8  Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
 
 Same.
 
 > ok 9      FIFO state 0: expected 0; got 0
 > ok 10     FIFO state 1: expected 0; got 0
 > ok 11     FIFO state 2: expected POLLIN; got POLLIN
 > ok 12     FIFO state 2a: expected 0; got 0
 > not ok 13 FIFO state 3: expected POLLHUP; got POLLIN
 
 Similarly.  I changed your patches to return both POLLHUP and POLLIN here.
 (This required complications to zap POLLIN as well as POLLHUP in state 0.)
 I thought that returning POLLHUP would be harmless, but it isn't for
 output since returning POLLHUP requires not returning POLLOUT so
 pgrams that don't understand POLLHUP might spin at EOF for write by
 waiting for POLLOUT.
 
 > ok 14     FIFO state 4: expected 0; got 0
 > ok 15     FIFO state 5: expected POLLIN; got POLLIN
 > not ok 16 FIFO state 6: expected POLLIN | POLLHUP; got POLLIN
 
 Similarly.  For this state, we could fix the bug in gdb (premature exit
 on POLLHUP when POLLIN is also set and actually indicates non-null data)
 by returning only POLLIN.  This would only work for polling for readability.
 For writability, POLLHUP needs to be returned synchronously if at all, to
 give the application a chance of avoiding a write that would fail.
 select()'s interface, and returning POLLOUT on EOF, presumably results in
 lots of processes killed by SIGPIPE when they try such a write.
 
 > not ok 17 FIFO state 6a: expected POLLHUP; got POLLIN
 
 Same as for pipes.
 
 [... same for second iteration]
 
 > That means two things:
 > 1.  When POLLHUP is returned, POLLIN is also always
 >    returned.
 > 2.  For FIFOs, POLLHUP is not used at all, but POLLIN
 >    is used instead.  This is the behaviour that Stevens
 >    describes in APUE, by the way.
 >
 > I guess portable programs cannot rely on the results from
 > poll() too much ...  They probably just look if at least
 > one of POLLHUP and POLLIN is set, and then call read().
 > Otherwise they would break on one platform or another.
 
 Not supporting POLLHUP for pipes and fifos seems best.  We have
 to set POLLIN on EOF since too many programs only look at POLLIN.
 Then setting POLLHUP doesn't gain much.  It's strange to support
 POLLHUP for pipes but not for fifos.  It is easier to support for
 pipes but more useful for fifos.
 
 > Here's a web page from someone who did similar tests on
 > a wide range of operating systems:
 >
 > http://www.greenend.org.uk/rjk/2001/06/poll.html
 >
 > His conclusions are a little bit different.  *SIGH*
 > It's all the fault of fuzzy SUS/POSIX.  :-(
 
 Urk.  It shows about 50 variations in 12 OS's without even checking
 fifos.
 
 We need more regression tests for sockets if we're going to change
 sopoll() significantly.  I hacked the tests to check socketpair()
 (just change pipe() to socketpair(...)).  Pipes were once just
 socketpairs but are now handled specially, and this gives more
 variations.  Fortunately not many.  Before your changes, there are
 no differences for select(), and for poll() there are these:
 
 before:
 < ok 3      Pipe state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP
 < not ok 4  Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
 after:
 > not ok 3  Socketpair state 6: expected POLLIN | POLLHUP; got POLLIN
 > not ok 4  Socketpair state 6a: expected POLLHUP; got POLLIN
 
 We just lose all setting of POLLHUP, and this only makes a difference
 here.  (State 6a is the only problem case for pipes and socketpair()
 has this and a problem with state 6 too.)
 
 After your changes there are no differences for pipes and socketpairs.
 
 With my version of your changes there is a difference for state 6a again:
 
 before:
 < not ok 4  Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP
 after:
 > ok 4      Socketpair state 6a: expected POLLHUP; got POLLHUP
 
 My changes are supposed to always set POLLIN with POLLHUP (giving "not ok"
 in state 6a), and they somehow do that in sopoll() for fifos but not for
 socketpairs.
 
 Linux-2.6.10 has the following problem cases:
 
 select();
 % not ok 9  FIFO state 0: expected set; got clear
 
 Linux apparently doesn't have a special case for state 0 in fifos
 (reader with no data, no writer and no disconnection) -- it has the
 same behaviour in this state for select() as for poll() although this
 behaviour is clearly nonstandard for select().
 
 poll():
 not ok 4  Socketpair state 6a: expected POLLHUP; got POLLIN | POLLHUP
 
 In this state (reader with no data and a disconnection), Linux has
 simpler behaviour that is inconsistent withe Linux' pipe().
 
 I don't know socket programming well enough to quickly write similar
 tests for general connections.
 
 Bruce

From: Bruce Evans <bde@zeta.org.au>
To: Oliver Fromme <olli@lurza.secnetix.de>
Cc: bug-followup@freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Sun, 26 Mar 2006 23:57:15 +1100 (EST)

 On Thu, 23 Mar 2006, Oliver Fromme wrote:
 
 > OK, here are new patches.  I wrote and tested them on
 > RELENG_6, but they should apply to HEAD as well, AFAICT.
 >
 > With these patches, all of the test programs pass with
 > success (no output), i.e. the select test and the poll
 > test.  My own test program from the beginning of this
 > PR passes without problems, too.
 >
 > --- ./fs/fifofs/fifo_vnops.c.orig	Tue Mar 21 09:42:32 2006
 > +++ ./fs/fifofs/fifo_vnops.c	Thu Mar 23 19:57:21 2006
 > @@ -231,6 +231,12 @@
 > 				wakeup(&fip->fi_writers);
 > 				sowwakeup(fip->fi_writesock);
 > 			}
 > +			else if (ap->a_mode & O_NONBLOCK) {
 > +				SOCKBUF_LOCK(&fip->fi_readsock->so_rcv);
 > +				fip->fi_readsock->so_rcv.sb_state |=
 > +				    SBS_EOFNOHUP;
 > +				SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv);
 > +			}
 > 		}
 > 	}
 > 	if (ap->a_mode & FWRITE) {
 
 This flags seems to be necessary (I couldn't recover it from the SBS_CANT*
 clags like I hoped to).
 
 The flag needs to be set even in the !O_NONBLOCK case since otherwise it
 doesn't get set if the first reader is !O_NONBLOCK and a later reader
 (before any writers) is O_NONBLOCK.
 
 Clearing of this flag seems to be missing in cases where all readers
 go away before any writers appear.
 
 I cnaged the sense of the flag and renamed it to SBS_COULDRCV.
 
 > @@ -241,7 +247,8 @@
 > 		fip->fi_writers++;
 > 		if (fip->fi_writers == 1) {
 > 			SOCKBUF_LOCK(&fip->fi_readsock->so_rcv);
 > -			fip->fi_readsock->so_rcv.sb_state &= ~SBS_CANTRCVMORE;
 > +			fip->fi_readsock->so_rcv.sb_state &=
 > +			    ~(SBS_CANTRCVMORE | SBS_EOFNOHUP);
 > 			SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv);
 > 			if (fip->fi_readers > 0) {
 > 				wakeup(&fip->fi_readers);
 
 OK.
 
 > @@ -661,37 +668,23 @@
 > 	int levents, revents = 0;
 >
 > 	fip = fp->f_data;
 > -	levents = events &
 > -	    (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLRDBAND);
 > +	levents = events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND);
 > 	if ((fp->f_flag & FREAD) && levents) {
 > -		/*
 > -		 * If POLLIN or POLLRDNORM is requested and POLLINIGNEOF is
 > -		 * not, then convert the first two to the last one.  This
 > -		 * tells the socket poll function to ignore EOF so that we
 > -		 * block if there is no writer (and no data).  Callers can
 > -		 * set POLLINIGNEOF to get non-blocking behavior.
 > -		 */
 > -		if (levents & (POLLIN | POLLRDNORM) &&
 > -		    !(levents & POLLINIGNEOF)) {
 > -			levents &= ~(POLLIN | POLLRDNORM);
 > -			levents |= POLLINIGNEOF;
 > -		}
 > -
 > 		filetmp.f_data = fip->fi_readsock;
 > 		filetmp.f_cred = cred;
 > 		revents |= soo_poll(&filetmp, levents, cred, td);
 > -
 > -		/* Reverse the above conversion. */
 > -		if ((revents & POLLINIGNEOF) && !(events & POLLINIGNEOF)) {
 > -			revents |= (events & (POLLIN | POLLRDNORM));
 > -			revents &= ~POLLINIGNEOF;
 > -		}
 > 	}
 > 	levents = events & (POLLOUT | POLLWRNORM | POLLWRBAND);
 > 	if ((fp->f_flag & FWRITE) && levents) {
 > 		filetmp.f_data = fip->fi_writesock;
 > 		filetmp.f_cred = cred;
 > 		revents |= soo_poll(&filetmp, levents, cred, td);
 > +	}
 
 OK.
 
 > +	if (revents & POLLHUP) {
 > +		SOCKBUF_LOCK(&fip->fi_readsock->so_rcv);
 > +		if (fip->fi_readsock->so_rcv.sb_state & SBS_EOFNOHUP)
 > +			revents = (revents & ~POLLHUP) | POLLHUPIGNEOF;
 > +		SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv);
 > 	}
 > 	return (revents);
 > }
 
 I think the locking here isn't useful (locking for reading a flag generally
 isn't since there is little difference if you lose a race before, during
 or after reading the flag.  Locking the whole function might be needed.
 There is now locking for the whole of sopoll().  In the old version of
 -current that I use there isn't any locking at all there.
 
 The above became more complicated when I set POLLIN together with POLLHUP
 in sopoll().  Then the above needed to clear POLLIN together with POLLHUP
 but only if there is no data.
 
 Clearing flags in callers is ugly and causes the following bug: sopoll()
 has decided not to call selrecord() if revents != 0; the above mainains
 revents != 0 but pollscan() may clear the final flag POLLHUPIGNEOF out
 of revents and then we sleep without having called selrecord().  I didn't
 notice anything breaking from this.  The zero timeout in the regression
 tests prevents problems there and probably nothing else in my system has
 used a fifo lately.
 
 I think I fixed these problems by moving all the decisions to sopoll().
 pollscan() sets a flag POLLPOLL and fifofs maintains SBS_COULDRCV so
 that sopoll() can decide.
 
 Now I wonder about sleeping with or without selrecord() for silly
 combinations of poll flags.  I think events == 0 causes a sleep that
 is not terminated by hangup since sopoll() is not called so it doesn't
 even get a chance to set POLLHUP.  I think this case should cause
 a sleep that is terminated by hangup (but nothing else except a timeout
 or signal).
 
 > --- ./kern/uipc_socket.c.orig	Wed Dec 28 19:05:13 2005
 > +++ ./kern/uipc_socket.c	Thu Mar 23 22:50:33 2006
 > @@ -2033,16 +2033,15 @@
 > 	SOCKBUF_LOCK(&so->so_snd);
 > 	SOCKBUF_LOCK(&so->so_rcv);
 
 The version that you use has the locking here too.
 
 > 	if (events & (POLLIN | POLLRDNORM))
 > -		if (soreadable(so))
 > -			revents |= events & (POLLIN | POLLRDNORM);
 > -
 > -	if (events & POLLINIGNEOF)
 > 		if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat ||
 > 		    !TAILQ_EMPTY(&so->so_comp) || so->so_error)
 > -			revents |= POLLINIGNEOF;
 > +			revents |= events & (POLLIN | POLLRDNORM);
 >
 > -	if (events & (POLLOUT | POLLWRNORM))
 > -		if (sowriteable(so))
 > +	if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) ||
 > +	    (so->so_snd.sb_state & SBS_CANTSENDMORE))
 > +		revents |= POLLHUP;
 > +	else
 > +		if (events & (POLLOUT | POLLWRNORM) && sowriteable(so))
 > 			revents |= events & (POLLOUT | POLLWRNORM);
 >
 > 	if (events & (POLLPRI | POLLRDBAND))
 > @@ -2050,9 +2049,7 @@
 > 			revents |= events & (POLLPRI | POLLRDBAND);
 >
 > 	if (revents == 0) {
 > -		if (events &
 > -		    (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM |
 > -		     POLLRDBAND)) {
 > +		if (events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND)) {
 > 			selrecord(td, &so->so_rcv.sb_sel);
 > 			so->so_rcv.sb_flags |= SB_SEL;
 > 		}
 
 This all worked as intended.  It broke lat_rpc because (as intended) the
 condition for setting POLLIN became a subset of soreadable(), so POLLIN
 no longer gets set at pure EOF, but lat_rpc depends on it being set.
 
 The rest of the changes are simple and worked.
 
 I forgot to mention another source of inconsistencies: kqueue.  kqueue
 mainly uses the SB_CANT* flags for sockets and fifos to decide EOF, so
 it should work reasonably provided sopoll() does, but it will behave
 like select() in the special case unless it is changed to behave like
 poll().
 
 Here is my work-in-progress version.  The patch is relative to a very
 old version of FreeBSD.
 
 %%%
 Index: fifo_vnops.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/fs/fifofs/fifo_vnops.c,v
 retrieving revision 1.100
 diff -u -2 -r1.100 fifo_vnops.c
 --- fifo_vnops.c	23 Jun 2004 00:35:50 -0000	1.100
 +++ fifo_vnops.c	26 Mar 2006 09:42:47 -0000
 @@ -232,4 +232,22 @@
   		fip->fi_readers++;
   		if (fip->fi_readers == 1) {
 +			SOCKBUF_LOCK(&fip->fi_readsock->so_rcv);
 +			if (fip->fi_writers > 0)
 +				fip->fi_readsock->so_rcv.sb_state |=
 +				    SBS_COULDRCV;
 +			else
 +				/*
 +				 * Sloppy?  Might be necessary to clear it
 +				 * in all the places where fi_readers is
 +				 * decremented to 0.  I think only writers
 +				 * polling for input could be confused by
 +				 * having it not set, and there is a problem
 +				 * with these anyway now that we have
 +				 * reversed the sense of the flag -- they
 +				 * now block (?), but shouldn't.
 +				 */
 +				fip->fi_readsock->so_rcv.sb_state &=
 +				    ~SBS_COULDRCV;
 +			SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv);
   			SOCKBUF_LOCK(&fip->fi_writesock->so_snd);
   			fip->fi_writesock->so_snd.sb_state &= ~SBS_CANTSENDMORE;
 @@ -248,7 +266,8 @@
   		fip->fi_writers++;
   		if (fip->fi_writers == 1) {
 -			SOCKBUF_LOCK(&fip->fi_writesock->so_rcv);
 +			SOCKBUF_LOCK(&fip->fi_readsock->so_rcv);
   			fip->fi_readsock->so_rcv.sb_state &= ~SBS_CANTRCVMORE;
 -			SOCKBUF_UNLOCK(&fip->fi_writesock->so_rcv);
 +			fip->fi_readsock->so_rcv.sb_state |= SBS_COULDRCV;
 +			SOCKBUF_UNLOCK(&fip->fi_readsock->so_rcv);
   			if (fip->fi_readers > 0) {
   				wakeup(&fip->fi_readers);
 @@ -521,32 +540,11 @@
   	int events, revents = 0;
 
 -	events = ap->a_events &
 -	    (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLRDBAND);
 +	events = ap->a_events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND);
   	if (events) {
 -		/*
 -		 * If POLLIN or POLLRDNORM is requested and POLLINIGNEOF is
 -		 * not, then convert the first two to the last one.  This
 -		 * tells the socket poll function to ignore EOF so that we
 -		 * block if there is no writer (and no data).  Callers can
 -		 * set POLLINIGNEOF to get non-blocking behavior.
 -		 */
 -		if (events & (POLLIN | POLLRDNORM) &&
 -		    !(events & POLLINIGNEOF)) {
 -			events &= ~(POLLIN | POLLRDNORM);
 -			events |= POLLINIGNEOF;
 -		}
 -
   		filetmp.f_data = ap->a_vp->v_fifoinfo->fi_readsock;
   		filetmp.f_cred = ap->a_cred;
 -		if (filetmp.f_data)
 -			revents |= soo_poll(&filetmp, events,
 -			    ap->a_td->td_ucred, ap->a_td);
 -
 -		/* Reverse the above conversion. */
 -		if ((revents & POLLINIGNEOF) &&
 -		    !(ap->a_events & POLLINIGNEOF)) {
 -			revents |= (ap->a_events & (POLLIN | POLLRDNORM));
 -			revents &= ~POLLINIGNEOF;
 -		}
 +		revents |= soo_poll(&filetmp,
 +		    events | (ap->a_events & POLLPOLL), ap->a_td->td_ucred,
 +		    ap->a_td);
   	}
   	events = ap->a_events & (POLLOUT | POLLWRNORM | POLLWRBAND);
 @@ -554,8 +552,6 @@
   		filetmp.f_data = ap->a_vp->v_fifoinfo->fi_writesock;
   		filetmp.f_cred = ap->a_cred;
 -		if (filetmp.f_data) {
 -			revents |= soo_poll(&filetmp, events,
 -			    ap->a_td->td_ucred, ap->a_td);
 -		}
 +		revents |= soo_poll(&filetmp, events, ap->a_td->td_ucred,
 +		    ap->a_td);
   	}
   	return (revents);
 Index: sys_generic.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/kern/sys_generic.c,v
 retrieving revision 1.131
 diff -u -2 -r1.131 sys_generic.c
 --- sys_generic.c	5 Apr 2004 21:03:35 -0000	1.131
 +++ sys_generic.c	26 Mar 2006 05:41:48 -0000
 @@ -1093,6 +1080,6 @@
   				 * POLLERR if appropriate.
   				 */
 -				fds->revents = fo_poll(fp, fds->events,
 -				    td->td_ucred, td);
 +				fds->revents = fo_poll(fp,
 +				    fds->events | POLLPOLL, td->td_ucred, td);
   				if (fds->revents != 0)
   					n++;
 Index: uipc_socket.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v
 retrieving revision 1.189
 diff -u -2 -r1.189 uipc_socket.c
 --- uipc_socket.c	24 Jun 2004 04:28:30 -0000	1.189
 +++ uipc_socket.c	26 Mar 2006 09:27:47 -0000
 @@ -1872,13 +1870,43 @@
   			revents |= events & (POLLIN | POLLRDNORM);
 
 -	if (events & POLLINIGNEOF)
 -		if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat ||
 -		    !TAILQ_EMPTY(&so->so_comp) || so->so_error)
 -			revents |= POLLINIGNEOF;
 -
   	if (events & (POLLOUT | POLLWRNORM))
   		if (sowriteable(so))
   			revents |= events & (POLLOUT | POLLWRNORM);
 
 +	/*
 +	 * SBS_CANTRCVMORE (which is checked by soreadable()) normally
 +	 * implies EOF (and thus readable) and hung up, but for
 +	 * compatibility with other systems and to obtain behavior that
 +	 * is otherwise unavailable we make the case of poll() on a fifo
 +	 * that has never had any writers during the lifetime of any
 +	 * current reader special: then we pretend that the fifo is
 +	 * unreadable unless it contains non-null data, and that it is
 +	 * not hung up.  The POLLPOLL flag is set by poll() to identify
 +	 * poll() here, and the SBS_COULDRCV flag is set by the fifo
 +	 * layer to indicate a fifo that is not in the special state.
 +	 */
 +	if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
 +		if (!(events & POLLPOLL) || so->so_rcv.sb_state & SBS_COULDRCV)
 +			revents |= POLLHUP;	/* finish settings */
 +		else if (!(so->so_rcv.sb_cc >= so->so_rcv.sb_lowat ||
 +		    !TAILQ_EMPTY(&so->so_comp) || so->so_error))
 +			revents &= ~(POLLIN | POLLRDNORM); /* undo settings */
 +	}
 +
 +	/*
 +	 * Testing of hangup for writers could be optimized by combining
 +	 * it with testing for writeability, but we keep the test separate
 +	 * and with the same organization as the test for readers for
 +	 * clarity.  Note that writeable implies not hung up, so if POLLHUP
 +	 * is set here then (POLLOUT | POLLWRNORM) is not set above, as
 +	 * standards require.  Less obviously, if POLLHUP was set above for
 +	 * a reader, then the output flags cannot have been set above for
 +	 * a writer.  Even less obviously, we cannot end up with both
 +	 * POLLHUP output flags set in revents after ORing the revents for
 +	 * the read and write socket in fifo_poll().
 +	 */
 +	if (so->so_snd.sb_state & SBS_CANTSENDMORE)
 +		revents |= POLLHUP;
 +
   	if (events & (POLLPRI | POLLRDBAND))
   		if (so->so_oobmark || (so->so_rcv.sb_state & SBS_RCVATMARK))
 @@ -1886,7 +1914,5 @@
 
   	if (revents == 0) {
 -		if (events &
 -		    (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM |
 -		     POLLRDBAND)) {
 +		if (events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND)) {
   			SOCKBUF_LOCK(&so->so_rcv);
   			selrecord(td, &so->so_rcv.sb_sel);
 Index: poll.h
 ===================================================================
 RCS file: /home/ncvs/src/sys/sys/poll.h,v
 retrieving revision 1.13
 diff -u -2 -r1.13 poll.h
 --- poll.h	10 Jul 2002 04:47:25 -0000	1.13
 +++ poll.h	26 Mar 2006 07:56:52 -0000
 @@ -67,7 +67,6 @@
   #define	POLLWRBAND	0x0100		/* OOB/Urgent data can be written */
 
 -#if __BSD_VISIBLE
 -/* General FreeBSD extension (currently only supported for sockets): */
 -#define	POLLINIGNEOF	0x2000		/* like POLLIN, except ignore EOF */
 +#ifdef _KERNEL
 +#define	POLLPOLL	0x8000		/* system call is actually poll() */
   #endif
 
 Index: socketvar.h
 ===================================================================
 RCS file: /home/ncvs/src/sys/sys/socketvar.h,v
 retrieving revision 1.130
 diff -u -2 -r1.130 socketvar.h
 --- socketvar.h	24 Jun 2004 04:27:10 -0000	1.130
 +++ socketvar.h	26 Mar 2006 08:35:56 -0000
 @@ -212,4 +212,5 @@
   #define	SBS_CANTRCVMORE		0x0020	/* can't receive more data from peer */
   #define	SBS_RCVATMARK		0x0040	/* at mark on input */
 +#define	SBS_COULDRCV		0x0080	/* could receive previously (or now) */
 
   /*
 Index: syscalls.c
 ===================================================================
 RCS file: /home/ncvs/src/usr.bin/truss/syscalls.c,v
 retrieving revision 1.39
 diff -u -2 -r1.39 syscalls.c
 --- syscalls.c	11 Jun 2004 11:58:07 -0000	1.39
 +++ syscalls.c	25 Mar 2006 13:25:13 -0000
 @@ -402,5 +402,5 @@
   #define POLLKNOWN_EVENTS \
   	(POLLIN | POLLPRI | POLLOUT | POLLERR | POLLHUP | POLLNVAL | \
 -	 POLLRDNORM |POLLRDBAND | POLLWRBAND | POLLINIGNEOF) 
 +	 POLLRDNORM |POLLRDBAND | POLLWRBAND)
 
   	  u += snprintf(tmp + used, per_fd,
 %%%
 
 Bruce

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/94772: commit references a PR
Date: Tue,  7 Jul 2009 09:44:00 +0000 (UTC)

 Author: kib
 Date: Tue Jul  7 09:43:44 2009
 New Revision: 195423
 URL: http://svn.freebsd.org/changeset/base/195423
 
 Log:
   Fix poll(2) and select(2) for named pipes to return "ready for read"
   when all writers, observed by reader, exited. Use writer generation
   counter for fifo, and store the snapshot of the fifo generation in the
   f_seqcount field of struct file, that is otherwise unused for fifos.
   Set FreeBSD-undocumented POLLINIGNEOF flag only when file f_seqcount is
   equal to fifo' fi_wgen, and revert r89376.
   
   Fix POLLINIGNEOF for sockets and pipes, and return POLLHUP for them.
   Note that the patch does not fix not returning POLLHUP for fifos.
   
   PR:	kern/94772
   Submitted by:	bde (original version)
   Reviewed by:	rwatson, jilles
   Approved by:	re (kensmith)
   MFC after:	6 weeks (might be)
 
 Modified:
   head/sys/fs/fifofs/fifo_vnops.c
   head/sys/kern/sys_pipe.c
   head/sys/kern/uipc_socket.c
   head/sys/sys/socketvar.h
 
 Modified: head/sys/fs/fifofs/fifo_vnops.c
 ==============================================================================
 --- head/sys/fs/fifofs/fifo_vnops.c	Tue Jul  7 00:02:26 2009	(r195422)
 +++ head/sys/fs/fifofs/fifo_vnops.c	Tue Jul  7 09:43:44 2009	(r195423)
 @@ -84,6 +84,7 @@ struct fifoinfo {
  	struct socket	*fi_writesock;
  	long		fi_readers;
  	long		fi_writers;
 +	int		fi_wgen;
  };
  
  static vop_print_t	fifo_print;
 @@ -232,6 +233,7 @@ fail1:
  				sowwakeup(fip->fi_writesock);
  			}
  		}
 +		fp->f_seqcount = fip->fi_wgen - fip->fi_writers;
  	}
  	if (ap->a_mode & FWRITE) {
  		if ((ap->a_mode & O_NONBLOCK) && fip->fi_readers == 0) {
 @@ -279,6 +281,9 @@ fail1:
  				fip->fi_writers--;
  				if (fip->fi_writers == 0) {
  					socantrcvmore(fip->fi_readsock);
 +					mtx_lock(&fifo_mtx);
 +					fip->fi_wgen++;
 +					mtx_unlock(&fifo_mtx);
  					fifo_cleanup(vp);
  				}
  				return (error);
 @@ -395,8 +400,12 @@ fifo_close(ap)
  	}
  	if (ap->a_fflag & FWRITE) {
  		fip->fi_writers--;
 -		if (fip->fi_writers == 0)
 +		if (fip->fi_writers == 0) {
  			socantrcvmore(fip->fi_readsock);
 +			mtx_lock(&fifo_mtx);
 +			fip->fi_wgen++;
 +			mtx_unlock(&fifo_mtx);
 +		}
  	}
  	fifo_cleanup(vp);
  	return (0);
 @@ -634,28 +643,13 @@ fifo_poll_f(struct file *fp, int events,
  	levents = events &
  	    (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLRDBAND);
  	if ((fp->f_flag & FREAD) && levents) {
 -		/*
 -		 * If POLLIN or POLLRDNORM is requested and POLLINIGNEOF is
 -		 * not, then convert the first two to the last one.  This
 -		 * tells the socket poll function to ignore EOF so that we
 -		 * block if there is no writer (and no data).  Callers can
 -		 * set POLLINIGNEOF to get non-blocking behavior.
 -		 */
 -		if (levents & (POLLIN | POLLRDNORM) &&
 -		    !(levents & POLLINIGNEOF)) {
 -			levents &= ~(POLLIN | POLLRDNORM);
 -			levents |= POLLINIGNEOF;
 -		}
 -
  		filetmp.f_data = fip->fi_readsock;
  		filetmp.f_cred = cred;
 +		mtx_lock(&fifo_mtx);
 +		if (fp->f_seqcount == fip->fi_wgen)
 +			levents |= POLLINIGNEOF;
 +		mtx_unlock(&fifo_mtx);
  		revents |= soo_poll(&filetmp, levents, cred, td);
 -
 -		/* Reverse the above conversion. */
 -		if ((revents & POLLINIGNEOF) && !(events & POLLINIGNEOF)) {
 -			revents |= (events & (POLLIN | POLLRDNORM));
 -			revents &= ~POLLINIGNEOF;
 -		}
  	}
  	levents = events & (POLLOUT | POLLWRNORM | POLLWRBAND);
  	if ((fp->f_flag & FWRITE) && levents) {
 
 Modified: head/sys/kern/sys_pipe.c
 ==============================================================================
 --- head/sys/kern/sys_pipe.c	Tue Jul  7 00:02:26 2009	(r195422)
 +++ head/sys/kern/sys_pipe.c	Tue Jul  7 09:43:44 2009	(r195423)
 @@ -1353,8 +1353,7 @@ pipe_poll(fp, events, active_cred, td)
  #endif
  	if (events & (POLLIN | POLLRDNORM))
  		if ((rpipe->pipe_state & PIPE_DIRECTW) ||
 -		    (rpipe->pipe_buffer.cnt > 0) ||
 -		    (rpipe->pipe_state & PIPE_EOF))
 +		    (rpipe->pipe_buffer.cnt > 0))
  			revents |= events & (POLLIN | POLLRDNORM);
  
  	if (events & (POLLOUT | POLLWRNORM))
 @@ -1364,10 +1363,14 @@ pipe_poll(fp, events, active_cred, td)
  		     (wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >= PIPE_BUF))
  			revents |= events & (POLLOUT | POLLWRNORM);
  
 -	if ((rpipe->pipe_state & PIPE_EOF) ||
 -	    wpipe->pipe_present != PIPE_ACTIVE ||
 -	    (wpipe->pipe_state & PIPE_EOF))
 -		revents |= POLLHUP;
 +	if ((events & POLLINIGNEOF) == 0) {
 +		if (rpipe->pipe_state & PIPE_EOF) {
 +			revents |= (events & (POLLIN | POLLRDNORM));
 +			if (wpipe->pipe_present != PIPE_ACTIVE ||
 +			    (wpipe->pipe_state & PIPE_EOF))
 +				revents |= POLLHUP;
 +		}
 +	}
  
  	if (revents == 0) {
  		if (events & (POLLIN | POLLRDNORM)) {
 
 Modified: head/sys/kern/uipc_socket.c
 ==============================================================================
 --- head/sys/kern/uipc_socket.c	Tue Jul  7 00:02:26 2009	(r195422)
 +++ head/sys/kern/uipc_socket.c	Tue Jul  7 09:43:44 2009	(r195423)
 @@ -2885,14 +2885,9 @@ sopoll_generic(struct socket *so, int ev
  	SOCKBUF_LOCK(&so->so_snd);
  	SOCKBUF_LOCK(&so->so_rcv);
  	if (events & (POLLIN | POLLRDNORM))
 -		if (soreadable(so))
 +		if (soreadabledata(so))
  			revents |= events & (POLLIN | POLLRDNORM);
  
 -	if (events & POLLINIGNEOF)
 -		if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat ||
 -		    !TAILQ_EMPTY(&so->so_comp) || so->so_error)
 -			revents |= POLLINIGNEOF;
 -
  	if (events & (POLLOUT | POLLWRNORM))
  		if (sowriteable(so))
  			revents |= events & (POLLOUT | POLLWRNORM);
 @@ -2901,10 +2896,16 @@ sopoll_generic(struct socket *so, int ev
  		if (so->so_oobmark || (so->so_rcv.sb_state & SBS_RCVATMARK))
  			revents |= events & (POLLPRI | POLLRDBAND);
  
 +	if ((events & POLLINIGNEOF) == 0) {
 +		if (so->so_rcv.sb_state & SBS_CANTRCVMORE) {
 +			revents |= events & (POLLIN | POLLRDNORM);
 +			if (so->so_snd.sb_state & SBS_CANTSENDMORE)
 +				revents |= POLLHUP;
 +		}
 +	}
 +
  	if (revents == 0) {
 -		if (events &
 -		    (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM |
 -		     POLLRDBAND)) {
 +		if (events & (POLLIN | POLLPRI | POLLRDNORM | POLLRDBAND)) {
  			selrecord(td, &so->so_rcv.sb_sel);
  			so->so_rcv.sb_flags |= SB_SEL;
  		}
 
 Modified: head/sys/sys/socketvar.h
 ==============================================================================
 --- head/sys/sys/socketvar.h	Tue Jul  7 00:02:26 2009	(r195422)
 +++ head/sys/sys/socketvar.h	Tue Jul  7 09:43:44 2009	(r195423)
 @@ -197,10 +197,11 @@ struct xsocket {
      ((so)->so_proto->pr_flags & PR_ATOMIC)
  
  /* can we read something from so? */
 -#define	soreadable(so) \
 +#define	soreadabledata(so) \
      ((so)->so_rcv.sb_cc >= (so)->so_rcv.sb_lowat || \
 -	((so)->so_rcv.sb_state & SBS_CANTRCVMORE) || \
  	!TAILQ_EMPTY(&(so)->so_comp) || (so)->so_error)
 +#define	soreadable(so) \
 +	(soreadabledata(so) || ((so)->so_rcv.sb_state & SBS_CANTRCVMORE))
  
  /* can we write something to so? */
  #define	sowriteable(so) \
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->closed 
State-Changed-By: olli 
State-Changed-When: Wed Aug 4 14:03:22 UTC 2010 
State-Changed-Why:  
Patch committed, problem fixed (last year already). 
Thanks to everyone involved! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=94772 
>Unformatted:
