From ino-qc@spotteswoode.de.eu.org  Wed Jun 18 07:09:46 2003
Return-Path: <ino-qc@spotteswoode.de.eu.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 420B437B401
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 18 Jun 2003 07:09:46 -0700 (PDT)
Received: from mout0.freenet.de (mout0.freenet.de [194.97.50.131])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CD0F643F75
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 18 Jun 2003 07:09:44 -0700 (PDT)
	(envelope-from ino-qc@spotteswoode.de.eu.org)
Received: from [194.97.50.136] (helo=mx3.freenet.de)
	by mout0.freenet.de with asmtp (Exim 4.20)
	id 19Sdd9-0000ua-EL
	for FreeBSD-gnats-submit@freebsd.org; Wed, 18 Jun 2003 16:09:43 +0200
Received: from p3e9baab6.dip.t-dialin.net ([62.155.170.182] helo=spotteswoode.dnsalias.org)
	by mx3.freenet.de with asmtp (ID inode@freenet.de) (Exim 4.20 #1)
	id 19Sdd8-00035O-RD
	for FreeBSD-gnats-submit@freebsd.org; Wed, 18 Jun 2003 16:09:42 +0200
Received: (qmail 1623 invoked by uid 0); 18 Jun 2003 14:09:42 -0000
Message-Id: <r85ra34p.fsf@ID-23066.news.dfncis.de>
Date: 18 Jun 2003 16:09:42 +0200
From: "clemens fischer" <ino-qc@spotteswoode.de.eu.org>
To: FreeBSD-gnats-submit@freebsd.org
Cc: "Matthias Teege" <info@mteege.de>
Subject: poll(2) semantics differ from susV3/POSIX
X-Send-Pr-Version: 3.113

>Number:         53447
>Category:       kern
>Synopsis:       [kernel] poll(2) semantics differ from susV3/POSIX
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    alfred
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jun 18 07:10:13 PDT 2003
>Closed-Date:    
>Last-Modified:  Wed Mar 22 21:02:24 GMT 2006
>Originator:     Clemens Fischer ino-qc@spotteswoode.dnsalias.org
>Release:        FreeBSD 4.8-STABLE i386
>Organization:
>Environment:
System: FreeBSD private.spanker 4.8-STABLE FreeBSD 4.8-STABLE #3: Thu May 29 06:19:09 CEST 2003 root@private.spanker:/usr/src/sys/compile/n1 i386

the program is fnord (http://www.fefe.de/fnord/) running on
freebsd-4.8/i386 and serving a wiki-CGI.

>Description:

a colleague and i independantly made the same observation: we are
running a wiki on a small HTTP server.  every page served by it had an
error message on the bottom: "Looks like the CGI crashed.".  we could
track this down to the code in the server where data is read from the
CGI through a pipe.  this is done using poll(2) and read(2).  the same
code runs without problems on linux, and we can patch fnord to work
around the problem, which is otherwise reproducable.

this is part of the discussion thread on the mailinglist:

  > i had the same problem on my freebsd-4.8-stable.  every page had
  > "looks like your CGI crashed" at the bottom, but they actually
  > worked fine.  after applying the patch the problem has
  > disappeared.

  Mhh, then this is apparently a problem with BSD poll() semantics.

  poll is expected to set the POLLHUP bit on EOF, but FreeBSD
  apparently does not, but signals POLLIN and then returns 0 on
  read().  Is someone involved with the FreeBSD crowd and can post a
  bug report for this?

  ---
  See the single unix specification.

    http://www.opengroup.org/onlinepubs/007904975/functions/poll.html

  POLLHUP shall be set if the device has been disconnected, i.e. for
  sockets if the other side has called shutdown or close.  We are
  polling on a pipe from the CGI.  When the CGI is done, the pipe is
  closed, and we should received POLLHUP.  That is exactly what this
  return bit is for.

>How-To-Repeat:

this is an excerpt of fnords code.  using poll(2) on the pipe to the
CGIs server in this way will produce the expected results, but the
last line always states: "Looks like the CGI crashed.".

static void start_cgi(int nph,const char* pathinfo,const char *const *envp) {
  size_t size=0;
  int n;
  int pid;
  char ibuf[8192],obuf[8192];
  int fd[2],df[2];

  if (pipe(fd)||pipe(df)) {
    badrequest(500,"Internal Server Error","Server Resource problem.");
  }

  if ((pid=fork())) {
    if (pid>0) {
      struct pollfd pfd[2];
      int nr=1;
      int startup=1;

      signal(SIGCHLD,cgi_child);
      signal(SIGPIPE,SIG_IGN);		/* NO! no signal! */

      close(df[0]);
      close(fd[1]);

      pfd[0].fd=fd[0];
      pfd[0].events=POLLIN;
      pfd[0].revents=0;

      pfd[1].fd=df[1];
      pfd[1].events=POLLOUT;
      pfd[1].revents=0;

      if (post_len) ++nr;	/* have post data */
      else close(df[1]);	/* no post data */

      while(poll(pfd,nr,-1)!=-1) {
	/* read from cgi */
	if (pfd[0].revents&POLLIN) {
	  n=read(fd[0],ibuf,sizeof(ibuf));
	  // if (n<=0) goto cgi_500;             this is the original code
          if (n<=0 && errno!=0) goto cgi_500; // this is the workaround
	  /* startup */
	  if (startup) {
	    startup=0;
	    ...
	  }
	  /* non startup */
	  else {
	    buffer_put(buffer_1,ibuf,n);
	  }
	  size+=n;
	  if (pfd[0].revents&POLLHUP) break;
	}
	/* write to cgi the post data */
	else if (nr>1 && pfd[1].revents&POLLOUT) {
	  if (post_miss) {
	    write(df[1],post_miss,post_mlen);
	    post_miss=0;
	  }
	  else if (post_mlen<post_len) {
	    n=read(0,obuf,sizeof(obuf));
	    if (n<1) goto cgi_500;
	    post_mlen+=n;
	    write(df[1],obuf,n);
	  }
	  else {
	    --nr;
	    close(df[1]);
	  }
	}
	else if (pfd[0].revents&POLLHUP) break;
	else {
cgi_500:  if (startup)
	    badrequest(500,"Internal Server Error","Looks like the CGI crashed.");
	  else {
	    buffer_puts(buffer_1,"\n\n");
	    buffer_puts(buffer_1,"Looks like the CGI crashed.");
	    buffer_puts(buffer_1,"\n\n");
	    break;
	  }
	}
      }

      buffer_flush(buffer_1);
      dolog(size);
      ...

>Fix:

i have classified the problems Severity as "serious", although for the
case of poll(2) loops a workaround is easy to find.  on the other hand
people porting susv3 compliant software to freebsd will have to do
this for every poll(2) use.  so it can well become critical to other
people who aren't aware of this difference.

here's a typical expression found in linux application code:

      while (poll(pfd,nr,-1) != -1) {
	/* read from cgi */
	if (pfd[0].revents & POLLIN) {
	  n = read(fd[0], ibuf, sizeof(ibuf));
	  if (n<=0) goto cgi_500;                // <-
	  ...
        }
      }

and here's what makes it run reliably on freebsd-4.8:

      while (poll(pfd,nr,-1) != -1) {
	/* read from cgi */
	if (pfd[0].revents & POLLIN) {
	  n = read(fd[0], ibuf, sizeof(ibuf));
          if (n<=0 && errno!=0) goto cgi_500;    // <-
	  ...
        }
      }

  clemens
>Release-Note:
>Audit-Trail:

From: "Artem 'Zazoobr' Ignatjev" <timon@memphis.mephi.ru>
To: freebsd-gnats-submit@freebsd.org
Cc:  
Subject: Re: kern/53447: poll(2) semantics differ from susV3/POSIX
Date: 18 Jun 2003 20:54:29 +0400

 clemens fischer wrote:
 
 > a colleague and i independantly made the same observation: we are
 > running a wiki on a small HTTP server.  every page served by it had an
 > error message on the bottom: "Looks like the CGI crashed.".  we could
 > track this down to the code in the server where data is read from the
 > CGI through a pipe.  this is done using poll(2) and read(2).  the same
 > code runs without problems on linux, and we can patch fnord to work
 > around the problem, which is otherwise reproducable.
 > 
 > this is part of the discussion thread on the mailinglist:
 > 
 >   > i had the same problem on my freebsd-4.8-stable.  every page had
 >   > "looks like your CGI crashed" at the bottom, but they actually
 >   > worked fine.  after applying the patch the problem has
 >   > disappeared.
 > 
 >   Mhh, then this is apparently a problem with BSD poll() semantics.
 > 
 >   poll is expected to set the POLLHUP bit on EOF, but FreeBSD
 >   apparently does not, but signals POLLIN and then returns 0 on
 >   read().  Is someone involved with the FreeBSD crowd and can post a
 >   bug report for this?
 > 
 FreeBSD DOES set POLLHUP bit; but, also, EOF on pipe or disconnected
 socket can be caught by reading 0 bytes from ready-to-read descriptor.
 See the code below (it's /sys/kern/sys_pipe.c 1.60.2.13, used in FreeBSD
 4.8-RELEASE):
 int
 pipe_poll(fp, events, cred, p)
 	struct file *fp;
 	int events;
 	struct ucred *cred;
 	struct proc *p;
 {
 	struct pipe *rpipe = (struct pipe *)fp->f_data;
 	struct pipe *wpipe;
 	int revents = 0;
 
 	wpipe = rpipe->pipe_peer;
 	if (events & (POLLIN | POLLRDNORM))
 		if ((rpipe->pipe_state & PIPE_DIRECTW) ||
 		    (rpipe->pipe_buffer.cnt > 0) ||
 >		    (rpipe->pipe_state & PIPE_EOF))
 >			revents |= events & (POLLIN | POLLRDNORM);
 
 	if (events & (POLLOUT | POLLWRNORM))
 		if (wpipe == NULL || (wpipe->pipe_state & PIPE_EOF) ||
 		    (((wpipe->pipe_state & PIPE_DIRECTW) == 0) &&
 		     (wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >= PIPE_BUF))
 			revents |= events & (POLLOUT | POLLWRNORM);
 
 >	if ((rpipe->pipe_state & PIPE_EOF) ||
 >	    (wpipe == NULL) ||
 >	    (wpipe->pipe_state & PIPE_EOF))
 >		revents |= POLLHUP;
 
 	if (revents == 0) {
 		if (events & (POLLIN | POLLRDNORM)) {
 			selrecord(p, &rpipe->pipe_sel);
 			rpipe->pipe_state |= PIPE_SEL;
 		}
 
 		if (events & (POLLOUT | POLLWRNORM)) {
 			selrecord(p, &wpipe->pipe_sel);
 			wpipe->pipe_state |= PIPE_SEL;
 		}
 	}
 
 	return (revents);
 }
 
 

From: Bruce Evans <bde@zeta.org.au>
To: "Artem 'Zazoobr' Ignatjev" <timon@memphis.mephi.ru>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/53447: poll(2) semantics differ from susV3/POSIX
Date: Thu, 19 Jun 2003 14:18:12 +1000 (EST)

 On Wed, 18 Jun 2003, Artem 'Zazoobr' Ignatjev wrote:
 
 >  clemens fischer wrote:
 >  > ...
 >  >   Mhh, then this is apparently a problem with BSD poll() semantics.
 >  >
 >  >   poll is expected to set the POLLHUP bit on EOF, but FreeBSD
 >  >   apparently does not, but signals POLLIN and then returns 0 on
 >  >   read().  Is someone involved with the FreeBSD crowd and can post a
 >  >   bug report for this?
 >  >
 >  FreeBSD DOES set POLLHUP bit; but, also, EOF on pipe or disconnected
 >  socket can be caught by reading 0 bytes from ready-to-read descriptor.
 
 The latter is very standard (required by POSIX).  Whether POLLIN should
 be set together with POLLHUP for EOF is not so clear.  It is permitted
 by POSIX and seems least surprising, so FreeBSD does it.  POSIX mainly
 requires POLLOUT and POLLHUP to not both be set.  This all goes naturally
 with read(), write() and select() semantics: for most types of files
 including pipes, read() returns 0 with no error on EOF, and select()
 has no standard way to select on EOF, so reading works best if EOF
 satisfies POLLIN.  OTOH write() returns -1 and a nonzero errno (EPIPE
 for pipes) on EOF, and write-selects on pipes (if not the whole process)
 normallt get terminated by SIGPIPE so select()'s lack of understanding
 of EOF is less of a problem for writes than for reads.
 
 POLLHUP is more broken for named pipes and sockets than for nameless
 pipes.  It seems to be unimplemented, and FreeBSD may have broken
 POLLHUP for all types of EOFs by making poll() and select() for reading
 always block waiting for a writer if there isn't one (and there is no
 data).  Other systems apparently handle initial EOFs (ones where the
 open() was nonblocking and there was no writer at open time and none
 since) specially, but POSIX doesn't seem to mention an special handling
 for initial EOFs and handling all EOFs like this makes it harder to
 detect them.
 
 >  See the code below (it's /sys/kern/sys_pipe.c 1.60.2.13, used in FreeBSD
 >  4.8-RELEASE):
 >  int
 >  pipe_poll(fp, events, cred, p)
 >  	struct file *fp;
 >  	int events;
 >  	struct ucred *cred;
 >  	struct proc *p;
 >  {
 >  	struct pipe *rpipe = (struct pipe *)fp->f_data;
 >  	struct pipe *wpipe;
 >  	int revents = 0;
 >
 >  	wpipe = rpipe->pipe_peer;
 >  	if (events & (POLLIN | POLLRDNORM))
 >  		if ((rpipe->pipe_state & PIPE_DIRECTW) ||
 >  		    (rpipe->pipe_buffer.cnt > 0) ||
 >  >		    (rpipe->pipe_state & PIPE_EOF))
 >  >			revents |= events & (POLLIN | POLLRDNORM);
 >
 >  	if (events & (POLLOUT | POLLWRNORM))
 >  		if (wpipe == NULL || (wpipe->pipe_state & PIPE_EOF) ||
 >  		    (((wpipe->pipe_state & PIPE_DIRECTW) == 0) &&
 >  		     (wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >= PIPE_BUF))
 >  			revents |= events & (POLLOUT | POLLWRNORM);
 >
 >  >	if ((rpipe->pipe_state & PIPE_EOF) ||
 >  >	    (wpipe == NULL) ||
 >  >	    (wpipe->pipe_state & PIPE_EOF))
 >  >		revents |= POLLHUP;
 
 The only known bug in polling on nameless pipes is near here.  POLLHUP is
 set for both sides if PIPE_EOF is set for either side.  This may be correct
 for writing but it is broken for reading.  The writer may have written
 something and then exited.  This gives POLLHUP for the reader (presumably
 because it gives PIPE_EOF for the writer).  But EOF, and thus POLLHUP, should
 not occur for the reader until the data already written had been read.  This
 bug breaks at least gdb's detection of EOF (try "echo 'p 0' | gdb /bin/cat").
 
 Bruce
Responsible-Changed-From-To: freebsd-bugs->freebsd-standards 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Fri Sep 3 04:01:36 GMT 2004 
Responsible-Changed-Why:  
Sounds like a standards problem. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=53447 
Responsible-Changed-From-To: freebsd-standards->bde 
Responsible-Changed-By: bde 
Responsible-Changed-When: Mon Apr 25 17:04:09 GMT 2005 
Responsible-Changed-Why:  
FreeBSD has changes that are supposed to fix problems in this area, 
but the changes seem to just make things worse -- there are now 
at least 2 more PRs about the new misbehaviour. 

The correct fix seems to be to simply implement POLLHUP, and not ignore 
EOF on FIFOs like FreeBSD does now.  On EOF, applications polling for 
POLLIN (including via select() on a read descriptor) should get POLLIN 
returned (or the read descriptor bit set for select()) like they used 
to.  For poll(), POLLHUP is set too, and applications should check 
this if they don't want to read EOF.  For select(), it is not easy 
to avoid endlessy reading EOF in some cases, but POSIX is very clear 
(much clearer than for poll()) that select() on a read descriptor must 
return immediately on EOF. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=53447 
Responsible-Changed-From-To: bde->alfred 
Responsible-Changed-By: bde 
Responsible-Changed-When: Tue Jan 3 00:39:26 UTC 2006 
Responsible-Changed-Why:  
I won't be fixing this any time soon.  Alfred broke it last. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=53447 
>Unformatted:
