From nobody@FreeBSD.org  Fri Feb 10 07:04:41 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E106A106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 10 Feb 2012 07:04:41 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id B5A048FC1E
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 10 Feb 2012 07:04:41 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q1A74f7U051745
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 10 Feb 2012 07:04:41 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id q1A74f98051744;
	Fri, 10 Feb 2012 07:04:41 GMT
	(envelope-from nobody)
Message-Id: <201202100704.q1A74f98051744@red.freebsd.org>
Date: Fri, 10 Feb 2012 07:04:41 GMT
From: Diomidis Spinellis <dds@aueb.gr>
To: freebsd-gnats-submit@FreeBSD.org
Subject: tee looses data when writing to non-blocking file descriptors
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         164947
>Category:       bin
>Synopsis:       [patch] tee(1) loses data when writing to non-blocking file descriptors
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Feb 10 07:10:09 UTC 2012
>Closed-Date:    
>Last-Modified:  Sun Apr 20 00:09:59 UTC 2014
>Originator:     Diomidis Spinellis
>Release:        8.1
>Organization:
AUEB
>Environment:
FreeBSD istlab.dmst.aueb.gr 8.1-RELEASE-p6 FreeBSD 8.1-RELEASE-p6 #0: Tue Nov  1 15:16:34 EET 2011     dds@istlab.dmst.aueb.gr:/usr/obj/usr/src/sys/ISTLAB  i386
You have new mail in /var/mail/dds

>Description:
When tee(1) tries to write to a file descriptor that has been set to non-blocking mode the write(2) call may fail with EAGAIN.  Instead of retrying the operation, tee will throw that chunk of data away.
>How-To-Repeat:
Run the following:
#!/usr/local/bin/bash
# bash needed for the >(...) functionality
# ssh apparently sets O_NONBLOCK
# Remove the 2>/dev/null to see tee complaining
dd count=100000 if=/dev/zero | 
tee >(ssh localhost dd of=/dev/null) 2>/dev/null | 
(ssh localhost dd of=/dev/null)

100000+0 records in
100000+0 records out
51200000 bytes transferred in 9.224390 secs (5550503 bytes/sec)
100000+0 records in
100000+0 records out
51200000 bytes transferred in 9.061471 secs (5650297 bytes/sec)
92080+0 records in
92080+0 records out
47144960 bytes transferred in 9.101738 secs (5179776 bytes/sec)

>Fix:
I attach a patch that fixes the problem.

Patch attached with submission follows:

--- tee.c	2012/02/08 14:50:10	1.1
+++ tee.c	2012/02/08 14:59:10
@@ -46,8 +46,10 @@
 #endif /* not lint */
 
 #include <sys/types.h>
+#include <sys/select.h>
 #include <sys/stat.h>
 #include <err.h>
+#include <errno.h>
 #include <fcntl.h>
 #include <signal.h>
 #include <stdio.h>
@@ -64,6 +66,7 @@
 
 void add(int, const char *);
 static void usage(void);
+static void waitfor(int fd);
 
 int
 main(int argc, char *argv[])
@@ -110,9 +113,14 @@
 			bp = buf;
 			do {
 				if ((wval = write(p->fd, bp, n)) == -1) {
-					warn("%s", p->name);
-					exitval = 1;
-					break;
+					if (errno == EAGAIN) {
+						waitfor(p->fd);
+						wval = 0;
+					} else {
+						warn("%s", p->name);
+						exitval = 1;
+						break;
+					}
 				}
 				bp += wval;
 			} while (n -= wval);
@@ -141,3 +149,15 @@
 	p->next = head;
 	head = p;
 }
+
+/* Wait for the specified fd to be ready for writing */
+static void
+waitfor(int fd)
+{
+	fd_set writefds;
+
+	FD_ZERO(&writefds);
+	FD_SET(fd, &writefds);
+	if (select(fd + 1, NULL, &writefds, NULL, NULL) == -1)
+		err(1, "select");
+}


>Release-Note:
>Audit-Trail:

From: Martin Cracauer <cracauer@cons.org>
To: Diomidis Spinellis <dds@aueb.gr>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: bin/164947: tee looses data when writing to non-blocking file descriptors
Date: Fri, 10 Feb 2012 14:17:36 -0500

 I don't think it is ssh that is causing this. If you use a named pipe
 explicitly and hook ssh up to that the error doesn't appear.  Seems to
 be something that bash is doing there.
 
 That doesn't mean I am opposed to handling EAGAIN.
 
 The way I normally do it is a simple retry loop, not using select.
 I'm aware of the tradeoffs, so far I was always better off not
 investing a second system call into every retry.
 
 Martin
 
 -- 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Martin Cracauer <cracauer@cons.org>   http://www.cons.org/cracauer/

From: Diomidis Spinellis <dds@aueb.gr>
To: Martin Cracauer <cracauer@cons.org>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: bin/164947: tee looses data when writing to non-blocking file
 descriptors
Date: Fri, 10 Feb 2012 22:32:02 +0200

 > I don't think it is ssh that is causing this. If you use a named pipe
 > explicitly and hook ssh up to that the error doesn't appear.  Seems to
 > be something that bash is doing there.
 
 I think the named pipe isolates the write fd from the ssh end.  If you 
 use cat or dd instead of ssh the problem goes away.
 
 > That doesn't mean I am opposed to handling EAGAIN.
 >
 > The way I normally do it is a simple retry loop, not using select.
 > I'm aware of the tradeoffs, so far I was always better off not
 > investing a second system call into every retry.
 
 I agree this can be cheaper for many cases, but it can become very 
 expensive for long waits.

From: Martin Cracauer <cracauer@cons.org>
To: Diomidis Spinellis <dds@aueb.gr>
Cc: Martin Cracauer <cracauer@cons.org>, freebsd-gnats-submit@freebsd.org
Subject: Re: bin/164947: tee looses data when writing to non-blocking file descriptors
Date: Fri, 10 Feb 2012 16:03:02 -0500

 > I think the named pipe isolates the write fd from the ssh end.  If you 
 > use cat or dd instead of ssh the problem goes away.
 
 Do you happen to know what bash does there, exactly? I was assuming it
 is creating a named pipe behind the user's back.
 
 I noticed that if you do ssh on the "tee part" and something else on
 the end of the regular pipe then things also fail.  On the other hand
 if you put the "tee part" on something else and the regular pipe on
 ssh things never seem to fail.
 
 tee treats both fds the same, and obviously ssh is always setting up
 it's input the same way, so the difference must be in what bash is
 doing there with that "pipe emulation".
 
 > I agree this can be cheaper for many cases, but it can become very 
 > expensive for long waits.
 
 I'd like to understand what exactly is special about the way bash
 implements that feature so that we can make a more educated decision
 about the tradeoff of using select or not.
 
 Martin
 -- 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Martin Cracauer <cracauer@cons.org>   http://www.cons.org/cracauer/

From: Diomidis Spinellis <dds@aueb.gr>
To: Martin Cracauer <cracauer@cons.org>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: bin/164947: tee looses data when writing to non-blocking file
 descriptors
Date: Fri, 10 Feb 2012 23:17:03 +0200

 > Do you happen to know what bash does there, exactly? I was assuming it
 > is creating a named pipe behind the user's back.
 
 It is creating a normal pipe and providing it as an argument through 
 /dev/fd.  Try
 
 ls -l /dev/fd >(wc -l)
 
 > I noticed that if you do ssh on the "tee part" and something else on
 > the end of the regular pipe then things also fail.  On the other hand
 > if you put the "tee part" on something else and the regular pipe on
 > ssh things never seem to fail.
 
 On 8.1 release I needed both ends to run ssh to see the problem.
 
 
 BTW The problem also manifests itself on Mac OS X and Linux :-)

From: Martin Cracauer <cracauer@cons.org>
To: Diomidis Spinellis <dds@aueb.gr>
Cc: Martin Cracauer <cracauer@cons.org>, freebsd-gnats-submit@freebsd.org
Subject: Re: bin/164947: tee looses data when writing to non-blocking file descriptors
Date: Fri, 10 Feb 2012 17:16:32 -0500

 > It is creating a normal pipe and providing it as an argument through 
 > /dev/fd.  Try
 > 
 > ls -l /dev/fd >(wc -l)
 
 Hmmm, this is what I get in ps from this pipe:
 28571  1  T    0:01.56 emacs -nw tee.c.rej
 29598  1  T    0:00.00 cstream -n 10m -i- -v2
 29599  1  T    0:00.00 -bash (bash)
 29600  1  T    0:00.02 ssh localhost dd of=/dev/null
 29603  1  T    0:00.00 tee /tmp/cracauer/sh-np-1328937382
 29609  1  R+   0:00.00 ps
 usr.bin/tee(wings)152% ls -l  /tmp/cracauer/sh-np-1328937382
 prw-------  1 cracauer  wheel  0 Feb 10 16:38 /tmp/cracauer/sh-np-1328937382|
 
 Either way, I tested your patch, it fixes the problem and it's
 obviously correct (EAGAIN needs to be taken into account) so I'm gonna
 commit it.
 
 -- 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Martin Cracauer <cracauer@cons.org>   http://www.cons.org/cracauer/

From: David Xu <listlog2011@gmail.com>
To: Diomidis Spinellis <dds@aueb.gr>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: bin/164947: tee looses data when writing to non-blocking file
 descriptors
Date: Sat, 11 Feb 2012 11:21:38 +0800

 > When tee(1) tries to write to a file descriptor that has been set to
 > non-blocking mode the write(2) call may fail with EAGAIN.  Instead of
 > retrying the operation, tee will throw that chunk of data away.

 so tee should also work with non-blocking read,  your patch is incomplete.
 

From: Diomidis Spinellis <dds@aueb.gr>
To: davidxu@FreeBSD.org
Cc: David Xu <listlog2011@gmail.com>, freebsd-gnats-submit@FreeBSD.org
Subject: Re: bin/164947: tee looses data when writing to non-blocking file
 descriptors
Date: Sat, 11 Feb 2012 09:25:34 +0200

 >> When tee(1) tries to write to a file descriptor that has been set to
 >> non-blocking mode the write(2) call may fail with EAGAIN. Instead of
 >> retrying the operation, tee will throw that chunk of data away.
 > so tee should also work with non-blocking read, your patch is incomplete.
 
 You're right.  By the same argument all other utilities should also be 
 fixed.  However, this may create new bugs and instability. For the 
 specific case of tee writing I offered a test case, demonstrating the 
 problem.  This was distilled from an actual production use (scattering a 
 dump to tape and disk).  I think it's best to fix each utility as the 
 need arises.
>Unformatted:
