From netch@lucky.net  Tue Oct  2 05:55:04 2001
Return-Path: <netch@lucky.net>
Received: from burka.carrier.kiev.ua (burka.carrier.kiev.ua [193.193.193.107])
	by hub.freebsd.org (Postfix) with ESMTP id BB61937B40B
	for <FreeBSD-gnats-submit@freebsd.org>; Tue,  2 Oct 2001 05:54:57 -0700 (PDT)
Received: from netch@localhost (netch@localhost)
	by burka.carrier.kiev.ua  id PVC23146;
	Tue, 2 Oct 2001 15:54:49 +0300 (EEST)
	(envelope-from netch)
Message-Id: <200110021254.PVC23146@burka.carrier.kiev.ua>
Date: Tue, 2 Oct 2001 15:54:49 +0300 (EEST)
From: Valentin Nechayev <netch@lucky.net>
Reply-To: Valentin Nechayev <netch@segfault.kiev.ua>
To: FreeBSD-gnats-submit@freebsd.org
Subject: incorrect signal handling in snpread()
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         30985
>Category:       kern
>Synopsis:       incorrect signal handling in snpread()
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Oct 02 06:00:01 PDT 2001
>Closed-Date:    Sat Nov 24 08:00:14 PST 2001
>Last-Modified:  Sat Nov 24 08:00:28 PST 2001
>Originator:     Valentin Nechayev <netch@lucky.net>
>Release:        FreeBSD 4.4-RELEASE i386
>Organization:
Lucky Net Ltd.
>Environment:

Found on: FreeBSD 4.2-STABLE
Confirmed on: FreeBSD 4.4-RELEASE

>Description:

sleeping in snpread() waiting for data is interruptible (PCATCH flag),
but does not check for tsleep() status which can say that signal came.
As a result, system hangs in "forever" cycle.

Discovered by: Vladimir Jakovenko <vovik@lucky.net>

>How-To-Repeat:

Write any userland program which does read() on snp(4) device in
blocking mode. Run it and interrupt it from its terminal with intr character
(Ctrl-C in standard case). Local (virtual) console, e.g. ttyv1, may be
required.
Standard watch(8) does not show this behavoir in 99.99...% cases because
it waits in select(), not in read(). But it also can fall to this case
in rare case when signal occurs during snpread() (am I wrong?)

>Fix:

Following simple fix provides checking for tsleep() return code
and EINTR returning when signal is caught. It is applicable for
all late 4.* systems.
It also increases IPL around the data wait cycle. As data can be put
to snoop device during interrupt (am I wrong?), it is desirable.
If hard-interrupt and soft-interrupt handling code can't put data to
snoop device, this IPL increasing isn't needed.
I didn't increase IPL around uiomove() due to some deja-vu.

But it should be noted that there are some principal architectural
issues in this approach. As snp device is designed for use when
select/poll is used to wait and ioctl(FIONREAD) is desired to get
tty status, it may be desirable to avoid blocking reads of snp device
totally. I can't suppose what approach is more right.

==={{{
--- tty_snoop.c.0	Thu Nov 18 08:39:47 1999
+++ tty_snoop.c	Tue Oct  2 15:23:54 2001
@@ -143,18 +143,26 @@
 	if (snp->snp_tty == NULL)
 		return (EIO);
 
+	s = spltty();
 	snp->snp_flags &= ~SNOOP_RWAIT;
 
 	do {
 		if (snp->snp_len == 0) {
-			if (flag & IO_NDELAY)
+			if (flag & IO_NDELAY) {
+				splx(s);
 				return (EWOULDBLOCK);
+			}
 			snp->snp_flags |= SNOOP_RWAIT;
-			tsleep((caddr_t) snp, (PZERO + 1) | PCATCH, "snoopread", 0);
+			error = tsleep((caddr_t) snp, (PZERO + 1) | PCATCH, "snoopread", 0);
+			if (error == EINTR || error == ERESTART) {
+				splx(s);
+				return EINTR;
+			}
 		}
 	} while (snp->snp_len == 0);
 
 	n = snp->snp_len;
+	splx(s);
 
 	while (snp->snp_len > 0 && uio->uio_resid > 0 && error == 0) {
 		len = MIN(uio->uio_resid, snp->snp_len);
===}}}


/netch
>Release-Note:
>Audit-Trail:

From: netch@segfault.kiev.ua (Valentin Nechayev)
To: freebsd-gnats-submit@freebsd.org
Cc:  
Subject: Re: kern/30985: incorrect signal handling in snpread()
Date: Thu, 4 Oct 2001 13:27:51 +0300 (EEST)

 >>Synopsis:       incorrect signal handling in snpread()
 
 VN> --- tty_snoop.c.0	Thu Nov 18 08:39:47 1999
 VN> +++ tty_snoop.c	Tue Oct  2 15:23:54 2001
 
 For 5-current, the same bug is in src/sys/dev/snp/snp.c.
 
 
 /netch

From: Dima Dorfman <dima@trit.org>
To: Valentin Nechayev <netch@segfault.kiev.ua>
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/30985: incorrect signal handling in snpread() 
Date: Sun, 07 Oct 2001 08:50:41 -0700

 Valentin Nechayev <netch@lucky.net> wrote:
 > >Fix:
 > 
 > Following simple fix provides checking for tsleep() return code
 > and EINTR returning when signal is caught. It is applicable for
 > all late 4.* systems.
 > It also increases IPL around the data wait cycle. As data can be put
 > to snoop device during interrupt (am I wrong?), it is desirable.
 
 I don't think it's possible for data to enter snp during an interrupt.
 At least, there are no provisions for this anywhere else in the code.
 Thus, I think this is undesirable.
 
 > But it should be noted that there are some principal architectural
 > issues in this approach. As snp device is designed for use when
 > select/poll is used to wait and ioctl(FIONREAD) is desired to get
 > tty status, it may be desirable to avoid blocking reads of snp device
 > totally. I can't suppose what approach is more right.
 
 I think it's okay to sleep in snpread.  snp generally tries to act
 like any other driver, and most allow sleeping in their read routine.
 
 >  	do {
 >  		if (snp->snp_len == 0) {
 > -			if (flag & IO_NDELAY)
 > +			if (flag & IO_NDELAY) {
 > +				splx(s);
 >  				return (EWOULDBLOCK);
 > +			}
 >  			snp->snp_flags |= SNOOP_RWAIT;
 > -			tsleep((caddr_t) snp, (PZERO + 1) | PCATCH, "snoopread"
 > , 0);
 > +			error = tsleep((caddr_t) snp, (PZERO + 1) | PCATCH, "snoopread", 0);
 > +			if (error == EINTR || error == ERESTART) {
 > +				splx(s);
 > +				return EINTR;
 > +			}
 
 Why can't we just return whatever tsleep() returns, as most (all?)
 other drivers do?  Like so (untested):
 
 Index: snp.c
 ===================================================================
 RCS file: /ref/cvsf/src/sys/dev/snp/snp.c,v
 retrieving revision 1.63
 diff -u -r1.63 snp.c
 --- snp.c	2001/09/12 08:37:11	1.63
 +++ snp.c	2001/10/07 15:44:47
 @@ -255,7 +255,10 @@
  			if (flag & IO_NDELAY)
  				return (EWOULDBLOCK);
  			snp->snp_flags |= SNOOP_RWAIT;
 -			tsleep((caddr_t)snp, (PZERO + 1) | PCATCH, "snprd", 0);
 +			error = tsleep((caddr_t)snp, (PZERO + 1) | PCATCH,
 +			    "snprd", 0);
 +			if (error != 0)
 +				return (error);
  		}
  	} while (snp->snp_len == 0);
  

From: Valentin Nechayev <netch@netch.kiev.ua>
To: Dima Dorfman <dima@trit.org>
Cc: Valentin Nechayev <netch@segfault.kiev.ua>,
	FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/30985: incorrect signal handling in snpread()
Date: Sun, 7 Oct 2001 21:15:36 +0300

  Sun, Oct 07, 2001 at 08:50:41, dima wrote about "Re: kern/30985: incorrect signal handling in snpread()": 
 
 > > -			tsleep((caddr_t) snp, (PZERO + 1) | PCATCH, "snoopread"
 > > , 0);
 > > +			error = tsleep((caddr_t) snp, (PZERO + 1) | PCATCH, "snoopread", 0);
 > > +			if (error == EINTR || error == ERESTART) {
 > > +				splx(s);
 > > +				return EINTR;
 > > +			}
 > 
 > Why can't we just return whatever tsleep() returns, as most (all?)
 > other drivers do?  Like so (untested):
 
 I am not experienced kernel hacker ;) The examples I saw in kernel code,
 mostly test for ERESTART and possibly EINTR and exit from routine
 in case of such codes. tsleep(9) man page (in RELENG_4_4) mentions the only
 another return code allowed - EWOULDBLOCK in timeout case, but snpread()
 doesn't suppose timeout. If you suppose that exit on any nonzero value
 is correct, you probably are right.
 
 >  			snp->snp_flags |= SNOOP_RWAIT;
 > -			tsleep((caddr_t)snp, (PZERO + 1) | PCATCH, "snprd", 0);
 > +			error = tsleep((caddr_t)snp, (PZERO + 1) | PCATCH,
 > +			    "snprd", 0);
 > +			if (error != 0)
 > +				return (error);
 >  		}
 
 
 /netch
State-Changed-From-To: open->closed 
State-Changed-By: dd 
State-Changed-When: Sat Nov 24 08:00:14 PST 2001 
State-Changed-Why:  
FIxed in -current, thanks!  (And sorry for the delay.) 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=30985 
>Unformatted:
