From nobody@FreeBSD.org  Fri Mar 22 08:59:42 2002
Return-Path: <nobody@FreeBSD.org>
Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21])
	by hub.freebsd.org (Postfix) with ESMTP id 0227437B41C
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 22 Mar 2002 08:59:42 -0800 (PST)
Received: (from nobody@localhost)
	by freefall.freebsd.org (8.11.6/8.11.6) id g2MGxfu43089;
	Fri, 22 Mar 2002 08:59:41 -0800 (PST)
	(envelope-from nobody)
Message-Id: <200203221659.g2MGxfu43089@freefall.freebsd.org>
Date: Fri, 22 Mar 2002 08:59:41 -0800 (PST)
From: Todd Hayton <thayton@torrentnet.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: read() system call never returns in some cases
X-Send-Pr-Version: www-1.0

>Number:         36209
>Category:       kern
>Synopsis:       read() system call never returns in some cases
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Mar 22 09:00:06 PST 2002
>Closed-Date:    Sun Dec 08 10:11:47 PST 2002
>Last-Modified:  Sun Dec 08 10:11:47 PST 2002
>Originator:     Todd Hayton
>Release:        FreeBSD 2.2.2
>Organization:
Ericsson IPI
>Environment:
>Description:
Essentially the problem is this - about 1 in every 100 copies using
scp (scp somefile user@remosthost:somefile), the entire machine will
hang. Pings will work for a short time, but you can't telnet into
the machine. Wait long enough and the pings will eventually fail as
well.

I started doing ktrace's of each scp, and the last system call to
get recorded is always read() and it's always a read on a file
descriptor created by pipe():

ktrace 1
--------
cognac> kdump -f scp_ktrace.out | tail -20
  3725 scp      CALL  old.sigaction(0xd,0x7fbfdd00,0x7fbfdcf4)
  3725 scp      RET   old.sigaction 0
  3725 scp      CALL  pipe
  3725 scp      RET   pipe 9
  3725 scp      CALL  pipe
  3725 scp      RET   pipe 11/0xb
  3725 scp      CALL  pipe
  3725 scp      RET   pipe 13/0xd
  3725 scp      CALL  close(0x9)
  3725 scp      RET   close 0
  3725 scp      CALL  close(0xa)
  3725 scp      RET   close 0
  3725 scp      CALL  fork
  3725 scp      RET   fork 3726/0xe8e
  3725 scp      CALL  close(0xb)
  3725 scp      RET   close 0
  3725 scp      CALL  close(0xe)
  3725 scp      RET   close 0
  3725 scp      CALL  read(0xd,0x7fbfdcbf,0x1)
  3725 scp      PSIG  SIGINT SIG_DFL

ktrace 2
--------
cognac> kdump -f scp_ktrace2.out | tail -20
  2456 scp      RET   chdir 0
  2456 scp      CALL  old.sigaction(0xd,0x7fbfdd30,0x7fbfdd24)
  2456 scp      RET   old.sigaction 0
  2456 scp      CALL  pipe
  2456 scp      RET   pipe 9
  2456 scp      CALL  pipe
  2456 scp      RET   pipe 11/0xb
  2456 scp      CALL  pipe
  2456 scp      RET   pipe 13/0xd
  2456 scp      CALL  close(0x9)
  2456 scp      RET   close 0
  2456 scp      CALL  close(0xa)
  2456 scp      RET   close 0
  2456 scp      CALL  fork
  2456 scp      RET   fork 2457/0x999
  2456 scp      CALL  close(0xb)
  2456 scp      RET   close 0
  2456 scp      CALL  close(0xe)
  2456 scp      RET   close 0
  2456 scp      CALL  read(0xd,0x7fbfdcef,0x1)

ktrace 3
--------
cognac> kdump -f scp_ktrace3.out | tail -20
   727 scp      RET   chdir 0
   727 scp      CALL  old.sigaction(0xd,0x7fbfdd30,0x7fbfdd24)
   727 scp      RET   old.sigaction 0
   727 scp      CALL  pipe
   727 scp      RET   pipe 9
   727 scp      CALL  pipe
   727 scp      RET   pipe 11/0xb
   727 scp      CALL  pipe
   727 scp      RET   pipe 13/0xd
   727 scp      CALL  close(0x9)
   727 scp      RET   close 0
   727 scp      CALL  close(0xa)
   727 scp      RET   close 0
   727 scp      CALL  fork
   727 scp      RET   fork 728/0x2d8
   727 scp      CALL  close(0xb)
   727 scp      RET   close 0
   727 scp      CALL  close(0xe)
   727 scp      RET   close 0
   727 scp      CALL  read(0xd,0x7fbfdcef,0x1)

We're using F-secure's version of SSH. I looked through the
code and it looks as if SSH actually gets some random noise from
the output of commands: it forks() off some children to run some
commands, and then uses the pipes to read the results of the commands
from the child back into the parent. It's these reads that seem to 
never return (and eventually hang the system).

I glanced around on the web and have seen references to pipe_read()
and pipe_write() containing race conditions:

http://www.geocrawler.com/mail/msg.php3?msg_id=2172760&list=159

So, I'm wondering if this is a manifestation of that...the fact that
it takes so many repeated attempts for the lockup to occur would
seem to be characteristic of a race condition...
>How-To-Repeat:
      
>Fix:
      
>Release-Note:
>Audit-Trail:

From: David Malone <dwmalone@maths.tcd.ie>
To: Todd Hayton <thayton@torrentnet.com>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/36209: read() system call never returns in some cases
Date: Sun, 24 Mar 2002 00:21:35 +0000

 On Fri, Mar 22, 2002 at 08:59:41AM -0800, Todd Hayton wrote:
 > >Release:        FreeBSD 2.2.2
 
 Are you really using FreeBSD-2.2.2? Given that is was released
 almost 5 years ago, it isn't really supported any more. However,
 I'd suggest that you look through the commit logs for sys_pipe.c
 at:
 
 	http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sys_pipe.c
 
 and see if you can find a patch which is apropriate.
 
 	David.
State-Changed-From-To: open->closed 
State-Changed-By: iedowse 
State-Changed-When: Sun Dec 8 10:09:52 PST 2002 
State-Changed-Why:  

FreeBSD 2.2.2 is no longer supported. Please try 4.7-RELEASE, and 
open a new PR if you can reproduce the problem there. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=36209 
>Unformatted:
