From toasty@home.dragondata.com  Fri Aug 14 22:28:37 1998
Received: from home.dragondata.com (home.dragondata.com [204.137.237.2])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id WAA18858
          for <FreeBSD-gnats-submit@freebsd.org>; Fri, 14 Aug 1998 22:28:36 -0700 (PDT)
          (envelope-from toasty@home.dragondata.com)
Received: (from toasty@localhost)
	by home.dragondata.com (8.8.8/8.8.5) id AAA02508;
	Sat, 15 Aug 1998 00:28:06 -0500 (CDT)
Message-Id: <199808150528.AAA02508@home.dragondata.com>
Date: Sat, 15 Aug 1998 00:28:06 -0500 (CDT)
From: toasty@dragondata.com
Reply-To: toasty@dragondata.com
To: FreeBSD-gnats-submit@freebsd.org
Subject: odd nfs server not responding messages appear
X-Send-Pr-Version: 3.2

>Number:         7619
>Category:       kern
>Synopsis:       odd nfs server not responding messages appear
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    dillon
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Aug 14 22:30:00 PDT 1998
>Closed-Date:    Thu Aug 10 08:50:34 PDT 2000
>Last-Modified:  Thu Aug 10 08:52:47 PDT 2000
>Originator:     Kevin Day
>Release:        FreeBSD 2.2.7-STABLE i386
>Organization:
DragonData Internet Services
>Environment:

I have a 2.2.5 NFS server, and a 2.2.7 NFS client.

>Description:

Occasionally, the 2.2.7 client spits out a message that the NFS server isn't
responding, then instantly says it's OK afterwards. Meanwhile, transfers are
still going on, and everything seems ok... 

It appears harmless, but it makes me wonder if i'm not slowly corrupting
things when this happens.

>How-To-Repeat:

No clue, it seems rather random.


Here's a dmesg from the client...

Copyright (c) 1992-1998 FreeBSD Inc.
Copyright (c) 1982, 1986, 1989, 1991, 1993
	The Regents of the University of California.  All rights reserved.

FreeBSD 2.2.7-RELEASE #0: Thu Jul 30 16:42:02 CDT 1998
    root@shell1.dragondata.com:/usr/src/sys/compile/SHELL1
CPU: Pentium II (quarter-micron) (398.27-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x651  Stepping=1
  Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,<b16>,<b17>,MMX,<b24>>
real memory  = 402653184 (393216K bytes)
avail memory = 391720960 (382540K bytes)
Probing for devices on PCI bus 0:
chip0 <generic PCI bridge (vendor=8086 device=7190 subclass=0)> rev 2 on pci0:0:0
chip1 <generic PCI bridge (vendor=8086 device=7191 subclass=4)> rev 2 on pci0:1:0
chip2 <Intel 82371AB PCI-ISA bridge> rev 2 on pci0:7:0
chip3 <Intel 82371AB IDE interface> rev 1 on pci0:7:1
chip4 <Intel 82371AB USB interface> rev 1 int d irq 9 on pci0:7:2
chip5 <Intel 82371AB Power management controller> rev 2 on pci0:7:3
de0 <Digital 21140A Fast Ethernet> rev 34 int a irq 11 on pci0:14:0
de0: 21140A [10-100Mb/s] pass 2.2
de0: address 00:40:05:43:a3:a3
de1 <Digital 21140A Fast Ethernet> rev 34 int a irq 10 on pci0:15:0
de1: 21140A [10-100Mb/s] pass 2.2
de1: address 00:40:05:42:dd:26
Probing for devices on PCI bus 1:
vga0 <VGA-compatible display device> rev 92 on pci1:0:0
Probing for devices on the ISA bus:
sc0 at 0x60-0x6f irq 1 on motherboard
sc0: VGA color <16 virtual consoles, flags=0x0>
sio0 at 0x3f8-0x3ff irq 4 on isa
sio0: type 16550A
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1 not found at 0x2f8
lpt0 at 0x378-0x37f irq 7 on isa
lpt0: Interrupt-driven port
lp0: TCP/IP capable interface
lpt1 not found at 0xffffffff
psm0 not found at 0x60
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
wdc0 at 0x1f0-0x1f7 irq 14 on isa
wdc0: unit 0 (wd0): <Maxtor 91152D8>
wd0: 8063MB (16514064 sectors), 16383 cyls, 16 heads, 63 S/T, 512 B/S
wdc1 at 0x170-0x177 irq 15 on isa
wdc1: unit 0 (atapi): <NEC                 CD-ROM DRIVE:28C/3.02>, removable, dma, iordy
wcd0: 2412/5512Kb/sec, 128Kb cache, audio play, 256 volume levels, ejectable tray
wcd0: no disc inside, unlocked
npx0 flags 0x1 on motherboard
npx0: INT 16 interface
de0: enabling Full Duplex 100baseTX port
de1: enabling 100baseTX port
nfs server home.internal:/home: not responding
nfs server home.internal:/home: is alive again
nfs server home.internal:/home: not responding
nfs server home.internal:/home: not responding
nfs server home.internal:/home: is alive again
nfs server home.internal:/home: is alive again
nfs server home.internal:/home: not responding
nfs server home.internal:/home: is alive again
nfs server home.internal:/home: not responding
nfs server home.internal:/home: is alive again


>Fix:
	
Unknown.
>Release-Note:
>Audit-Trail:

From: "Slowbob" <beatrice@mfn.org>
To: <freebsd-gnats-submit@freebsd.org>
Cc: <toasty@dragondata.com>
Subject: Re: kern/7619: odd nfs server not responding messages appear
Date: Sun, 20 Dec 1998 03:30:38 -0600

 We also have this problem, under 2.2.5-R clients and servers, 
 but we have noticed that this message is emitted while the clients 
 are in NFSAIO state.  As per the trouble report, there is no sign that 
 anything is amiss during the periods when errors are emitted, however, 
 we also have periods of long pauses (ranging anywhere from a few 
 seconds to over a minute) on NFS.  During these pauses, there are 
 *no* error messages emitted at all, either during or after the pause.  
 The only thing we have tracked down for sure is that the NFS
 clients are stuck in the same NFSAIO state during the pause as 
 appears when the above error messages are emitted.
 
 Just a clarification: the time interval for the error messages (here at
 least) 
 is *always* timestamped 3 seconds apart (i.e., 00:00:00 for the not
 responding 
 error, 00:00:03 for the Alive again message) - clue??
 
 Yours,
 
 J.A. Terranson
 sysadmin@mfn.org

From: Kevin Day <toasty@dragondata.com>
To: FreeBSD-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/7619
Date: Tue, 28 Dec 1999 03:03:16 -0600 (CST)

 Just for the audit trail:
 
 
 This turned out to be the NFS client dynamic retransmitter getting overly
 aggressive when it came to very very fast links (like 100MB Ethernet). The
 normal rtt is very short, except when there are a few collisions, which
 would dramatically increase the rtt.
 
 Mounting with '-d' will correct this problem, and I highly recommend anyone
 who's seeing the 'server not responding' messages in their syslog to use it.
 The NFS client will tend to get rather confused when it sees the server go
 away, and will sometimes get rather out of sync.
 
 After looking at the rather scary code used for the dynamic retransmit, I've
 decided against trying to fix it myself. :)
 
 #define NFS_RTO(n, t) \
         ((t) == 0 ? (n)->nm_timeo : \
          ((t) < 3 ? \
           (((((n)->nm_srtt[t-1] + 3) >> 2) + (n)->nm_sdrtt[t-1] + 1) >> 1) : \
           ((((n)->nm_srtt[t-1] + 7) >> 3) + (n)->nm_sdrtt[t-1] + 1)))
 
 I do not wish to break this. :)
 
 (Search the archives around April/May in -current for a discussion about
 this... I think someone else looked at the retransmit stuff and came to the
 same conclusion)
 
 Kevin
 
State-Changed-From-To: open->closed 
State-Changed-By: johan 
State-Changed-When: Thu Aug 10 08:50:34 PDT 2000 
State-Changed-Why:  
Fixed noted in the audit-trail 


Responsible-Changed-From-To: freebsd-bugs->dillon 
Responsible-Changed-By: johan 
Responsible-Changed-When: Thu Aug 10 08:50:34 PDT 2000 
Responsible-Changed-Why:  
NFS is Matts area. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=7619 
>Unformatted:
Kevin Day
