From nobody@FreeBSD.ORG  Fri Jun 23 17:19:43 2000
Return-Path: <nobody@FreeBSD.ORG>
Received: by hub.freebsd.org (Postfix, from userid 32767)
	id 570B937BAB1; Fri, 23 Jun 2000 17:19:43 -0700 (PDT)
Message-Id: <20000624001943.570B937BAB1@hub.freebsd.org>
Date: Fri, 23 Jun 2000 17:19:43 -0700 (PDT)
From: wessels@ircache.net
Sender: nobody@FreeBSD.ORG
To: freebsd-gnats-submit@FreeBSD.org
Subject: processes stuck in 'ffsvgt' and 'FFS no' states
X-Send-Pr-Version: www-1.0

>Number:         19479
>Category:       kern
>Synopsis:       processes stuck in 'ffsvgt' and 'FFS no' states
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jun 23 17:20:01 PDT 2000
>Closed-Date:    Sun Jun 02 04:11:56 PDT 2002
>Last-Modified:  Sun Jun 02 04:11:56 PDT 2002
>Originator:     Duane Wessels
>Release:        RELENG_3 from 20000620
>Organization:
IRCache
>Environment:
FreeBSD mr-garrison.measurement-factory.com 3.5-STABLE FreeBSD 3.5-STABLE #4: Tue Jun 20 14:15:04 MDT 2000
root@mr-garrison.measurement-factory.com:/usr/src/sys/compile/SQUID  i386

>Description:
I'm benchmarking the Squid proxy cache, which generates a lot
of disk I/O.  After an hour or two, all of the Squid processes
get stuck in the 'ffsvgt' state, and one of them is stuck
in 'FFS no':

  PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
  368 wessels   98   0   102M 99036K RUN    106:01 98.14% 98.14% squid
  375 wessels  -18   0  1272K   812K ffsvgt   2:23  0.00%  0.00% diskd
  373 wessels  -18   0  1272K   812K ffsvgt   2:22  0.00%  0.00% diskd
  374 wessels  -20   0  1272K   812K FFS no   2:22  0.00%  0.00% diskd
  372 wessels  -18   0  1272K   812K ffsvgt   2:20  0.00%  0.00% diskd
  370 wessels  -18   0  1272K   812K ffsvgt   2:19  0.00%  0.00% diskd
  371 wessels  -18   0  1272K   812K ffsvgt   2:19  0.00%  0.00% diskd

Although the system continues to run, it seems like any process that
tries to access a filesystem also becomes stuck in the ffsvgt
state.  I can't start new commands, log in, etc.  

Filesystems are mounted normally.  No special options:

/dev/da0c               /cache1         ufs     rw              0       3
/dev/da1c               /cache2         ufs     rw              0       3
/dev/da2c               /cache3         ufs     rw              0       3
/dev/da3c               /cache4         ufs     rw              0       3
/dev/da4c               /cache5         ufs     rw              0       3
/dev/da5c               /cache6         ufs     rw              0       3

I gather that 'FFS no' is related to filesystem vnodes.  I wonder
if I'm running out of vnodes, or if they're not getting freed?
Here's systat after the benchmark has been running for 20 minutes:

    2 users    Load  1.23  1.16  0.90                  Fri Jun 23 18:10

Mem:KB    REAL            VIRTUAL                     VN PAGER  SWAP PAGER
        Tot   Share      Tot    Share    Free         in  out     in  out
Act   68748    1064    74260     3860   21700 count
All  513584    1440  3026816     4408         pages    
                                                       53 cow    Interrupts
Proc:r  p  d  s  w    Csw  Trp  Sys  Int  Sof  Flt     75 zfod   2377 total
     1     7  4       527  297 6247 2377  268  152  50408 wire    100 clk0 irq0
                                                    71716 act     128 rtc0 irq8
45.3%Sys   6.3%Intr 47.1%User  0.0%Nice  1.2%Idl   370040 inact  1864 pci irq10
|    |    |    |    |    |    |    |    |    |      20564 cache   266 pci irq11
=======================+++>>>>>>>>>>>>>>>>>>>>>>     1136 free        fdc0 irq6
                                                          daefr    19 wdc0 irq14
Namei         Name-cache    Dir-cache                 106 prcfr
    Calls     hits    %     hits    %                     react
     1920     1529   80        6    0                     pdwake
                                                      525 pdpgs
Discs   da0   da1   da2   da3   da4   da5   wd0           intrn
KB/t   7.18  7.38  8.13  8.26 10.58  8.03  8.00      8342 buf
tps      55    57    50    40    18    47     1       202 dirtybuf
MB/s   0.39  0.41  0.39  0.32  0.18  0.37  0.01     32702 desiredvnodes
% busy   25    24    25    16    12    21     0     55047 numvnodes
                                                       24 freevnodes

>How-To-Repeat:
Install the latest squid-2.4 code and pummel it with a
Web Polygraph workload.
>Fix:


>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: mike 
State-Changed-When: Sat Jul 21 21:24:15 PDT 2001 
State-Changed-Why:  

Does this problem still occur in newer versions of FreeBSD, 
such as 4.3-RELEASE? 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=19479 

From: Mike Barcroft <mike@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/19479: processes stuck in 'ffsvgt' and 'FFS no' states
Date: Sun, 22 Jul 2001 13:32:45 -0400

 Adding to Audit-Trail.
 
 On Sun, Jul 22, 2001 at 05:56:16AM -0600, Duane Wessels wrote:
 > 
 > 
 > On Sat, 21 Jul 2001 mike@FreeBSD.org wrote:
 > 
 > > Synopsis: processes stuck in 'ffsvgt' and 'FFS no' states
 > > 
 > > State-Changed-From-To: open->feedback
 > > State-Changed-By: mike
 > > State-Changed-When: Sat Jul 21 21:24:15 PDT 2001
 > > State-Changed-Why: 
 > > 
 > > Does this problem still occur in newer versions of FreeBSD,
 > > such as 4.3-RELEASE?
 > > 
 > > http://www.FreeBSD.org/cgi/query-pr.cgi?pr=19479
 > 
 > Nope.  I've only seen this in 3.x versions.  I haven't checked any
 > recent 3.x versions.  All my servers run 4.x now.  Back then, it
 > was really easy to reproduce the problem, however.
 > 
 > 
State-Changed-From-To: feedback->closed 
State-Changed-By: mike 
State-Changed-When: Sun Jul 22 10:38:06 PDT 2001 
State-Changed-Why:  

This is no longer a problem in newer versions of FreeBSD. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=19479 

From: Mike Barcroft <mike@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/19479: processes stuck in 'ffsvgt' and 'FFS no' states
Date: Fri, 21 Sep 2001 23:07:13 -0400

 Adding to Audit-Trail.
 
 ----- Forwarded message from Chuck Cranor <chuck@research.att.com> -----
 
 Delivered-To: mike@freebsd.org
 Date: Fri, 21 Sep 2001 13:41:34 -0400
 From: Chuck Cranor <chuck@research.att.com>
 To: freebsd-stable@FreeBSD.org
 Cc: mike@FreeBSD.org
 Subject: more on 4.4-RELEASE kernel deadlock ("FFS node")
 User-Agent: Mutt/1.2.5i
 Organization: AT&T Labs-Research
 
 this is a followup to my previous message:
 
 first, the problem i am seeing is almost identical to the one reported
 in kern/19479 (see http://www.FreeBSD.org/cgi/query-pr.cgi?pr=19479) 
 which was closed by mike@freebsd back in july.   my data would suggest
 that the PR should be re-opened, because it doesn't seem fixed?
 
 
 i reran a "make release" while having a "vmstat -m | grep 'FFS node'"
 running every three seconds in another window.  the memory allocation
 for "FFS node" is exploding during the "make release"... here is the
 last bit of vmstat -m output before the kernel deadlocked:
 
 Memory statistics by type                          Type  Kern
         Type  InUse MemUse HighUse  Limit Requests Limit Limit Size(s)
 ....
      FFS node199352 99676K  99677K102400K   648929    0     0  512
      FFS node199871 99936K  99936K102400K   649930    0     0  512
      FFS node200457100229K 100229K102400K   651733    0     0  512
      FFS node200794100397K 100398K102400K   653082    0     0  512
      FFS node200875100438K 100440K102400K   653902    0     0  512
      FFS node201466100733K 100733K102400K   655886    0     0  512
      FFS node202732101366K 101367K102400K   658611    0     0  512
      FFS node203300101650K 101651K102400K   660008    0     0  512
      FFS node203851101926K 101926K102400K   661718    0     0  512
      FFS node203923101962K 101962K102400K   661927    0     0  512
      FFS node204489102245K 102245K102400K   663650    0     0  512
 <deadlock here!>
 
 102245K seems like an awful lot of memory for FFS nodes?
 what would cause this?   does anyone else see very large FFS node
 memory allocation during "make release"?
 
 i got a stack trace of the process stuck in "FFS node"... it is
 basically:
 	open
 	vn_open
 	namei
 	lookup
 	ufs_vnoperate
 	vfs_cache_lookup
 	ufs_vnoperate
 	ufs_lookup
 	ffs_vget +0x8b
 	malloc + 0x1a3
 	tsleep
 
 clearly we are reaching the malloc limit for "FFS node" and hanging.
 
 
 chuck
 
 ----- End forwarded message -----
State-Changed-From-To: closed->open 
State-Changed-By: mike 
State-Changed-When: Fri Sep 21 20:12:45 PDT 2001 
State-Changed-Why:  

The originator's problem disappeared, but unfortunately another user 
is seeing a similar problem. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=19479 
State-Changed-From-To: open->feedback 
State-Changed-By: iedowse 
State-Changed-When: Sun Dec 2 13:26:17 PST 2001 
State-Changed-Why:  

This problem should be fixed in recent -STABLE versions. One problem 
was that I had accidentally pushed the size of the inode struct 
over 256 bytes, but a bigger issue relating to limiting the number 
of vnodes was fixed by Matt Dillon about a month ago. Can you confirm 
that this is fixed now? 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=19479 
State-Changed-From-To: feedback->closed 
State-Changed-By: iedowse 
State-Changed-When: Sun Jun 2 04:11:39 PDT 2002 
State-Changed-Why:  

Feedback timeout. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=19479 
>Unformatted:
