From vova@express.ru  Mon Dec 10 06:07:48 2001
Return-Path: <vova@express.ru>
Received: from vbook.express.ru (asplinux.ru [195.133.213.194])
	by hub.freebsd.org (Postfix) with ESMTP id 33A4537B41C
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 10 Dec 2001 06:07:47 -0800 (PST)
Received: from vova by vbook.express.ru with local (Exim 3.31 #2)
	id 16DR68-0000tT-00
	for FreeBSD-gnats-submit@freebsd.org; Mon, 10 Dec 2001 17:08:00 +0300
Message-Id: <E16DR68-0000tT-00@vbook.express.ru>
Date: Mon, 10 Dec 2001 17:08:00 +0300
From: Vladimir B.Grebenschikov <vova@express.ru>
Reply-To: Vladimir B.Grebenschikov <vova@express.ru>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: Invalid FFS node allocation algorithm on systems with a lot of memory and lots of small files accessed
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         32672
>Category:       kern
>Synopsis:       Invalid FFS node allocation algorithm on systems with a lot of memory and lots of small files accessed
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    dillon
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Dec 10 06:10:00 PST 2001
>Closed-Date:    Sun Dec 30 11:35:10 PST 2001
>Last-Modified:  Sun Dec 30 11:39:31 PST 2001
>Originator:     Vladimir B. Grebenschikov
>Release:        FreeBSD 4.4-RELEASE i386
>Organization:
SW Soft
>Environment:
FreeBSD vrebuild 4.4-RELEASE FreeBSD 4.4-RELEASE #4: Mon Dec 10 15:23:49 GMT 2001 root@vrebuild:/usr/src/sys/compile/VREBUILD  i386
maxusers        512
(tried both with UFS_DIRHASH and without UFS_DIRHASH, with SOFTUPDATES and without SOFTUPDATES)

System 2Gb RAM, 2 x 800MHz:

CPU: Pentium III/Pentium III Xeon/Celeron (803.41-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x68a  Stepping = 10

Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE> = 2147483648 (2097152K bytes)

avail memory = 2087796736 (2038864K bytes)
Programming 16 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 16 pins in IOAPIC #1
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  4, version: 0x000f0011, at 0xfec00000
 io1 (APIC): apic id:  5, version: 0x000f0011, at 0xfec01000

>Description:

In case of a lot of memory and lots of small files operations ('make release' in my case)
system can reach maximum of M_FFSNODE (inode) objects and deadlocks in 
ufs/ffs/ffs_vfsops.c:ffs_vget()

==============================================================================
        /*
         * Lock out the creation of new entries in the FFS hash table in
         * case getnewvnode() or MALLOC() blocks, otherwise a duplicate 
         * may occur!
         */
        if (ffs_inode_hash_lock) {
                while (ffs_inode_hash_lock) {
                        ffs_inode_hash_lock = -1;
                        tsleep(&ffs_inode_hash_lock, PVM, "ffsvgt", 0);
                }
                goto restart;
        }
        ffs_inode_hash_lock = 1;

        /*
         * If this MALLOC() is performed after the getnewvnode()
         * it might block, leaving a vnode with a NULL v_data to be
         * found by ffs_sync() if a sync happens to fire right then,
         * which will cause a panic because ffs_sync() blindly
         * dereferences vp->v_data (as well it should).
         */
        MALLOC(ip, struct inode *, sizeof(struct inode),
            ump->um_malloctype, M_WAITOK);
=========================================================================


One process gets sleeping on "FFS Node" (in MALLOC in the above code) because 
maximum of M_FFSNODE objects is reached (for me it is 0x6400000), in my case 
it was 'cvs checkout' from make release scripts.

All the other processes trying to get access to disk get locked on "ffsvgt"
(because ffs_inode_hash_lock is taken by cvs)

So some comments:

1st: I think the placement of lock and MALLOC in ffs_vget() needs to be 
changed to avoid deadlocks.
(first do MALLOC and then lock ffs_inode_hash_lock) 

2nd: We need to do something when the number of allocated ffsnode objects is exceeded (its
limit is set to vm_kmem_size/2 by default), free some cache objects or so.

>How-To-Repeat:

Get 2Gb RAM system and run make release (with ports and docs)

>Fix:

See above
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->dillon 
Responsible-Changed-By: sheldonh 
Responsible-Changed-When: Sun Dec 30 04:24:19 PST 2001 
Responsible-Changed-Why:  
FFS and lots of memory -- looks like Matt's field. :-) 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=32672 
State-Changed-From-To: open->closed 
State-Changed-By: dillon 
State-Changed-When: Sun Dec 30 11:35:10 PST 2001 
State-Changed-Why:  
This is believed to be fixed in -stable (and thus for the upcoming 4.5 
release).  The problem was that the vnode/inode reclamation system depends 
on the VM system running out of memory and having to free vnodes/inodes up. 
Machines with large amounts of ram, however, will often run the malloc 
bucket for vnodes or inodes out before they run out of memory. 

Our solution is to enforce the kern.maxvnodes limit by proactively reclaiming 
vnodes/inodes when the limit is reached, even if there is still lots of free 
memory. 


http://www.FreeBSD.org/cgi/query-pr.cgi?pr=32672 
>Unformatted:
