From martin@email.aon.at  Sat Nov 10 08:06:36 2007
Return-Path: <martin@email.aon.at>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DBAA716A417
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 10 Nov 2007 08:06:36 +0000 (UTC)
	(envelope-from martin@email.aon.at)
Received: from email.aon.at (nat-warsl417-01.aon.at [195.3.96.119])
	by mx1.freebsd.org (Postfix) with ESMTP id 3F75613C4A3
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 10 Nov 2007 08:06:36 +0000 (UTC)
	(envelope-from martin@email.aon.at)
Received: (qmail 29779 invoked from network); 10 Nov 2007 07:39:43 -0000
Received: from unknown (HELO email.aon.at) ([172.18.5.236])
          (envelope-sender <martin@email.aon.at>)
          by fallback01.highway.telekom.at (qmail-ldap-1.03) with SMTP
          for <FreeBSD-gnats-submit@freebsd.org>; 10 Nov 2007 07:39:43 -0000
Received: (qmail 32604 invoked from network); 10 Nov 2007 07:39:26 -0000
Received: from m1153p024.adsl.highway.telekom.at (HELO gandalf.xyzzy) ([80.121.16.24])
          (envelope-sender <martin@email.aon.at>)
          by smarthub94.highway.telekom.at (qmail-ldap-1.03) with SMTP
          for <FreeBSD-gnats-submit@freebsd.org>; 10 Nov 2007 07:39:26 -0000
Received: from gandalf.xyzzy (localhost.xyzzy [127.0.0.1])
	by gandalf.xyzzy (8.13.8/8.13.8) with ESMTP id lAA7dPaD003951
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 10 Nov 2007 08:39:25 +0100 (CET)
	(envelope-from martin@gandalf.xyzzy)
Received: (from martin@localhost)
	by gandalf.xyzzy (8.13.8/8.13.8/Submit) id lAA7dOAa003950;
	Sat, 10 Nov 2007 08:39:24 +0100 (CET)
	(envelope-from martin)
Message-Id: <200711100739.lAA7dOAa003950@gandalf.xyzzy>
Date: Sat, 10 Nov 2007 08:39:24 +0100 (CET)
From: Martin Birgmeier <martin@nowhere.com>
Reply-To: Martin Birgmeier <martin@nowhere.com>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: dirhash on very large directories blocks the machine for tens of seconds
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         117954
>Category:       kern
>Synopsis:       [ufs] dirhash on very large directories blocks the machine for tens of seconds
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-fs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Nov 10 08:10:00 UTC 2007
>Closed-Date:    
>Last-Modified:  Tue Sep 07 00:22:40 UTC 2010
>Originator:     Martin Birgmeier
>Release:        FreeBSD 6.2-RELEASE i386
>Organization:
MBi at home
>Environment:
System: FreeBSD gandalf.xyzzy 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Sat Jan 13 20:23:55 CET 2007 root@gandalf.xyzzy:/d/14.1/OBJ/FreeBSD/RELENG_6_2_0_RELEASE/src/sys/XYZZY i386

This machine is about 7 years old, with an Athlon 800 MHz processor. The disks can sustain about 10 Mbyte/sec both read and write.

>Description:
	I am mirroring the KDE subversion repository via rsync. KDE currently holds at rev. 734839, meaning that there are two subdirectories (revs and revprops) holding 734840 files each. For this to work at all, I have enabled dirhash and set the hashing are to 32MB via vfs.ufs.dirhash_maxmem=33554432 in sysctl.conf.

	The problem is that whenever the hashing is done (i.e., after these directories have not been in the kernel for some time, and now are being accessed), they will be read in by the dirhash algorithm, and doing this, consume lots of processor time (my xload jumps to 8+ all at once), and, as far as I can make out in such a situation, also all (or at least most) of the available disk bandwidth.
	For my machine the behavior is so bad that for about a minute the X Window system freezes completely (including the cursor). (Note that in fact it is more like 2 x 30 secs, obviously for each of the two directories involved.) The xload spike is becoming visible after this. Also, as I am using pppoa (ADSL over USB, basically), the buffers allotted to this are exhausted, as shown by log messages to the console. To me this looks like even interrupts are not serviced any more.

>How-To-Repeat:
	Enter a directory with > 250 k entries after it has not been accessed for a long time.

>Fix:
	I assume that the fix involves modifying the dirhash algorithm such that it obeys standard process scheduling behavior, esp. with regard to relinquishing the CPU according to the process' scheduling parameters.
	This probably means that the syscall in question can no longer be implemented as a single atomic operation (which it currently seems to be).
	Since I am no expert in this area, please take those ideas with a grain of salt!

Please note that the e-mail address given above is not valid, as I am paranoid about spam. Simply reply via adding to the PR, I'll monitor it regularly.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->iedowse 
Responsible-Changed-By: rwatson 
Responsible-Changed-When: Fri Mar 7 18:29:08 UTC 2008 
Responsible-Changed-Why:  
Over to Ian, who wrote UFS dirhash. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=117954 
Responsible-Changed-From-To: iedowse->freebsd-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Thu May 28 22:08:53 UTC 2009 
Responsible-Changed-Why:  
iedowse is not actively working on this problem ATM. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=117954 

From: John Baldwin <jhb@freebsd.org>
To: bug-followup@freebsd.org,
 martin@nowhere.com
Cc:  
Subject: Re: kern/117954: [ufs] dirhash on very large directories blocks the machine for tens of seconds
Date: Thu, 1 Apr 2010 09:01:57 -0400

 While the kernel scheduler will not preempt a thread in the kernel (e.g. 
 during a system call) if a timeslice expires, it will preempt that thread for 
 interrupts (assuming you have 'options PREEMPTION' enabled which has been on 
 by default in GENERIC for some time now on i386), thus the dirhash 
 calculations should not starve interrupts.  However, X is not an interrupt, so 
 while things like ping should still work, X will not get to run.
 
 While it would be tempting to defer the hashing of the directory contents to 
 an asynchronous task for large directories running in a thread with a low 
 priority, this might have bad side effects due to priority inversions related 
 to a very low priority thread holding various vnode locks.
 
 -- 
 John Baldwin
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: arundel 
Responsible-Changed-When: Tue Sep 7 00:21:40 UTC 2010 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=117954 
>Unformatted:
