From peterjeremy@acm.org  Wed Mar  3 11:20:02 2010
Return-Path: <peterjeremy@acm.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5A03C1065670
	for <FreeBSD-gnats-submit@freebsd.org>; Wed,  3 Mar 2010 11:20:02 +0000 (UTC)
	(envelope-from peterjeremy@acm.org)
Received: from mail15.syd.optusnet.com.au (mail15.syd.optusnet.com.au [211.29.132.196])
	by mx1.freebsd.org (Postfix) with ESMTP id B0BDE8FC37
	for <FreeBSD-gnats-submit@freebsd.org>; Wed,  3 Mar 2010 11:20:01 +0000 (UTC)
Received: from server.vk2pj.dyndns.org (c122-106-253-149.belrs3.nsw.optusnet.com.au [122.106.253.149])
	by mail15.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o23BJoms000494
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 3 Mar 2010 22:19:58 +1100
Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1])
	by server.vk2pj.dyndns.org (8.14.3/8.14.3) with ESMTP id o23BJjEN083245;
	Wed, 3 Mar 2010 22:19:45 +1100 (EST)
	(envelope-from peter@server.vk2pj.dyndns.org)
Received: (from peter@localhost)
	by server.vk2pj.dyndns.org (8.14.3/8.14.3/Submit) id o23BJjqr083244;
	Wed, 3 Mar 2010 22:19:45 +1100 (EST)
	(envelope-from peter)
Message-Id: <201003031119.o23BJjqr083244@server.vk2pj.dyndns.org>
Date: Wed, 3 Mar 2010 22:19:45 +1100 (EST)
From: Peter Jeremy <peterjeremy@acm.org>
Reply-To: Peter Jeremy <peterjeremy@acm.org>
To: FreeBSD-gnats-submit@freebsd.org
Subject: [patch] db(3) fails with large block sizes
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         144446
>Category:       bin
>Synopsis:       [patch] db(3) fails with large block sizes
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    avg
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Mar 03 11:30:01 UTC 2010
>Closed-Date:    Sat Aug 21 16:09:46 UTC 2010
>Last-Modified:  Sat Aug 21 16:09:46 UTC 2010
>Originator:     Peter Jeremy
>Release:        FreeBSD 8.0-STABLE amd64
>Organization:
Alcatel-Lucent Australia
>Environment:
System: FreeBSD server.vk2pj.dyndns.org 8.0-STABLE FreeBSD 8.0-STABLE #1: Wed Jan 27 06:55:10 EST 2010 root@server.vk2pj.dyndns.org:/var/obj/usr/src/sys/server amd64

>Description:
	Whilst trying to port db(3) to a Solaris system, I have identified
	two issues with the existing hash(3) code.

	Firstly, when creating a new hash database, the bucket size
	defaults to the st_blksize of the file (hash/hash.c::init_hash()).
	There is no sanity checking to ensure that st_blksize is within
	valid limits (hash/hash.h defined MAX_BSIZE as 65536).

	In FreeBSD, st_blksize is currently hardwired to PAGE_SIZE in
	kern/vfs_vnops.c::vn_stat() so this is purely a theoretical
	issue on FreeBSD.  Solaris exposes the blocksize from the
	underlying filesystem - and in the case of ZFS, this is 128KB,
	which exceeds MAX_BSIZE.  In my case, the symptoms were that
	when sequentially reading the database (via DB->seq()), the
	returned keys were padded with 64KB of NULs.

	Secondly, when the bucket size is set to 64KB (MAX_BSIZE),
	non-trivial databases crash reporting:
	    "HASH: Out of overflow pages.  Increase page size"
	It's not clear what triggers this.

	Thirdly, whilst writing this PR, I've noticed that hash(3)
	states that the default hash table bucket size is 256 bytes.
	The actual default (as per DEF_BUCKET_SIZE in hash/hash.h)
	is 4096 bytes.

>How-To-Repeat:
	First problem can't be reproduced on FreeBSD but occurs on
	at least Solaris and OpenSolaris with ZFS.

	The second problem can be reproduced by extending the db
	test tool (lib/libc/db/test/run.test, test20) to include
	a bucket size of 65536.  Based on the progression of
	fill factor's in test20, the logical fill factors are
	2735 3647 5471, however the test fails with each of these
	values as well as 8, 341 and 10001.  All other bucket
	sizes appear to work successfully - which suggests there
	is a problem with this particular bucket size.

>Fix:
	First patch adds check for excessive st_blksize.  Note that this
	assumes that st_blksize is a power of 2.  It might be reasonable
	to verify that st_blksize isn't too small but I'm not sure what
	the lower limit is.

Index: hash/hash.c
===================================================================
RCS file: /usr/ncvs/src/lib/libc/db/hash/hash.c,v
retrieving revision 1.21.2.2
diff -u -r1.21.2.2 hash.c
--- hash/hash.c	28 Aug 2009 19:48:06 -0000	1.21.2.2
+++ hash/hash.c	3 Mar 2010 09:39:45 -0000
@@ -293,6 +293,8 @@
 		if (stat(file, &statbuf))
 			return (NULL);
 		hashp->BSIZE = statbuf.st_blksize;
+		if (hashp->BSIZE > MAX_BSIZE)
+			hashp->BSIZE = MAX_BSIZE;
 		hashp->BSHIFT = __log2(hashp->BSIZE);
 	}
 

	This is simply a work-around for the second problem.  I'm not
	certain what the actual problem is bu this avoids it.

Index: hash/hash.h
===================================================================
RCS file: /usr/ncvs/src/lib/libc/db/hash/hash.h,v
retrieving revision 1.9.2.1
diff -u -r1.9.2.1 hash.h
--- hash/hash.h	3 Aug 2009 08:13:06 -0000	1.9.2.1
+++ hash/hash.h	3 Mar 2010 09:39:45 -0000
@@ -118,7 +118,7 @@
 /*
  * Constants
  */
-#define	MAX_BSIZE		65536		/* 2^16 */
+#define	MAX_BSIZE		32768		/* 2^15 but should be 65536 */
 #define MIN_BUFFERS		6
 #define MINHDRSIZE		512
 #define DEF_BUFSIZE		65536		/* 64 K */


	The following updates the man page to contain the current
	default bucket size as per DEF_BUCKET_SIZE in hash/hash.h

Index: man/hash.3
===================================================================
RCS file: /usr/ncvs/src/lib/libc/db/man/hash.3,v
retrieving revision 1.9.10.1
diff -u -r1.9.10.1 hash.3
--- man/hash.3	3 Aug 2009 08:13:06 -0000	1.9.10.1
+++ man/hash.3	3 Mar 2010 09:50:52 -0000
@@ -78,7 +78,7 @@
 element
 defines the
 .Nm
-table bucket size, and is, by default, 256 bytes.
+table bucket size, and is, by default, 4096 bytes.
 It may be preferable to increase the page size for disk-resident tables
 and tables with large data items.
 .It Va ffactor
>Release-Note:
>Audit-Trail:

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/144446: commit references a PR
Date: Mon,  5 Apr 2010 10:02:06 +0000 (UTC)

 Author: avg
 Date: Mon Apr  5 10:01:53 2010
 New Revision: 206177
 URL: http://svn.freebsd.org/changeset/base/206177
 
 Log:
   hash.3: fix a factual mistake in the man page
   
   PR:		bin/144446
   Submitted by:	Peter Jeremy <peterjeremy@acm.org>
   MFC after:	3 days
 
 Modified:
   head/lib/libc/db/man/hash.3
 
 Modified: head/lib/libc/db/man/hash.3
 ==============================================================================
 --- head/lib/libc/db/man/hash.3	Mon Apr  5 09:26:03 2010	(r206176)
 +++ head/lib/libc/db/man/hash.3	Mon Apr  5 10:01:53 2010	(r206177)
 @@ -78,7 +78,7 @@ The
  element
  defines the
  .Nm
 -table bucket size, and is, by default, 256 bytes.
 +table bucket size, and is, by default, 4096 bytes.
  It may be preferable to increase the page size for disk-resident tables
  and tables with large data items.
  .It Va ffactor
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/144446: commit references a PR
Date: Mon,  5 Apr 2010 10:12:35 +0000 (UTC)

 Author: avg
 Date: Mon Apr  5 10:12:21 2010
 New Revision: 206178
 URL: http://svn.freebsd.org/changeset/base/206178
 
 Log:
   libc/db/hash: cap auto-tuned block size with a value that actually works
   
   This fix mostly matters after r206129 that made it possible for
   st_blksize to be greater than 4K.  For this reason, this change should
   be MFC-ed before r206129.
   Also, it seems that all FreeBSD uitlities that use db(3) hash databases
   and create new databases in files, specify their own block size value
   and thus do not depend on block size autotuning.
   
   PR:		bin/144446
   Submitted by:	Peter Jeremy <peterjeremy@acm.org>
   MFC after:	5 days
 
 Modified:
   head/lib/libc/db/hash/hash.c
   head/lib/libc/db/hash/hash.h
 
 Modified: head/lib/libc/db/hash/hash.c
 ==============================================================================
 --- head/lib/libc/db/hash/hash.c	Mon Apr  5 10:01:53 2010	(r206177)
 +++ head/lib/libc/db/hash/hash.c	Mon Apr  5 10:12:21 2010	(r206178)
 @@ -293,6 +293,8 @@ init_hash(HTAB *hashp, const char *file,
  		if (stat(file, &statbuf))
  			return (NULL);
  		hashp->BSIZE = statbuf.st_blksize;
 +		if (hashp->BSIZE > MAX_BSIZE)
 +			hashp->BSIZE = MAX_BSIZE;
  		hashp->BSHIFT = __log2(hashp->BSIZE);
  	}
  
 
 Modified: head/lib/libc/db/hash/hash.h
 ==============================================================================
 --- head/lib/libc/db/hash/hash.h	Mon Apr  5 10:01:53 2010	(r206177)
 +++ head/lib/libc/db/hash/hash.h	Mon Apr  5 10:12:21 2010	(r206178)
 @@ -118,7 +118,7 @@ typedef struct htab	 {		/* Memory reside
  /*
   * Constants
   */
 -#define	MAX_BSIZE		65536		/* 2^16 */
 +#define	MAX_BSIZE		32768		/* 2^15 but should be 65536 */
  #define MIN_BUFFERS		6
  #define MINHDRSIZE		512
  #define DEF_BUFSIZE		65536		/* 64 K */
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: Peter Jeremy <peterjeremy@acm.org>
To: bug-followup@FreeBSD.org, Andriy Gapon <avg@FreeBSD.org>
Cc:  
Subject: Re: bin/144446: [patch] db(3) fails with large block sizes
Date: Tue, 6 Apr 2010 10:09:04 +1000

 --PNTmBPCT7hxwcZjr
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 Following r206129, FreeBSD no longer has a hardwired st_blksize
 so the first problem should now be reproducable on FreeBSD.
 
 The scenario where I found the bug was the perl 5.8 DB_File test
 db-hash.t on a ZFS filesystem.  The problem should therefore be
 reproducable on 9-current with r206129 as follows:
 
 1) Set WRKDIRPREFIX to a ZFS filesystem.
 2) cd /usr/ports/lang/perl5.8
 3) make
 4) cd $WRKDIRPREFIX/usr/ports/lang/perl5.8/work/perl-5.8.9/t
 5) ../perl ../ext/DB_File/t/db-hash.t
    (it's possible perl will need to be installed for this to run)
 This should result in a core dump following test 21.
 
 --=20
 Peter Jeremy
 
 --PNTmBPCT7hxwcZjr
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (FreeBSD)
 
 iEYEARECAAYFAku6e6AACgkQ/opHv/APuIdQmACfWyOwW3vuLu7PXj42yOdAIK1O
 tLUAn2VlkxEgoNyYS5hJV7uIsIMYyAP5
 =cADY
 -----END PGP SIGNATURE-----
 
 --PNTmBPCT7hxwcZjr--
State-Changed-From-To: open->patched 
State-Changed-By: vwe 
State-Changed-When: Thu Aug 12 20:46:11 UTC 2010 
State-Changed-Why:  
already committed and MFC'ed 


Responsible-Changed-From-To: freebsd-bugs->avg 
Responsible-Changed-By: vwe 
Responsible-Changed-When: Thu Aug 12 20:46:11 UTC 2010 
Responsible-Changed-Why:  
Andriy, it seems to be already MFC'ed. Can this be closed now? 

http://www.freebsd.org/cgi/query-pr.cgi?pr=144446 
State-Changed-From-To: patched->closed 
State-Changed-By: avg 
State-Changed-When: Sat Aug 21 16:09:23 UTC 2010 
State-Changed-Why:  
Fixed. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=144446 
>Unformatted:
