From packet@adrenochrome.nl  Mon Mar  5 22:30:11 2007
Return-Path: <packet@adrenochrome.nl>
Received:
Message-Id: <20070305221240.ECA7C2F65E@roterstern.frankenstein13.de>
Date: Mon,  5 Mar 2007 23:12:40 +0100 (CET)
From: Sebastian Klemke <packet@adrenochrome.nl>
To: FreeBSD-gnats-submit@freebsd.org
Cc: packet@adrenochrome.nl
Subject: unionfs breaks openldap-server23 with bdb back-end
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         109950
>Category:       kern
>Synopsis:       [unionfs] unionfs breaks openldap-server23 with bdb back-end
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    daichi
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Mar 05 22:40:03 GMT 2007
>Closed-Date:    Wed May 07 05:14:54 UTC 2008
>Last-Modified:  Wed May  7 05:20:00 UTC 2008
>Originator:     Sebastian Klemke
>Release:        FreeBSD 6.2-STABLE i386
>Organization:
>Environment:
System: FreeBSD roterstern.frankenstein13.de 6.2-STABLE FreeBSD 6.2-STABLE #0: Sun Mar 4 16:37:31 CET 2007 root@roterstern.frankenstein13.de:/usr/obj/usr/src/sys/ROTERSTERN i386


	src checkout from 2007-03-03, relevant ports:
	db44-4.4.20.4
	libltdl-1.5.22_2
	openldap-client-2.3.34
	openldap-server-2.3.34

>Description:
	I use unionfs to mount a FreeBSD world "below" (in the sense
	of -o below) my jail root directories. This has the negative
	side-effect, that the Berkeley DB (db44-4.4.20.4) back-end of
	openldap's slapd (openldap-server-2.3.34) breaks. When I try
	to populate an empty LDAP tree with some initial object (via
	ldapadd), slapd produces the following errors in syslog:

Mar  5 13:02:43 ldap slapd[7443]: bdb(dc=nerdheim,dc=de): fsync Bad file descriptor
Mar  5 13:02:43 ldap slapd[7443]: bdb(dc=nerdheim,dc=de): PANIC: Bad file descriptor
Mar  5 13:02:43 ldap slapd[7443]: bdb(dc=nerdheim,dc=de): PANIC: fatal region error detected; run recovery
Mar  5 13:02:43 ldap last message repeated 2 times
Mar  5 13:02:43 ldap slapd[7443]: bdb(dc=nerdheim,dc=de): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
Mar  5 13:02:43 ldap slapd[7443]: bdb_db_cache: db_open(objectClass) failed: DB_RUNRECOVERY: Fatal error, run database recovery (-30974)

	After this happens, slapd reports internal errors for all LDAP
	operations. Restarting rebuilds the database and slapd then
	works again, until after the next change is performed. Copying
	the jail root directory to a normal ufs filesystem with
	soft-updates and noatime and trying to do the same yields no
	errors. That's why I think unionfs is the culprit.

	The unionfs is mounted with copymode=transparent.

>How-To-Repeat:
	run slapd with bdb back-end on a unionfs filesystem

>Fix:

	


>Release-Note:
>Audit-Trail:

From: Sebastian Klemke <packet@adrenochrome.nl>
To: bug-followup@FreeBSD.org, packet@adrenochrome.nl
Cc:  
Subject: Re: kern/109950: unionfs breaks openldap-server23 with bdb back-end
Date: Tue, 6 Mar 2007 17:50:10 +0100

 Further debugging has shown that it is always the second call to
 fsync() that fails with EBADF. Adding one object to the LDAP directory
 is OK, but adding another object will then result in fsync() return
 EBADF, and adding yet another (a third) object will cause slapd to
 report an internal error (because the database is
 corrupted). Restarting slapd causes it to recover the db, so after
 restarting slapd another two objects can be added.
Responsible-Changed-From-To: freebsd-bugs->daichi 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Mon Jan 14 05:31:19 UTC 2008 
Responsible-Changed-Why:  
Over to maintainer. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=109950 

From: "Alexander V. Chernikov" <admin@su29.net>
To: bug-followup@FreeBSD.org, packet@adrenochrome.nl
Cc:  
Subject: Re: kern/109950: [unionfs] unionfs breaks openldap-server23 with
 bdb back-end
Date: Fri, 07 Mar 2008 04:56:11 +0300

 This is a multi-part message in MIME format.
 --------------020408020303010307040201
 Content-Type: text/plain; charset=KOI8-R; format=flowed
 Content-Transfer-Encoding: 7bit
 
 Problem relies in multi threaded disk access.
 I've got similar issue trying to run mysql 5.0.
 InnoDB startup complains on fsync() returning EBADF
 
 After looking into unionfs code I've discovered the following:
 unionfs_get_node_status() used in functions like unionfs_fsync()
 checks list in unionfs_node for items belonging to the same LWP.
 E.g. if we open file from one kernel thread and then will try to do
 write() and fsync() on that fd in another [kernel] thread 
 unionfs_get_node_status() will return new blank structure with
 uns_upper_opencnt/uns_lower_opencnt fields set to 0, so
 unionfs_fsync() will see that file has been opened 0 times and throw EBADF.
 Proposed fix is to change unionfs_get_node_status() check to be 
 process-based instead of thread-based.
 
 
 
 
 --------------020408020303010307040201
 Content-Type: text/plain;
  name="unionfs_threads.diff"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline;
  filename="unionfs_threads.diff"
 
 --- sys/fs/unionfs/union_subr.c.orig	2008-03-05 04:07:31.000000000 +0300
 +++ sys/fs/unionfs/union_subr.c	2008-03-05 04:08:23.000000000 +0300
 @@ -242,12 +242,13 @@
  			struct unionfs_node_status **unspp)
  {
  	struct unionfs_node_status *unsp;
 +	pid_t pid = td->td_proc->p_pid;
  
  	KASSERT(NULL != unspp, ("null pointer"));
  	ASSERT_VOP_ELOCKED(UNIONFSTOV(unp), "unionfs_get_node_status");
  
  	LIST_FOREACH(unsp, &(unp->un_unshead), uns_list) {
 -		if (unsp->uns_tid == td->td_tid) {
 +		if (unsp->uns_pid == pid) {
  			*unspp = unsp;
  			return;
  		}
 @@ -257,7 +258,7 @@
  	MALLOC(unsp, struct unionfs_node_status *,
  	    sizeof(struct unionfs_node_status), M_TEMP, M_WAITOK | M_ZERO);
  
 -	unsp->uns_tid = td->td_tid;
 +	unsp->uns_tid = pid;
  	LIST_INSERT_HEAD(&(unp->un_unshead), unsp, uns_list);
  
  	*unspp = unsp;
 --- sys/fs/unionfs/union.h.orig	2007-11-03 13:32:26.000000000 +0300
 +++ sys/fs/unionfs/union.h	2008-03-05 04:06:59.000000000 +0300
 @@ -66,7 +66,7 @@
  /* unionfs status list */
  struct unionfs_node_status {
  	LIST_ENTRY(unionfs_node_status) uns_list;	/* Status list */
 -	lwpid_t		uns_tid;		/* current thread id */
 +	pid_t		uns_pid;		/* current process id */
  	int		uns_node_flag;		/* uns flag */
  	int		uns_lower_opencnt;	/* open count of lower */
  	int		uns_upper_opencnt;	/* open count of upper */
 
 --------------020408020303010307040201--
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/109950: commit references a PR
Date: Fri, 25 Apr 2008 11:37:30 +0000 (UTC)

 daichi      2008-04-25 11:37:20 UTC
 
   FreeBSD src repository
 
   Modified files:
     sys/fs/unionfs       union.h union_subr.c union_vnops.c 
   Log:
   o Fixed multi thread access issue reported by Alexander V. Chernikov
       (admin@su29.net)
     fixed: kern/109950
   
   PR:             kern/109950
   Submitted by:   Alexander V. Chernikov (admin@su29.net)
   Reviewed by:    Masanori OZAWA (ozawa@ongs.co.jp)
   MFC after:      1 week
   
   Revision  Changes    Path
   1.38      +2 -2      src/sys/fs/unionfs/union.h
   1.104     +4 -3      src/sys/fs/unionfs/union_subr.c
   1.155     +7 -7      src/sys/fs/unionfs/union_vnops.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->closed 
State-Changed-By: daichi 
State-Changed-When: Wed May 7 05:13:49 UTC 2008 
State-Changed-Why:  
kern/109950 is fixed. Thanks! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=109950 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/109950: commit references a PR
Date: Wed,  7 May 2008 05:11:59 +0000 (UTC)

 daichi      2008-05-07 05:11:52 UTC
 
   FreeBSD src repository
 
   Modified files:        (Branch: RELENG_7)
     sys/fs/unionfs       union.h union_subr.c union_vnops.c 
   Log:
   MFC:
   - Fixed multi thread access issue reported by Alexander V. Chernikov
     (admin@su29.net)
   - fixed: kern/109950
   
   PR:             kern/109950
   Submitted by:   Alexander V. Chernikov (admin@su29.net)
   Reviewed by:    Masanori OZAWA (ozawa@ongs.co.jp)
   
   Revision   Changes    Path
   1.34.2.3   +2 -2      src/sys/fs/unionfs/union.h
   1.92.2.5   +4 -3      src/sys/fs/unionfs/union_subr.c
   1.142.2.9  +7 -7      src/sys/fs/unionfs/union_vnops.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
>Unformatted:
