From yar@bsd.chem.msu.ru  Mon Oct 16 10:31:45 2006
Return-Path: <yar@bsd.chem.msu.ru>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6876516A412
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 16 Oct 2006 10:31:45 +0000 (UTC)
	(envelope-from yar@bsd.chem.msu.ru)
Received: from bsd.chem.msu.ru (bsd.chem.msu.ru [195.208.208.23])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8CA3243D6D
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 16 Oct 2006 10:31:44 +0000 (GMT)
	(envelope-from yar@bsd.chem.msu.ru)
Received: from bsd.chem.msu.ru (localhost [127.0.0.1])
	by bsd.chem.msu.ru (8.13.4/8.13.3) with ESMTP id k9GAVhK6009796
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 16 Oct 2006 14:31:43 +0400 (MSD)
	(envelope-from yar@bsd.chem.msu.ru)
Received: (from yar@localhost)
	by bsd.chem.msu.ru (8.13.4/8.13.3/Submit) id k9GAVgNs009795;
	Mon, 16 Oct 2006 14:31:42 +0400 (MSD)
	(envelope-from yar)
Message-Id: <200610161031.k9GAVgNs009795@bsd.chem.msu.ru>
Date: Mon, 16 Oct 2006 14:31:42 +0400 (MSD)
From: Yar Tikhiy <yar@comp.chem.msu.su>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: fts(3) can't handle very deep trees
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         104458
>Category:       bin
>Synopsis:       [libc] fts(3) can't handle very deep trees
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    yar
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Oct 16 10:40:15 GMT 2006
>Closed-Date:    Mon Aug 18 11:58:08 UTC 2008
>Last-Modified:  Mon Aug 18 11:58:08 UTC 2008
>Originator:     Yar Tikhiy
>Release:        FreeBSD 7.0-CURRENT i386
>Organization:
None
>Environment:

	FreeBSD 7.0-CURRENT i386

>Description:
	
	Utilities using fts(3), find(1) and rm(1) being among them,
	fail to handle a directory tree so deep that a path in it is
	longer than ~49-50K.

>How-To-Repeat:
	
	1. Create a long chain of subdirectories using the script
	   attached.  Each subdir name will be "000...".  The overall
	   structure will be 000*/000*/000*/000*/...
	   This is better done on a scratch mfs because the
	   resulting chain will be hard to delete using stock tools.
	
	2. Invoke rm or "find -delete" on it.

	E.g.:

%csh xdir.csh 100 500
%rm -rf 0*
rm: fts_read: File name too long
%find 0* -delete
find: fts_read: File name too long

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#!/bin/csh
#
# Yeah, I know that csh programming is harmful,
# but /bin/sh cannot handle trees so deep.
# See PR bin/104456.
#

set len=$1	# lenght of each name
set dep=$2	# depth

set name=`printf %0${len}d 0`

while ($dep)
	mkdir $name && cd $name
	@ dep--
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

>Fix:
>Release-Note:
>Audit-Trail:

From: Bruce Evans <bde@zeta.org.au>
To: Yar Tikhiy <yar@comp.chem.msu.su>
Cc:  
Subject: Re: bin/104458: fts(3) can't handle very deep trees
Date: Mon, 16 Oct 2006 22:53:40 +1000 (EST)

 >> Description:
 >
 > 	Utilities using fts(3), find(1) and rm(1) being among them,
 > 	fail to handle a directory tree so deep that a path in it is
 > 	longer than ~49-50K.
 
 ISTR that POSIX requires at least rm and cp to work with any depth that
 can occur in theory, and QOI of course requires any utility that
 traverses trees to work with any possiible depth that occurs in practice.
 cp ensures brokeness using FTS_NOCHDIR, but this bug is missing in cp.
 
 This was easy to debug.  fts has a bogus USHRT_MAX limit on path lengths
 in FTSENT.  It normally fails at about 50K because it keeps doubling
 the length and 2*50K is the first doubling to above 64K.  FTS only
 has a limit of INT_MAX.  These limits are well documented in the source.
 
 >> How-To-Repeat:
 >
 > 	1. Create a long chain of subdirectories using the script
 > 	   attached.  Each subdir name will be "000...".  The overall
 > 	   structure will be 000*/000*/000*/000*/...
 > 	   This is better done on a scratch mfs because the
 > 	   resulting chain will be hard to delete using stock tools.
 
 I used /compat/linux/bin/cp.  It just worked :-(. /compat/linux/usr/bin/du
 also just worked, while du just failed.  I debugged using du.
 
 FTSENT has quite a few shorts.  ISTR a discussion about one of them
 not being large enough for use as a cookie, but don't remember the
 path length one being noticed as a limit before.  It also has a
 limit of 64K on the number of levels.  This can easily be reached.
 It also has some INT_MAX limits.  These are not easy to reach yet.
 
 Implementing the POSIX requirement to work for any tree is an
 interesting problem.  The program cannot use any stacks or counters,
 since no stack or counter can be large enough.  However, a counter
 with 640K of bits should be large enough for anyone in practice.
 A stack of parent directories can easily overflow in practice.  In
 particular, ".." must be used to traverse back since a stack of
 fd's to fchdir() back to would hit the {OPEN_MAX} limit very easily.
 fts(3) seems to handle this right (it doesn't keep directories open).
 ft's most fundamental limit seems to be the INT_MAX one on path
 lengths.  For some reason, it wants to construct a full path.  That
 feature should probably just be turned off for deep paths, or always
 as an option, since it is impossible to use paths longer than
 {PATH_MAX} and often only the path relative to a subdir is needed/
 
 Bruce

From: Yar Tikhiy <yar@comp.chem.msu.su>
To: Bruce Evans <bde@zeta.org.au>
Cc: Yar Tikhiy <yar@comp.chem.msu.su>
Subject: Re: bin/104458: fts(3) can't handle very deep trees
Date: Mon, 16 Oct 2006 19:38:08 +0400

 Realized that there is a bug in the algo :-)
 
 On Mon, Oct 16, 2006 at 07:11:02PM +0400, Yar Tikhiy wrote:
 > 
 > At least in the rm case, we are limited only by the number of files
 > we cannot delete IMHO.  Assume we can delete everything in a tree:
 > no restrictive permissions, no immutable flags, etc.  Then we can
 > traverse and remove it with the following algo, which utilizes FS
 > as the storage of all its state:
 > 
 > 	/*
 > 	 * removes everything in and under current directory
 > 	 */
 > 	stat(".", &st0);
 > again:
 > 	dp = opendir(".");
 > 	while (ep = readdir(dp)) {
 > 		/* skip . and .. here */
 > 		stat(ep->d_name, &st);
 > 		if (S_ISDIR(st.st_mode)) {
 > 			if (rmdir(ep->d_name) == -1) { /* ENOTEMPTY */
 > 				chdir(ep->d_name);
 > 				closedir(dp);
 > 				goto again;
 > 			}
 > 		} else
 > 			unlink(ep->d_name);
 > 	}
 > 	/* we arrive here only after we deleted all in . */
 
 	closedir(dp);
 
 > 	stat(".", &st);
 > 	if (st.st_dev == st0.st_dev && st.st_ino == st0.st_ino)
 > 		return;
 > 	chdir("..");
 > 	goto again;
 > 
 > Real life dictates that we should handle failures to delete something.
 > This can be done by keeping a list (or hash) of unremovable files
 > identified by st_dev and st_ino of the parent directory and the name
 > of the file itself.  Then we can skip over them in the readdir loop
 > so that we don't loop forever.  But all this means farewell to fts(3).
 
 -- 
 Yar

From: Yar Tikhiy <yar@comp.chem.msu.su>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/104458: fts(3) can't handle very deep trees
Date: Sun, 29 Oct 2006 14:13:48 +0300

 JFTR:
 
 NetBSD has the fts_*pathlen fields in FTSENT extended to u_int.
 But why not to size_t?
 
 NetBSD's fts_level is still short though.  If we extend ours, it
 should be int64_t as we don't want to hit INT_MAX right after SHRT_MAX.
 
 BTW, FTW.level is int now, as is FTW.base.  The former should be
 the same as FTS.fts_level while the latter begs to be size_t as
 it's an array index.  SUSv3 defines them as int though.
 
 In addition, NetBSD folks've made fts_number 64-bit.  Perhaps we
 could just make our fts_bignum and fts_number the same field if can
 take the ABI breakage.
 
 Other candidates for extension to size_t are fts_pathlen and
 fts_nitems in FTS as they both are essentially array indexes.
 
 Lastly, the flag fields in FTS could be extended, too, because one
 of them, fts_info, has 15 bits used now.
 
 After changing fts.h, fts.c should be made 64-bit clear.
 
 -- 
 Yar
State-Changed-From-To: open->analyzed 
State-Changed-By: yar 
State-Changed-When: Thu Aug 23 17:46:21 UTC 2007 
State-Changed-Why:  
A fix approved by re@ is about to hit HEAD in a day or two. 


Responsible-Changed-From-To: freebsd-bugs->yar 
Responsible-Changed-By: yar 
Responsible-Changed-When: Thu Aug 23 17:46:21 UTC 2007 
Responsible-Changed-Why:  
Taking care of my own PR as usual. :-) 

http://www.freebsd.org/cgi/query-pr.cgi?pr=104458 
State-Changed-From-To: analyzed->open 
State-Changed-By: yar 
State-Changed-When: Tue Sep 18 05:04:03 UTC 2007 
State-Changed-Why:  
This bug will be fixed later, perhaps after 7.0-RELEASE. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=104458 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/104458: commit references a PR
Date: Sat, 26 Jan 2008 17:09:47 +0000 (UTC)

 yar         2008-01-26 17:09:41 UTC
 
   FreeBSD src repository
 
   Modified files:
     .                    UPDATING 
     include              fts.h 
     lib/libc/gen         Makefile.inc Symbol.map fts-compat.c 
                          fts-compat.h fts.3 fts.c 
     sys/sys              param.h 
   Log:
   Our fts(3) API, as inherited from 4.4BSD, suffers from integer
   fields in FTS and FTSENT structs being too narrow.  In addition,
   the narrow types creep from there into fts.c.  As a result, fts(3)
   consumers, e.g., find(1) or rm(1), can't handle file trees an ordinary
   user can create, which can have security implications.
   
   To fix the historic implementation of fts(3), OpenBSD and NetBSD
   have already changed <fts.h> in somewhat incompatible ways, so we
   are free to do so, too.  This change is a superset of changes from
   the other BSDs with a few more improvements.  It doesn't touch
   fts(3) functionality; it just extends integer types used by it to
   match modern reality and the C standard.
   
   Here are its points:
   
   o For C object sizes, use size_t unless it's 100% certain that
     the object will be really small.  (Note that fts(3) can construct
     pathnames _much_ longer than PATH_MAX for its consumers.)
   
   o Avoid the short types because on modern platforms using them
     results in larger and slower code.  Change shorts to ints as
     follows:
   
           - For variables than count simple, limited things like states,
             use plain vanilla `int' as it's the type of choice in C.
   
           - For a limited number of bit flags use `unsigned' because signed
             bit-wise operations are implementation-defined, i.e., unportable,
             in C.
   
   o For things that should be at least 64 bits wide, use long long
     and not int64_t, as the latter is an optional type.  See
     FTSENT.fts_number aka FTS.fts_bignum.  Extending fts_number `to
     satisfy future needs' is pointless because there is fts_pointer,
     which can be used to link to arbitrary data from an FTSENT.
     However, there already are fts(3) consumers that require fts_number,
     or fts_bignum, have at least 64 bits in it, so we must allow for them.
   
   o For the tree depth, use `long'.  This is a trade-off between making
     this field too wide and allowing for 64-bit inode numbers and/or
     chain-mounted filesystems.  On the one hand, `long' is almost
     enough for 32-bit filesystems on a 32-bit platform (our ino_t is
     uint32_t now).  On the other hand, platforms with a 64-bit (or
     wider) `long' will be ready for 64-bit inode numbers, as well as
     for several 32-bit filesystems mounted one under another.  Note
     that fts_level has to be signed because -1 is a magic value for it,
     FTS_ROOTPARENTLEVEL.
   
   o For the `nlinks' local var in fts_build(), use `long'.  The logic
     in fts_build() requires that `nlinks' be signed, but our nlink_t
     currently is uint16_t.  Therefore let's make the signed var wide
     enough to be able to represent 2^16-1 in pure C99, and even 2^32-1
     on a 64-bit platform.  Perhaps the logic should be changed just
     to use nlink_t, but it can be done later w/o breaking fts(3) ABI
     any more because `nlinks' is just a local var.
   
   This commit also inludes supporting stuff for the fts change:
   
   o Preserve the old versions of fts(3) functions through libc symbol
   versioning because the old versions appeared in all our former releases.
   
   o Bump __FreeBSD_version just in case.  There is a small chance that
   some ill-written 3-rd party apps may fail to build or work correctly
   if compiled after this change.
   
   o Update the fts(3) manpage accordingly.  In particular, remove
   references to fts_bignum, which was a FreeBSD-specific hack to work
   around the too narrow types of FTSENT members.  Now fts_number is
   at least 64 bits wide (long long) and fts_bignum is an undocumented
   alias for fts_number kept around for compatibility reasons.  According
   to Google Code Search, the only big consumers of fts_bignum are in
   our own source tree, so they can be fixed easily to use fts_number.
   
   o Mention the change in src/UPDATING.
   
   PR:             bin/104458
   Approved by:    re (quite a while ago)
   Discussed with: deischen (the symbol versioning part)
   Reviewed by:    -arch (mostly silence); das (generally OK, but we didn't
                   agree on some types used; assuming that no objections on
                   -arch let me to stick to my opinion)
   
   Revision  Changes    Path
   1.517     +14 -0     src/UPDATING
   1.12      +11 -18    src/include/fts.h
   1.131     +1 -1      src/lib/libc/gen/Makefile.inc
   1.8       +11 -8     src/lib/libc/gen/Symbol.map
   1.30      +28 -9     src/lib/libc/gen/fts-compat.c
   1.13      +0 -13     src/lib/libc/gen/fts-compat.h
   1.24      +6 -21     src/lib/libc/gen/fts.3
   1.29      +13 -41    src/lib/libc/gen/fts.c
   1.329     +1 -1      src/sys/sys/param.h
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->patched 
State-Changed-By: yar 
State-Changed-When: Sat Feb 9 14:35:48 UTC 2008 
State-Changed-Why:  
Fixed in CURRENT after all. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=104458 
State-Changed-From-To: patched->closed 
State-Changed-By: yar 
State-Changed-When: Mon Aug 18 11:51:57 UTC 2008 
State-Changed-Why:  
The fix breaks the ABI and there was a lot of fuss against its 
implementation which I don't want to go through again, so the fix 
will not be merged to stable branches. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=104458 
>Unformatted:
