From peterjeremy@acm.org  Fri Jun 25 22:56:44 2010
Return-Path: <peterjeremy@acm.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 16F77106566B
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 25 Jun 2010 22:56:44 +0000 (UTC)
	(envelope-from peterjeremy@acm.org)
Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au [211.29.133.51])
	by mx1.freebsd.org (Postfix) with ESMTP id 9B4E38FC14
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 25 Jun 2010 22:56:43 +0000 (UTC)
Received: from server.vk2pj.dyndns.org (c211-30-160-13.belrs4.nsw.optusnet.com.au [211.30.160.13])
	by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o5PMue4B024972
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 26 Jun 2010 08:56:41 +1000
Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1])
	by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id o5PMucCi037028;
	Sat, 26 Jun 2010 08:56:38 +1000 (EST)
	(envelope-from peter@server.vk2pj.dyndns.org)
Received: (from peter@localhost)
	by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id o5PMucNC037027;
	Sat, 26 Jun 2010 08:56:38 +1000 (EST)
	(envelope-from peter)
Message-Id: <201006252256.o5PMucNC037027@server.vk2pj.dyndns.org>
Date: Sat, 26 Jun 2010 08:56:38 +1000 (EST)
From: Peter Jeremy <peterjeremy@acm.org>
Reply-To: Peter Jeremy <peterjeremy@acm.org>
To: FreeBSD-gnats-submit@freebsd.org
Subject: Poor file(1) performance
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         148150
>Category:       bin
>Synopsis:       Poor file(1) performance
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    edwin
>State:          analyzed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jun 25 23:00:09 UTC 2010
>Closed-Date:    
>Last-Modified:  Sun Mar 25 03:42:53 UTC 2012
>Originator:     Peter Jeremy
>Release:        FreeBSD 8.1-PRERELEASE amd64
>Organization:
n/a
>Environment:
System: FreeBSD server.vk2pj.dyndns.org 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #4: Sun Jun 13 09:18:30 EST 2010 root@server.vk2pj.dyndns.org:/var/obj/usr/src/sys/server amd64

FreeBSD aspire.vk2pj.dyndns.org 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #12: Mon Jun 14 11:34:12 EST 2010     root@builder.vk2pj.dyndns.org:/obj/usr/src/sys/aspire  i386

>Description:

	I recently had reason to run file(1) on a large number of
	files and felt the performance wasn't up to par.  When I
	investigated further, I found that about 95% of the runtime
	related to the two regex's to recognize REXX files:

# OS/2 batch files are REXX. the second regex is a bit generic, oh well
# the matched commands seem to be common in REXX and uncommon elsewhere
100	regex/c =3D^[\ \t]{0,10}call[\ \t]{1,10}rxfunc OS/2 REXX batch file text
100	regex/c =3D^[\ \t]{0,10}say\ ['"]	     OS/2 REXX batch file text

	Since REXX files are not present in my environment, I can
	avoid the issue by just commenting out the offending lines.
	Someone with more expertise in magic(5) might be able to
	suggest a better fix.

	I have tried reporting this to the upstream maintainers and
`	received a "not interested" response.

>How-To-Repeat:
	Copy /usr/share/misc/magic to magic.old
	Apply the equivalent of the below patch to create magic.new
	time(1) file(1) on the same set of files using magic.old and magic.new

	Using my home directory on my i386 netbook, I get:
file -m magic.new * > /dev/null  1.42s user 0.13s system 98% cpu 1.576 total
file -m magic.new * > /dev/null  1.35s user 0.10s system 98% cpu 1.469 total
file -m magic.new * > /dev/null  1.35s user 0.10s system 98% cpu 1.470 total
file -m magic.old * > /dev/null  33.35s user 0.11s system 98% cpu 34.055 total
file -m magic.old * > /dev/null  33.12s user 0.14s system 98% cpu 33.714 total
file -m magic.old * > /dev/null  33.08s user 0.11s system 98% cpu 33.606 total

	Using my home directory on my amd64 desktop, I get:
file -m magic.new * > /dev/null  2.18s user 0.41s system 28% cpu 9.111 total
file -m magic.new * > /dev/null  2.11s user 0.49s system 24% cpu 10.707 total
file -m magic.new * > /dev/null  2.05s user 0.56s system 23% cpu 10.989 total
file -m magic.old * > /dev/null  28.54s user 0.51s system 78% cpu 37.088 total
file -m magic.old * > /dev/null  28.54s user 0.52s system 89% cpu 32.575 total
file -m magic.old * > /dev/null  28.71s user 0.47s system 99% cpu 29.371 total

	The poorer wallclock performance on my amd64 is because it's
	running ZFS without adequate RAM whereas my netbook is UFS on SSD
	and the actual directory contents are completely different.
>Fix:

	The following just comments out the REXX test.

Index: Magdir/msdos
===================================================================
RCS file: /usr/ncvs/src/contrib/file/Magdir/msdos,v
retrieving revision 1.3
diff -u -r1.3 msdos
--- Magdir/msdos	4 May 2009 00:37:44 -0000	1.3
+++ Magdir/msdos	19 Jun 2010 03:23:23 -0000
@@ -18,8 +18,8 @@
 
 # OS/2 batch files are REXX. the second regex is a bit generic, oh well
 # the matched commands seem to be common in REXX and uncommon elsewhere
-100	regex/c =^[\ \t]{0,10}call[\ \t]{1,10}rxfunc OS/2 REXX batch file text
-100	regex/c =^[\ \t]{0,10}say\ ['"]	     OS/2 REXX batch file text
+#100	regex/c =^[\ \t]{0,10}call[\ \t]{1,10}rxfunc OS/2 REXX batch file text
+#100	regex/c =^[\ \t]{0,10}say\ ['"]	     OS/2 REXX batch file text
 
 0	leshort		0x14c	MS Windows COFF Intel 80386 object file
 #>4	ledate		x	stamp %s
>Release-Note:
>Audit-Trail:

From: Garrett Cooper <yanefbsd@gmail.com>
To: Peter Jeremy <peterjeremy@acm.org>
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: bin/148150: Poor file(1) performance
Date: Fri, 25 Jun 2010 17:54:37 -0700

 FWIW I think that this is more indicative of poor regexp(3)
 performance or possibly tighter constraints placed on the regexp
 compiler / parser to do the act of parsing the string.
 
 Not saying that what you proposed isn't valid, but it's definitely an
 interesting note that should be brought up to the upstream folks.
 
 Thanks!
 -Garrett

From: Peter Jeremy <peterjeremy@acm.org>
To: Garrett Cooper <yanefbsd@gmail.com>
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: bin/148150: Poor file(1) performance
Date: Sat, 26 Jun 2010 12:43:58 +1000

 --d6Gm4EdcadzBjdND
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On 2010-Jun-25 17:54:37 -0700, Garrett Cooper <yanefbsd@gmail.com> wrote:
 >FWIW I think that this is more indicative of poor regexp(3)
 >performance or possibly tighter constraints placed on the regexp
 >compiler / parser to do the act of parsing the string.
 
 There are several other regexp's that don't have the same impact.  My
 suspicion is that at least some of the problem is the 100 line offset.
 My feeling is that moving the existing regexp down a level and
 wrapping it with a more efficient test that selects potential REXX
 files might be the optimal solution but I don't know either REXX or
 magic(5) well enough to suggest one.
 
 --=20
 Peter Jeremy
 
 --d6Gm4EdcadzBjdND
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (FreeBSD)
 
 iEYEARECAAYFAkwlaW0ACgkQ/opHv/APuIe9BACgh4vcx4eyZWK/tFLNqHNNvQ5d
 syIAnAhvs+z5WaYOI9r6SvXVgIuabpPo
 =iSLZ
 -----END PGP SIGNATURE-----
 
 --d6Gm4EdcadzBjdND--
State-Changed-From-To: open->analyzed 
State-Changed-By: edwin 
State-Changed-When: Thu Oct 21 20:53:01 UTC 2010 
State-Changed-Why:  
The best patch so far without loosing features would be: 

-100    regex/c =^[ t]{0,10}call[ t]{1,10}rxfunc OS/2 REXX batch file text 
-100    regex/c =^[ t]{0,10}say ['"]      OS/2 REXX batch file text 
+100     regex/c =^[ t]+call[ t]+rxfunc OS/2 REXX batch file text 
+100     regex/c =^[ t]+say ['"]           OS/2 REXX batch file text 

which reduces the worst-case time from 1900 ms to 450 ms, full 
removal would reduce it back to 170 ms. Worst case scenario is the 
file aclocal.m4 in the contrib/file directory. 



Responsible-Changed-From-To: freebsd-bugs->obrien 
Responsible-Changed-By: edwin 
Responsible-Changed-When: Thu Oct 21 20:53:01 UTC 2010 
Responsible-Changed-Why:  
obrien@ wants to pre-approve changes or do them himself. 

obrien@: Let me know if I can commit this one! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=148150 
Responsible-Changed-From-To: obrien->freebsd-bugs 
Responsible-Changed-By: eadler 
Responsible-Changed-When: Sun Mar 25 03:38:14 UTC 2012 
Responsible-Changed-Why:  
timeout 

http://www.freebsd.org/cgi/query-pr.cgi?pr=148150 
Responsible-Changed-From-To: freebsd-bugs->edwin 
Responsible-Changed-By: eadler 
Responsible-Changed-When: Sun Mar 25 03:42:53 UTC 2012 
Responsible-Changed-Why:  
timeout 

http://www.freebsd.org/cgi/query-pr.cgi?pr=148150 
>Unformatted:
