Newsgroups: comp.mail.misc
Path: utzoo!utgpu!trigraph!john
From: john@trigraph.uucp (John Chew)
Subject: Re: Perl version of from (Was: Re: from.sed (v1.2))
Message-ID: <1989Dec29.170942.16243@trigraph.uucp>
Sender: "John J. Chew" <john@trigraph.UUCP>
Reply-To: "John J. Chew" <poslfit@gpu.UTCS.UToronto.CA>
Organization: Trigraph Inc., Toronto, Canada
References: <1989Dec20.222732.5633@trigraph.uucp> <JV.89Dec21221143@mhres.mh.nl>
Date: Fri, 29 Dec 89 17:09:42 GMT

In article <1989Dec20.222732.5633@trigraph.uucp> 
  I posted a sed script that does the job of from(1).

In <JV.89Dec21221143@mhres.mh.nl> Johan Vromans <jv@mh.nl> 
  posted a perl script that does the same thing.

I've tried both out on various mailboxes and have come to the
following conclusions:

1. On small mailboxes, the compilation-time overhead of perl makes it a pig.

2. On large mailboxes, especially those containing long messages, perl
   can catch up to sed.

3. The following patch to Johan Vromans' perl script speeds it up by
   as much as 30% on large files, by tightening the search-for-From_
   loop.

*** old/from.jv.pl	Fri Dec 29 12:05:15 1989
--- from.jv.pl	Fri Dec 29 11:42:13 1989
***************
*** 30,40
    
  
  # read through input file(s)
! while ( $line = <> ) {
!   chop ($line);
! 
!   # scan until "From_" header found
!   next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
    $from = $1;  
    $date = $2;
    if ( $date eq "" || $from eq "" ) {

--- 30,39 -----
    
  
  # read through input file(s)
! while (<>) {
!   next unless /^From /;
!   chop;
!   next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
    $from = $1;  
    $date = $2;
    if ( $date eq "" || $from eq "" ) {
***************
*** 38,44
    $from = $1;  
    $date = $2;
    if ( $date eq "" || $from eq "" ) {
!     print STDERR "Possible garbage: $line\n";
      next;
    }
  

--- 37,43 -----
    $from = $1;  
    $date = $2;
    if ( $date eq "" || $from eq "" ) {
!     print STDERR "Possible garbage: $_\n";
      next;
    }
  
I'll keep both scripts around for now.  I actually prefer the notion
of writing such things in perl, but when your mail machine is a heavily-
used VAX-11/750 you can't afford luxuries....

John
-- 
john j. chew, iii   		  phone: +1 416 425 3818     AppleLink: CDA0329
trigraph, inc., toronto, canada   {uunet!utai!utcsri,utgpu,utzoo}!trigraph!john
dept. of math., u. of toronto     poslfit@{utorgpu.bitnet,gpu.utcs.utoronto.ca}
