[HN Gopher] Regular expression matching can be simple and fast (...
       ___________________________________________________________________
        
       Regular expression matching can be simple and fast (2007)
        
       Author : momonga
       Score  : 65 points
       Date   : 2024-05-21 01:26 UTC (10 hours ago)
        
 (HTM) web link (swtch.com)
 (TXT) w3m dump (swtch.com)
        
       | jjice wrote:
       | This and the associated series of regex posts from Russ Cox are
       | probably the best content on practical regular expression engine
       | internals you can get. If you have a cursory understanding of
       | NFAs and DFAs, it's some of the best technical writing I've ever
       | had the pleasure to read.
       | 
       | These posts are responsible for my love of regular expressions :)
        
         | kqr wrote:
         | And if you don't have a cursory understanding of NFAs and DFAs,
         | they're still great for getting that cursory understanding.
         | 
         | These articles were my introduction to finite automata (and
         | almost the only place I've learned about them aside from
         | practical use) and my intuition for them seem to often be
         | better than my peers'.
        
           | banish-m4 wrote:
           | NFAs are apparently undecidable since the same input can go
           | to multiple states. DFA states can only go to one state when
           | given a particular input sequence.
           | 
           | The Dragon book contains an NFA to DFA algorithm (powerset)
           | which explodes states with multiple exiting token
           | transitions.
           | 
           | PS: All your DAGs are belong to us.
        
             | gliptic wrote:
             | NFAs are decidable. As you said, a NFA can be converted to
             | a DFA which is decidable. Maybe you mean they are non-
             | deterministic (that's in the name)?
        
       | ChrisArchitect wrote:
       | Some discussion from 5 years ago:
       | 
       | https://news.ycombinator.com/item?id=20310669
        
         | dang wrote:
         | Thanks! Macroexpanded:
         | 
         |  _Regular Expression Matching Can Be Simple and Fast (2007)_ -
         | https://news.ycombinator.com/item?id=20310669 - June 2019 (33
         | comments)
         | 
         |  _Regular Expression Matching Can Be Simple and Fast (2007)_ -
         | https://news.ycombinator.com/item?id=16341519 - Feb 2018 (20
         | comments)
         | 
         |  _Regular Expression Matching Can Be Simple and Fast (2007)_ -
         | https://news.ycombinator.com/item?id=9374858 - April 2015 (16
         | comments)
         | 
         |  _Regular Expression Matching Can Be Simple And Fast [2007]_ -
         | https://news.ycombinator.com/item?id=6593331 - Oct 2013 (1
         | comment)
         | 
         |  _Regular Expression Matching Can Be Simple And Fast_ -
         | https://news.ycombinator.com/item?id=820201 - Sept 2009 (15
         | comments)
         | 
         |  _Regular Expression Matching Can Be Simple And Fast_ -
         | https://news.ycombinator.com/item?id=466845 - Feb 2009 (17
         | comments)
        
       | rurban wrote:
       | Perl needed a couple of years to catchup. At first they had no
       | idea what Russ was talking about
        
       | adrianN wrote:
       | Regexes can be fast and simple, but sometimes it's faster and
       | simpler to use normal string search. I recently replaced a regex
       | with two finds for 10x speedup.
        
         | secondcoming wrote:
         | Well if people spend time learning arcane regex syntax it'd be
         | a shame to not use them!
         | 
         | But yes, you're right. Sometimes simple finds are sufficient.
        
         | cies wrote:
         | Or write a parser!
         | 
         | In most cases the parser will not be shorter (code) than the
         | regex, but it will most certainly be simpler (cognitively; it's
         | hard to understand all that regexes do, they're close to magic)
         | and more testable (you can test the parts of the parser, like
         | the number parser or the string parser, individually), and
         | often faster too.
        
           | cies wrote:
           | https://stackoverflow.com/questions/11905506/regular-
           | express...
        
       | chx wrote:
       | > As mentioned earlier, no one knows how to implement regular
       | expressions with backreferences efficiently, though no one can
       | prove that it's impossible either. (Specifically, the problem is
       | NP-complete, meaning that if someone did find an efficient
       | implementation, that would be major news to computer scientists
       | and would win a million dollar prize.)
       | 
       | It really needs to be noted how NP-complete and an algorithm
       | efficient _in practice_ have nothing to do with each other. If an
       | algorithm takes C * 1.000...eighty zeroes...1^n time it is
       | exponential but for any n you will encounter in real life the
       | time it takes will be indistinguishable from C.
        
         | bubblyworld wrote:
         | There's a fun list of problems like this here:
         | https://cstheory.stackexchange.com/questions/6660/polynomial...
         | 
         | On the other hand, it's remarkable that so many algorithms _do_
         | have reasonable constants/exponents. So the concept works
         | pretty well.
        
       ___________________________________________________________________
       (page generated 2024-05-21 12:01 UTC)