hngopher.com

       [HN Gopher] QRS: Epsilon Wrangling
       ___________________________________________________________________
        
       QRS: Epsilon Wrangling
        
       Author : zdw
       Score  : 17 points
       Date   : 2025-07-09 20:45 UTC (3 days ago)
        
 (HTM) web link (www.tbray.org)
 (TXT) w3m dump (www.tbray.org)
        
       | o11c wrote:
       | I also have been playing with regexes recently. One observation
       | that I've made: if you're willing to setting for something
       | slightly _weaker_ than regexes, you can make your code _trivial_
       | to understand (and go straight to a DFA instead of going through
       | NFA). AFAICT the only  "hard" case (which I'm erroring out on)
       | involves things like /(AB)+|(AC)+/ (where A, B, and C are
       | arbitrarily complex patterns), since everything else can be
       | easily simplified. And at least for the contexts I care about,
       | that kind of regex is exceptionally rare.
       | 
       | ... I probably should actually read the papers on how to do it
       | properly though. Last time I tried, I got stuck in a tangle of
       | "why does C make efficient hashmaps so hard!" - this time, I'm
       | avoiding C until I have a fully-tested prototype (current status:
       | 1.0KLoC logic, 0.5KLoC comments, 4.4KLoC test suite, 40% tests
       | failing after a recent refactor [edit: I forgot Python enums
       | don't compare equal to integers], 100% of the time being annoyed
       | at how stupidly slow Python is if you use obscure programming
       | features like "loops", "strings", "branching", or "functions").
        
         | kragen wrote:
         | Plausibly this approach is trivial to understand and implements
         | full regexes, but it is slower than the NFA or DFA approach:
         | http://canonical.org/~kragen/sw/dev3/redl.py
         | 
         | PEGs are in some ways more powerful than regexes, and in other
         | ways less powerful, but usually the former matter more. This is
         | not _trivial_ to understand but I think it 's not that hard
         | either; it's a page of code: https://github.com/kragen/peg-
         | bootstrap/blob/master/peg.md
         | 
         | That version doesn't memoize and so doesn't enjoy Packrat
         | parsing's linear-time guarantee, but you can easily modify it
         | to do so.
         | 
         | Another subset of regexes that's easy to understand is this
         | single-page pattern matcher by Rob Pike from TPOP:
         | https://www.cs.princeton.edu/courses/archive/spr09/cos333/be...
         | which is _enormously_ less code than my single-page PEG parser
         | generator above. But then, it doesn 't have to compile itself.
        
           | o11c wrote:
           | Unfortunately, neither "waste time" nor "waste space" are
           | approaches worth pursuing.
           | 
           | We already have too many programs being written that are too
           | simple and thus slow and/or wrong. We need to write code that
           | is as simple as possible, but _no simpler_.
        
             | kragen wrote:
             | In practice you can _always_ make a program take less space
             | if it can use more time, or take less time if it can use
             | more space; the guiltless perfection you seek does not
             | exist.
             | 
             | I feel like PEG parsing can be fast and space-efficient,
             | but I haven't seen an existence proof yet.
        
       ___________________________________________________________________
       (page generated 2025-07-12 23:01 UTC)