[HN Gopher] QRS: Epsilon Wrangling
___________________________________________________________________
QRS: Epsilon Wrangling
Author : zdw
Score : 17 points
Date : 2025-07-09 20:45 UTC (3 days ago)
(HTM) web link (www.tbray.org)
(TXT) w3m dump (www.tbray.org)
| o11c wrote:
| I also have been playing with regexes recently. One observation
| that I've made: if you're willing to setting for something
| slightly _weaker_ than regexes, you can make your code _trivial_
| to understand (and go straight to a DFA instead of going through
| NFA). AFAICT the only "hard" case (which I'm erroring out on)
| involves things like /(AB)+|(AC)+/ (where A, B, and C are
| arbitrarily complex patterns), since everything else can be
| easily simplified. And at least for the contexts I care about,
| that kind of regex is exceptionally rare.
|
| ... I probably should actually read the papers on how to do it
| properly though. Last time I tried, I got stuck in a tangle of
| "why does C make efficient hashmaps so hard!" - this time, I'm
| avoiding C until I have a fully-tested prototype (current status:
| 1.0KLoC logic, 0.5KLoC comments, 4.4KLoC test suite, 40% tests
| failing after a recent refactor [edit: I forgot Python enums
| don't compare equal to integers], 100% of the time being annoyed
| at how stupidly slow Python is if you use obscure programming
| features like "loops", "strings", "branching", or "functions").
| kragen wrote:
| Plausibly this approach is trivial to understand and implements
| full regexes, but it is slower than the NFA or DFA approach:
| http://canonical.org/~kragen/sw/dev3/redl.py
|
| PEGs are in some ways more powerful than regexes, and in other
| ways less powerful, but usually the former matter more. This is
| not _trivial_ to understand but I think it 's not that hard
| either; it's a page of code: https://github.com/kragen/peg-
| bootstrap/blob/master/peg.md
|
| That version doesn't memoize and so doesn't enjoy Packrat
| parsing's linear-time guarantee, but you can easily modify it
| to do so.
|
| Another subset of regexes that's easy to understand is this
| single-page pattern matcher by Rob Pike from TPOP:
| https://www.cs.princeton.edu/courses/archive/spr09/cos333/be...
| which is _enormously_ less code than my single-page PEG parser
| generator above. But then, it doesn 't have to compile itself.
| o11c wrote:
| Unfortunately, neither "waste time" nor "waste space" are
| approaches worth pursuing.
|
| We already have too many programs being written that are too
| simple and thus slow and/or wrong. We need to write code that
| is as simple as possible, but _no simpler_.
| kragen wrote:
| In practice you can _always_ make a program take less space
| if it can use more time, or take less time if it can use
| more space; the guiltless perfection you seek does not
| exist.
|
| I feel like PEG parsing can be fast and space-efficient,
| but I haven't seen an existence proof yet.
___________________________________________________________________
(page generated 2025-07-12 23:01 UTC)