[HN Gopher] Rob Pike's simple C regex matcher in Go
       ___________________________________________________________________
        
       Rob Pike's simple C regex matcher in Go
        
       Author : todsacerdoti
       Score  : 277 points
       Date   : 2022-08-12 03:23 UTC (19 hours ago)
        
 (HTM) web link (benhoyt.com)
 (TXT) w3m dump (benhoyt.com)
        
       | antirez wrote:
       | Apparently I reinvented almost the same code several years after
       | Pike:
       | 
       | https://github.com/redis/redis/blob/99ebbee2b260b3466672b4fa...
        
         | hddqsb wrote:
         | Just a heads up that this implementation has exponential time
         | complexity on pathological patterns.
         | 
         | Demo:
         | https://tio.run/##tVddc6IwFH3nV0Q7WyXFIlrdUWr7uC/9B213BkIQKo...
         | pattern = "a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z"
         | text    =
         | "aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzza"
         | running time = 1.6s
         | 
         | It should be relatively easy to fix this. Insight: If the
         | pattern contains two or more wildcards (i.e. it has the form
         | "<A>*<B>*<C>"), and you've found a match for "<A>*<B>", but you
         | can't find a match for "*<C>", you can stop immediately without
         | considering longer matches for the first wildcard. The reason
         | is that a longer match for the first wildcard will mean you to
         | start looking for "*<C>" later in the string, and that's
         | strictly harder.
        
           | ball_of_lint wrote:
           | That might help, but I think it goes in the wrong direction -
           | The proper way to implement a regex matcher if you are
           | worried about performance against arbitrary patterns is to
           | build a Finite State Machine. It's not as simple of code and
           | involves knowing some computer science theory, but results in
           | a matcher with better time complexity guarantees.
        
             | ridiculous_fish wrote:
             | Finite state machines also have their cliffs, in particular
             | bounded repetition. Rust's FA-based regex will explode with
             | /.{12345}/, while v8's backtracking engine handles it no
             | problem.
        
               | burntsushi wrote:
               | Yes, but please be careful here. The cliffs are
               | categorically different. The cliffs of a finite state
               | machine are still bounded to O(m * n), where m ~
               | len(regex) and n ~ len(haystack). The cliffs of a
               | backtracker are exponential _and_ difficult to predict.
               | 
               | The repetition operator provokes the cliff because it
               | represents short-hand syntax for easily increasing `m` to
               | very large sizes. Indeed, by default, '.{12345}' will
               | fail to compile because it blows the default limit. That
               | only works because, as I said, the cliffs for finite
               | automata based regex engines are far more predictable.
        
             | hddqsb wrote:
             | [Note: This comment tree is about the Redis code, which
             | uses glob-style patterns rather than regex. So in this
             | comment and in my grandparent comment, "*" means "any
             | string" rather than "zero or more of the previous
             | pattern".]
             | 
             | An NFA / DFA would be a perfectly valid way to implement an
             | efficient matcher, but it's not the only way.
             | 
             | For example, memoization
             | (https://news.ycombinator.com/item?id=32434412#32436442)
             | can be easily shown to have time complexity O(P * T) where
             | P = len(pattern) and T = len(text), which is the same as a
             | basic NFA implementation.
             | 
             | In the case of the Redis code, my proposed fix requires
             | much less code and memory than memoization or an NFA, and
             | has time complexity O(P * T). It's slightly harder to prove
             | this bound; here is a sketch of the proof:
             | 
             | Matching against a pattern fragment that does not contain
             | "*" takes O(len(pattern_fragment)).
             | 
             | When a call to stringmatchlen() encounters a wildcard, it
             | may recurse O(T) times (and then it returns). Each
             | recursive call can be classified either "leaf" (if there is
             | a match failure before the next wildcard) or "full" (if the
             | match reaches the next wildcard).
             | 
             | The insight I described leads to at most one "full" call at
             | each level of the recursion (i.e. for each wildcard in the
             | pattern). So for a particular pattern fragment, the number
             | of "leaf" calls that process it is O(T). Hence the overall
             | time complexity is O(P * T).
        
               | burntsushi wrote:
               | See also:
               | https://github.com/google/re2/blob/main/re2/bitstate.cc
               | 
               | Although its space complexity is P*T, which limits its
               | use to small regexes/haystacks. It sounds like your claim
               | is that your technique achieves a smaller space bound?
        
               | hddqsb wrote:
               | Yes. My proposed fix takes O(P * T) time and O(1) memory
               | (but it only supports glob-style wildcards; it doesn't
               | support repetition of specific characters / character
               | classes / sub-patterns).
        
               | burntsushi wrote:
               | Ah gotya. An important clarification indeed. Thanka for
               | that!
        
               | [deleted]
        
       | jll29 wrote:
       | I like the article that this post refers to about efficient
       | implementation (https://swtch.com/~rsc/regexp/regexp1.html). It
       | mentions a debate between J. Friedl and comp. sci. purists about
       | whether NFAs and regular expressions should really be called what
       | they are in the face of backreferences and other notation. Friedl
       | suggests another name should be used, because de facto "NFAs" are
       | no longer equivalent to DFAs.
       | 
       | Since Xerox Research Europe's invention of xfst, the term
       | "extended regular expression" is already taken for a specific
       | superclass of expressions that describe regular relations.
       | 
       | So I propose two alternatives, _semi-regular expressions_ and
       | _super-regular expressions_ , for strings that look like regular
       | expressions but that use non-regular extensions like
       | backreferences that push the complexity beyond what can be done
       | with NDAs/DFAs.
        
         | flashgordon wrote:
         | Hah this is a classic. I actually used this as a reference to
         | build my own about a year ago in typescript (documentation a
         | long way to go and still WIP) which I am using in a couple of
         | production ready dsls:
         | 
         | https://github.com/panyam/tlex
        
         | 082349872349872 wrote:
         | Of those, I'd prefer _semi-regular expressions_ (but personally
         | currently lean towards _irregular expressions_ ). As they lose
         | the nice properties, there's certainly nothing " _super_ "-
         | _regular_ about them.
        
           | catlifeonmars wrote:
           | What about pseudo-regular expressions?
        
           | LudwigNagasena wrote:
           | Super-x may also mean that it has some extra properties wrt
           | x. In fact, it was the original meaning that has drifted to
           | its more common use today. Cf. "hyper" as in "hypersonic".
           | Both "super" and "hyper" even mean the same thing
           | ("over"/"above") in their original languages (Latin and Greek
           | respectively).
        
         | [deleted]
        
         | eru wrote:
         | What really irks me that there are several useful operations
         | that can be supported by proper regular expressions in linear
         | time, but those franken-expressions never feature them.
         | 
         | I am mostly talking about set intersection and complement. But
         | https://en.wikipedia.org/wiki/Regular_language#Closure_prope...
         | has some more.
        
           | mananaysiempre wrote:
           | Burntsushi (of regex crate / ripgrep fame) is on record
           | saying[1] he thinks those are intractable in a standard DFA
           | implementation on a large alphabet (Unicode), and if he can't
           | do it, I don't think there's much hope for the rest of us :)
           | They are relatively straightforward in derivative-based
           | implementations[2,3], but those are kind of meh performance-
           | wise AFAIU, and just eagerly building the DFA using
           | derivatives[4] yields huge automata.
           | 
           | (To be fair, the regex crate can't match canonically
           | equivalent Unicode strings, which I also think is pretty meh,
           | but I've played with Unicode normalization recently and I'm
           | convinced it would be impossible to support canonical
           | equivalence with the same order of magnitude in performance
           | without going through _horrible_ contortions.)
           | 
           | [1] https://news.ycombinator.com/item?id=31693008
           | 
           | [2] https://news.ycombinator.com/item?id=17652542
           | 
           | [3] https://news.ycombinator.com/item?id=8199902
           | 
           | [4] https://news.ycombinator.com/item?id=25604829
        
             | burntsushi wrote:
             | > To be fair, the regex crate can't match canonically
             | equivalent Unicode strings, which I also think is pretty
             | meh
             | 
             | Yeah not "meh" at all. :-) I don't think any regex engine
             | can do it? Does icu's regex engine even do it? Certainly no
             | fsm regex engine does.
             | 
             | UTS#18 doesn't even ask regex engines to do it. If you need
             | it, it just suggests canonicalizing both the regex and the
             | haystack first.
             | 
             | And yes, derivatives build full DFAs. Those are a no-go in
             | a general purpose regex engine.
             | 
             | There's also the issue of complement and intersection being
             | somewhat difficult to reason about.
             | 
             | I'll also note that I have never given any serious effort
             | to actually trying to implement complement or intersection.
             | I just don't see any viable implememtation path to it in
             | the first place. I don't know where I would start.
             | Everything I cam think of involves extreme compile-time
             | costs that would just be totally unacceptable in a general
             | purpose regex engine.
             | 
             | But, I am not really an "ideas" person. I see myself more
             | as an engineer. So my inability should not really be taken
             | for gospel. Be skeptical of expertise. That's the
             | foundation of science. :)
        
               | mananaysiempre wrote:
               | No, I don't think anybody can do it :)
               | 
               | Unless I can't read, ICU4C does not even do streaming
               | normalization, and the buffer-at-a-time one is around 100
               | --200 MB/s or so[1], which looks ... mildly amusing next
               | to your engine. My own attempt at streaming normalization
               | is currently about two times slower, because it has
               | deeper (and smaller) tables and is just generally dumb,
               | but then ICU4C is not particularly smart either. I expect
               | that normalization at 2x ICU speed or more is possible,
               | but that still won't make normalize-then-match
               | competitive with no normalization, while encoding all
               | canonical equivalents into the DFA sounds downright
               | hellish (also strictly speaking impossible, given
               | normalization requires unbounded buffer space, but that
               | part is probably solvable).
               | 
               | UTS #18 level 2 (which I'm aware you explicitly did not
               | attempt) kind of says some things about canonical
               | equivalence and then goes "nah, normalization is
               | difficult, let's go match grapheme clusters, also maybe
               | just prenormalize to NFD"[2]. Which just looks weird, but
               | apparently it means they did try to impose a requirement
               | then gave up more than a decade ago[3]. Does anything
               | even fully implement level 2, I wonder?
               | 
               | So, meh, but not really directed at you--regex is
               | supernaturally fast as far as I'm concerned. Perhaps I'm
               | annoyed at Unicode normalization for being (not really
               | complicated but) so punishingly difficult to do quickly
               | it isn't really clear how you could wire it into a regex
               | matcher.
               | 
               | There is also the question of how normalization-invariant
               | regexes should even work: should "<C-caron>." match
               | "<C-cedilla><caron>"? UTS #18 (with invariance
               | requirement restored) seems to imply yes, but it's hard
               | to imagine when anyone would actually want that. (Note
               | that both pattern and haystack are normalized--the
               | corresponding regex for NFD input would be something like
               | "(C<caron>[[:ccc=0:][:ccc>=230:]]|C[[:ccc>0:]&[:ccc<230:]
               | ]<caron>)", but nothing implements that syntax.) So while
               | UTS #18 gives a possible interpretation of how Unicode
               | regexes should work, I'm not entirely convinced it's the
               | right one, or even right enough to go on the crusade of
               | implementing a normalization-invariant regex engine.
               | 
               | [1] Or so I thought, but my tests used UTF-32 while
               | apparently UTF-16 is much faster: https://tzlaine.github.
               | io/text/doc/html/boost_text__proposed.... Ugh.
               | 
               | [2] https://www.unicode.org/reports/tr18/#Canonical_Equiv
               | alents
               | 
               | [3] https://www.unicode.org/reports/tr18/tr18-13.html#Can
               | onical_...
        
         | kazinator wrote:
         | I cannot agree here. The term "regular expression" relates to
         | "regular language".
         | 
         | So instead of inventing new words haphazardly, we should look
         | at what kind of language we are able to express with these non-
         | regular expressions, what the name of that language is and then
         | use that name for the expressions.
         | 
         | Here is a starting point: where do regular languages sit in the
         | Chomsky hierarchy:
         | 
         | https://en.wikipedia.org/wiki/Regular_language#Location_in_t...
         | 
         | Here, we see that there are also less inclusive languages
         | included in regular: finite languages and star-free languages,
         | for which the corresponding "finite expression" and "star-free
         | expression" make sense.
         | 
         | In the other direction, maybe some useful language kinds can be
         | identified between "regular" and "context-free" in relation to
         | the capabilities of certain kinds of expressions. However,
         | "context-free expression" isn't a bad start. ("Cfex" [si:fex]?)
        
         | Groxx wrote:
         | Alternatively: declare that "regex" has graduated into its own
         | proper term (not a contraction) for whatever this evolving
         | somewhat-too-capable-for-easy-classification tool is.
         | 
         | It's certainly how it's used colloquially.
        
           | amscanne wrote:
           | The term "regex" is already used interchangeably for both
           | classes. The Go regular expression library (which is widely
           | used) does _not_ support backreferences, and is guaranteed to
           | run in linear time.
           | 
           | It seems sillier to say "that's not a regex" (since it
           | definitely is a regular expression engine), than to say that
           | the others are _extended_ regular expressions or some other,
           | more flexible thing (for better and worse).
        
         | anitil wrote:
         | I've read that article so many times I knew what it was by
         | seeing the link. The first few times it didn't make sense, and
         | now it seems almost obvious - I can't remember what I was not
         | understanding.
        
       | tgv wrote:
       | It's a left-shortest matching back-tracker. Not to be used in
       | practice. I'm sure this implementation features in various text
       | books.
        
         | maccard wrote:
         | So it's doable if you know the solution and have seen it
         | before, but otherwise it's pretty damn difficult
        
           | samatman wrote:
           | The issue here is the pathological performance under crafted
           | inputs, I don't believe GP was saying it's easy to write or
           | uninteresting history.
        
       | nsonha wrote:
       | I don't see anything "beautiful" here, it works and is performant
       | but seems pretty average in term of readability
        
         | abudabi123 wrote:
         | Emacs has a DSL with the regex power and potential to make the
         | reading experience better than poking toothpicks in your
         | eyeball; experience and familiarity with the syntax sees beauty
         | where others fail to?
        
         | dev_snd wrote:
         | I never understood why incrementing a pointer while doing
         | another operation, such as a comparison, is considered good
         | style in C, TBQH.
         | 
         | I think the go example is much more readable.
        
           | rswail wrote:
           | Because C was originally for PDP/DEC equipment and the
           | instruction set had standard register access modes including
           | both pre and post increment.
           | 
           | So the (for example) strcmp "while ( _s++ ==_ d++);" made
           | sense as efficient code, because the pointer access and the
           | post increment effectively compiled down to almost a single
           | instruction.
           | 
           | https://en.wikipedia.org/wiki/PDP-11_architecture#General_re.
           | ..
        
             | im3w1l wrote:
             | x86 has cmps for doing comparison with postincrement /
             | postdecrement.
        
               | azinman2 wrote:
               | Do compilers use that if you break it apart to something
               | more legible? If you don't, does the intel cpu end up
               | doing the right thing anyway?
        
               | umanwizard wrote:
               | Yes, modern compilers should treat them the same.
        
               | rswail wrote:
               | Sure, but the point was that C was designed to implement
               | Unix and Unix was implemented on DEC machines and DEC
               | machines had a particular architecture around registers
               | that included the pre/post increment, which lead to the C
               | *p++ style to iterate.
        
         | christophilus wrote:
         | Of the C code I've seen, this is quite nice. It's decently
         | commented, and very simple. I'd say it's above average, but
         | that may say more about the codebases I've seen than it does
         | about this example.
        
       | adrianmsmith wrote:
       | Good to see the term "regexp" (with a "p") in use. I always used
       | to call it that, then at some point realized everyone was writing
       | "regex", and wondered if I'd spelled the abbreviation wrong the
       | whole time. I guess it's just a function of time, it used to be
       | spelled with a "p" and now mostly isn't.
        
         | rob74 wrote:
         | Guess that time favors shorter abbreviations. I prefer "rex"
         | myself (at least as a variable name for a regex object).
        
         | lupire wrote:
         | Regexp has been dying for decades
         | 
         | https://trends.google.com/trends/explore?date=all&geo=US&q=R...
        
         | IshKebab wrote:
         | Probably because "regex" is much easier to say and looks much
         | more like a real word than "regexp". Like "latex" or "codex".
         | Nothing ends in "exp".
        
         | smcl wrote:
         | I ended up learning it the same, possibly from the Camel Book?
         | One thing I noticed when I wasn't coding alone in my bedroom
         | and had to actually talk to other coders was that "regex" is
         | more natural to say than "regexp", so that might explain why it
         | became the preferred spelling.
        
       | metadat wrote:
       | I wonder what the non-recursive redesign would look like, without
       | using the simulate-recursion-via-callstack translation technique.
       | 
       | I slept on it and the solution still isn't clear to me.
       | 
       | As an aside: It'd be nice if the go unit tests logged an
       | indication of whether or not the ./matchC exec tests ran.
        
       | kubb wrote:
       | It's a a nicely written function... but it has the difficulty of
       | a warmup interview question or a CS101 homework.
        
         | keepquestioning wrote:
        
         | lupire wrote:
        
       | mkovach wrote:
       | I love this bit of code. It saved my butt in '99 when I had to
       | implement searches for potential "Y2K" coding issues. Some where,
       | collecting dust in CVS repository, probably saved on tape is some
       | storage bunker, is a modified version of this that searched
       | millions of lines of code for various potential "double digit
       | year" issues.
       | 
       | And this also taught me an elegant way to implement recursion in
       | C, that I shamelessly used in interviews to look way smarter than
       | I actually am.
        
       | commandersaki wrote:
       | Excellent post. I remember reading the exegesis years ago but
       | completely forgot about it. Really impressed how well the Go
       | version turned out.
       | 
       | Btw always love your writing Ben, it's very engaging.
        
       | returningfory2 wrote:
       | > I ran benchmarks [...] matching the regex Ben.*H over 100
       | concatenated repeats of the King James Bible.
       | 
       | Is there a concern with these kinds of micro-benchmarks, where
       | you repeatedly do the same small operation (matching the same
       | regex), that your results will be skewed by the branch predictor,
       | CPU caches, etc.?
        
       | user5678 wrote:
        
       | rurban wrote:
       | I also revived my old dynamic string matchers recently, which
       | don't need any costly compilation step:
       | https://github.com/rurban/tiny-matcher (in-work)
       | 
       | And with compilation https://github.com/rurban/tiny-regex-c
       | (done)
        
         | benhoyt wrote:
         | That's really nice clear code -- thanks for sharing. This would
         | be great for small embedded systems.
        
           | rurban wrote:
           | Not yet for embedded, as my variant still allocs. When I have
           | time I'll add the static variants. I'll need it for MQTT on
           | my baremetal firmwares sooner or later.
        
       | hddqsb wrote:
       | _> horrible run times on craftily-constructed regexes_
       | 
       | Here is a simple patch to make the runtime O(len(pattern) *
       | len(text)) instead of exponential. It adds 5 lines and a new
       | parameter. The idea is to memoize (cache) the results,
       | specifically the failures (there is no need to cache the
       | successes, because all the functions return immediately on
       | success).                    // Match reports whether regexp
       | matches anywhere in text.          func Match(regexp, text
       | string) bool {         +    seen := make(map[[2]int]bool)
       | if regexp != "" && regexp[0] == '^' {         -        return
       | matchHere(regexp[1:], text)         +        return
       | matchHere(regexp[1:], text, seen)              }              for
       | {         -        if matchHere(regexp, text) {         +
       | if matchHere(regexp, text, seen) {                      return
       | true                  }                  if text == "" {
       | return false                  }                  text = text[1:]
       | }          }                    // matchHere reports whether
       | regexp matches at beginning of text.         -func
       | matchHere(regexp, text string) bool {         +func
       | matchHere(regexp, text string, seen map[[2]int]bool) bool {
       | switch {              case regexp == "":                  return
       | true              case regexp == "$":                  return
       | text == ""              case len(regexp) >= 2 && regexp[1] ==
       | '*':         -        return matchStar(regexp[0], regexp[2:],
       | text)         +        return matchStar(regexp[0], regexp[2:],
       | text, seen)              case text != "" && (regexp[0] == '.' ||
       | regexp[0] == text[0]):         -        return
       | matchHere(regexp[1:], text[1:])         +        return
       | matchHere(regexp[1:], text[1:], seen)              }
       | return false          }                    // matchStar reports
       | whether c*regexp matches at beginning of text.         -func
       | matchStar(c byte, regexp, text string) bool {         +func
       | matchStar(c byte, regexp, text string, seen map[[2]int]bool) bool
       | {              for {         -        if matchHere(regexp, text)
       | {         +        if seen[[2]int{len(regexp), len(text)}] {
       | +            return false         +        }         +        if
       | matchHere(regexp, text, seen) {                      return true
       | }                  if text == "" || (text[0] != c && c != '.') {
       | return false                  }         +
       | seen[[2]int{len(regexp), len(text)}] = true                  text
       | = text[1:]              }          }
       | 
       | Demo: https://go.dev/play/p/aD9vzXwTHGE
        
       | compsciphd wrote:
       | So fun story (to me in retrospect).
       | 
       | I had an interview at google years ago where I basically I was
       | asked to come up with this on my own. In my opinion, a terrible
       | interview Q (per Kernighan's blog, linked in original post, it
       | took pike an hour or two alone in his office), and I bombed it.
       | It was either my just before or just after lunch interview and it
       | stuck with me for the rest of the day and I couldn't give my
       | entire focus to the rest of the sessions.
       | 
       | anyways, after it was done, I found kernighan's blog post and
       | decided I should try to implement this in java as it will allow
       | me to even get some practice with things like junit and more
       | experience with object oriented programming as I had been living
       | in the C world at that time). so I did, but I then found this
       | regular expression page ( https://www.regular-expressions.info/)
       | and learned many more things about "regular expressions" that i
       | hadn't learned in school (because they aren't regular) and
       | wondered if I could extend pike's simplicity to them in a
       | relatively simple manner. So I did.
       | 
       | which I created this https://github.com/sjpotter/regex.
       | 
       | Not meant to be "performant" but meant to be educational to me
       | and perhaps others.
       | 
       | Then when I was learning go, I rewrote it in Go. I did things
       | that are not "normal" go patterns (i.e. using panic and recover
       | to make error handling cleaner (in my opinion, if an error path
       | is simply if (err) { return err } all the way up, I personally
       | think panics/recover at the top can be cleaner, but it seems most
       | disagree), but it was educational on how to rewrite my java code
       | to an object oriented style in Go and to explore the limitations
       | of interfaces and how one might get around them (also perhaps in
       | ways that go against what might be considered proper go
       | programming)
       | 
       | https://github.com/sjpotter/regex-go
       | 
       | though now that I look at that repo, wondering if all the work
       | was done elsewhere and then I just pushed a single commit there,
       | might need to go back and look at it
        
         | renewiltord wrote:
         | Absolutely insane as an interview question. I know lots of top
         | engineers who've built the fundamentals and driven companies to
         | billions of dollars multiple times who would have failed to
         | have completed this objective within an hour.
         | 
         | I refuse to believe anyone could imagine it's possible. Either
         | that or there's a whole bunch of hidden talent wasting away at
         | Google - which sounds unlikely.
        
           | cmrdporcupine wrote:
           | The Google interview is more interested in how you approach
           | the problem, explain it, and analyze it etc than in a working
           | solution. You can pass the interview without having fully
           | working code.
           | 
           | (I did interview training at Google twice)
           | 
           | I still think it's a terrible interview format, though.
        
           | iainmerrick wrote:
           | Or, most developers at Google do interviews but many of them
           | aren't very good at calibrating their questions.
        
         | benhoyt wrote:
         | I agree it's way too hard as a one-hour interview question!
         | When I interviewed at Google a few years' ago I was asked to
         | write a simple glob matcher that handled ? and * (like the one
         | I've shown here: https://benhoyt.com/writings/rob-pike-
         | regex/#bonus-glob-matc...). I think that question was border-
         | line okay for an interview, though when I conduct interviews I
         | prefer to give less algorithm-y problems. It was a tough
         | question for me to do in 30-40 minutes, though I think with a
         | little bit of help I got a working solution in Python (I
         | probably didn't handle edge cases properly).
        
           | layer8 wrote:
           | I've implemented glob matchers by regex-replacing the glob
           | pattern, turning it into a regex, and then using that regex.
           | I wonder how that would have gone in an interview. :)
        
             | burntsushi wrote:
             | This is actually how glob matching is currently implemented
             | in ripgrep: https://github.com/BurntSushi/ripgrep/blob/a9d9
             | 7a1dda0a07a8e...
             | 
             | Works quite well because it can take advantage of the host
             | of optimizations inside the regex engine.
        
           | IshKebab wrote:
           | Honestly even that is way too hard. The people who originally
           | came up with DFA and NFA engines and all that didn't do it
           | under pressure in 40 minutes. You're basically selecting for
           | people who have done it before.
        
             | naniwaduni wrote:
             | But Dijkstra's shortest-paths algorithm is, of course, a
             | perfect interview question, having infamously been derived
             | in half an hour over a coffee date with his fiancee.
        
               | google234123 wrote:
               | This is completely unfair comparison. Deriving is not
               | comparable to reciting/applying something... Dijkstra's
               | is pretty intuitive once you know it and about 10 or 12
               | line of python code? Quantum mechanics took years and
               | multiple geniuses to figure out yet somehow undergrads
               | and graduate students are able to learn it in one or two
               | semesters.
        
         | kramerger wrote:
         | > I personally think panics/recover at the top can be cleaner,
         | but it seems most disagree
         | 
         | Count me among those. At best, panic/recover would be an uglier
         | version of throw/catch with even worse performance.
         | 
         | That said, I hope someday Go adds the "?" return-operator that
         | kotlin and Rust use. Won't really change anything besides
         | adding syntactic sugar for an already recommended pattern. But
         | development ergonomics would improve a lot.
        
           | jjnoakes wrote:
           | > That said, I hope someday Go adds the "?" return-operator
           | 
           | Same here. I think this is my biggest code-reading pain point
           | as a go developer. I'm toying with the idea of playing more
           | with Go+
           | 
           | https://github.com/goplus/gop/blob/main/doc/docs.md#error-
           | ha...
        
           | mdcfrancis wrote:
           | The authors of the encoding/json library thought the same. As
           | a way to exit from deep down in a parse stack it works well
           | especially if you only have limited entry points. https://cs.
           | opensource.google/go/go/+/refs/tags/go1.19:src/en... uses
           | well typed panics.
        
             | nemo1618 wrote:
             | The stdlib is not always the best example of idiomatic Go;
             | encoding/json is particularly infamous for having poor
             | performance and awkward APIs. I'm fairly confident that it
             | it was re-written today, it wouldn't use panic/recover.
        
             | paulsmith wrote:
             | It's a viable alternative to returning error values in a
             | recursive descent parser.
        
       | djbusby wrote:
       | Ok. This was educational and fun. Thanks!
        
       | unixbane wrote:
       | okay now post the standard academic version of elegant simple
       | regex in algol, lisp, ml whatever and see how many upvotes it
       | gets
        
       ___________________________________________________________________
       (page generated 2022-08-12 23:02 UTC)