[HN Gopher] State machines are wonderful tools
       ___________________________________________________________________
        
       State machines are wonderful tools
        
       Author : chmaynard
       Score  : 266 points
       Date   : 2021-01-01 08:50 UTC (14 hours ago)
        
 (HTM) web link (nullprogram.com)
 (TXT) w3m dump (nullprogram.com)
        
       | vesinisa wrote:
       | Cool trickery, but should I ever encounter one of those hex
       | tables in a program I am maintaining, I would certainly hope to
       | see at least the state graph of the encoded machine as
       | documentation.
        
       | pupdogg wrote:
       | I don't mind being reminded every 2-3 years how great State
       | Machines are. I was first introduced to the concept back in 2002
       | and my mind was blown. At that time, I was in my 20s...now I'm
       | pushing past 40 and I'm starting to think my life is a state
       | stuck in an event loop.
        
       | allenu wrote:
       | State machines are incredibly useful and I use them all the time
       | when I write UI code.
       | 
       | There are a couple benefits that come to mind with respect to
       | writing UI. First, it forces you think through all the potential
       | "inputs" on your system, be they button presses from users or
       | network request responses. You have to be explicit about what
       | could have an effect on the system. Secondly, it makes your
       | business logic easier to reason about because you essentially
       | turn your UI code into a bunch of functional statements. All your
       | transitions are just pure functions where the input is the
       | previous state and some input, and the output is the new state.
       | 
       | A lot of UI frameworks do it this way today, of course, but I
       | picked up from reading about how Elm works. Whenever I write new
       | UI from scratch, I always end up falling back on the state
       | machine for guidance.
        
       | grandinj wrote:
       | FSM are fun to write but horrible to modify especially if you're
       | not the one who wrote it.
        
         | rualca wrote:
         | > FSM are fun to write but horrible to modify especially if
         | you're not the one who wrote it.
         | 
         | It really depends on the tooling you're using. For example,
         | Qt's support for SCXML[1] eliminates much of the pain of using
         | state machines.
         | 
         | [1] https://doc.qt.io/qt-5/qtscxml-overview.html
        
         | xondono wrote:
         | The general issue is that FSM require a lot of naming, since
         | you want to give every state a significant name, and naming is
         | hard.
         | 
         | When the states are clear enough, modifying FSMs is very nice.
        
           | teacpde wrote:
           | That isn't necessarily a state machine problem, a state is an
           | abstraction that maps to part of your business semantics.
           | Naming becomes hard when your state abstraction isn't clear,
           | which in turn is probably due to how business semantics are
           | modeled. One the most important things in implementing state
           | machines is to draw it up and make sure every state and
           | transition has clear meanings before actually implementing
           | them. There usually will be multiple iterations of modeling
           | before arriving at a good design.
        
         | threatofrain wrote:
         | But for many this is just React with Redux, and the community
         | buzz around that has been that teamwork is made easier. You'd
         | have a state machine if the problem called for it anyway, but
         | with redux everything is now explicit and teammates come
         | prepared.
        
         | amw-zero wrote:
         | This is only true when considering a specific software design
         | that implements the state machine. I'd say the most popular way
         | to define a state machine in code is how Xstate does it:
         | https://github.com/davidkpiano/xstate. You define your states,
         | and then you define your inputs, and then you map out every
         | single transition between states.
         | 
         | This is probably the most inefficient way to design a state
         | machine. There's a simple mathematical explanation for that. If
         | we agree that a state diagram is a directed graph consisting of
         | N nodes, then there are N^2 possible transitions (that's the
         | definition of a completely connected graph). This means there's
         | a large space of possible transitions relative to the number of
         | states, which means there's a high probability that you will
         | have to modify lots of transitions when modifying the state
         | machine itself.
         | 
         | In this article, this author is sharing several alternative
         | designs for state machines, which don't look like they suffer
         | from exactly the same problem. I'm most familiar with TLA+,
         | where you describe a state machine simply with one logical
         | predicate: the "Next" state relation. This is the most powerful
         | way I've found to define state machines, and modifying one
         | function to entirely change the behavior of the state machine
         | is very easy in practice.
        
           | teacpde wrote:
           | Mind elaborate on the next state predicate approach? By next,
           | I imagine the focus is on transition rather than state;
           | however, if you have multiple transitions arriving at the
           | same state, what makes the next approach better? At the end
           | of day, the state machine is the same regardless how it is
           | implemented.
        
             | nitrogen wrote:
             | _Edit:_ looks like they 're talking about this:
             | https://en.wikipedia.org/wiki/TLA%2B
             | 
             | What I describe below is not related.
             | 
             | ----
             | 
             | Not the OP, but here's a stab at it...
             | 
             | In state machines commonly used in embedded C code, each
             | state is represented either by a function or by a struct
             | that contains a function pointer. The state machine calls
             | the current state function in a loop or whenever relevant
             | events happen, and that function returns a pointer to the
             | next state, calculated based on inputs. If there is no
             | state transition, then the function returns NULL or returns
             | its own state.
        
           | jayd16 wrote:
           | Seems like you could just add some sugar to the definition
           | process, eg an Any State or more than one source state per
           | transition.
        
           | JadeNB wrote:
           | > There's a simple mathematical explanation for that. If we
           | agree that a state diagram is a directed graph consisting of
           | N nodes, then there are N^2 possible transitions (that's the
           | definition of a completely connected graph).
           | 
           | A state diagram is usually richer than a directed graph--
           | usually the edges are _labelled_. This multiplies the number
           | of possible transitions by the number of possible labels.
           | 
           | (Also, for anyone interested in comparing with the graph-
           | theoretic terminology, the term "complete graph"
           | (https://en.wikipedia.org/wiki/Complete_graph) is often used
           | for undirected graphs with no self-loops, in which case there
           | are only \binom n 2 possible edges.)
        
         | lmilcin wrote:
         | The problem with modifying FSM is if you start with modifying
         | the code. This is wrong approach.
         | 
         | The FSM is not a code but a model and the code is just an
         | implementation of the model.
         | 
         | What you want to do is to recreate the process of modelling (or
         | modify existing model) and then recreate the code based on
         | that.
         | 
         | What I do is to have an implementation-agnostic model of the
         | FSM that gets translated to code. That code is never modified,
         | it only is called from external modules or calls external
         | modules.
         | 
         | There are some cases where I would just hardcode the FSM
         | without the model. Those cases are when I do not expect the
         | state machine to change because it follows from the fundamental
         | description of the problem. For example, I might have some
         | communication protocol and the state machine describes the
         | state of that communication channel. If I do my job well the
         | state machine maps 1:1 to the problem and will never need to
         | change even as the rest of the application changes.
        
           | teacpde wrote:
           | In your example, what happens if changes need to be made to
           | the protocol? It is less frequent than business/application
           | changes, but definitely can happen.
        
             | lmilcin wrote:
             | Good protocols are usually made in layers. You would
             | typically want to have some state machine describing lower
             | part of application protocol on which you build higher
             | level of your application protocol, composed of specific
             | functions.
             | 
             | This way you can modify functionality without necessarily
             | touching low level implementation.
             | 
             | I can give an example.
             | 
             | I have worked with credit card terminals and I have
             | implemented entire application from scratch.
             | 
             | There is a protocol to communicate between the terminal and
             | the acquirer's system (ISO 8583).
             | 
             | This protocol specifies a flow and format of messages. For
             | example, if you send a transaction request, you will get a
             | response. If you don't get a response you may or may not
             | retry it, etc. If you send a transaction advice (ie.
             | transaction has already happened) you must be retrying
             | sending it until you get a response, but every
             | retransmission after first must include a bit that says it
             | is retransmission. If the request or advice has been sent
             | and transaction was cancelled, you need to be sending
             | cancellations until you get an acknowledgement.
             | 
             | Now, all those messages have a huge amount of fields that
             | change regularly as requirements change, but the basic flow
             | of the protocol is fairly constant and has been constant
             | for many years.
             | 
             | I have implemented the protocol as a tree of state machines
             | telling the application what to do based on events and I
             | believe the code has not been touched in over 10 years even
             | though the application was in constant development.
        
         | jarirajari wrote:
         | FSMs should be "written" visually, and they should have either
         | hierarchical sub-states or some sort of a grouping scheme to
         | keep complexity at a reasonable level.
         | 
         | FSM examples are usually too simplified, and they don't show
         | how to manage the complexities of a real, more complex state
         | machine.
        
         | blunte wrote:
         | It also depends on how you implement the state machine. If
         | you're constrained by memory or performance (or playing golf),
         | then you may end up with code like TFA.
         | 
         | But if you're implementing in a typical app, you can obviously
         | organize and write that is very understandable.
         | 
         | I find the title of TFA misleading in its generality in
         | comparison to the content of the article.
        
         | masterofmisc wrote:
         | Yeah, i find that using a library (like Stateless for C#) where
         | it makes you "spell-out, upfront" each of the states and the
         | transitions that are permitted from this state, it makes it
         | easier to reason about the code.
        
       | ci5er wrote:
       | A long time ago I used to auto gen http and smtp servers and
       | message queues (similar to RabbitMQ) from very compact
       | descriptions. My in-house I infrastructure build-out productivity
       | went up ~30x, but all the downstream teams hated it.
        
       | blargmaster42_8 wrote:
       | And this is news? Duuu
        
       | [deleted]
        
       | strenholme wrote:
       | I ended up using one in MaraDNS 2.0 to parse configuration files,
       | since the C code for parsing MaraDNS 1.0 configurations files was
       | a lot more error-prone and hard to maintain.
       | 
       | The state machine's grammar is embedded in the Deadwood (MaraDNS
       | 2 recursor) binary.
       | 
       | Here is a full explanation of the finite state machine, along
       | with the actual embedded code:
       | 
       | https://github.com/samboy/MaraDNS/blob/master/deadwood-githu...
        
       | carapace wrote:
       | FWIW, here's an example I wrote in Python "that imitates what
       | "compiled" FSM code might look like in an "unrolled" form. Most
       | FSM code uses a little driver loop and a table datastructure, the
       | code below instead acts like JMP instructions ("jump", or GOTO in
       | higher-level-but-still-low-level languages) to hard-code the
       | information in the table into a little patch of branches."
       | 
       | https://joypy.osdn.io/notebooks/Derivatives_of_Regular_Expre...
       | 
       | It's part of an (incomplete. slightly broken) notebook on
       | Brzozowski's derivatives of regular expressions.
        
       | sovietmudkipz wrote:
       | It's humbling to see a state machine implemented using
       | coroutines. I program professionally since about 6 years ago but
       | have only recently appreciated generators and coroutines.
       | 
       | I'm working through learn.unity.com for fun and because, in my
       | estimation, there is an upcoming boom in real time interfaces for
       | real work. All the people playing video games have mastered
       | skills that aren't being taken full advantage of. Anyways,
       | coroutines in Unity can be used to do interesting things over
       | time while attached to the game loop.
       | 
       | My point, despite my digression, is just how there are still
       | basic/intermediate topics I have only been able to appreciate
       | recently. Humbling.
        
         | jayd16 wrote:
         | You should also try to groc Unity's animation controller. It's
         | a state machine with non-discrete transitions (primarily for
         | animation blending). Neat stuff.
        
         | sali0 wrote:
         | 100% agreed on the increasing importance of video game
         | interfaces. Nice to see I am not alone in that thought.
        
           | sovietmudkipz wrote:
           | Every now and again as I play RTS (and other) games I imagine
           | how the interface could be used in for business, engineering
           | or marketing. There's potential, IMO.
           | 
           | The test (as I imagine it now) is making work fun enough that
           | my 13 year old self would want to do work.
           | 
           | At any rate, ideas are cheap. Prototypes and implementation
           | are what matters.
           | 
           | We're speaking publically but I would like to know what you
           | imagine about when you think about this topic?
        
       | sixdimensional wrote:
       | No post on state machines would be complete without mention of
       | the superset concept of Petri nets [1]. I have used Petri nets to
       | build my own workflow engine previously. Building a workflow
       | engine is one of those things... nearly every enterprise app has
       | one, and normally you shouldn't need to build one if you can
       | source one... but it is an exercise that makes you a stronger
       | engineer/developer too, deepening understanding of the patterns
       | that lie beneath most systems.
       | 
       | Back when I was building, openly available workflow engines,
       | BPML, BPEL etc. were not really a thing yet, so it made a little
       | more sense to be building one.
       | 
       | [1] https://en.wikipedia.org/wiki/Petri_net
        
         | mikewarot wrote:
         | This is my first encounter with the Petri nets, and at first I
         | thought they were just like state machines, in that you're only
         | concerned with flow of control... boy was I wrong. Its going to
         | take a while to fully grok, but it seems impressively powerful
         | in what it can do. Thanks!
        
         | nudpiedo wrote:
         | Do you have more information on how to build workflow systems,
         | and why is it good to do so with Petri nets rather than
         | abstract the processes with classes or whatever else?
         | 
         | Any information is welcome
        
       | dqpb wrote:
       | Every program is fundamentally states and state transitions. You
       | can be explicit about it or not, but the states and transitions
       | are there regardless.
        
         | amw-zero wrote:
         | Absolutely. This is the definition of what computation is. Even
         | in the lambda calculus, where everything is defined with pure
         | functions, we build up state sequentially over time, based on
         | previous values of the state. The fact that there's no mutation
         | under the hood is inconsequential.
        
       | ColinWright wrote:
       | I'm reminded of a thing I wrote about seeing a state machine
       | implemented in real life:
       | 
       | https://www.solipsys.co.uk/new/StateMachineInRealLife.html?U...
        
         | skrebbel wrote:
         | That was a delightful little read, thanks!
        
         | pardner wrote:
         | Great little story!
        
         | [deleted]
        
       | zevv wrote:
       | Encoding parser FSMs by hand can be tedious and error prone,
       | there's some nice tools to help you with that; I'm a huge fan of
       | Ragel: http://www.colm.net/open-source/ragel/
        
         | 082349872349872 wrote:
         | I sometimes wonder if replacing the hand-coded FSMs of UI
         | ravioli callbacks with a more parser-generator-like declaration
         | would be a cleaner approach.
         | 
         | (or maybe it's already been done? I haven't done UI code since
         | last century...)
        
       | hyperpallium2 wrote:
       | Now you equivalently have two problems.
        
       | amw-zero wrote:
       | These are some pretty cool examples, though a little too nitty
       | gritty to communicate the practical importance of state machines.
       | The truth is, state machines are so fundamental that programmers
       | are not even aware that they are using them. Many, if not most,
       | computer programs can be seen as state machines. In fact,
       | programming can quite literally be seen as state machine creation
       | and management. Programming languages are tools for creating
       | complex state machines without having to manually wire them
       | together.
       | 
       | For example, take the following factorial function in Python:
       | def factorial(n):           result = 1           while n > 1:
       | result *= n             n -= 1                return result
       | 
       | Here is a TLA+ spec for the same algorithm, along with the
       | corresponding state diagram for calculating factorial(5):
       | https://gist.github.com/amw-zero/c604f4dfd2d98e0d77fffe602c6....
       | This is a simple example, but the state diagram illustrates an
       | important point that the while loop doesn't: the algorithm is
       | completely linear, meaning it simply builds upon previous states.
       | Even though there is branching in the code, there is no branching
       | in the state diagram.
       | 
       | Of course, we're not so lucky and most state spaces are not
       | linear. Here's the spec and diagram for a user interface that I
       | was building a while back: https://gist.github.com/amw-
       | zero/4dc16cbf8e578d9be9e68b0a85c.... There was a subtle state-
       | based bug where a user had to perform a special sequence of
       | events to experience it. The code seemed simple, but looking at
       | the state diagram directly showed what the bug was immediately.
       | 
       | This was in a React app, and doing this exercise made me realize
       | that React apps trivially translate to state machines as well.
       | Whenever you call `setState` or the corresponding `setHookValue`
       | function when using hooks, you are defining a state transition.
       | The set of all state variables determines the full state space of
       | the application, and bugs arise when we forget to consider that
       | fact within some of our transition functions.
       | 
       | Long story short, you don't need to use a state machine design
       | pattern such as Xstate to actually have a state machine. In fact,
       | unless you are doing some very weird things, your application
       | most likely already is a state machine under it all, but that's
       | only because a state machine is a very simple, general concept
       | that has tons of applications.
        
         | arcbyte wrote:
         | Yup. Basically all enterprise software is just state machine
         | design. Any simple RESTFUL app with a database is just a state
         | machine. Call one end point to create a resource, fetch it
         | later from a second. Calling them out of order yields an error
         | state. Super simple state machine. Of course we've reinvented
         | this wheel so many times.
         | 
         | Something like LISP is fun because you can get both the
         | executable and (human AND machine) readable representations of
         | a state machine out of the exact same source code. Thats still
         | the future of programming.
        
         | the_cat_kittles wrote:
         | i think this has always subconsciously confused me whenever
         | people talk about state machines, and your comment clarifies
         | what i think ive felt. what would qualify as _not_ a state
         | machine in the broad sense?
        
           | amw-zero wrote:
           | That's a great question. Combinational logic is an example of
           | something that computes that isn't a state machine. You can
           | compute Boolean algebra operations with no notion of time,
           | they're essentially pure functions.
           | 
           | That's where it gets slightly confusing though. Just because
           | a function is pure, does not mean that its implementation is
           | not a state machine. For example, an algorithm to compute the
           | greatest common denominator between two numbers is stateful,
           | even if it has a pure functional interface.
        
       | loopz wrote:
       | Isn't FSM more like writing a program consisting only of GOTO's?
       | I call this the "loopz-effect", where any single state may have
       | effect in any other state of the program.
        
         | ajuc wrote:
         | It's the opposite. In a FSM you have clearly defined states and
         | all possible transitions between them.
         | 
         | If you write the same code as normal imperative code that
         | modifies some variables - to understand what it does you have
         | to "run it" in your brain and reverse engineer the states and
         | transitions yourself.
        
         | garethrowlands wrote:
         | Yes. And not only that, but GOTOs are the normal recommended
         | way of implementing an FSM.
        
           | xondono wrote:
           | What? Where? Most sane people write FSM as switches inside an
           | infinite loop
        
             | recursive wrote:
             | Right here, by me. Or perhaps I'm insane.
             | 
             | Using goto ensures that all of a particular state's logic
             | is nearby in code. It also eliminates the need to check an
             | exhaustive list of irrelevant switch cases at runtime.
        
         | amw-zero wrote:
         | A FSM is simply a structure that progresses over time, where
         | the next state is dependent on the previous state. By your
         | usage of "goto," I feel like you're implying like this is a
         | negative thing. But state is an inherent part of computation. A
         | Turing Machine can be seen as a slightly more powerful state
         | machine, and most if not all of the code that we write can be
         | easily modeled with the less powerful state machine.
         | 
         | And no, functional programming does not actually help this.
        
         | davemp wrote:
         | It's more like having a switch statement inside of a loop.
         | 
         | The variable you switch on is the next state and the case
         | you're in is the current state.
        
         | retrac wrote:
         | Strictly speaking, yes. But you should think of it as a
         | formalized abstraction over that; a way to reason about
         | something you can easily translate into a bunch of goto
         | statements.
         | 
         | It's just like how all if/then and function call constructs
         | compile down to goto, as well. Some languages will offer more
         | abstraction over this, but you can still do such constructs in
         | assembly language if you want to by hand-holding the machine
         | through every step.
         | 
         | If you do find yourself writing assembly language (or C, for
         | that matter!) then you should do just that; implement high-
         | level constructs at the low level with a bunch of jump
         | statements. The resulting code is verbose and superficially
         | seems gnarly, but will also be far more scalable and robust.
         | 
         | But there are compilers for a reason! There are some examples
         | of automatic FSM to C compilers elsewhere in this thread. Some
         | very high-level languages have embedded domain languages for
         | FSM generation as well.
        
       | psykotic wrote:
       | Some random fun things you can do with DFAs/NFAs:
       | 
       | 1. From two machines you can build a single machine that
       | corresponds to running them in lockstep/parallel. Most language
       | classes are closed under unions but regular languages stand out
       | by being closed under intersections, and that relies on this
       | construction. You can't replicate this construction with pushdown
       | automata to show that context-free languages are closed under
       | intersection because the product automata would require two
       | stacks. But in fact the intersection of a context-free language
       | with a regular language is context-free via this construction.
       | 
       | 2. You can run them backwards. Usually works best with NFAs since
       | reversing a DFA can seriously inflate the state space. E.g. if
       | you're trying to match a regular expression of the form '<re1>
       | string <re2>' you can search for 'string' and then match re1
       | backward and re2 forward. Running machines backwards can also be
       | used for computing capture groups after a match has been found
       | since the fastest methods for matching don't let you compute
       | capture groups on the fly. Decoupling the re1 and re2 machines
       | also means you end up with two independent machines with smaller
       | state spaces, which means smaller, cache-friendlier lookup
       | tables. See the next item for a less conventional example of what
       | you can do with small state spaces.
       | 
       | 3. While DFA execution seems inherently serial, it can actually
       | be parallelized very effectively if the state space is small
       | enough, through speculative execution. Suppose you want to lex a
       | string in parallel by chopping it in half. You don't know what
       | state the DFA will be in when it reaches the halfway point, but
       | you can process the second half for all possible states at that
       | point. Then when both halves are finished you throw away all but
       | one of the speculative answers for the second half based on the
       | final state for the first half. This is extremely fast and
       | practical with PSHUFB/TBL-like instructions if your state space
       | is small enough, at most 16 states for PSHUFB. You can mux
       | together multiple PSHUFBs to go beyond that limit but the cost
       | increases substantially. (This is a parallel prefix algorithm
       | when applied recursively. As far as I know the application to
       | parallel lexing goes back to the Hillis-Steele paper on data-
       | parallel algorithms from 1986.)
       | 
       | 4. There's a critical optimization for the last algorithm which
       | is sometimes applicable. A log file processor might have numerous
       | internal states while processing a line, so you'd rather not
       | speculate on all of them. Instead you can chop the file in half
       | and from the second half seek forward to the next newline and
       | start there. This avoids speculation entirely if there's a unique
       | state following a newline character regardless of what state you
       | were in before the newline.
        
         | sleavey wrote:
         | What are DFAs/NFAs? I didn't see them defined in the article.
        
           | psykotic wrote:
           | Deterministic/non-deterministic finite automaton. It's the
           | standard term for a finite state machine which processes a
           | stream of characters. All the examples in his article are
           | DFAs.
        
             | sleavey wrote:
             | Thanks! I'm interested in the idea of a reversible state
             | machine, for parsing an application specific language
             | syntax into some machine representation, then allowing the
             | user to modify it and unparse back to syntax again. So far
             | I've had to implement procedural code for both directions
             | but I figure there must be some way for some smart code to
             | both parse and unparse using some declarative parser
             | rules^. I've had no luck finding any existing art so far.
             | 
             | ^I figure this won't be super simple, since e.g. whitespace
             | is often ignored at the tokenizer step and would need
             | reinjected by the unparser, but that sort of stuff is
             | probably not too difficult to handle.
        
               | mikewarot wrote:
               | I'm interested in doing that as well, I've often thought
               | of being able to go from Pascal -> ast + symbol table ->
               | executable and back.
               | 
               | You would want to preserve all the whitespace, to make it
               | completely reversible. The ability to modify the symbol
               | table to _directly_ rename a variable or procedure, would
               | be quite powerful, and never wrong. 8)
        
               | cxr wrote:
               | Notwithstanding caveats about what "reversible" means
               | here, I'm interested in the application. I've been
               | interested in "non-destructive compilation" for a while,
               | including what you might call "language skins"--
               | derivations built from reversible, deterministic
               | transformations on the syntax graph created from an
               | arbitrary parse. With the current popularity of (one-way)
               | transpilers, I'm surprised there's not more interest and
               | work on making them two-way, for example, but that's just
               | one possible application.
        
               | sleavey wrote:
               | > I'm interested in the application
               | 
               | I'm part of a team developing a scientific modelling
               | tool. The user defines a model via syntax similar to
               | SPICE's netlists, specifying model objects and their
               | parameters and connections. The nature of the type of
               | simulation requires that the user runs the script over
               | and over while tweaking various parameters, some by hand
               | and some via optimization routines. We use the syntax as
               | a way to serialize the complete state of the model, to be
               | embedded in the generated results (the script size is
               | negligible compared to the results). It is therefore a
               | requirement that we can generate syntax from the model
               | that, once re-parsed, results in the same model again -
               | thus the need for forward and reverse parsing.
               | 
               | My current approach is to store the parsed AST as part of
               | the model, then start from this (updating anything that
               | was changed/added/removed in the meantime) when
               | generating syntax.
               | 
               | > I'm surprised there's not more interest and work on
               | making them two-way, for example, but that's just one
               | possible application.
               | 
               | I suppose it's a combination of not many users typically
               | wanting such a feature, the ease of storing a program's
               | state as a proprietary/binary blob (Python's pickle
               | module for example), and the tight coupling such a system
               | creates between syntax and objects: almost everything
               | created by the parser should map 1:1 to syntax, even if
               | modified by the user after parsing. One example of that
               | last point: you might define some sort of axis via
               | syntax, like `range(100, 200, increment=1)`. The parser
               | then creates this axis as an array of 10 data points. But
               | the user may also modify this in memory, e.g. adding `50`
               | to the array, so it's no longer a monotonic range. when
               | you come to unparse you need to have some code to
               | identify what this array represents (is it still a range?
               | or a sequence with no order?), and generate corresponding
               | syntax. That's tricky and requires lots of code for this
               | usually niche feature.
        
               | psykotic wrote:
               | "Reversible" meant consuming the character stream in
               | reverse order while matching the same language, so it's
               | different from what you have in mind--automatically
               | generating a printer from a parser. On the surface it
               | sounds very doable if each pattern of AST node is
               | generated by a unique grammar production. Then pretty
               | printing is the task of matching against an unambiguous
               | tree grammar. It's not really different from what the
               | compiler for a language like ML or Haskell does for
               | pattern matching.
        
               | sleavey wrote:
               | Ah, now I see what you mean.
               | 
               | > On the surface it sounds very doable if each pattern of
               | AST node is generated by a unique grammar production.
               | 
               | This seems to be the key, from my work implementing this
               | in my application. Luckily for me the grammar itself can
               | be somewhat tweaked to help ensure a 1:1 mapping between
               | productions and objects, at least until we release v1,
               | but if someone is having to work with an existing grammar
               | I can see the need for all sorts of hacks.
        
               | psykotic wrote:
               | Yeah, the good news is that your tool can statically
               | detect the ambiguities and you can let users specify
               | priorities for the conflicting productions to define a
               | unique mapping. That's a standard technique in parser
               | generators. There the ambiguities are in the opposite
               | direction, but the same idea applies.
        
           | progre wrote:
           | https://en.wikipedia.org/wiki/Deterministic_finite_automaton
        
         | ufo wrote:
         | By any chance, do you know an example of a regex library or
         | tool that uses that backwards-matching trick?
        
           | psykotic wrote:
           | I learned the specific 're1 string re2' decomposition with
           | backward matching from Hyperscan but AFAIK several fast grep-
           | oriented regex engines factor out both prefixes and suffixes
           | and use whichever is longer/most rare as a substring anchor.
           | For the suffix case you need backward matching to confirm the
           | regex match after the substring match. Hyperscan actually
           | decomposes the entire state machine graph into 're1 string
           | re2' style fragments, which lets them select the fastest
           | implementation technique for each sub-machine.
           | 
           | I think Russ Cox talks about backward matching in his article
           | series on regular expressions, so it's probably used
           | somewhere in re2.
        
             | jwilk wrote:
             | Russ Cox's article in question:
             | 
             | https://swtch.com/~rsc/regexp/regexp3.html
        
             | burntsushi wrote:
             | The grep-oriented tools have it a bit easier. They can (and
             | do) extract inner literals as well. The trick is that the
             | search is line oriented and lines tend to be small. So when
             | you find a match for an inner literal, all you need to do
             | is find the bounds of the line that contains the literal
             | and run the full regex engine on just that line. It works
             | extremely well when matches are somewhat rare.
             | 
             | As for RE2, backward matching is used to find the starting
             | location of a match.
        
         | feanaro wrote:
         | Thinking about point 1 makes me wonder why no popular regex
         | engines have an intersection (logical AND) operator. It seems
         | like it could allow writing complex regexes in a much clearer
         | way.
        
           | indentit wrote:
           | Could 2 lookaheads starting at the same position not be used
           | for this purpose?
        
             | enedil wrote:
             | Lookahead is what causes a language not to be regular. In
             | fact, common regex libraries support broader classes of
             | languages than the regular ones. Perhaps that's why there's
             | no `and` operator.
        
           | burntsushi wrote:
           | Most popular regex engines use backtracking and support non-
           | regular features. They aren't based on automata, so
           | implementing AND is perhaps not as theoretically straight-
           | forward in that context.
           | 
           | There are some popular finite automata based engines though.
           | RE2 comes to mind. In theory, things like AND and complement
           | could be implemented, but in practice, they blow up the size
           | of the state machine.[1, 2] The current maintainer of RE2 did
           | build a tool that permits intersection and complement
           | though.[3]
           | 
           | [1] - https://dl.acm.org/doi/10.1145/2071368.2071372
           | 
           | [2] - https://github.com/google/re2/issues/156#issuecomment-3
           | 54068...
           | 
           | [3] - https://github.com/google/redgrep
        
             | feanaro wrote:
             | This is interesting, thank you!
        
       ___________________________________________________________________
       (page generated 2021-01-01 23:01 UTC)