[HN Gopher] Storing Data in Control Flow
___________________________________________________________________
Storing Data in Control Flow
Author : chmaynard
Score : 56 points
Date : 2023-07-11 18:56 UTC (4 hours ago)
(HTM) web link (research.swtch.com)
(TXT) w3m dump (research.swtch.com)
| kccqzy wrote:
| I have second thoughts on the author's assertion that "When it's
| possible to store state in code instead, that often leads to a
| clearer program."
|
| I have found that an explicit state machine can be more readable
| when there are many states. When there are too many states, a
| straightforward transformation to control flow often cannot
| eliminate all the "goto"s like the author successfully did. Even
| if they were eliminated, there are usually too many nested "if"
| conditions and the like to be easily readable. That's the kind of
| spaghetti code that first inspired a ban on goto. (And that's not
| even including cases when you are writing code to a spec that's a
| state machine, like the TCP state machine.)
|
| What I personally like, sometimes, is to make the state variable
| an enum and give names to be states. State 0 becomes
| kBeforeStart, state 1 becomes kInsideQuote, state 2 becomes
| kAfterBackslash. It suddenly now becomes intuitive. If you've
| written any kind of emulator or interpreter it's even more
| intuitive, because the state machine is just the DFA expanded
| from the regular expression. If I'm using a language with
| guaranteed tail calls, I can also replace each case in that
| switch by a separate named function.
|
| It's possible to write clear and readable code in either style.
| Doing so requires skill and expertise, of which I don't think the
| author lacks.
| binary132 wrote:
| This makes me think of tries. You don't actually need to store
| the value at the leaf if you simply accumulate the path traversed
| in the traversal function. Of course sometimes it's more
| efficient to store values.
| samsquire wrote:
| Thanks for this.
|
| I think high level control over control flow manipulation is
| really underrated and underutilised in programming languages.
|
| For example, the decision where to jump is extremely powerful but
| the primitives we have for it are weak. (Exceptions, algebraic
| effects, if statements)
|
| Inheritance controls control flow but doesn't actually solve the
| pattern of desired behaviour and code organisation most people
| want.
|
| One idea I am exploring is the idea that efficient control flow
| can be compiled from the inferred relationships of a rule engine.
|
| Information systems should be used to create all the
| relationships between data and associations of desired behaviour
| to data values or partial data values.
|
| Then all the if statements, switch statements and vtables can be
| inferred.
|
| In other words the types the code operates on is inferred from
| desired behaviour based on state.
| knome wrote:
| There is a game called TIS-100 where you solve problems using a
| grid of networked processors that can each store one current
| register and a second value you can swap to. It's an extremely
| constrained environment as such.
|
| There is one challenge where you receive a series of numbers,
| have to sort and then output them. The game provides a couple of
| FIFO units you can read and write from in order to facilitate
| this.
|
| I, instead, decided to do it without using them. I ended up
| storing all of the state in branching. By passing values down and
| then jumping and returning literals from branches, it gave me
| effectively a comparatively massive amount of space based on
| where the instruction pointer was sitting in any given node,
| letting me manipulate the state, storing it as branches taken,
| and then dumping out literals from the lowest levels of those
| before resetting the nodes for the next series of values.
|
| It made me cognizant of how state, assumptions and literal values
| are "stored" by which branch you choose, with no actual
| variables.
|
| This is also what happens when you run a "check" on a value
| rather than parsing it into a more specific type related to the
| check [1]. After a `if( !isa_username( somestring) ){ return ; }`
| you will often assume that `somename` is a username without
| actually encoding that into its type.
|
| Because these assumptions are not checked by the compiler, then
| the compiler can't stop you from inappropriately merging branches
| or accidentally cutting out checks later.
|
| It's powerful, common, and generally dangerous.
|
| [1] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-
| va...
| riwsky wrote:
| Not checked by the compiler, except for cases where they're
| checked by the compiler:
| https://www.typescriptlang.org/docs/handbook/2/narrowing.htm...
|
| TS is probably the most famous gradual-type system to do this,
| but IIRC pycharm is smart enough to figure this out for its
| python linting, too
| knome wrote:
| Narrowing involves actual types. I was referring to the
| compiler not checking mere value validation.
|
| I was referencing the kind of validation where someone checks
| a bunch of attributes of a string, usually, or perhaps the
| range of an integer, but never actually does anything
| excepting in fail cases and then passes it on to other
| functions that then just expect those invariants to hold. I
| doubt typescript is remembering that my string doesn't have
| dashes in it or that it adheres to some regex, for example.
|
| Type narrowing is fantastic for making things like nil
| punning work inside a complex type system.
|
| I really like typescript's types. From where they started,
| they did an absolutely incredible job creating a robust and
| powerful type system that could manage almost any existing
| javascript code pattern.
|
| Just really amazing work by the authors.
| ramesh31 wrote:
| Think of all the confusion and hot air that could be saved if we
| all just switched to Lisp.
| 10000truths wrote:
| What's really unfortunate is that compilers _already_ perform
| this "control flow -> state machine" transformation, but almost
| none of them expose it to the user for direct manipulation -
| instead, they tightly couple it to a bunch of unrelated
| abstractions like event loop runtimes, async executors, managed
| stacks, etc. I'd kill for a compiler that can properly:
|
| * Parse a coroutine that produces and consumes values
|
| * Perform all the normal function-level optimizations to minimize
| frame space (stack slot reuse via liveness analysis of local
| variables, re-computing temporaries across yield points, etc.)
|
| * Expose that coroutine frame as a fixed-size struct that I can
| explicitly resume and query
|
| Zig is _almost_ there, but suspend /resume cannot return
| values/take arguments, which requires some unergonomic
| workarounds. Rust is making some promising progress on unified
| coroutines (https://github.com/rust-lang/rfcs/pull/2781), but the
| generator types are opaque so you can't encapsulate them in a
| Sized struct or allocate an array of them. Not to mention that
| it's still extra-unstable, and last I checked, there were issues
| with generator size optimizations (https://github.com/rust-
| lang/rust/issues/59087). C++20 coroutines are similarly opaque
| and cannot be allocated as a contiguous array.
| AlphaSite wrote:
| Are you asking for what is roughly Python generator expressions
| but in a compiled language?
| 10000truths wrote:
| Yes, something like Python generators which have yield and
| send, but typed, and which compiles down to an efficiently
| sized struct with a resume function. Here's a crude
| illustration of what I mean, using pseudo-code describing a
| TCP session flow: coroutine tcp_session() {
| var syn_packet = @consume();
| if(!check_syn_flag(syn_packet)) {
| @produce(ERROR_BAD_INPUT); return; }
| var syn_ack_packet = build_tcp_segment(...);
| @produce(syn_ack_packet); var ack_packet =
| @consume(); if(!check_ack_flag(ack_packet)) {
| @produce(ERROR_BAD_INPUT); } // ... proceed
| with tcp session }
|
| Such a coroutine would then be usable elsewhere in code like
| so: var tcp_connections = map<(ip_address,
| u16), tcp_session>::new(); // ... var
| incoming_ip_packet = read_ip_packet_from_raw_socket();
| var result = tcp_connections[get_dst(incoming_ip_packet)].res
| ume(incoming_ip_packet); if(result == ERROR_BAD_INPUT)
| { // ... }
|
| And the compiler would be able to determine upfront what the
| size of the tcp_session struct is, so there'd be no need for
| implicit boxing/heap allocations.
| [deleted]
___________________________________________________________________
(page generated 2023-07-11 23:00 UTC)