hngopher.com

       [HN Gopher] Out of the Tar Pit (2006) [pdf]
       ___________________________________________________________________
        
       Out of the Tar Pit (2006) [pdf]
        
       Author : n0w
       Score  : 83 points
       Date   : 2023-02-27 08:01 UTC (1 days ago)
        
 (HTM) web link (curtclifton.net)
 (TXT) w3m dump (curtclifton.net)
        
       | cloogshicer wrote:
       | I think the crux of this paper is section 7.2.2:
       | 
       | > There is one final practical problem that we want to consider
       | -- even though we believe it is fairly rare in most application
       | domains. In section 7.1.1 we argued that immutable, derived data
       | would correspond to accidental state and could be omitted
       | (because the logic of the system could always be used to derive
       | the data on-demand). Whilst this is true, there are occasionally
       | situations where the ideal world approach (of having no
       | accidental state, and using on-demand derivation) does not give
       | rise to the most natural modelling of the problem. One possible
       | situation of this kind is for _derived data which is dependent
       | upon both a whole series of user inputs over time, and its own
       | previous values_. In such cases it can be advantageous to
       | maintain the accidental state even in the ideal world. An example
       | of this would be the derived data representing the position state
       | of a computer-controlled opponent in an interactive game -- it is
       | at all times derivable by a function of both all prior user
       | movements and the initial starting positions, but this is not the
       | way it is most naturally expressed.
       | 
       | Emphasis is mine.
       | 
       | I think that this type of derived data, which I put in italics
       | above, is quite common - contrary to what the authors of the
       | paper argue. Any UI code or game-like system, like simulations,
       | will have this kind of data. And the paper does not have a good
       | answer for it. I honestly think that nobody has an answer for it,
       | and it's why most of our UIs suck.
       | 
       | I would love to see something that makes handling this type of
       | derived data easy.
        
       | feoren wrote:
       | My impression from reading this has always been "Almost there!
       | Keep going." If you keep pulling on these threads, I claim that
       | you reach some inevitable conclusions. These ideas are a superset
       | of all the other advice they give (and most other advice you
       | hear, too):
       | 
       | 1. Separate identity from data. These are _completely different
       | things_. The number 7 is not mutable, even though my age is. Keep
       | going! The phrase  "It was a dark and stormy night" is not
       | mutable, even though the opening line of your book is. Keep
       | going! The statement "there are seventeen cars parked on level 3
       | and four spots available" is _not mutable_ , even though the
       | status of the parking garage is. Identity is immutable, and
       | statements (data) are also immutable. Only the assignment of a
       | statement to an identity is mutable.
       | 
       | 2. Think about the operations that make sense on your data. Sure,
       | you have "map", "filter", "reduce" that are mathematically pure.
       | What about "offset" or "rotate"? Those have identity, they are
       | associative, individually commutative (but not with each other!).
       | Is there a distributive property that applies there? Okay, what
       | about "buy" and "sell" operating on commodities? Are those
       | mathematical operations? Do they have an identity element? Are
       | they associative and commutative? "Buy 2000 lbs of chicken" -- is
       | that equivalent to "Buy 1 ton of chicken"? What are its domain
       | and range? Not "chicken", nor even "warehouses" -- you're not
       | teleporting the chicken as soon as you commit that record. More
       | like "contract", or even "contract request". What can you do with
       | contracts? Is there an "identity contract"? "Buy 0 chicken"? Zero
       | is a useful number! Is "Buy 0 chicken" the same as "Buy 0 beef"?
       | Explore these questions. Find the math around your domain.
       | Functional purity is all good, but it's wasted if your domain is
       | just "Chicken : Meat : Commodity", "Beef : Meat : Commodity",
       | "Pine : Lumber : Commodity". Wrong. Don't think about nouns.
       | Think about sensible pure operations in your domain. Successive
       | abstraction layers should be _restricting_ what 's possible to do
       | with your data, by defining higher and higher-level operations.
       | If your abstraction layers aren't restricting what's possible to
       | do, they're an inner platform and you don't need them.
       | 
       | 3. Don't make big decisions upfront. That's how you add
       | accidental complexity. Don't make any decisions you don't need
       | to, and prefer solutions that allow you to defer or lighten
       | decisions. If you follow this thread, that means you're using a
       | relational model. They absolutely got that right (section 8).
       | Otherwise you're making decisions about ownership that you don't
       | need to be making. Domain Driven Design has it _wrong_ here with
       | aggregate roots. What 's an aggregate? Which concept goes inside
       | of which aggregate? You do not need to make that decision. The
       | authors of TFA get it right, and then wrong again, when they try
       | to apply a relational model to shitty noun-based domain ideas
       | like "Give Employee objects a reference to their Department, vs.
       | Give Department objects a set (or array) of references to their
       | Employees vs. Both of the above". No. They're not following their
       | own advice. Can an employee exist without the concept of
       | "department"? Yes. Can a department exist without the concept of
       | employees? Probably. Therefore you _must_ encode the relationship
       | separately from both. Your hands are tied. If concept A can exist
       | independently of concept B, you _must not_ draw an arrow from A
       | to B. The answer is not  "both", it's "neither". An employee's
       | assignment within a department is its own independent concept,
       | that knows about "employee" and "department" both. And now it's
       | obvious you can give this concept its own tenure and add a new
       | record whenever they change departments. You've found append-only
       | event sourcing without needing to start there -- it came from
       | first principles (in this case).
       | 
       | 4. Operate on the broadest scope you can. Manipulate sets of
       | things, not individual things. This is half of functional
       | programming's usefulness right here. This is supported by using a
       | relational model. Operate on sequences of things, not individual
       | things: there's reactive programming. How else can you broaden
       | the scope of what you're operating on?
       | 
       | 5. Don't just do something, stand there! That is: don't _do_ ,
       | _plan_. Instead of immediately looping, just use  "map". Instead
       | of immediately hitting the database, write a query plan. This
       | doesn't mean offload your thinking to GraphQL -- that's just
       | kicking the can down the road. See point #2. This is the other
       | half of functional programming's usefulness. Do only as much as
       | you need to make (or append to) a plan, and then stop. Someone
       | else will execute it later -- maybe! You don't care.
       | 
       | There's probably more, but there are more fundamental ideas than
       | "use functional programming" or "use event sourcing" or "use
       | SOLID" -- first principles that actually lead to almost all the
       | other good advice you get. This paper kinda almost gets there a
       | couple times, then walks back again. Keep going. Keep pulling on
       | those threads. I suspect that the more you do this, the more your
       | code will change for the better. But you have to be willing to
       | keep going -- don't half-ass it. If you only go part way and then
       | stop, the benefits won't be apparent yet.
        
       | dagss wrote:
       | I feel event sourcing is a real world pragmatic approach to
       | declarative programming that this paper advocates.
       | 
       | For state changes you add events to the database to describe
       | something that happened. Any question you may need an answer for
       | / business decision you want to make can be answered by querying
       | the events.
       | 
       | The problem at the moment is that while event sourcing is
       | excellent at reducing accidental complexity surrounding
       | implementing business rules, there is little standard / commonly
       | used tooling around it and you end up with lots of accidental
       | complexity in that end.
       | 
       | An example would be a database not designed to be a CRUD store
       | but to store events and manage read models, and manage
       | computation of projections etc -- while being suitable for OLTP
       | workloads. At a minimum, very strong support for using any kind
       | of construct in materialized views (since in a sense the entire
       | business logic is written as a "materialized view" when doing
       | event sourcing)
        
         | feoren wrote:
         | Event Sourcing is a nice card to have in your hand, but it
         | should not be a goal in itself. A yard is 3 feet. The atomic
         | mass of Hydrogen is 1.008 and its symbol is "H". These will
         | _never_ change. If someone came to me and said  "your
         | 'Chemical' table is not event-sourced. We're doing event-
         | sourcing here. You need to change it." I would tell them to get
         | lost. Why the hell would you have an event for "An Element's
         | Mass was Modified" event stored in your database? Unless you're
         | developing for CERN or NREL or something, just don't.
         | 
         | On the other hand, having a bank account table with a single
         | field for someone's money is clearly not enough. You absolutely
         | should be tracking every transaction that has changed that
         | account's value. Do you need to track every other possible
         | change to that account? Like, whether they want paper or
         | electronic mail? No, probably not.
         | 
         | "Event sourcing" is a way to refactor a domain model -- take a
         | statement and break it into a sequence of statements that "add
         | up" to the original statement.
         | 
         | "Add up" is key here. When you break "AccountBalance" into
         | "Transaction", it's clear how to "add up" transactions to
         | recreate the original account balance. But that's not your
         | goal, necessarily! The reason why this tends to make better
         | domain models is exactly because you have to think about
         | "adding up" your domain models, and what that means. "Adding
         | up" is an associative, probably-commutative, binary operation
         | with identity. Note that that means your domain MUST have a
         | "zero transaction". ALL of your events that you event source
         | need a "zero event". If you cannot come up with the "zero" of
         | an event, then you should not be breaking into events! The
         | whole point is to be able to define the monoid over it, which
         | requires identity.
         | 
         | So instead of taking event sourcing as an end goal, make your
         | goal this: think about the operations that make sense on your
         | domain, like accumulating accounts. What else can you
         | accumulate? Can you add Employees together? Not really -- you
         | can _group_ them, into departments and events and meetings. Is
         | that grouping associative and commutative? Sure -- it 's just
         | Set Union. Is there anything about Employees you can add
         | together? Well, their salaries. In fact for data analysis,
         | employee salaries are an important cost that you probably want
         | to throw in a data cube. Define a Monoid over Employee
         | salaries.
         | 
         | What other operations make sense on your data? Close, open,
         | start, end, group, buy, sell, move, rotate, add, multiply,
         | concatenate, join, reverse, inverse, combine, merge, undo,
         | redo, fill, empty, saturate, fix, break, import, export,
         | report, validate. Are they associative, commutative,
         | distributive, invertible? Do they have identity? Event sourcing
         | is such a tiny part of exploring this world. And it's worth
         | exploring.
        
           | dagss wrote:
           | I'm not arguing that event sourcing should be done for its
           | own sake, so I don't really want to disagree with you; but
           | that said your post doesn't perfectly resonate with me
           | either.
           | 
           | When you write a typical backend system, the desired function
           | of the system is to interact with the external world. Without
           | I/O the system may as well not exist.
           | 
           | Input is a desire from someone that something be done or
           | recording that something happened. Such input changes the
           | data recorded, or appends what the system should know / has
           | seen. This is an "event".
           | 
           | All input can be framed as being an event. But "an element's
           | mass was modified" is not an event ... it doesn't describe
           | someone or something giving input to our system.
           | 
           | The algebraic view on things you take seems to be treating
           | the system at a different level than what I think about as
           | event sourcing.
           | 
           | Neither "an element's mass was modified" or "sell" or
           | "transaction" that you mention are realistic events. An event
           | is "User U clicked a button in the web console to Sell share
           | S at time T". Implementing the effects of that event --
           | computing a specific read model resulting from appending the
           | event to the stored set of events -- may well be best done by
           | some algebra like you suggest, but that seems like another
           | topic.
           | 
           | You seem to talk about models for computing and transforming
           | state. I talk about I/O vs data storage.
        
             | feoren wrote:
             | > All input can be framed as being an event.
             | 
             | Sure, it _could_ be, but is it _useful_ to do that? If I
             | stand up and shout  "The price of a Banana is $4 per
             | bushel!", you could record my voice and upload it as a raw
             | wave file. That's the rawest "input event" you can come up
             | with. Or you could write down "some random dude said that
             | bananas cost $4 around 4:30 pm and I'm not sure whether I
             | believe him or not". That's not the "raw input", it's been
             | transcribed and modified and annotated. Yet it's almost
             | certainly more useful to your system, and it's kinda like
             | event sourcing. Kinda.
             | 
             | The problem with worrying about whether something is
             | "input" or "output" or "internal" is that you can just move
             | the dotted line anywhere around your system to change
             | those. If you break a monolith into independent reusable
             | building blocks, those building blocks are going to have a
             | completely different idea of what counts as input and
             | output. But who cares? You're not changing any fundamental
             | truth about how the domain works. Your domain model should
             | really be independent of worrying about what's "input" and
             | "output". Those lines move all the time. Instead think
             | about what operations make sense to do with your data, and
             | then think about the mathematical properties of those
             | operations.
             | 
             | > But "an element's mass was modified" is not an event ...
             | it doesn't describe someone or something giving input to
             | our system.
             | 
             | Sure it does. Someone gave you the input that a particular
             | element has a particular mass. How is that not input? How
             | else did you get that data?
             | 
             | > The algebraic view on things you take seems to be
             | treating the system at a different level than what I think
             | about as event sourcing.
             | 
             | This is my exact point. You _should_ think about event
             | sourcing this way, because that 's the only reason it's
             | useful: it's accidentally a source of important "domain
             | algebra" that you otherwise might miss. But there's _lots_
             | of other important  "domain algebra" that you are still
             | missing, and they don't necessarily look like event
             | sourcing.
             | 
             | > An event is "User U clicked a button in the web console
             | to Sell share S at time T".
             | 
             | But surely that's not what you're _storing_ in your system!
             | That would be an extreme coupling between the concept of
             | "selling shares" and "clicking a button". Those are
             | _completely unrelated_ ideas! Why would you want to tightly
             | couple them!? If _that 's_ what you think event sourcing
             | is, sorry to be blunt, but you have very badly
             | misunderstood it.
        
               | dagss wrote:
               | Fair points. But how do I get from a "domain algebra" to
               | practical implementation with a popular database?
               | 
               | Event sourcing can be translated to adding rows to
               | tables, or adding documents to collections. Focusing so
               | much on "append" isn't only because of what kind of
               | events you would model, but because you store data in
               | databases by, well, storing it..appending it..
               | 
               | If event sourcing is only useful as a source of an
               | important "domain algebra": How does a domain algebra
               | translate to practical use of a database system for your
               | application? How can I focus on the "operations I want to
               | do on my data", when the tools I am given is pretty much
               | INSERT/UPDATE/SELECT, or GET/PUT, or some variety on
               | these lines?
        
               | feoren wrote:
               | Bear with me, we're going on a bit of a walk.
               | 
               | An abstraction layer says: here are some operations you
               | can do, operating on some models. Presumably those models
               | are a useful or convenient perspective on some underlying
               | data, and those operations are important or useful as
               | well. At this level, we can define something like "order
               | a pizza", "renew my driver's license", or "show me a book
               | I might like to read". Note the abstraction is driven by
               | the needs of the users/consumers/callers of the
               | abstraction, _not_ its implementations. This sounds
               | obvious, but it 's extremely often done the wrong way
               | around -- "Foo : IFoo" shouldn't be muscle memory, and it
               | should mean "I can't currently think of any other
               | implementation of IFoo right now, although that might
               | change", and NOT mean "this interface is the header of
               | this class, which we have to do or we get yelled at".
               | 
               | The implementation of an abstraction layer breaks down an
               | operation in one domain into operations in another
               | (presumably "lower level") domain. It's possible that
               | "renew my driver's license" can be reasonably written
               | directly in a few SQL statements. Okay, that's no big
               | deal. Most likely you have a few intermediate layers, or
               | an ORM, or whatever. So the implementation of the "DMV
               | API" (or whatever is defining that operation) is where
               | you break it down into INSERT/UPDATE/SELECT or whatever
               | other lower-level tools you have.
               | 
               | None of this changes when you start thinking about your
               | higher-level operations algebraically. All that really
               | means is that you consider the operations and their
               | inputs/outputs, and you start asking questions like: are
               | these idempotent? Associative? Commutative? Do they have
               | identity? You can ask those questions at the abstraction
               | level. They are part of the definition of the operation.
               | It's up to the implementation to make sure those
               | properties are respected.
               | 
               | Look for ways to change the operations you offer to be as
               | "mathematical" as you can. The more mathematical they
               | are, they more you'll serendipitously come up with new
               | and interesting and useful ways to use them; the more
               | reusable they'll be.
               | 
               | "Renew my driver's license" is idempotent; "order a
               | pizza" is not. Prefer idempotency. Can we make "order a
               | pizza" idempotent? Sure. The client generates (or
               | requests) a unique ID for the order. We have "update an
               | order", which may actually be a family of operations. We
               | have "finalize an order" which places it. Finalizing the
               | same order twice does nothing -- it's idempotent.
               | 
               | User story: a bunch of guys are sitting around and want
               | to order from your pizza place, but it's always annoying
               | to pass the phone around; everyone wants to look at the
               | menu at once. We want "distributed ordering", so a party
               | can all contribute what they want to the same order. They
               | want to see what everyone else is doing in real-time. I
               | want breadsticks, but there's only 5 of us. If anyone has
               | ordered breadsticks, we're all good. If we have an "add
               | breadsticks" operation and poor concurrency, we end up
               | with 5 breadsticks in our cart; someone has to notice
               | that and fix it. Not great.
               | 
               | So what's a better operation than "add breadsticks"? How
               | about "make sure there are enough breadsticks"? If three
               | people all say "make sure there are enough breadsticks!"
               | that's basically the same as one person saying it. That's
               | an important domain operation. If all you're doing is
               | thinking "event sourcing", then "add breadsticks" looks
               | _more_ like a domain event than  "make sure there are
               | enough breadsticks", but the latter is actually easier to
               | work with and leads to a better experience. You can't
               | make the jump from "add breadsticks" to "make sure there
               | are enough breadsticks" just by thinking about event
               | sourcing -- you get there by thinking about math.
               | 
               | What happens when someone removes a pizza from the cart
               | at the same time as someone else is adding pepperoni to
               | it? This is the "Google Docs" problem. We need to think
               | about associativity and commutativity to really solve
               | these problems, not to mention random vs. sequential
               | identity. Can you undo these operations? If I say "extra
               | sauce", and someone else says "no, light sauce", we might
               | have a conflict to resolve. But if I then undo my "set
               | extra sauce" action -- what happens? It should clearly
               | result in "light sauce". That's the obviously correct
               | answer, so we can work back from that to figure out how
               | the "set sauce amount" operation should work. "User A
               | requests heavy sauce on pizza AF73" is idempotent and
               | reversible. "User B requests light sauce on pizza AF73".
               | "User A revokes their request for heavy sauce on pizza
               | AF73". Okay, great, we're golden. "User C removes pizza
               | AF73 from their order." "User D requests pepperoni on
               | pizza AF73". "User C undoes their operation to remove
               | pizza AF73 from their order." Great: pizza AF73 has
               | pepperoni on it, despite the fact that it had been
               | removed when pepperoni was added. No problem here.
               | 
               | How do you handle statements like "User A revokes their
               | request for heavy sauce on pizza AF73?" Event sourcing
               | would say that this is a distinct event that you have to
               | INSERT in your append-only log. But why not just DELETE
               | the initial request? That works just fine. In any case,
               | the low-level steps are the easy part.
               | 
               | We're not too far from event sourcing here, but that
               | emerged from analyzing the problem and trying to make it
               | as "mathematically pure" as we could. We didn't start
               | with event sourcing, and we saw another example (add
               | breadsticks) where the "event sourced" version was worse.
               | Event sourcing is a trick, but not the goal. The goal is
               | "domain algebra".
               | 
               | On the back end, you'll be doing reporting on these
               | numbers. You want a really flexible reporting system?
               | There was a buzzword a few years ago, "data cube". It was
               | a buzzword because nobody could define what it meant, but
               | after thinking about it for a while, I decided it could
               | have a useful definition: A data cube is a pairing of a
               | set of categorical data (pizza sizes, customer types,
               | order times, whatever) and value data (costs, amounts,
               | prep times, ratings). Each "value data" must have a
               | monoid defined over it, which means some binary "add" (or
               | "combine") method with identity. Any time you have a
               | monoid defined, you get a (distributed!) "aggregate"
               | method for free. That's what "roll up" and "drill down"
               | are, in report-speak: projecting your categorical data
               | and aggregating all your value data using the defined
               | monoid. The same kind of analysis applies here, even
               | though event sourcing has nothing to do with any of this.
               | 
               | By the way: the identity of "combine" over rating and
               | prep time are _not_ simply 0! Think about it harder than
               | that. How do you combine user ratings? Most likely your
               | value type will need to be able to sensibly represent
               | "0/0" -- that's a valid and useful value sometimes!
               | 
               | We didn't get particularly "mathy" operations with our
               | pizza example, partly because it's a very "end user"
               | domain and not a reusable mid-level domain like unit
               | conversions or a physics engine. It's even more important
               | for those domains. What's a Point minus another Point?
               | Not a Point! Are "offset" and "rotate" associative?
               | Commutative? Distributive? Think about these things if
               | you want a reusable physics engine.
               | 
               | This is not scripting. This is not even programming. It's
               | software engineering.
        
       | Verdex wrote:
       | In the past, I have been unimpressed by this paper. Perhaps
       | someone can shed some historical context...
       | 
       | But from my perspective what happens is that the paper defines
       | complexity in exactly the way that allows them to deride OO, FP,
       | etc programming whilst simultaneously showing how awesome
       | functional relational programming is. It ignores complexity
       | that's orthogonal to what FRP addresses and ignores areas in
       | which FRP itself contributes to unnecessary complexity.
       | 
       | It feels like a scenario where the authors had something that
       | they thought was neat and went out to create metrics that would
       | in fact show that it was neat. Maybe FRP is really neat, but I
       | feel that the paper itself doesn't contribute to anything because
       | its logic is so custom and purpose built.
        
         | mhsdef wrote:
         | I think the first half of the paper ("the problem") is much
         | stronger than the latter half ("the solution").
        
       | bob1029 wrote:
       | This paper was career changing for me.
       | 
       | Chapter 9 is effectively the original concept for our business
       | rules engine today. We use SQLite and C# UDFs as the actual
       | foundation.
       | 
       | Using a proper relational model and SQL queries for all the
       | things means that domain experts can directly contribute. It also
       | makes it feasible to turn your domain experts into internal
       | customers of your developers. For B2B products, this can be make
       | or break.
       | 
       | Building an actual business around this kind of thing is very
       | hazardous in my experience. Unless you are _completely_ certain
       | that you have the schema figured out, this promising foundation
       | converts to quicksand.
       | 
       | Using higher normal forms is one way to reduce the consequence of
       | screwing up domain modeling, but you still really have to know
       | the relational guts of the business one way or another.
       | 
       | One design-time trick is to get all stakeholders to compose their
       | ideal schema in something like excel and have them fill in some
       | sample data. Showing them an example of one of these for a
       | different domain is a powerful lesson in my experience.
        
       | ZitchDog wrote:
       | A classic paper with a dream that has yet to be realized. We
       | continue to bolt state on top of state. Redux bolted on top of
       | GraphQL on top of Redis on top of Postgres. We can do better.
        
         | lucisferre wrote:
         | It continues to be such a sad state of affairs. We are
         | constantly reinventing and repackaging well known bad
         | practices.
         | 
         | I recall getting publicly lambasted on here for daring to
         | question the wisdom of an ActiveRecord-like data layer for a
         | front-end SPA framework.
        
           | layer8 wrote:
           | If only we could change that state of affairs!
           | 
           | ;)
        
           | ZitchDog wrote:
           | Oh right, I forgot the ORM with "distributed caching" needed
           | between the database and GraphQL layers. More hidden state.
        
       | dcre wrote:
       | This paper was very influential on me when I first started
       | programming professionally around 2012. I don't plan on reading
       | it again, but my vague memory of what I got out of it is pretty
       | simple and I think has become pretty standard practice at this
       | point: avoid mutable state and use pure functions where possible.
       | 
       | The framing of accidental and essential complexity is of course
       | very useful and not really unique to this paper. The difficulty
       | is there is nothing but experience-informed judgment that can
       | tell you which complexity is accidental and which is essential.
       | There is always a framing of the problem that can justify some
       | bit of complexity as essential. You have to judge.
        
         | feoren wrote:
         | Plenty of state -- even mutable state -- is essential
         | complexity. Think of an IDE where a user is typing out code:
         | the state is changing all the time. Pure functional programming
         | has a hard time walking the line between "avoiding" mutable
         | state, and "ignoring" mutable state. If you insist on
         | functional purity, you're already admitting defeat in the face
         | of essential mutable state. The walls of functional (and
         | especially reactive) programming are too high -- they currently
         | don't play very well with their Turing and von Neumann
         | neighbors.
         | 
         | Functional programming is only "mathematically pure" because
         | _we already have math for it_. You can make mutating state
         | "mathematically pure" too, if you just invent new math for it.
         | For example, you can pair state with a "generation" variable
         | and define mutating functions as returning the new state with
         | generation + 1; now all your functions are pure again. That's
         | not all it takes, and it's not easy! But it shows how narrow-
         | minded the current idea of "purity" is.
        
           | cubefox wrote:
           | I always thought the problem with a "purely" functional view
           | is much more practical: From my limited Haskell experience, I
           | gather that you have to use recursion instead of loops, since
           | loops (and GOTOs) work by iteratively modifying some state,
           | unlike recursion. But humans think in loops. If you look in a
           | cookbook for a recipe, it will almost certainly contain loops
           | and rarely any recursions.
           | 
           | Recursion programs are provably Turing equivalent to WHILE or
           | GOTO programs, but that doesn't mean they are equally natural
           | for our brain to think about. (Not to mention things like
           | tail recursion, which is even less intuitive.)
        
             | dannyobrien wrote:
             | I definitely don't think in terms of recursion (and I don't
             | think I've seen it explicitly in a recipe!), but I'm not
             | sure loops and interation are necessarily that intuitive,
             | either. I learned programming at an early age and I still
             | distinctly remember the period when loop constructs "made
             | sense".
             | 
             | I can believe that recursion is even less easy to teach
             | than iteration, but it may be that this is a pedagological
             | issue, or that we have yet to work out the best way to
             | convey or document computation.
        
               | cubefox wrote:
               | Many recipes contain WHILE loops. E.g. while the pan is
               | not full, layer meat sauce, lasagna noodles, and bechamel
               | sauce. FOR loops (do the x process 10 times, or once for
               | everything in a list) are also prefectly natural.
               | Examples for recursion are not as easy to find. This
               | points to recursion itself being harder for us than
               | loops. Understanding loops in recipes doesn't require
               | pedagogical attention.
        
           | mpweiher wrote:
           | > Plenty of state -- even mutable state -- is essential
           | complexity.
           | 
           | Arguably most of it!
           | 
           | After all, computer don't compute.
           | 
           | https://www.youtube.com/watch?v=EKWGGDXe5MA&t=297s
           | 
           |  _One of the miseries of life is that everyone names
           | everything a litte bit wrong, and so it makes everything a
           | little harder to understand in the world than it would be if
           | it were named differently. A computer does not primarily
           | compute in the sense of doing arithmetic. Strange. Although
           | they call them computers, that 's not what they primarily do.
           | They primarily are filing systems. People in the computer
           | business say they're not really computers, they are "data
           | handlers". All right. That's nice. Data handlers would have
           | been a better name because it gives a better idea of the idea
           | of a filing system._
        
             | dagss wrote:
             | Related: The Norwegian word for "computer" is "datamaskin".
             | Or, "data machine".
        
           | mrkeen wrote:
           | > Think of an IDE where a user is typing out code: the state
           | is changing all the time.
           | 
           | That's easy mode. Mutability is often justifiable when one
           | person is doing one thing to a computer at a time. Now
           | extrapolate to multiple people editing the same document at
           | once. Suddenly you're discussing CRDTs and other approaches.
           | And now implement undo on top of that!
           | 
           | Likewise with git, or blockchain, or kafka. They're
           | persistent logs so you can keep your head straight while
           | figuring out what the current state(s) of the system can be.
           | Even with git, when you do an in-place mutation (force-push)
           | there's _still_ an extra log (reflog) behind the scenes
           | trying to keep the changes sane.
        
             | feoren wrote:
             | I think this is a good example of why "simple CRUD systems"
             | are actually much more complex than people usually give
             | them credit for. Anything seems easy if you do it badly.
             | But multiple people editing a document at once with CRDTs
             | and undo is still even easier than a basic CRUD application
             | done properly: now you have multiple people editing
             | multiple different data types at once! In such cases we
             | should be thinking about building CRDTs over all the
             | various operations users may be doing with the data, which
             | means thinking about associativity and commutativity of
             | generic data operations. Git only has to deal with merging
             | text files line by line; CRUD applications have to deal
             | with merging lots of different data types!
        
           | kccqzy wrote:
           | There is no essential mutable state in computer science. All
           | essential mutable state can be modeled away to use no mutable
           | state, as you have shown in the generation number idea which
           | is one valid way. (I'm strictly talking about computer
           | science problems not computer engineering problems such as
           | drivers.)
           | 
           | The generation number idea you have shown is an excellent
           | idea. It immediately enables several new capabilities
           | compared with just mutation:
           | 
           | * You are now able to determine which data is older and which
           | is newer without resorting to an external clock.
           | 
           | * If your mutation takes a long time, there is no longer a
           | need to use a long-held mutex to ensure thread safety which
           | would prevent concurrent reads; you can create a new
           | generation separately and then atomically CAS it. Writers
           | don't block readers.
           | 
           | * If you suddenly need undo/redo functionality, it's suddenly
           | trivial. Or diff'ing between versions. Or understanding the
           | reason for changes (basics of auditing).
        
             | kilgnad wrote:
             | Lambda calculus machines can't exist in reality though.
             | Only the von Neumann machine can be actualized.
             | 
             | Thus from a practical perspective the foundations of actual
             | computing must be built on mutable state.
             | 
             | You are wrong about the generations idea. It's a side
             | effect. Functions themselves cannot actually do anything
             | with a generation number. It's completely useless other
             | then for letting you know how many times a value has
             | changed.
             | 
             | A generation number also doesn't negate the need for
             | mutexes. The system is still operating on shared state, a
             | generation number doesn't change this.
        
               | kccqzy wrote:
               | You are not getting the point of either my comment or the
               | original article. You are thinking from an engineering
               | point of view, where if you look through all the layers
               | of abstraction you see mutexes and shared mutable memory.
               | That's not my point; my point is about building up those
               | abstractions so they are sufficiently hidden. Functional
               | programmers routinely operate on a higher level of
               | abstraction, and they have shorter, more maintainable
               | code. In such code, it would be a code smell if you
               | continue to use mutexes, because that should've been an
               | implementation detail. It should've been clear from the
               | outset that the kind of complexity the article is talking
               | about is cognitive complexity.
        
             | layer8 wrote:
             | > All essential mutable state can be modeled away to use no
             | mutable state
             | 
             | However, that modeling introduces complexity of its own
             | that not everyone agrees is entirely essential.
        
             | enlyth wrote:
             | > There is no essential mutable state in computer science.
             | 
             | Yes, theoretically. Now imagine your mutable state is 2GB
             | in size, have fun creating a copy of it on every change.
        
               | akimball wrote:
               | The optimizer removes that.
        
               | kccqzy wrote:
               | You are not getting the point of my comment or the
               | original article. It's clear to me that the kind of
               | complexity being talked about in the article is cognitive
               | complexity not computational complexity. It's not about
               | choosing between an O(n) copying of data and O(1) in-
               | place mutation; it's about choosing code that's easy to
               | comprehend and maintain.
        
               | deathanatos wrote:
               | Their point is that you can't choose the cognitively
               | simpler choice because of real world design constraints,
               | such as not consuming all available RAM -- i.e., because
               | of essential complexity.
        
               | pletnes wrote:
               | Btrfs does stuff like instantaneous snapshots for a multi
               | terabyte size filesystem. I have no idea how they do it,
               | but apparently it's 100% possible.
        
             | feoren wrote:
             | Engineering is about tradeoffs. There is no One Model To
             | Rule Them All. Your post is a great thing to keep in mind,
             | but it's not a prescription. Engineers need to be able to
             | look at a problem from many points of views at once, and
             | try to find the right path for their current problem. This
             | is why it's so important that models play nicely with one
             | another, something that functional programming is getting
             | better at, but reactive programming still really struggles
             | with.
             | 
             | All of your asterisked points are well-taken, but: do you
             | _need_ that capability? Sometimes; sometimes not.
        
           | ergeysay wrote:
           | Not to mention that pure FP completely handwaves away
           | implicit global state such as heap.
        
             | kilgnad wrote:
             | That's the point. This hand waving allows the user to build
             | more complex systems.
             | 
             | The cost is efficiency but make no mistake not thinking
             | about the heap allows people to build much more complex
             | programs. It's a trade off between complexity and
             | efficiency.
        
               | akimball wrote:
               | Au contraire. It is a tradeoff between application
               | programmer effort and compiler writer effort.
               | Amortization infers it is better to have an extremely
               | advanced IR optimizer.
        
               | kilgnad wrote:
               | Then why do zero cost abstractions exist? Languages like
               | rust and C++ make the trade off on the application side
               | rather then the compiler side.
        
               | cubefox wrote:
               | Then could it reversely be argued that not worrying about
               | the stack (tail recursion) allows people to build more
               | complex programs? Probably depends on which is more
               | worrisome on average, heap or stack.
        
           | kilgnad wrote:
           | No. Pure functions are more modular. It allows you to treat
           | your logic like bricks and building blocks. Pure functions
           | are composable purely off of types. Combinators are actually
           | the proper term here.
           | 
           | Mutable state does not do this. Your idea of generations
           | doesn't make sense, because that means every function must
           | take a generation as the input parameter too. Given a
           | different generation number the function must
           | deterministically create the same output.
           | 
           | It is not a complete system. Your equations produce this
           | useless generation number that aren't reused as part of the
           | system, it's just there for you and meta analysis.
           | 
           | That is not to say your first paragraph is completely wrong
           | though. There is a trade off between modularity and
           | efficiency. Mutating a variable is less resource intensive
           | then generating a new variable.
        
             | feoren wrote:
             | > Pure functions are more modular. It allows you to treat
             | your logic like bricks and building blocks.
             | 
             | Mathematical purity is indeed what allows you to treat your
             | logic like bricks and building blocks. My point is that "a
             | monad is a monoid in the category of endofunctors" is not
             | the only "pure math" out there, and also that your domain
             | model is likely the most impure part of your whole program.
             | Functional programming is awesome! But mostly ignores the
             | existence of essential mutable state, and embracing
             | functional programming is only a small part of the
             | "mathematical purity" struggle that is indeed the only way
             | to build truly reusable building blocks. If you're spending
             | lots of time building clean, pure, mathematical data
             | manipulation logic, but the data you're manipulating is
             | "Dog : Animal" and "Cat : Animal", you're in a garbage
             | in/garbage out situation. Worry about the mathematical
             | purity of your data model itself. It will marry and dance
             | and harmonize with the purity of your functional logic!
             | 
             | > Mutating a variable is less resource intensive then
             | generating a new variable.
             | 
             | Not always.
        
       | n0w wrote:
       | I came across this after seeing relic[0] submitted the other day
       | and thought it was pretty interesting.
       | 
       | I've been into CRDTs for a while and have started wondering about
       | generic mechanisms for distributed data. This lead me to read a
       | lot more about the Relational Model of data and eventually to the
       | Event Calculus.
       | 
       | What's interesting to me is that these things end up feeling a
       | lot like CRDTs[1] or Event Sourcing. I haven't quite finished
       | pulling on these threads but the relic link was a timely read
       | considering!
       | 
       | I really liked the first half of this paper and the Authors
       | categorization of complexity. However the second half fell a bit
       | short for me. It seems they made the same mistake as many other
       | people (SQL != Relational) and their idea of Feeders and
       | Observers seems a bit more like an escape hatch than an elegant
       | method for interfacing with the outside world.
       | 
       | [0] https://github.com/wotbrew/relic [1]
       | http://archagon.net/blog/2018/03/24/data-laced-with-history/
        
         | hummus_bae wrote:
         | [dead]
        
       | carapace wrote:
       | I'm designing a such a system now, based on the pure functional
       | Joy language with two additional data stores: a relational db
       | system (Prolog (or maybe Datalog), not SQL) and what is
       | effectively a git repo although I think of it as a "data oracle".
       | 
       | The role of the relational db system is explained in TFA. (Prolog
       | makes a fine Relational Model DB (the "relations" in RMDBs are
       | the same _logical relations_ that Prolog  "relations", um, are.)
       | The language is cleaner and simpler and more powerful than stock
       | SQL, there's an ISO standard and several solid implementations,
       | and you can always back it up with SQLite or PostGRES or whatever
       | if you need to.) The trick to integrating it with a purely
       | functional system is to only use "pure and monotonic Prolog code"
       | which you want to do anyway (
       | https://www.metalevel.at/prolog/debugging ) or as I like to say,
       | "Don't put lies in your database."
       | 
       | The "data oracle" (which again is more-or-less just a git repo)
       | provides bytes given a three-tuple of _(hash, offset, length)_.
       | These are immutable, so you can cache the results of (pure)
       | computations over them (e.g. a predicate like  "is valid UTF-8"
       | is true/false for all time, yeah?) This replaces the filesystem.
       | 
       | I was working with Prof. Wirth's Oberon RISC CPU as a basis, but
       | a couple of days ago a fantastic new 64-bit vm went by here on HN
       | and I'm going to use that going forward.
       | https://github.com/maximecb/uvm
       | https://news.ycombinator.com/item?id=34936729
        
       ___________________________________________________________________
       (page generated 2023-02-28 23:01 UTC)