[HN Gopher] Out of the Tar Pit (2006) [pdf]
___________________________________________________________________
Out of the Tar Pit (2006) [pdf]
Author : n0w
Score : 83 points
Date : 2023-02-27 08:01 UTC (1 days ago)
(HTM) web link (curtclifton.net)
(TXT) w3m dump (curtclifton.net)
| cloogshicer wrote:
| I think the crux of this paper is section 7.2.2:
|
| > There is one final practical problem that we want to consider
| -- even though we believe it is fairly rare in most application
| domains. In section 7.1.1 we argued that immutable, derived data
| would correspond to accidental state and could be omitted
| (because the logic of the system could always be used to derive
| the data on-demand). Whilst this is true, there are occasionally
| situations where the ideal world approach (of having no
| accidental state, and using on-demand derivation) does not give
| rise to the most natural modelling of the problem. One possible
| situation of this kind is for _derived data which is dependent
| upon both a whole series of user inputs over time, and its own
| previous values_. In such cases it can be advantageous to
| maintain the accidental state even in the ideal world. An example
| of this would be the derived data representing the position state
| of a computer-controlled opponent in an interactive game -- it is
| at all times derivable by a function of both all prior user
| movements and the initial starting positions, but this is not the
| way it is most naturally expressed.
|
| Emphasis is mine.
|
| I think that this type of derived data, which I put in italics
| above, is quite common - contrary to what the authors of the
| paper argue. Any UI code or game-like system, like simulations,
| will have this kind of data. And the paper does not have a good
| answer for it. I honestly think that nobody has an answer for it,
| and it's why most of our UIs suck.
|
| I would love to see something that makes handling this type of
| derived data easy.
| feoren wrote:
| My impression from reading this has always been "Almost there!
| Keep going." If you keep pulling on these threads, I claim that
| you reach some inevitable conclusions. These ideas are a superset
| of all the other advice they give (and most other advice you
| hear, too):
|
| 1. Separate identity from data. These are _completely different
| things_. The number 7 is not mutable, even though my age is. Keep
| going! The phrase "It was a dark and stormy night" is not
| mutable, even though the opening line of your book is. Keep
| going! The statement "there are seventeen cars parked on level 3
| and four spots available" is _not mutable_ , even though the
| status of the parking garage is. Identity is immutable, and
| statements (data) are also immutable. Only the assignment of a
| statement to an identity is mutable.
|
| 2. Think about the operations that make sense on your data. Sure,
| you have "map", "filter", "reduce" that are mathematically pure.
| What about "offset" or "rotate"? Those have identity, they are
| associative, individually commutative (but not with each other!).
| Is there a distributive property that applies there? Okay, what
| about "buy" and "sell" operating on commodities? Are those
| mathematical operations? Do they have an identity element? Are
| they associative and commutative? "Buy 2000 lbs of chicken" -- is
| that equivalent to "Buy 1 ton of chicken"? What are its domain
| and range? Not "chicken", nor even "warehouses" -- you're not
| teleporting the chicken as soon as you commit that record. More
| like "contract", or even "contract request". What can you do with
| contracts? Is there an "identity contract"? "Buy 0 chicken"? Zero
| is a useful number! Is "Buy 0 chicken" the same as "Buy 0 beef"?
| Explore these questions. Find the math around your domain.
| Functional purity is all good, but it's wasted if your domain is
| just "Chicken : Meat : Commodity", "Beef : Meat : Commodity",
| "Pine : Lumber : Commodity". Wrong. Don't think about nouns.
| Think about sensible pure operations in your domain. Successive
| abstraction layers should be _restricting_ what 's possible to do
| with your data, by defining higher and higher-level operations.
| If your abstraction layers aren't restricting what's possible to
| do, they're an inner platform and you don't need them.
|
| 3. Don't make big decisions upfront. That's how you add
| accidental complexity. Don't make any decisions you don't need
| to, and prefer solutions that allow you to defer or lighten
| decisions. If you follow this thread, that means you're using a
| relational model. They absolutely got that right (section 8).
| Otherwise you're making decisions about ownership that you don't
| need to be making. Domain Driven Design has it _wrong_ here with
| aggregate roots. What 's an aggregate? Which concept goes inside
| of which aggregate? You do not need to make that decision. The
| authors of TFA get it right, and then wrong again, when they try
| to apply a relational model to shitty noun-based domain ideas
| like "Give Employee objects a reference to their Department, vs.
| Give Department objects a set (or array) of references to their
| Employees vs. Both of the above". No. They're not following their
| own advice. Can an employee exist without the concept of
| "department"? Yes. Can a department exist without the concept of
| employees? Probably. Therefore you _must_ encode the relationship
| separately from both. Your hands are tied. If concept A can exist
| independently of concept B, you _must not_ draw an arrow from A
| to B. The answer is not "both", it's "neither". An employee's
| assignment within a department is its own independent concept,
| that knows about "employee" and "department" both. And now it's
| obvious you can give this concept its own tenure and add a new
| record whenever they change departments. You've found append-only
| event sourcing without needing to start there -- it came from
| first principles (in this case).
|
| 4. Operate on the broadest scope you can. Manipulate sets of
| things, not individual things. This is half of functional
| programming's usefulness right here. This is supported by using a
| relational model. Operate on sequences of things, not individual
| things: there's reactive programming. How else can you broaden
| the scope of what you're operating on?
|
| 5. Don't just do something, stand there! That is: don't _do_ ,
| _plan_. Instead of immediately looping, just use "map". Instead
| of immediately hitting the database, write a query plan. This
| doesn't mean offload your thinking to GraphQL -- that's just
| kicking the can down the road. See point #2. This is the other
| half of functional programming's usefulness. Do only as much as
| you need to make (or append to) a plan, and then stop. Someone
| else will execute it later -- maybe! You don't care.
|
| There's probably more, but there are more fundamental ideas than
| "use functional programming" or "use event sourcing" or "use
| SOLID" -- first principles that actually lead to almost all the
| other good advice you get. This paper kinda almost gets there a
| couple times, then walks back again. Keep going. Keep pulling on
| those threads. I suspect that the more you do this, the more your
| code will change for the better. But you have to be willing to
| keep going -- don't half-ass it. If you only go part way and then
| stop, the benefits won't be apparent yet.
| dagss wrote:
| I feel event sourcing is a real world pragmatic approach to
| declarative programming that this paper advocates.
|
| For state changes you add events to the database to describe
| something that happened. Any question you may need an answer for
| / business decision you want to make can be answered by querying
| the events.
|
| The problem at the moment is that while event sourcing is
| excellent at reducing accidental complexity surrounding
| implementing business rules, there is little standard / commonly
| used tooling around it and you end up with lots of accidental
| complexity in that end.
|
| An example would be a database not designed to be a CRUD store
| but to store events and manage read models, and manage
| computation of projections etc -- while being suitable for OLTP
| workloads. At a minimum, very strong support for using any kind
| of construct in materialized views (since in a sense the entire
| business logic is written as a "materialized view" when doing
| event sourcing)
| feoren wrote:
| Event Sourcing is a nice card to have in your hand, but it
| should not be a goal in itself. A yard is 3 feet. The atomic
| mass of Hydrogen is 1.008 and its symbol is "H". These will
| _never_ change. If someone came to me and said "your
| 'Chemical' table is not event-sourced. We're doing event-
| sourcing here. You need to change it." I would tell them to get
| lost. Why the hell would you have an event for "An Element's
| Mass was Modified" event stored in your database? Unless you're
| developing for CERN or NREL or something, just don't.
|
| On the other hand, having a bank account table with a single
| field for someone's money is clearly not enough. You absolutely
| should be tracking every transaction that has changed that
| account's value. Do you need to track every other possible
| change to that account? Like, whether they want paper or
| electronic mail? No, probably not.
|
| "Event sourcing" is a way to refactor a domain model -- take a
| statement and break it into a sequence of statements that "add
| up" to the original statement.
|
| "Add up" is key here. When you break "AccountBalance" into
| "Transaction", it's clear how to "add up" transactions to
| recreate the original account balance. But that's not your
| goal, necessarily! The reason why this tends to make better
| domain models is exactly because you have to think about
| "adding up" your domain models, and what that means. "Adding
| up" is an associative, probably-commutative, binary operation
| with identity. Note that that means your domain MUST have a
| "zero transaction". ALL of your events that you event source
| need a "zero event". If you cannot come up with the "zero" of
| an event, then you should not be breaking into events! The
| whole point is to be able to define the monoid over it, which
| requires identity.
|
| So instead of taking event sourcing as an end goal, make your
| goal this: think about the operations that make sense on your
| domain, like accumulating accounts. What else can you
| accumulate? Can you add Employees together? Not really -- you
| can _group_ them, into departments and events and meetings. Is
| that grouping associative and commutative? Sure -- it 's just
| Set Union. Is there anything about Employees you can add
| together? Well, their salaries. In fact for data analysis,
| employee salaries are an important cost that you probably want
| to throw in a data cube. Define a Monoid over Employee
| salaries.
|
| What other operations make sense on your data? Close, open,
| start, end, group, buy, sell, move, rotate, add, multiply,
| concatenate, join, reverse, inverse, combine, merge, undo,
| redo, fill, empty, saturate, fix, break, import, export,
| report, validate. Are they associative, commutative,
| distributive, invertible? Do they have identity? Event sourcing
| is such a tiny part of exploring this world. And it's worth
| exploring.
| dagss wrote:
| I'm not arguing that event sourcing should be done for its
| own sake, so I don't really want to disagree with you; but
| that said your post doesn't perfectly resonate with me
| either.
|
| When you write a typical backend system, the desired function
| of the system is to interact with the external world. Without
| I/O the system may as well not exist.
|
| Input is a desire from someone that something be done or
| recording that something happened. Such input changes the
| data recorded, or appends what the system should know / has
| seen. This is an "event".
|
| All input can be framed as being an event. But "an element's
| mass was modified" is not an event ... it doesn't describe
| someone or something giving input to our system.
|
| The algebraic view on things you take seems to be treating
| the system at a different level than what I think about as
| event sourcing.
|
| Neither "an element's mass was modified" or "sell" or
| "transaction" that you mention are realistic events. An event
| is "User U clicked a button in the web console to Sell share
| S at time T". Implementing the effects of that event --
| computing a specific read model resulting from appending the
| event to the stored set of events -- may well be best done by
| some algebra like you suggest, but that seems like another
| topic.
|
| You seem to talk about models for computing and transforming
| state. I talk about I/O vs data storage.
| feoren wrote:
| > All input can be framed as being an event.
|
| Sure, it _could_ be, but is it _useful_ to do that? If I
| stand up and shout "The price of a Banana is $4 per
| bushel!", you could record my voice and upload it as a raw
| wave file. That's the rawest "input event" you can come up
| with. Or you could write down "some random dude said that
| bananas cost $4 around 4:30 pm and I'm not sure whether I
| believe him or not". That's not the "raw input", it's been
| transcribed and modified and annotated. Yet it's almost
| certainly more useful to your system, and it's kinda like
| event sourcing. Kinda.
|
| The problem with worrying about whether something is
| "input" or "output" or "internal" is that you can just move
| the dotted line anywhere around your system to change
| those. If you break a monolith into independent reusable
| building blocks, those building blocks are going to have a
| completely different idea of what counts as input and
| output. But who cares? You're not changing any fundamental
| truth about how the domain works. Your domain model should
| really be independent of worrying about what's "input" and
| "output". Those lines move all the time. Instead think
| about what operations make sense to do with your data, and
| then think about the mathematical properties of those
| operations.
|
| > But "an element's mass was modified" is not an event ...
| it doesn't describe someone or something giving input to
| our system.
|
| Sure it does. Someone gave you the input that a particular
| element has a particular mass. How is that not input? How
| else did you get that data?
|
| > The algebraic view on things you take seems to be
| treating the system at a different level than what I think
| about as event sourcing.
|
| This is my exact point. You _should_ think about event
| sourcing this way, because that 's the only reason it's
| useful: it's accidentally a source of important "domain
| algebra" that you otherwise might miss. But there's _lots_
| of other important "domain algebra" that you are still
| missing, and they don't necessarily look like event
| sourcing.
|
| > An event is "User U clicked a button in the web console
| to Sell share S at time T".
|
| But surely that's not what you're _storing_ in your system!
| That would be an extreme coupling between the concept of
| "selling shares" and "clicking a button". Those are
| _completely unrelated_ ideas! Why would you want to tightly
| couple them!? If _that 's_ what you think event sourcing
| is, sorry to be blunt, but you have very badly
| misunderstood it.
| dagss wrote:
| Fair points. But how do I get from a "domain algebra" to
| practical implementation with a popular database?
|
| Event sourcing can be translated to adding rows to
| tables, or adding documents to collections. Focusing so
| much on "append" isn't only because of what kind of
| events you would model, but because you store data in
| databases by, well, storing it..appending it..
|
| If event sourcing is only useful as a source of an
| important "domain algebra": How does a domain algebra
| translate to practical use of a database system for your
| application? How can I focus on the "operations I want to
| do on my data", when the tools I am given is pretty much
| INSERT/UPDATE/SELECT, or GET/PUT, or some variety on
| these lines?
| feoren wrote:
| Bear with me, we're going on a bit of a walk.
|
| An abstraction layer says: here are some operations you
| can do, operating on some models. Presumably those models
| are a useful or convenient perspective on some underlying
| data, and those operations are important or useful as
| well. At this level, we can define something like "order
| a pizza", "renew my driver's license", or "show me a book
| I might like to read". Note the abstraction is driven by
| the needs of the users/consumers/callers of the
| abstraction, _not_ its implementations. This sounds
| obvious, but it 's extremely often done the wrong way
| around -- "Foo : IFoo" shouldn't be muscle memory, and it
| should mean "I can't currently think of any other
| implementation of IFoo right now, although that might
| change", and NOT mean "this interface is the header of
| this class, which we have to do or we get yelled at".
|
| The implementation of an abstraction layer breaks down an
| operation in one domain into operations in another
| (presumably "lower level") domain. It's possible that
| "renew my driver's license" can be reasonably written
| directly in a few SQL statements. Okay, that's no big
| deal. Most likely you have a few intermediate layers, or
| an ORM, or whatever. So the implementation of the "DMV
| API" (or whatever is defining that operation) is where
| you break it down into INSERT/UPDATE/SELECT or whatever
| other lower-level tools you have.
|
| None of this changes when you start thinking about your
| higher-level operations algebraically. All that really
| means is that you consider the operations and their
| inputs/outputs, and you start asking questions like: are
| these idempotent? Associative? Commutative? Do they have
| identity? You can ask those questions at the abstraction
| level. They are part of the definition of the operation.
| It's up to the implementation to make sure those
| properties are respected.
|
| Look for ways to change the operations you offer to be as
| "mathematical" as you can. The more mathematical they
| are, they more you'll serendipitously come up with new
| and interesting and useful ways to use them; the more
| reusable they'll be.
|
| "Renew my driver's license" is idempotent; "order a
| pizza" is not. Prefer idempotency. Can we make "order a
| pizza" idempotent? Sure. The client generates (or
| requests) a unique ID for the order. We have "update an
| order", which may actually be a family of operations. We
| have "finalize an order" which places it. Finalizing the
| same order twice does nothing -- it's idempotent.
|
| User story: a bunch of guys are sitting around and want
| to order from your pizza place, but it's always annoying
| to pass the phone around; everyone wants to look at the
| menu at once. We want "distributed ordering", so a party
| can all contribute what they want to the same order. They
| want to see what everyone else is doing in real-time. I
| want breadsticks, but there's only 5 of us. If anyone has
| ordered breadsticks, we're all good. If we have an "add
| breadsticks" operation and poor concurrency, we end up
| with 5 breadsticks in our cart; someone has to notice
| that and fix it. Not great.
|
| So what's a better operation than "add breadsticks"? How
| about "make sure there are enough breadsticks"? If three
| people all say "make sure there are enough breadsticks!"
| that's basically the same as one person saying it. That's
| an important domain operation. If all you're doing is
| thinking "event sourcing", then "add breadsticks" looks
| _more_ like a domain event than "make sure there are
| enough breadsticks", but the latter is actually easier to
| work with and leads to a better experience. You can't
| make the jump from "add breadsticks" to "make sure there
| are enough breadsticks" just by thinking about event
| sourcing -- you get there by thinking about math.
|
| What happens when someone removes a pizza from the cart
| at the same time as someone else is adding pepperoni to
| it? This is the "Google Docs" problem. We need to think
| about associativity and commutativity to really solve
| these problems, not to mention random vs. sequential
| identity. Can you undo these operations? If I say "extra
| sauce", and someone else says "no, light sauce", we might
| have a conflict to resolve. But if I then undo my "set
| extra sauce" action -- what happens? It should clearly
| result in "light sauce". That's the obviously correct
| answer, so we can work back from that to figure out how
| the "set sauce amount" operation should work. "User A
| requests heavy sauce on pizza AF73" is idempotent and
| reversible. "User B requests light sauce on pizza AF73".
| "User A revokes their request for heavy sauce on pizza
| AF73". Okay, great, we're golden. "User C removes pizza
| AF73 from their order." "User D requests pepperoni on
| pizza AF73". "User C undoes their operation to remove
| pizza AF73 from their order." Great: pizza AF73 has
| pepperoni on it, despite the fact that it had been
| removed when pepperoni was added. No problem here.
|
| How do you handle statements like "User A revokes their
| request for heavy sauce on pizza AF73?" Event sourcing
| would say that this is a distinct event that you have to
| INSERT in your append-only log. But why not just DELETE
| the initial request? That works just fine. In any case,
| the low-level steps are the easy part.
|
| We're not too far from event sourcing here, but that
| emerged from analyzing the problem and trying to make it
| as "mathematically pure" as we could. We didn't start
| with event sourcing, and we saw another example (add
| breadsticks) where the "event sourced" version was worse.
| Event sourcing is a trick, but not the goal. The goal is
| "domain algebra".
|
| On the back end, you'll be doing reporting on these
| numbers. You want a really flexible reporting system?
| There was a buzzword a few years ago, "data cube". It was
| a buzzword because nobody could define what it meant, but
| after thinking about it for a while, I decided it could
| have a useful definition: A data cube is a pairing of a
| set of categorical data (pizza sizes, customer types,
| order times, whatever) and value data (costs, amounts,
| prep times, ratings). Each "value data" must have a
| monoid defined over it, which means some binary "add" (or
| "combine") method with identity. Any time you have a
| monoid defined, you get a (distributed!) "aggregate"
| method for free. That's what "roll up" and "drill down"
| are, in report-speak: projecting your categorical data
| and aggregating all your value data using the defined
| monoid. The same kind of analysis applies here, even
| though event sourcing has nothing to do with any of this.
|
| By the way: the identity of "combine" over rating and
| prep time are _not_ simply 0! Think about it harder than
| that. How do you combine user ratings? Most likely your
| value type will need to be able to sensibly represent
| "0/0" -- that's a valid and useful value sometimes!
|
| We didn't get particularly "mathy" operations with our
| pizza example, partly because it's a very "end user"
| domain and not a reusable mid-level domain like unit
| conversions or a physics engine. It's even more important
| for those domains. What's a Point minus another Point?
| Not a Point! Are "offset" and "rotate" associative?
| Commutative? Distributive? Think about these things if
| you want a reusable physics engine.
|
| This is not scripting. This is not even programming. It's
| software engineering.
| Verdex wrote:
| In the past, I have been unimpressed by this paper. Perhaps
| someone can shed some historical context...
|
| But from my perspective what happens is that the paper defines
| complexity in exactly the way that allows them to deride OO, FP,
| etc programming whilst simultaneously showing how awesome
| functional relational programming is. It ignores complexity
| that's orthogonal to what FRP addresses and ignores areas in
| which FRP itself contributes to unnecessary complexity.
|
| It feels like a scenario where the authors had something that
| they thought was neat and went out to create metrics that would
| in fact show that it was neat. Maybe FRP is really neat, but I
| feel that the paper itself doesn't contribute to anything because
| its logic is so custom and purpose built.
| mhsdef wrote:
| I think the first half of the paper ("the problem") is much
| stronger than the latter half ("the solution").
| bob1029 wrote:
| This paper was career changing for me.
|
| Chapter 9 is effectively the original concept for our business
| rules engine today. We use SQLite and C# UDFs as the actual
| foundation.
|
| Using a proper relational model and SQL queries for all the
| things means that domain experts can directly contribute. It also
| makes it feasible to turn your domain experts into internal
| customers of your developers. For B2B products, this can be make
| or break.
|
| Building an actual business around this kind of thing is very
| hazardous in my experience. Unless you are _completely_ certain
| that you have the schema figured out, this promising foundation
| converts to quicksand.
|
| Using higher normal forms is one way to reduce the consequence of
| screwing up domain modeling, but you still really have to know
| the relational guts of the business one way or another.
|
| One design-time trick is to get all stakeholders to compose their
| ideal schema in something like excel and have them fill in some
| sample data. Showing them an example of one of these for a
| different domain is a powerful lesson in my experience.
| ZitchDog wrote:
| A classic paper with a dream that has yet to be realized. We
| continue to bolt state on top of state. Redux bolted on top of
| GraphQL on top of Redis on top of Postgres. We can do better.
| lucisferre wrote:
| It continues to be such a sad state of affairs. We are
| constantly reinventing and repackaging well known bad
| practices.
|
| I recall getting publicly lambasted on here for daring to
| question the wisdom of an ActiveRecord-like data layer for a
| front-end SPA framework.
| layer8 wrote:
| If only we could change that state of affairs!
|
| ;)
| ZitchDog wrote:
| Oh right, I forgot the ORM with "distributed caching" needed
| between the database and GraphQL layers. More hidden state.
| dcre wrote:
| This paper was very influential on me when I first started
| programming professionally around 2012. I don't plan on reading
| it again, but my vague memory of what I got out of it is pretty
| simple and I think has become pretty standard practice at this
| point: avoid mutable state and use pure functions where possible.
|
| The framing of accidental and essential complexity is of course
| very useful and not really unique to this paper. The difficulty
| is there is nothing but experience-informed judgment that can
| tell you which complexity is accidental and which is essential.
| There is always a framing of the problem that can justify some
| bit of complexity as essential. You have to judge.
| feoren wrote:
| Plenty of state -- even mutable state -- is essential
| complexity. Think of an IDE where a user is typing out code:
| the state is changing all the time. Pure functional programming
| has a hard time walking the line between "avoiding" mutable
| state, and "ignoring" mutable state. If you insist on
| functional purity, you're already admitting defeat in the face
| of essential mutable state. The walls of functional (and
| especially reactive) programming are too high -- they currently
| don't play very well with their Turing and von Neumann
| neighbors.
|
| Functional programming is only "mathematically pure" because
| _we already have math for it_. You can make mutating state
| "mathematically pure" too, if you just invent new math for it.
| For example, you can pair state with a "generation" variable
| and define mutating functions as returning the new state with
| generation + 1; now all your functions are pure again. That's
| not all it takes, and it's not easy! But it shows how narrow-
| minded the current idea of "purity" is.
| cubefox wrote:
| I always thought the problem with a "purely" functional view
| is much more practical: From my limited Haskell experience, I
| gather that you have to use recursion instead of loops, since
| loops (and GOTOs) work by iteratively modifying some state,
| unlike recursion. But humans think in loops. If you look in a
| cookbook for a recipe, it will almost certainly contain loops
| and rarely any recursions.
|
| Recursion programs are provably Turing equivalent to WHILE or
| GOTO programs, but that doesn't mean they are equally natural
| for our brain to think about. (Not to mention things like
| tail recursion, which is even less intuitive.)
| dannyobrien wrote:
| I definitely don't think in terms of recursion (and I don't
| think I've seen it explicitly in a recipe!), but I'm not
| sure loops and interation are necessarily that intuitive,
| either. I learned programming at an early age and I still
| distinctly remember the period when loop constructs "made
| sense".
|
| I can believe that recursion is even less easy to teach
| than iteration, but it may be that this is a pedagological
| issue, or that we have yet to work out the best way to
| convey or document computation.
| cubefox wrote:
| Many recipes contain WHILE loops. E.g. while the pan is
| not full, layer meat sauce, lasagna noodles, and bechamel
| sauce. FOR loops (do the x process 10 times, or once for
| everything in a list) are also prefectly natural.
| Examples for recursion are not as easy to find. This
| points to recursion itself being harder for us than
| loops. Understanding loops in recipes doesn't require
| pedagogical attention.
| mpweiher wrote:
| > Plenty of state -- even mutable state -- is essential
| complexity.
|
| Arguably most of it!
|
| After all, computer don't compute.
|
| https://www.youtube.com/watch?v=EKWGGDXe5MA&t=297s
|
| _One of the miseries of life is that everyone names
| everything a litte bit wrong, and so it makes everything a
| little harder to understand in the world than it would be if
| it were named differently. A computer does not primarily
| compute in the sense of doing arithmetic. Strange. Although
| they call them computers, that 's not what they primarily do.
| They primarily are filing systems. People in the computer
| business say they're not really computers, they are "data
| handlers". All right. That's nice. Data handlers would have
| been a better name because it gives a better idea of the idea
| of a filing system._
| dagss wrote:
| Related: The Norwegian word for "computer" is "datamaskin".
| Or, "data machine".
| mrkeen wrote:
| > Think of an IDE where a user is typing out code: the state
| is changing all the time.
|
| That's easy mode. Mutability is often justifiable when one
| person is doing one thing to a computer at a time. Now
| extrapolate to multiple people editing the same document at
| once. Suddenly you're discussing CRDTs and other approaches.
| And now implement undo on top of that!
|
| Likewise with git, or blockchain, or kafka. They're
| persistent logs so you can keep your head straight while
| figuring out what the current state(s) of the system can be.
| Even with git, when you do an in-place mutation (force-push)
| there's _still_ an extra log (reflog) behind the scenes
| trying to keep the changes sane.
| feoren wrote:
| I think this is a good example of why "simple CRUD systems"
| are actually much more complex than people usually give
| them credit for. Anything seems easy if you do it badly.
| But multiple people editing a document at once with CRDTs
| and undo is still even easier than a basic CRUD application
| done properly: now you have multiple people editing
| multiple different data types at once! In such cases we
| should be thinking about building CRDTs over all the
| various operations users may be doing with the data, which
| means thinking about associativity and commutativity of
| generic data operations. Git only has to deal with merging
| text files line by line; CRUD applications have to deal
| with merging lots of different data types!
| kccqzy wrote:
| There is no essential mutable state in computer science. All
| essential mutable state can be modeled away to use no mutable
| state, as you have shown in the generation number idea which
| is one valid way. (I'm strictly talking about computer
| science problems not computer engineering problems such as
| drivers.)
|
| The generation number idea you have shown is an excellent
| idea. It immediately enables several new capabilities
| compared with just mutation:
|
| * You are now able to determine which data is older and which
| is newer without resorting to an external clock.
|
| * If your mutation takes a long time, there is no longer a
| need to use a long-held mutex to ensure thread safety which
| would prevent concurrent reads; you can create a new
| generation separately and then atomically CAS it. Writers
| don't block readers.
|
| * If you suddenly need undo/redo functionality, it's suddenly
| trivial. Or diff'ing between versions. Or understanding the
| reason for changes (basics of auditing).
| kilgnad wrote:
| Lambda calculus machines can't exist in reality though.
| Only the von Neumann machine can be actualized.
|
| Thus from a practical perspective the foundations of actual
| computing must be built on mutable state.
|
| You are wrong about the generations idea. It's a side
| effect. Functions themselves cannot actually do anything
| with a generation number. It's completely useless other
| then for letting you know how many times a value has
| changed.
|
| A generation number also doesn't negate the need for
| mutexes. The system is still operating on shared state, a
| generation number doesn't change this.
| kccqzy wrote:
| You are not getting the point of either my comment or the
| original article. You are thinking from an engineering
| point of view, where if you look through all the layers
| of abstraction you see mutexes and shared mutable memory.
| That's not my point; my point is about building up those
| abstractions so they are sufficiently hidden. Functional
| programmers routinely operate on a higher level of
| abstraction, and they have shorter, more maintainable
| code. In such code, it would be a code smell if you
| continue to use mutexes, because that should've been an
| implementation detail. It should've been clear from the
| outset that the kind of complexity the article is talking
| about is cognitive complexity.
| layer8 wrote:
| > All essential mutable state can be modeled away to use no
| mutable state
|
| However, that modeling introduces complexity of its own
| that not everyone agrees is entirely essential.
| enlyth wrote:
| > There is no essential mutable state in computer science.
|
| Yes, theoretically. Now imagine your mutable state is 2GB
| in size, have fun creating a copy of it on every change.
| akimball wrote:
| The optimizer removes that.
| kccqzy wrote:
| You are not getting the point of my comment or the
| original article. It's clear to me that the kind of
| complexity being talked about in the article is cognitive
| complexity not computational complexity. It's not about
| choosing between an O(n) copying of data and O(1) in-
| place mutation; it's about choosing code that's easy to
| comprehend and maintain.
| deathanatos wrote:
| Their point is that you can't choose the cognitively
| simpler choice because of real world design constraints,
| such as not consuming all available RAM -- i.e., because
| of essential complexity.
| pletnes wrote:
| Btrfs does stuff like instantaneous snapshots for a multi
| terabyte size filesystem. I have no idea how they do it,
| but apparently it's 100% possible.
| feoren wrote:
| Engineering is about tradeoffs. There is no One Model To
| Rule Them All. Your post is a great thing to keep in mind,
| but it's not a prescription. Engineers need to be able to
| look at a problem from many points of views at once, and
| try to find the right path for their current problem. This
| is why it's so important that models play nicely with one
| another, something that functional programming is getting
| better at, but reactive programming still really struggles
| with.
|
| All of your asterisked points are well-taken, but: do you
| _need_ that capability? Sometimes; sometimes not.
| ergeysay wrote:
| Not to mention that pure FP completely handwaves away
| implicit global state such as heap.
| kilgnad wrote:
| That's the point. This hand waving allows the user to build
| more complex systems.
|
| The cost is efficiency but make no mistake not thinking
| about the heap allows people to build much more complex
| programs. It's a trade off between complexity and
| efficiency.
| akimball wrote:
| Au contraire. It is a tradeoff between application
| programmer effort and compiler writer effort.
| Amortization infers it is better to have an extremely
| advanced IR optimizer.
| kilgnad wrote:
| Then why do zero cost abstractions exist? Languages like
| rust and C++ make the trade off on the application side
| rather then the compiler side.
| cubefox wrote:
| Then could it reversely be argued that not worrying about
| the stack (tail recursion) allows people to build more
| complex programs? Probably depends on which is more
| worrisome on average, heap or stack.
| kilgnad wrote:
| No. Pure functions are more modular. It allows you to treat
| your logic like bricks and building blocks. Pure functions
| are composable purely off of types. Combinators are actually
| the proper term here.
|
| Mutable state does not do this. Your idea of generations
| doesn't make sense, because that means every function must
| take a generation as the input parameter too. Given a
| different generation number the function must
| deterministically create the same output.
|
| It is not a complete system. Your equations produce this
| useless generation number that aren't reused as part of the
| system, it's just there for you and meta analysis.
|
| That is not to say your first paragraph is completely wrong
| though. There is a trade off between modularity and
| efficiency. Mutating a variable is less resource intensive
| then generating a new variable.
| feoren wrote:
| > Pure functions are more modular. It allows you to treat
| your logic like bricks and building blocks.
|
| Mathematical purity is indeed what allows you to treat your
| logic like bricks and building blocks. My point is that "a
| monad is a monoid in the category of endofunctors" is not
| the only "pure math" out there, and also that your domain
| model is likely the most impure part of your whole program.
| Functional programming is awesome! But mostly ignores the
| existence of essential mutable state, and embracing
| functional programming is only a small part of the
| "mathematical purity" struggle that is indeed the only way
| to build truly reusable building blocks. If you're spending
| lots of time building clean, pure, mathematical data
| manipulation logic, but the data you're manipulating is
| "Dog : Animal" and "Cat : Animal", you're in a garbage
| in/garbage out situation. Worry about the mathematical
| purity of your data model itself. It will marry and dance
| and harmonize with the purity of your functional logic!
|
| > Mutating a variable is less resource intensive then
| generating a new variable.
|
| Not always.
| n0w wrote:
| I came across this after seeing relic[0] submitted the other day
| and thought it was pretty interesting.
|
| I've been into CRDTs for a while and have started wondering about
| generic mechanisms for distributed data. This lead me to read a
| lot more about the Relational Model of data and eventually to the
| Event Calculus.
|
| What's interesting to me is that these things end up feeling a
| lot like CRDTs[1] or Event Sourcing. I haven't quite finished
| pulling on these threads but the relic link was a timely read
| considering!
|
| I really liked the first half of this paper and the Authors
| categorization of complexity. However the second half fell a bit
| short for me. It seems they made the same mistake as many other
| people (SQL != Relational) and their idea of Feeders and
| Observers seems a bit more like an escape hatch than an elegant
| method for interfacing with the outside world.
|
| [0] https://github.com/wotbrew/relic [1]
| http://archagon.net/blog/2018/03/24/data-laced-with-history/
| hummus_bae wrote:
| [dead]
| carapace wrote:
| I'm designing a such a system now, based on the pure functional
| Joy language with two additional data stores: a relational db
| system (Prolog (or maybe Datalog), not SQL) and what is
| effectively a git repo although I think of it as a "data oracle".
|
| The role of the relational db system is explained in TFA. (Prolog
| makes a fine Relational Model DB (the "relations" in RMDBs are
| the same _logical relations_ that Prolog "relations", um, are.)
| The language is cleaner and simpler and more powerful than stock
| SQL, there's an ISO standard and several solid implementations,
| and you can always back it up with SQLite or PostGRES or whatever
| if you need to.) The trick to integrating it with a purely
| functional system is to only use "pure and monotonic Prolog code"
| which you want to do anyway (
| https://www.metalevel.at/prolog/debugging ) or as I like to say,
| "Don't put lies in your database."
|
| The "data oracle" (which again is more-or-less just a git repo)
| provides bytes given a three-tuple of _(hash, offset, length)_.
| These are immutable, so you can cache the results of (pure)
| computations over them (e.g. a predicate like "is valid UTF-8"
| is true/false for all time, yeah?) This replaces the filesystem.
|
| I was working with Prof. Wirth's Oberon RISC CPU as a basis, but
| a couple of days ago a fantastic new 64-bit vm went by here on HN
| and I'm going to use that going forward.
| https://github.com/maximecb/uvm
| https://news.ycombinator.com/item?id=34936729
___________________________________________________________________
(page generated 2023-02-28 23:01 UTC)