[HN Gopher] Data-Oriented Design (2018)
       ___________________________________________________________________
        
       Data-Oriented Design (2018)
        
       Author : DeathArrow
       Score  : 263 points
       Date   : 2023-07-03 10:41 UTC (12 hours ago)
        
 (HTM) web link (www.dataorienteddesign.com)
 (TXT) w3m dump (www.dataorienteddesign.com)
        
       | Ciantic wrote:
       | Everytime I hear about Data-Driven/Oriented Design I remember a
       | paper from OOP course I had to read in University, it used Data-
       | Driven Design as an example how to not to do things.
       | 
       | The paper in question is this from 1989:
       | https://dl.acm.org/doi/pdf/10.1145/74877.74885
       | 
       | It highlights that:
       | 
       | "Even though the goal of data-driven design is to encapsulate
       | data and algorithms, it inherently violates that encapsulation by
       | making the structure of an object part of the definition of the
       | object. This in turn leads to the definition of operations that
       | reflect that structure (because they were designed with the
       | structure in mind). Attempts to change the structure of an object
       | transparently are destined to fail because other classes rely on
       | that structure. This is the antithesis of encapsulation."
       | 
       | Then goes to show that Responsibility-Driven Design has better
       | approach.
       | 
       | What we mean with Data-Driven Design have come a long way though,
       | and isn't comparable to those days.
       | 
       | I find it a bit amusing that Data-Driven Design used to be an
       | insult of sorts, like you didn't know how to do things OOP the
       | right way.
        
         | dosshell wrote:
         | You do realize that DoD has nothing to do with data-driven,
         | right?
         | 
         | Data oriented design is about structure the data in regard of
         | how it is processed and the hardware. This is opposite of
         | object oriented design which models the data around your mental
         | model.
         | 
         | For example in DoD you could design a map with keys in one
         | array together and one array of values. This would make it much
         | faster to iterate over keys while searching because of cache
         | memory.
         | 
         | While in object oriented you would store an array of pairs.
         | 
         | Both approaches can use data-driven.
         | 
         | Edit: would -> could
        
           | morelisp wrote:
           | There are also multiple definitions of "data-driven" floating
           | around depending on qualifier/context. An OLAP RDBMS, for
           | example, will certainly be written in a data-oriented way,
           | and also have almost fully data-driven behavior, but will
           | certainly not have data-driven (in the sense of that paper)
           | design.
        
           | drainyard wrote:
           | While I agree with your point, the map example is not
           | entirely well represented in your comment. The idea is not to
           | just store keys and values in separate arrays, the idea is to
           | look at your use case and model your data after the
           | transformation you need. So if you have a case where storing
           | keys and values as pairs because of your access pattern, then
           | do that, if you have a case where you do a lot of searches
           | through keys, then store them in separate arrays.
           | 
           | The point of DoD is to look at the data you have and the data
           | you need it transformed into and then structure your data
           | after.
        
         | dgb23 wrote:
         | The example given as "data driven design" in this paper neither
         | resembles modern data driven programming nor data oriented
         | design.
         | 
         | I don't quite understand why the authors thought of this as a
         | good example of "data driven design".
         | 
         | I know that functions in a data driven program tend to be very
         | generic so the conclusion you cite is very much a mismatch from
         | my experience.
        
       | gjadi wrote:
       | Found an online review of the book:
       | https://gist.github.com/seece/25ed1b2108cf5782718b026382f2c5...
        
         | victor106 wrote:
         | Thanks for the link.
         | 
         | Found this interesting and against common advice.
         | 
         | "The bane of many projects, and the cause of their lateness,
         | has been the insistence on not doing optimisation prematurely.
         | The reason optimisation at late stages is so difficult is that
         | many pieces of software are built up with instances of objects
         | everywhere, even when not needed."
         | 
         | There are definitely applications where performance is of
         | primary concern (maybe only a few) and others where they are
         | not. In apps where it is, this gives me thought that maybe
         | premature optimization is okay? Am I reading that right?
         | 
         | There's also this called Data-Oriented Programming
         | https://www.manning.com/books/data-oriented-programming.
         | 
         | Are both these concepts the same?
        
           | crabmusket wrote:
           | No, DOP is basically just functional programming. There are
           | some overlaps (e.g. separate code from data) but they're not
           | related.
        
       | synergy20 wrote:
       | How useful and practical for DoD in ML and AI field to do
       | parallel (matrix) computing at scale? where performance is
       | crucial as well, but not as crucial as gaming(milliseconds
       | matter). DoD is a new paradigm for me.
        
       | bob1029 wrote:
       | > Is your data layout defined by a single interpretation from a
       | single point of view?
       | 
       | I think this might be the most important question at technology
       | selection and architecture time. Answering it usually requires
       | talking to the business and customers.
       | 
       | If you are certain there is exactly 1 valid "view" of the data
       | that will be used throughout, then perhaps enshrining it in code
       | makes sense. If you are even a tiny bit uncertain of this, a
       | relational-style model probably works better. SQL is the end game
       | for most businesses once they realize the game theory around this
       | one...
       | 
       | I am curious what HN thinks as major reasons for why everyone
       | seems to have moved away from 1 big SQL database. From my
       | perspective, yeah we have "web scale" edge cases that threaten
       | vertical scalability on writes, but most businesses will never
       | touch this, including members of the F100.
        
         | DeathArrow wrote:
         | >I am curious what HN thinks as major reasons for why everyone
         | seems to have moved away from 1 big SQL database
         | 
         | For the places I worked:
         | 
         | 1. We transitioned to microservices
         | 
         | 2. Performance, 1 BIG database slows that
         | 
         | 3. Ops/maintenance is very hard in a huge DB
         | 
         | 4. In a huge DB there can be a lot of junk no one uses, no one
         | remembers why is there, but no one is certain whether that junk
         | is still needed
         | 
         | 5. We had different optimization strategies for reads and
         | writes
         | 
         | 6. Teams need to have ownership on databases/data stores so we
         | can move fast instead waiting for DBAs to reply to tickets.
        
           | lisasays wrote:
           | _4. In a huge DB there can be a lot of junk no one uses, no
           | one remembers why is there, but no one is certain whether
           | that junk is still needed_
           | 
           | Of course no one knows how to even begin to come up with a
           | way of addressing that problem.
           | 
           | So the only viable option is to keep on masking it. And keep
           | propagating the junk data and zombie schemas ever forward.
        
         | hobs wrote:
         | For the same reason they went with microservices - its easier
         | to service the technical boundaries you control, and is a
         | political solution rather than a technical one.
         | 
         | Getting something based a DBA in change control was hard, but
         | shipping some IaC templates can be done in a sprint!
        
         | jwestbury wrote:
         | At a previous F100 company -- a tech company whose products are
         | widely used, we'll say -- we received guidance that RDBMS was
         | verboten except with explicit approval. This had nothing to do
         | with the best ways to model a given dataset, or achieving the
         | best performance, and everything to do with schema flexibility
         | and a history of outages caused by fucking up schema
         | migrations. These problems weren't occurring in our NoSQL
         | designs, and whatever benefits SQL databases offered didn't
         | counter the huge benefits we gained from NoSQL's lack of rigid
         | schema.
         | 
         | Of course, bad uses of key-value stores can have _massive_
         | performance impacts, and huge monetary costs when leveraging
         | cloud platforms like DynamoDB -- I 've seen a lot of cases
         | where people didn't properly structure their data for DDB, and
         | ended up performing loads of scans and sending costs through
         | the roof.
        
           | layer8 wrote:
           | Sounds like a company whose end users are mostly not their
           | customers.
        
           | littlestymaar wrote:
           | > and everything to do with schema flexibility and a history
           | of outages caused by fucking up schema migrations. These
           | problems weren't occurring in our NoSQL designs, and whatever
           | benefits SQL databases offered didn't counter the huge
           | benefits we gained from NoSQL's lack of rigid schema
           | 
           | Yikes
        
           | geophile wrote:
           | And then there are silent query failures. Want to change
           | "name": "John Smith" to "name": {"first": "John", "last":
           | "Smith"}? Easy! No schema migration!
           | 
           | But you have to modify all your queries to support both old
           | and new formats, or stop the world and change all the data
           | (after modifying all your code, including dynamically
           | generated queries).
           | 
           | And if you don't, your queries fail silently.
        
           | isoprophlex wrote:
           | I read that as "We don't want things to crash immediately
           | when the data model changes. We want things to keep chugging
           | along until the last possible moment, when we will realize
           | we've been silently corrupting everything"
        
           | bob1029 wrote:
           | > schema flexibility
           | 
           | If the business is of the notion that the schema is
           | "flexible", then it is probably time to bring all of the MBAs
           | into a conference room and have a come-to-jesus conversation
           | about the limitations of information theory and human
           | suffering.
           | 
           | At a certain point, when someone says "Widget", everyone in
           | the organization needs to be on the same page. This goes well
           | beyond any specific technology.
        
           | barrkel wrote:
           | Hybrid solutions are possible; e.g. JSONB in Postgres, where
           | you can still index and join with decent performance.
        
         | viraptor wrote:
         | > reasons for why everyone seems to have moved away from 1 big
         | SQL database
         | 
         | I'm sure there are going to be other answers for the code side
         | of things, but for ops:
         | 
         | Depends a lot on the size of the service, but in some cases: We
         | got enough data that 1 big SQL store makes ops _hard_. (Took me
         | 3 days to drop a table recently in a way that wouldn 't affect
         | the users) And splitting data became easier than before with
         | specialised backends. (A sharded 2nd layer cache of live data
         | seems way simpler to achieve than say 2 decades ago)
        
         | 0wis wrote:
         | I'll second @viraptor and @hops answers : It is the same cause
         | as the rise of microservices and DevOps adoption, easier
         | politics. I worked for a big old company, and most of the
         | problems were political and administrative. One big SQL
         | database is quite efficient until the entity that owns it does
         | not agree with the new CTO strategy and any another critical
         | part of the business. Add an incident that shows the low
         | resilience of the model and it quickly becomes a political
         | headache, while the technical solution still seems evident to
         | everyone.
        
         | robertlagrant wrote:
         | For me it's parallelisable delivery (or fragility, depending on
         | how you look at it). If a team owns its own data store, it can
         | make whatever changes it needs to and not have to worry about
         | any other part of the software being broken by those changes.
        
           | hcarvalhoalves wrote:
           | The trade-off being having a separate team trying to
           | integrate all the data back together with ETL.
        
             | robertlagrant wrote:
             | Well, if you need to integrate it. If you have some hideous
             | dashboards you might need to, it's true, but at that point
             | it's worth investing in a data person whose job it is to
             | keep up with all the breaking data warehouse integrations.
             | They'd have to anyway with any data approach.
        
         | [deleted]
        
         | ChicagoDave wrote:
         | Because over time the one big relational database turns into a
         | big ball of mud. Change becomes expensive and has a large blast
         | radius.
         | 
         | Contextual business domains should be the foundation of any
         | complex architecture. You reduce complexity and change blast
         | radius and speed up agility and feature adoption.
        
         | lovasoa wrote:
         | > why everyone seems to have moved away from 1 big SQL database
         | 
         | Hacker News is probably not representative of the whole tech
         | ecosystem. I think a majority of applications still uses one
         | big SQL database.
         | 
         | I recently released an open-source framework [1] that is
         | entirely based on Data-Oriented Design. I have received a lot
         | of comments from people for whom it was the right design.
         | Having all your data in the same place makes so many things
         | easier!
         | 
         | [1] https://sql.ophir.dev
        
           | thebruce87m wrote:
           | Web technologies are also not representative of all the whole
           | tech ecosystem. I can't fit an SQL database on my
           | microcontroller.
        
             | bob1029 wrote:
             | > I can't fit an SQL database on my microcontroller.
             | 
             | Perhaps not SQL Server or Oracle, but unless we are talking
             | about a very limited device there are likely options.
        
         | delusional wrote:
         | >I am curious what HN thinks as major reasons for why everyone
         | seems to have moved away from 1 big SQL database.
         | 
         | We haven't moved away from it, but we have run into a certain
         | class of problems that seem related to the 1 big SQL database
         | architecture. We're a really old enterprise, with a lot of non-
         | technical people creating a lot of technical solutions back in
         | the day that have become calcified and therefore have to keep
         | existing. One of the things we have is 5 levels of SQL data
         | transformations from the operational database (the one that
         | actually has applications) into different generations of
         | datamodel as the "type" of business we did changed.
         | 
         | The problem is that as we accumulate ever more layers, we keep
         | building on the layers before. The application that was built
         | 10 years ago on abstraction layer 2 now needs some data from
         | layer 4, let's create a new script that loops that data back
         | into the previous layer and keep going. Eventually we've ended
         | up with a huge amount of interdependent tables that all load
         | data from other tables/views in weird and unintuitive ways, and
         | the project to sort out the mess was deemed too expensive and
         | postponed until the 2030's.
         | 
         | I think it's understandable that people see those problems and
         | consider how we could have avoided them. Unfortunately, for
         | reasons I don't fully grasp, it seems impossible to apply
         | anything to software engineers that require discipline, and we
         | have to somehow make it impossible to create the spaghetti.
         | That's where separation comes in. If you can't read the data
         | from some other service, then it's impossible to create a
         | spaghetti mess that kills velocity for both parties.
         | 
         | The vertical separation of application becomes a software
         | solution to the people problem of poor engineering discipline
         | in enterprises.
        
       | debanjan16 wrote:
       | Even beginners can learn to program in a data oriented way from
       | the beginning.
       | 
       | Two books that teach this style of programming to beginners are:
       | 
       | 1. _How to Design Programs_ - https://htdp.org/
       | 
       | 2. _A Data-Centric Introduction to Computing_ - https://dcic-
       | world.org/
        
         | morelisp wrote:
         | Those are not this.
        
           | debanjan16 wrote:
           | Can you elaborate a bit more on the reason?
        
             | morelisp wrote:
             | The books you presented are, roughly speaking,
             | introductions to programming with a focus on data science,
             | functional programming, and common structures/ideas used in
             | those which are, in other texts, not usually considered
             | introductory material. "Data" in this sense means, like,
             | collected facts about the world and how to model them.
             | 
             | Data-oriented design is a particular way of designing your
             | programs where you focus on efficiently laying out your
             | "data" - in a different sense, meaning "whatever it is I've
             | got in storage" - within that storage - to compute with it
             | as fast as possible.
             | 
             | The industry-standard tools used for the first thing are
             | often using techniques developed in the second of the
             | second thing, but that's not relevant for the pedagogical
             | framing. The tools they are teaching (Scheme and Pyret)
             | actually make it very hard to play with low-level data
             | layout details. And the emphasis in these texts on "real
             | [as in, world] data" is in direct contradiction to the DOD
             | axiom that "data is not the problem domain... The data-
             | oriented design approach doesn't build the real-world
             | problem into the code."
             | 
             | A rule of thumb: Is anyone talking about GPUs, SIMD, or CPU
             | cache sizes? If not, you're looking at something about data
             | modeling or data science, not data orientation.
             | 
             | And this, sorry, is all super fucking obvious if you
             | actually read the intro to all three things.
        
       | kaycebasques wrote:
       | Pretty great intro paragraph. The eloquent writing and
       | interesting ideas motivate me to keep reading:
       | 
       | > Data is all we have. Data is what we need to transform in order
       | to create a user experience. Data is what we load when we open a
       | document. Data is the graphics on the screen, the pulses from the
       | buttons on your gamepad, the cause of your speakers producing
       | waves in the air, the method by which you level up and how the
       | bad guy knew where you were so as to shoot at you. Data is how
       | long the dynamite took to explode and how many rings you dropped
       | when you fell on the spikes. It is the current position and
       | velocity of every particle in the beautiful scene that ended the
       | game which was loaded off the disc and into your life via
       | transformations by machinery driven by decoded instructions
       | themselves ordered by assemblers instructed by compilers fed with
       | source-code.
        
       | cempaka wrote:
       | Andrew Kelley gave an informative and entertaining talk on how
       | DOD had inspired a lot of his work on the Zig compiler:
       | https://vimeo.com/649009599
        
       | camjw wrote:
       | Best SWE book I've ever read and I don't even work on something
       | video game adjacent any more
        
       | htk wrote:
       | Did anyone here by any chance create an epub out of this?
       | 
       | If not, any recommendations on utilities to convert several
       | linked html files into a single epub?
        
         | asimpletune wrote:
         | I think standardebook's has a command line utility that they
         | use for producing their ebooks, which you might be able to use
         | to produce an ebook from a bunch of html files. Ultimately an
         | epub is a zip of html anyways, I think.
        
           | htk wrote:
           | I'll give this a try, thank you.
        
       | deneas wrote:
       | I like Data-Oriented Design, but beware of one thing: You
       | organise your data like a database? You'll eventually be writing
       | a database management system, unless you can use a framework like
       | one of the many Entity-Component-System ones.
        
       | BonitaPersona wrote:
       | I love this book and have been very influenced by it.
       | 
       | However, it should definitely be called: Data-Oriented Design FOR
       | GAME DEVELOPMENT.
        
       | marcosdumay wrote:
       | The entire advice is context-dependent.
       | 
       | Games just happen to have a lot of operations that need column-
       | based access; but that's not true for all domains. When you go
       | and blindly push the game best practices into other domain, you
       | are just making everybody's life hard and most systems worse.
        
         | _dain_ wrote:
         | _> Games just happen to have a lot of operations that need
         | column-based access; but that's not true for all domains. _
         | 
         | This was _not at all_ obvious when ECS first came around. It
         | took a lot of time to convince people away from the OOP way.
        
         | [deleted]
        
         | 10000truths wrote:
         | It's not just column-based access. Formatting your data into a
         | struct of arrays exposes opportunites to pack your data more
         | efficiently and greatly reduce your application's memory usage.
         | Boolean struct fields can become bitsets. Nullable struct
         | fields can become sparse (or dense) maps. Pointer/reference
         | struct fields can become arrays of smaller-width integers that
         | index into a pool. And so on. When everything runs on CPUs that
         | frequently stall on memory accesses, the impact of these sorts
         | of changes cannot be understated - the latency difference
         | between L3 cache and RAM can be on the order of ~10x.
        
         | jackmott42 wrote:
         | The advice of keeping the data you access frequently contiguous
         | in memory applies to everything on modern hardware. If there is
         | a program where performance is an issue at all, probably this
         | will be one way to make sure performance is good.
        
           | marcosdumay wrote:
           | Well, if you want to generalize, it's about keeping data with
           | correlated accesses close to each other and aligned inside
           | memory pages, and failing that, yes to keep it at least
           | contiguous. It's not exactly about access frequency, except
           | that you want to optimize the things you access more.
           | 
           | Yes, that's a generic advice for high performance
           | applications that is at least generic enough to apply on
           | anything that is close to a normal computer. You will still
           | need further details if you are talking about things like HPC
           | (ironically) or mainframes, but it's general enough to say
           | people should do it without qualifications.
        
         | feoren wrote:
         | But it _is_ true, much much more often than you probably
         | realize. Just look at your tables and think about how
         | repetitive they are. The reason you can 't come up with a lot
         | of "column-based" (as you put it, which is still narrow-minded
         | IMO) operations is because you've never looked for them before.
         | Of course you haven't: you've been stuck in the traditional
         | mode where such things are basically impossible.
         | 
         | Do most of your tables have Name / Description type fields?
         | Here's some "column-based operations": Allow translation of
         | everything in your database. Generate natural-sounding text in
         | a report, inserting these names and descriptions (from multiple
         | different tables, of course). Free-text search of all your
         | important database concepts. Detect similar names to the one
         | the user is wanting to add, to prevent duplication. Clean
         | whitespace. That's five off the top of my head.
         | 
         | Do most of your tables have Archived / Status / Soft-delete
         | type fields? Allow a user to archive a record. Choose whether
         | to include archived records in a query or not. Delete archived
         | records after X days.
         | 
         | Do most of your tables have Comments fields? Allow multiple
         | comments. Track who made a comment and when. Track responses to
         | comments.
         | 
         | Do most of your tables track who last modified the record?
         | Track _all_ modifications. Show a list of recent modifications
         | to any records.
         | 
         | The list goes on and on and on. You call these "column-based
         | operations", which again, is short-sighted. They're more like
         | "concern-based operations". And it turns out _everything_ is a
         | cross-cutting concern. You 're shooting this idea down without
         | nearly understanding it.
        
           | feoren wrote:
           | For those who disagree, please give an example of a domain
           | that is devoid of important "concern-based" operations.
        
           | feoren wrote:
           | [flagged]
        
             | AdieuToLogic wrote:
             | > Imagine being in such a state of mind that you read the
             | above comment and downvote it!                 Please don't
             | comment about the voting on comments. It never       does
             | any good, and it makes boring reading.
             | 
             | source: https://news.ycombinator.com/newsguidelines.html
        
               | feoren wrote:
               | I don't see that as disallowing "for those who disagree
               | with me: what do you disagree with, exactly?" Isn't that
               | a reasonable question?
               | 
               | Besides, isn't downvoting someone for nonsensical, petty
               | reasons (and leaving no response as to why, of course) at
               | least as harmful to the overall discussion as referring
               | to those downvotes?
        
               | loup-vaillant wrote:
               | > _" for those who disagree with me: what do you disagree
               | with, exactly?" Isn't that a reasonable question?_
               | 
               | It's a hopeless one. People who downvote generally think
               | you're not worth actually answering. Also note that
               | heated language often gets automatically downvoted. And
               | with few exceptions ("eating babies is bad"), one sided
               | opinions tend to be less popular than anything that
               | appears "balanced".
               | 
               | It's especially tough if your one sided opinion is
               | attacking a popular practice.
        
         | flohofwoe wrote:
         | > Games just happen to have a lot of operations that need
         | column-based access.
         | 
         | And that's not even true for many code areas in typical games,
         | only where there's at least a few thousands 'things' to process
         | (e.g. particle systems or navigation/collision systems).
         | 
         | DOD makes a lot of sense within specific subsystems, but not
         | necessarily in high level gameplay code (outside specific
         | genres at least).
        
         | paulddraper wrote:
         | Databases are another common cause.
        
         | forrestthewoods wrote:
         | Hrmmm. Not sure this line of thinking makes sense.
         | 
         | Game data access patterns are quite brutal. OOP for games
         | results in extremely inefficient cache use, lots of random
         | access, and lots of pointer chasing.
         | 
         | ECS isn't a "natural" fit for games. It's quite difficult and
         | ECS systems are still far from a solved problem.
         | 
         | The two most popular game engines, Unreal and Unity, are
         | decisively non-ECS for almost everything they do.
         | 
         | In any case, I think the underlying principles of DOD apply to
         | all programs. Specific solutions vary, as always.
        
           | marcosdumay wrote:
           | > In any case, I think the underlying principles of DOD apply
           | to all programs.
           | 
           | Yeah, I do agree with that.
        
         | GartzenDeHaes wrote:
         | It's for high-performance computing with current CPU designs
         | that are dependent on data locality for performance.
         | 
         | I agree that it's a harmful design for business data.
         | Programmers want to push their runtime data model into the
         | database and they have no interest in the operational,
         | maintenance, and performance problems this causes. When someone
         | suggests this kind of thing, I'll ask them "how do we diagnose
         | performance problems with this technology when there are
         | 100,000 concurrent users and millions of data elements?" The
         | rows-and-columns people can answer this question.
        
           | feoren wrote:
           | > When someone suggests this kind of thing, I'll ask them
           | "how do we diagnose performance problems with this technology
           | when there are 100,000 concurrent users and millions of data
           | elements?"
           | 
           | I don't understand; the exact same performance diagnostics
           | work in both cases. Why is this different? There's nothing
           | intrinsically less performant about this approach. You really
           | think your checkerboard tables and long lists of columns with
           | names like "VALUE12" and "VALUE13" and multiple different
           | kinds of key/value pairs you jammed in there for different
           | clients -- you think _those_ are better performance!?
           | 
           | > 100,000 concurrent users
           | 
           | Do you _actually_ have 100,000 concurrent users? _Really_?
           | You don 't, do you? You just kinda hope you will eventually.
           | And again: this approach is not worse for that.
           | 
           | > millions of data elements
           | 
           | This is absolute peanuts for any modern database system. It's
           | weird that this is your extreme example.
        
         | torginus wrote:
         | I feel like this is increasingly the only way to write high-
         | performance code.
         | 
         | With newer hardware, the only thing that's expected to scale is
         | logic density - SRAM (and cache sizes) have stopped scaling
         | with the latest lithographies - and RAM bandwidth hasn't really
         | been scaling for quite a while (I'd think it's even possible
         | that per-core bandwidth has been _decreasing_ ) - memory access
         | has been the bottleneck for a while.
        
       | eska wrote:
       | Looked at it for 10 seconds and immediately found random 0s at
       | the end of the document or unreplaced text like "Noel Llopis in
       | his September 2009 article[#!NoelDOD!#]"
        
         | jerrygenser wrote:
         | The intro describes that it's a free version of a book that
         | contains most of the content except for a few sections.
         | 
         | The book was converted using an automated format from it's
         | native design into html so a large portion could be made
         | available for free and this automated conversion process might
         | have some minor issues -- like the one you described.
        
       | dkarl wrote:
       | > [Abstraction heavy paradigms] structure the code around the
       | description of the problem domain
       | 
       | My experience of Domain-Driven Design has been that it is
       | extremely effective for driving conversations about the domain
       | throughout the product life-cycle, but it produces frustrating
       | codebases that are poorly attuned to the world outside the
       | running process. Domain-Driven Design codebases want to be self-
       | contained universes and treat external systems as details. OO
       | design paradigms in general seem to have little respect for
       | messages exchanged with external systems.
       | 
       | This wasn't true back when JavaBeans and object databases other
       | distributed object systems were expected to take over the world,
       | but the failure of those technologies shrank the OO world from
       | distributed systems to isolated programs. These days, the
       | messages exchanged with external systems are just data, without
       | behavior. The marriage of data and behavior can only exist inside
       | a single process. So object-oriented design turned inward and
       | concentrated on the creation of rich inner worlds.
       | 
       | I think this is backwards, or at least incomplete. As Rich Hickey
       | says, effective programs need to be _situated in the real world._
       | They are not ethereal abstract models. They have concrete
       | functions, inputs and outputs. Having rich internal abstractions
       | that mimic some aspects of reality is a means to an end, entirely
       | subordinate to the purpose of executing interactions with other
       | computing systems. Data-driven design embraces this reality. By
       | treating data as essential, and behavior as something to be added
       | when it is needed, it allows inputs and outputs to be first-class
       | citizens.
       | 
       | > The data-oriented design approach doesn't build the real-world
       | problem into the code. This could be seen as a failing of the
       | data-oriented approach by veteran object-oriented developers, as
       | examples of the success of object-oriented design come from being
       | able to bring the human concepts to the machine
       | 
       | I think this perfectly sums up the confusion that OO modeling
       | creates. You use code to write programs, or services, or cloud
       | functions, things like that. The real-world problem that a
       | program, service, or cloud function solves is interacting in a
       | certain way with other programs, services, cloud functions.
       | 
       | The real-world problems that OO modeling paradigms want you to
       | focus on are the domain problems. This is vitally important for
       | design products, systems of programs that solve real human
       | problems. If you are designing a system for managing medical
       | records in a hospital, you need to model doctors, nurses,
       | patients, labs, radiology images, patient stays, all those real-
       | world things. However, when you are designing a piece of software
       | to do one thing within that medical records system, the "real-
       | world problem" your code is solving is limited to its role within
       | the larger system.
       | 
       | Data-oriented design is a natural mental fit for writing programs
       | or services that play a limited part in larger systems, which is
       | always what you are doing when you are writing code. Object-
       | oriented design wants to take on the complexity of the whole real
       | world, which is the right perspective for product design and
       | architecture, not for writing code.
        
         | rapnie wrote:
         | > My experience of Domain-Driven Design has been that it is
         | extremely effective for driving conversations about the domain
         | throughout the product life-cycle, but it produces frustrating
         | codebases that are poorly attuned to the world outside the
         | running process.
         | 
         | The DDD blue book by Eric Evans consists of two parts:
         | Strategic Design and Tactical Patters. Is it accurate to
         | summarize your comment as that the strategic design works very
         | well, but that most of the information out there leads one to
         | then learn about the tactical patterns in a particular OOP
         | contexts, which isn't a universal approach, and in many cases
         | shouldn't be the way to elaborate findings from Strategic
         | Design?
         | 
         | Note that searching the web for "DDD" yields mostly OOP-related
         | tactical patterns guidance, and this is why I think many people
         | are so sceptical about DDD. It is the strategic parts where
         | most of the (low-hanging fruit) value is.
         | 
         | Other non-OOP 'tactical' guidance, such as functional /
         | functional-reactive / actor-driven, etc. DDD is harder to come
         | by.
        
           | dkarl wrote:
           | In my copy of the blue book, "Strategic Design" is Part IV,
           | and there is no "Tactical Patterns" part or chapter. To be
           | honest, I've only quickly read through Part IV, cherry-
           | picking a couple of concepts, because it concentrates on
           | techniques for dealing with very large domain models and/or
           | very large organizations.
           | 
           | I think the good and bad are interleaved together throughout
           | the book. Part I Chapter 2, "Communication and the Use of
           | Language," is the one chapter I wish everybody I work with
           | would read and digest. I think establishing a consistent
           | language about the domain that is shared across functional
           | groups is critical, so the domain terminology used in source
           | code and comments is consistent with the terminology
           | engineers use when talking with product and customer support.
           | Part I Chapter 3 contains the clearest statement of (what I
           | think is) the core error: "Tightly relating the code to an
           | underlying model [which context makes clear is the domain
           | model] gives the code meaning and makes the model relevant."
           | The rest of the book is like that for me, alternating between
           | vigorously nodding my head and yelling "WHY WOULD YOU TELL
           | PEOPLE THAT," sometimes within the same chapter.
           | 
           | I suspect overall there's a communication issue with the
           | book, where he focuses entirely on the themes of DDD and only
           | occasionally gives lip service to other aspects of design.
           | For example, when he says that module names and structures
           | should reflect domain concepts, I think, "Yeah, that's really
           | nice to the extent you can accomplish that, but you also want
           | your module names and structures to reflect the logical
           | structure of your program, so somebody can look at it and see
           | how it works." You have these two aspects that should ideally
           | both be legible in the code, but the DDD book doesn't show
           | much respect for competing design priorities. In fact, it
           | often warns against the dangers of being led astray by other
           | design perspectives, such as architectural ones. Like so many
           | other OO methodologies, the overwhelming thrust of the book
           | is that if you attend to what it's teaching you, everything
           | else will take care of itself.
           | 
           | A priest once told me, a good priest will tell you when you
           | need a lawyer, a good lawyer will tell you when you need a
           | therapist, a good therapist will tell you when you need a
           | doctor, and a good doctor will tell you when you need a
           | priest. Be careful with an expert who always tells you that
           | their expert perspective is what you need. The DDD book is
           | definitely that kind of expert you have to be careful with.
        
       | Acumen321 wrote:
       | Mike Acton's talk "Data-Oriented Design and C++" from CppCon 2014
       | is the best programming talk ever given in my opinion. A must
       | watch:
       | 
       | https://youtu.be/rX0ItVEVjHc
        
         | wrapperup wrote:
         | It's fantastic, and also my favorite. And for those who might
         | not know, he was the one who really mainstreamed Data-oriented
         | design and ECS architecture in my eyes.
         | 
         | He previously was also leading the charge on Unity DOTS, though
         | unfortunately it seems Unity is having a tailspin at the
         | moment. The work on DOTS is solid, if incomplete.
        
       | dosshell wrote:
       | One key concept when I use DoD is to not abstract away the data.
       | Less is more.
       | 
       | But when quickly reading the intro text, I found it doing the
       | opposite, it talks too much and abstracts away the key concepts.
       | Only me who found it a bit ironic of not drinking the wine?
        
       | inopinatus wrote:
       | Some of the best advice I ever got for writing composable, high-
       | performance code was "work on structs of arrays, not arrays of
       | structs". I hear many echoes of that advice in this text. Turns
       | out that entity-component architectures work well in line-of-
       | business applications too, not just games.
       | 
       | Alas, many developers in enterprise are rusted onto a record-
       | keeping CRUD model and struggle to think in columns rather than
       | rows. The idea of inserting an entity id into a "published"
       | table, instead of setting a boolean "published" field to true,
       | doesn't always come naturally. Yet once you realise how readily
       | polymorphic this is, you may start wanting to use such approaches
       | to data for everything. Rich new opportunities then arise from
       | cross-pollinating component data. Some may question why it is
       | structurally permissible that, say, a network interface can have
       | a birthday, or why an invoice has an IPv6 address, why my cat is
       | in the DHCP pool, whilst limegreen is deleted and $5 on Tuesdays.
       | This of course is half the fun.
       | 
       | I don't accept that it's wholly incompatible with OO, though, a
       | thesis you'll see dotted around the place. I've even taken this
       | approach with Ruby using Active Record for persistence; not
       | normally a domain where the words "high performance" are bandied
       | about. That worked particularly because Ruby's object system,
       | being more Smalltalk-ish than C++/Java-ish, strongly favours
       | composition over inheritance.
        
         | incrudible wrote:
         | > you may start wanting to use such approaches to data for
         | everything
         | 
         | Please, dont. Arrays of structs is more natural for a reason,
         | SOA is not universally faster and it has some of its own
         | performance problems. If you _really_ care about performance,
         | AOSOA may well be your best bet. Again, please dont use it for
         | everything.
         | 
         | > a dhcp pool full of cats is half the fun
         | 
         | Replacing the weird code that the last guy wrote for his own
         | amusement is not fun.
        
         | Capricorn2481 wrote:
         | > Alas, many developers in enterprise are rusted onto a record-
         | keeping CRUD model and struggle to think in columns rather than
         | rows. The idea of inserting an entity id into a "published"
         | table, instead of setting a boolean "published" field to true,
         | doesn't always come naturally. Yet once you realise how readily
         | polymorphic this is, you may start wanting to use such
         | approaches to data for everything
         | 
         | I don't really understand what's Polymorphic about this, or
         | even beneficial. It seems like everytime I've had a boolean
         | column in a long-standing application, it eventually needed to
         | turn into something else
        
         | ryukoposting wrote:
         | Very insightful. I never thought about it like this, but it's
         | common practice in embedded systems to approach problems in
         | this way. One module may provide a statically-sized pool of
         | like objects, and other modules extend behavior in relation to
         | those objects by carrying buffers of pointers and/or indexes
         | into that pool. It might feel inefficient due to the storage of
         | so many pointers/indexes, but you're optimizing the size of the
         | largest thing: the object pool itself.
        
         | hu3 wrote:
         | > The idea of inserting an entity id into a "published" table,
         | instead of setting a boolean "published" field to true, doesn't
         | always come naturally.
         | 
         | It doesn't come naturally because now you need a JOIN in your
         | SQL just to fetch what was before a column. Or two queries
         | instead of one.
         | 
         | Not to mention having closely related data spread in different
         | tables increases cognitive load.
         | 
         | You just added a layer of indirection for what gain precisely?
        
           | xupybd wrote:
           | I think they're arguing that you get a performance gain.
           | Personally the systems if deal with wouldn't gain from such
           | and optimisation as there is too little data. So I'll stick
           | to the simplest approach.
        
         | gnuvince wrote:
         | > I don't accept that it's wholly incompatible with OO
         | 
         | It's not incompatible with the mechanics of OO, but it does
         | require that programmers change how they approach problems. For
         | instance, a common way to write code in an OO language is to
         | focus solely on the thing you want to think about (a user, a
         | blog post, a money transaction, what have you) and to implement
         | it in isolation of everything else, to hide all of its data,
         | and then to think about what methods need to be exposed to be
         | useful to other parts of the system. The idea of encapsulation
         | is quite strong.
         | 
         | In DOD, it is more common for data related to different domains
         | to be accessible and let the subsystems pick and choose what
         | they need to do their work. Nothing about Java or Ruby would
         | prevent this, but programmers definitely have mental barriers.
        
         | whstl wrote:
         | _" I don't accept that it's wholly incompatible with OO,
         | though, a thesis you'll see dotted around the place."_
         | 
         | Agreed. Arrays of Structs is not the part that is really
         | incompatible.
         | 
         | The part that clashes a bit with _traditional_ OOP is ECS,
         | where data and code are meant to be kept separate. But of
         | course I 'm talking about _very traditional OOP_. It is
         | entirely possible to use an OOP languages + classes to
         | implement ECS. It 's just not gonna be "traditional".
         | 
         | EDIT: To quote your other reply: "records don't have to
         | identify strongly with a class hierarchy".
        
           | loup-vaillant wrote:
           | Not sure what you all mean by OOP to be honest. Is this a
           | case of OOP actually meaning "good programming", then
           | morphing into the most fashionable (and hopefully best)
           | practices of the day?
           | 
           | It wouldn't be the first time... https://loup-
           | vaillant.fr/articles/deaths-of-oop
        
         | m_mueller wrote:
         | > I don't accept that it's wholly incompatible with OO
         | 
         | I don't think it is, not even in Java. If you use its primitive
         | array and value types, you can easily wrap this in a data
         | wrapper class with functional interfaces, which then can be
         | used quite elegantly in the rest of your code in an OO way -
         | you just don't box all the records into objects.
        
           | inopinatus wrote:
           | Agreed. Honestly I think the hardest part here is programmer
           | mindset - when you suggest that identity is fluid and records
           | don't have to correspond to a class hierarchy some folks get
           | a panicked look going on, like you just claimed to eat their
           | pets
        
             | m_mueller wrote:
             | Yep. Funny story, this was actually a coding interview
             | homework for a question with order of millions of data rows
             | from a file. The exercise required 10s timing requirement,
             | so I decided to do it data centric from start. Interviewers
             | found my style ,non idiomatic', but probably never thought
             | about why my version was done in fractions of a second.
        
         | andsoitis wrote:
         | > high-performance code was "work on structs of arrays, not
         | arrays of structs"
         | 
         | Wikipedia's article "Array of Structure (AoS) and Structure of
         | Arrays (SoA)" explains the trade off between performance (SoA)
         | and intuitiveness/lang support (AoS):
         | https://en.wikipedia.org/wiki/AoS_and_SoA
         | 
         | They also get into software support for SoA, including data
         | frames as implemented by R, Python's Panda's package, and
         | Julia's DataFrames.jl package which allows you to access SoA
         | like AoS.
        
         | DeathArrow wrote:
         | As I see it, there are two types of DOD, one that you've
         | mentioned and where you "work on structs of arrays, not arrays
         | of structs".
         | 
         | The other one means just giving up of encapsulation, separate
         | the data from the methods that work with the data, think the
         | whole app in terms on how the data flows through it and model
         | it so everything is easy to understand and change. For added
         | correctness, you can use immutable data structures and pure
         | functions.
        
         | AndrewKemendo wrote:
         | >The idea of inserting an entity id into a "published" table,
         | instead of setting a boolean "published" field to true, doesn't
         | always come naturally. Yet once you realise how readily
         | polymorphic this is, you may start wanting to use such
         | approaches to data for everything.
         | 
         | Wow thank you for this, it's a heuristic I didn't think about
         | but is really powerful.
         | 
         | I need to think through some of the implications, cause I think
         | there are some risks to this approach - namely co-mingling
         | production and non-production data in the same infra. That
         | means that there are data at rest and in transport that are
         | following the same data pathways but have different production
         | criticality. It puts a lot of risk on the filter working
         | perfectly, rather than not even being available on the same
         | infra.
         | 
         | Just spitballing:
         | 
         | Is the ostensible "prod_live_bool" flag manually set? If not
         | then a bug in any automation would certainly cause a nasty data
         | exposure issue
         | 
         | Are you doing column level security tokens? Wouldn't that need
         | some kind of intra-table RBAC? In other words, it seems like if
         | you want any RBAC within your table, then you have to bring it
         | with you every time from the beginning because you have no idea
         | what level of data sensitivity you'll have eventually, and
         | refactoring an increasingly expanding table to inherit RBAC
         | later ---- omg I can't even think of the amount of work that
         | would be. O(n^2) level of manual work??
         | 
         | Does it lead to a canonical "live or not" lookup table/dict at
         | scale?
         | 
         | I think the data security risks would prevent me from using
         | this design pattern for critical applications in MOST cases,
         | but I do love some of the patterns here and will be exploring
         | this in the future for sure!
        
           | foobarbaz33 wrote:
           | I don't think you're talking about the same thing as the
           | parent comment.
        
           | Twisol wrote:
           | I interpreted `published` along the lines of blog posts, in
           | which an article can be either a draft (visible only to the
           | author) or published (visible to the world). This seems
           | different from dev/prod venueing, where having separate
           | databases altogether makes sense.
           | 
           | I understood the column-first approach more as an alternative
           | to putting all columns for an entity in one table, especially
           | when rows often don't populate every column. From that
           | perspective, what's being described is a strong separation of
           | concerns; applying this to dev/prod would be a weakening of
           | this separation, and so probably not what is desired.
        
         | rawgabbit wrote:
         | In the data side of the world, "structs of arrays" translate to
         | column based indexes i.e. Snowflake and OLAP. "Arrays of
         | structs" translate to relational databases with its page/row
         | based indexes.
         | 
         | FWIW, I am a big fan of Snowflake and think it will eat
         | everyone else's lunch. I also find it amusing that Snowflake
         | "supports" foreign keys but don't enforce it. In other words,
         | Snowflake is as "nosql" as I care to go.
        
           | nerdponx wrote:
           | Or an in-memory "data frame" like in R, Python's Pandas, and
           | Polars.
        
         | Ovid wrote:
         | I've tried to push entity-component-systems (ECS) for non-game
         | applications. A financial company in London took that advice to
         | manage the complexity of their system since it was such a good
         | fit.
         | 
         | For those who are curious, here's a very brief introduction to
         | ECS: https://dev.to/ovid/the-unknown-design-pattern-1l64
        
       | oaiey wrote:
       | Data Oriented Design is more beginner friendly. Because it does
       | not deal with people and business but only with purity of data
       | modelling.
       | 
       | When I was young, my first step in a new project was painting the
       | Entity Relationship Model. That gave me the foundation for
       | everything else.
       | 
       | Nowadays, I try to understand the problems and the domain, work
       | on capabilities and how to group/box them before I start doing
       | data models.
        
         | chrisgd wrote:
         | I like the content of your comment. Everyone who has experience
         | recognizes that our love of data and programming often gets
         | sideswiped by business needs. I think though this article tries
         | to say that if we focus on gathering data needs from the
         | beginning, it might make the business needs conversation moot
        
       | lincpa wrote:
       | [dead]
        
       ___________________________________________________________________
       (page generated 2023-07-03 23:00 UTC)