[HN Gopher] Data-Oriented Design (2018)
___________________________________________________________________
Data-Oriented Design (2018)
Author : DeathArrow
Score : 263 points
Date : 2023-07-03 10:41 UTC (12 hours ago)
(HTM) web link (www.dataorienteddesign.com)
(TXT) w3m dump (www.dataorienteddesign.com)
| Ciantic wrote:
| Everytime I hear about Data-Driven/Oriented Design I remember a
| paper from OOP course I had to read in University, it used Data-
| Driven Design as an example how to not to do things.
|
| The paper in question is this from 1989:
| https://dl.acm.org/doi/pdf/10.1145/74877.74885
|
| It highlights that:
|
| "Even though the goal of data-driven design is to encapsulate
| data and algorithms, it inherently violates that encapsulation by
| making the structure of an object part of the definition of the
| object. This in turn leads to the definition of operations that
| reflect that structure (because they were designed with the
| structure in mind). Attempts to change the structure of an object
| transparently are destined to fail because other classes rely on
| that structure. This is the antithesis of encapsulation."
|
| Then goes to show that Responsibility-Driven Design has better
| approach.
|
| What we mean with Data-Driven Design have come a long way though,
| and isn't comparable to those days.
|
| I find it a bit amusing that Data-Driven Design used to be an
| insult of sorts, like you didn't know how to do things OOP the
| right way.
| dosshell wrote:
| You do realize that DoD has nothing to do with data-driven,
| right?
|
| Data oriented design is about structure the data in regard of
| how it is processed and the hardware. This is opposite of
| object oriented design which models the data around your mental
| model.
|
| For example in DoD you could design a map with keys in one
| array together and one array of values. This would make it much
| faster to iterate over keys while searching because of cache
| memory.
|
| While in object oriented you would store an array of pairs.
|
| Both approaches can use data-driven.
|
| Edit: would -> could
| morelisp wrote:
| There are also multiple definitions of "data-driven" floating
| around depending on qualifier/context. An OLAP RDBMS, for
| example, will certainly be written in a data-oriented way,
| and also have almost fully data-driven behavior, but will
| certainly not have data-driven (in the sense of that paper)
| design.
| drainyard wrote:
| While I agree with your point, the map example is not
| entirely well represented in your comment. The idea is not to
| just store keys and values in separate arrays, the idea is to
| look at your use case and model your data after the
| transformation you need. So if you have a case where storing
| keys and values as pairs because of your access pattern, then
| do that, if you have a case where you do a lot of searches
| through keys, then store them in separate arrays.
|
| The point of DoD is to look at the data you have and the data
| you need it transformed into and then structure your data
| after.
| dgb23 wrote:
| The example given as "data driven design" in this paper neither
| resembles modern data driven programming nor data oriented
| design.
|
| I don't quite understand why the authors thought of this as a
| good example of "data driven design".
|
| I know that functions in a data driven program tend to be very
| generic so the conclusion you cite is very much a mismatch from
| my experience.
| gjadi wrote:
| Found an online review of the book:
| https://gist.github.com/seece/25ed1b2108cf5782718b026382f2c5...
| victor106 wrote:
| Thanks for the link.
|
| Found this interesting and against common advice.
|
| "The bane of many projects, and the cause of their lateness,
| has been the insistence on not doing optimisation prematurely.
| The reason optimisation at late stages is so difficult is that
| many pieces of software are built up with instances of objects
| everywhere, even when not needed."
|
| There are definitely applications where performance is of
| primary concern (maybe only a few) and others where they are
| not. In apps where it is, this gives me thought that maybe
| premature optimization is okay? Am I reading that right?
|
| There's also this called Data-Oriented Programming
| https://www.manning.com/books/data-oriented-programming.
|
| Are both these concepts the same?
| crabmusket wrote:
| No, DOP is basically just functional programming. There are
| some overlaps (e.g. separate code from data) but they're not
| related.
| synergy20 wrote:
| How useful and practical for DoD in ML and AI field to do
| parallel (matrix) computing at scale? where performance is
| crucial as well, but not as crucial as gaming(milliseconds
| matter). DoD is a new paradigm for me.
| bob1029 wrote:
| > Is your data layout defined by a single interpretation from a
| single point of view?
|
| I think this might be the most important question at technology
| selection and architecture time. Answering it usually requires
| talking to the business and customers.
|
| If you are certain there is exactly 1 valid "view" of the data
| that will be used throughout, then perhaps enshrining it in code
| makes sense. If you are even a tiny bit uncertain of this, a
| relational-style model probably works better. SQL is the end game
| for most businesses once they realize the game theory around this
| one...
|
| I am curious what HN thinks as major reasons for why everyone
| seems to have moved away from 1 big SQL database. From my
| perspective, yeah we have "web scale" edge cases that threaten
| vertical scalability on writes, but most businesses will never
| touch this, including members of the F100.
| DeathArrow wrote:
| >I am curious what HN thinks as major reasons for why everyone
| seems to have moved away from 1 big SQL database
|
| For the places I worked:
|
| 1. We transitioned to microservices
|
| 2. Performance, 1 BIG database slows that
|
| 3. Ops/maintenance is very hard in a huge DB
|
| 4. In a huge DB there can be a lot of junk no one uses, no one
| remembers why is there, but no one is certain whether that junk
| is still needed
|
| 5. We had different optimization strategies for reads and
| writes
|
| 6. Teams need to have ownership on databases/data stores so we
| can move fast instead waiting for DBAs to reply to tickets.
| lisasays wrote:
| _4. In a huge DB there can be a lot of junk no one uses, no
| one remembers why is there, but no one is certain whether
| that junk is still needed_
|
| Of course no one knows how to even begin to come up with a
| way of addressing that problem.
|
| So the only viable option is to keep on masking it. And keep
| propagating the junk data and zombie schemas ever forward.
| hobs wrote:
| For the same reason they went with microservices - its easier
| to service the technical boundaries you control, and is a
| political solution rather than a technical one.
|
| Getting something based a DBA in change control was hard, but
| shipping some IaC templates can be done in a sprint!
| jwestbury wrote:
| At a previous F100 company -- a tech company whose products are
| widely used, we'll say -- we received guidance that RDBMS was
| verboten except with explicit approval. This had nothing to do
| with the best ways to model a given dataset, or achieving the
| best performance, and everything to do with schema flexibility
| and a history of outages caused by fucking up schema
| migrations. These problems weren't occurring in our NoSQL
| designs, and whatever benefits SQL databases offered didn't
| counter the huge benefits we gained from NoSQL's lack of rigid
| schema.
|
| Of course, bad uses of key-value stores can have _massive_
| performance impacts, and huge monetary costs when leveraging
| cloud platforms like DynamoDB -- I 've seen a lot of cases
| where people didn't properly structure their data for DDB, and
| ended up performing loads of scans and sending costs through
| the roof.
| layer8 wrote:
| Sounds like a company whose end users are mostly not their
| customers.
| littlestymaar wrote:
| > and everything to do with schema flexibility and a history
| of outages caused by fucking up schema migrations. These
| problems weren't occurring in our NoSQL designs, and whatever
| benefits SQL databases offered didn't counter the huge
| benefits we gained from NoSQL's lack of rigid schema
|
| Yikes
| geophile wrote:
| And then there are silent query failures. Want to change
| "name": "John Smith" to "name": {"first": "John", "last":
| "Smith"}? Easy! No schema migration!
|
| But you have to modify all your queries to support both old
| and new formats, or stop the world and change all the data
| (after modifying all your code, including dynamically
| generated queries).
|
| And if you don't, your queries fail silently.
| isoprophlex wrote:
| I read that as "We don't want things to crash immediately
| when the data model changes. We want things to keep chugging
| along until the last possible moment, when we will realize
| we've been silently corrupting everything"
| bob1029 wrote:
| > schema flexibility
|
| If the business is of the notion that the schema is
| "flexible", then it is probably time to bring all of the MBAs
| into a conference room and have a come-to-jesus conversation
| about the limitations of information theory and human
| suffering.
|
| At a certain point, when someone says "Widget", everyone in
| the organization needs to be on the same page. This goes well
| beyond any specific technology.
| barrkel wrote:
| Hybrid solutions are possible; e.g. JSONB in Postgres, where
| you can still index and join with decent performance.
| viraptor wrote:
| > reasons for why everyone seems to have moved away from 1 big
| SQL database
|
| I'm sure there are going to be other answers for the code side
| of things, but for ops:
|
| Depends a lot on the size of the service, but in some cases: We
| got enough data that 1 big SQL store makes ops _hard_. (Took me
| 3 days to drop a table recently in a way that wouldn 't affect
| the users) And splitting data became easier than before with
| specialised backends. (A sharded 2nd layer cache of live data
| seems way simpler to achieve than say 2 decades ago)
| 0wis wrote:
| I'll second @viraptor and @hops answers : It is the same cause
| as the rise of microservices and DevOps adoption, easier
| politics. I worked for a big old company, and most of the
| problems were political and administrative. One big SQL
| database is quite efficient until the entity that owns it does
| not agree with the new CTO strategy and any another critical
| part of the business. Add an incident that shows the low
| resilience of the model and it quickly becomes a political
| headache, while the technical solution still seems evident to
| everyone.
| robertlagrant wrote:
| For me it's parallelisable delivery (or fragility, depending on
| how you look at it). If a team owns its own data store, it can
| make whatever changes it needs to and not have to worry about
| any other part of the software being broken by those changes.
| hcarvalhoalves wrote:
| The trade-off being having a separate team trying to
| integrate all the data back together with ETL.
| robertlagrant wrote:
| Well, if you need to integrate it. If you have some hideous
| dashboards you might need to, it's true, but at that point
| it's worth investing in a data person whose job it is to
| keep up with all the breaking data warehouse integrations.
| They'd have to anyway with any data approach.
| [deleted]
| ChicagoDave wrote:
| Because over time the one big relational database turns into a
| big ball of mud. Change becomes expensive and has a large blast
| radius.
|
| Contextual business domains should be the foundation of any
| complex architecture. You reduce complexity and change blast
| radius and speed up agility and feature adoption.
| lovasoa wrote:
| > why everyone seems to have moved away from 1 big SQL database
|
| Hacker News is probably not representative of the whole tech
| ecosystem. I think a majority of applications still uses one
| big SQL database.
|
| I recently released an open-source framework [1] that is
| entirely based on Data-Oriented Design. I have received a lot
| of comments from people for whom it was the right design.
| Having all your data in the same place makes so many things
| easier!
|
| [1] https://sql.ophir.dev
| thebruce87m wrote:
| Web technologies are also not representative of all the whole
| tech ecosystem. I can't fit an SQL database on my
| microcontroller.
| bob1029 wrote:
| > I can't fit an SQL database on my microcontroller.
|
| Perhaps not SQL Server or Oracle, but unless we are talking
| about a very limited device there are likely options.
| delusional wrote:
| >I am curious what HN thinks as major reasons for why everyone
| seems to have moved away from 1 big SQL database.
|
| We haven't moved away from it, but we have run into a certain
| class of problems that seem related to the 1 big SQL database
| architecture. We're a really old enterprise, with a lot of non-
| technical people creating a lot of technical solutions back in
| the day that have become calcified and therefore have to keep
| existing. One of the things we have is 5 levels of SQL data
| transformations from the operational database (the one that
| actually has applications) into different generations of
| datamodel as the "type" of business we did changed.
|
| The problem is that as we accumulate ever more layers, we keep
| building on the layers before. The application that was built
| 10 years ago on abstraction layer 2 now needs some data from
| layer 4, let's create a new script that loops that data back
| into the previous layer and keep going. Eventually we've ended
| up with a huge amount of interdependent tables that all load
| data from other tables/views in weird and unintuitive ways, and
| the project to sort out the mess was deemed too expensive and
| postponed until the 2030's.
|
| I think it's understandable that people see those problems and
| consider how we could have avoided them. Unfortunately, for
| reasons I don't fully grasp, it seems impossible to apply
| anything to software engineers that require discipline, and we
| have to somehow make it impossible to create the spaghetti.
| That's where separation comes in. If you can't read the data
| from some other service, then it's impossible to create a
| spaghetti mess that kills velocity for both parties.
|
| The vertical separation of application becomes a software
| solution to the people problem of poor engineering discipline
| in enterprises.
| debanjan16 wrote:
| Even beginners can learn to program in a data oriented way from
| the beginning.
|
| Two books that teach this style of programming to beginners are:
|
| 1. _How to Design Programs_ - https://htdp.org/
|
| 2. _A Data-Centric Introduction to Computing_ - https://dcic-
| world.org/
| morelisp wrote:
| Those are not this.
| debanjan16 wrote:
| Can you elaborate a bit more on the reason?
| morelisp wrote:
| The books you presented are, roughly speaking,
| introductions to programming with a focus on data science,
| functional programming, and common structures/ideas used in
| those which are, in other texts, not usually considered
| introductory material. "Data" in this sense means, like,
| collected facts about the world and how to model them.
|
| Data-oriented design is a particular way of designing your
| programs where you focus on efficiently laying out your
| "data" - in a different sense, meaning "whatever it is I've
| got in storage" - within that storage - to compute with it
| as fast as possible.
|
| The industry-standard tools used for the first thing are
| often using techniques developed in the second of the
| second thing, but that's not relevant for the pedagogical
| framing. The tools they are teaching (Scheme and Pyret)
| actually make it very hard to play with low-level data
| layout details. And the emphasis in these texts on "real
| [as in, world] data" is in direct contradiction to the DOD
| axiom that "data is not the problem domain... The data-
| oriented design approach doesn't build the real-world
| problem into the code."
|
| A rule of thumb: Is anyone talking about GPUs, SIMD, or CPU
| cache sizes? If not, you're looking at something about data
| modeling or data science, not data orientation.
|
| And this, sorry, is all super fucking obvious if you
| actually read the intro to all three things.
| kaycebasques wrote:
| Pretty great intro paragraph. The eloquent writing and
| interesting ideas motivate me to keep reading:
|
| > Data is all we have. Data is what we need to transform in order
| to create a user experience. Data is what we load when we open a
| document. Data is the graphics on the screen, the pulses from the
| buttons on your gamepad, the cause of your speakers producing
| waves in the air, the method by which you level up and how the
| bad guy knew where you were so as to shoot at you. Data is how
| long the dynamite took to explode and how many rings you dropped
| when you fell on the spikes. It is the current position and
| velocity of every particle in the beautiful scene that ended the
| game which was loaded off the disc and into your life via
| transformations by machinery driven by decoded instructions
| themselves ordered by assemblers instructed by compilers fed with
| source-code.
| cempaka wrote:
| Andrew Kelley gave an informative and entertaining talk on how
| DOD had inspired a lot of his work on the Zig compiler:
| https://vimeo.com/649009599
| camjw wrote:
| Best SWE book I've ever read and I don't even work on something
| video game adjacent any more
| htk wrote:
| Did anyone here by any chance create an epub out of this?
|
| If not, any recommendations on utilities to convert several
| linked html files into a single epub?
| asimpletune wrote:
| I think standardebook's has a command line utility that they
| use for producing their ebooks, which you might be able to use
| to produce an ebook from a bunch of html files. Ultimately an
| epub is a zip of html anyways, I think.
| htk wrote:
| I'll give this a try, thank you.
| deneas wrote:
| I like Data-Oriented Design, but beware of one thing: You
| organise your data like a database? You'll eventually be writing
| a database management system, unless you can use a framework like
| one of the many Entity-Component-System ones.
| BonitaPersona wrote:
| I love this book and have been very influenced by it.
|
| However, it should definitely be called: Data-Oriented Design FOR
| GAME DEVELOPMENT.
| marcosdumay wrote:
| The entire advice is context-dependent.
|
| Games just happen to have a lot of operations that need column-
| based access; but that's not true for all domains. When you go
| and blindly push the game best practices into other domain, you
| are just making everybody's life hard and most systems worse.
| _dain_ wrote:
| _> Games just happen to have a lot of operations that need
| column-based access; but that's not true for all domains. _
|
| This was _not at all_ obvious when ECS first came around. It
| took a lot of time to convince people away from the OOP way.
| [deleted]
| 10000truths wrote:
| It's not just column-based access. Formatting your data into a
| struct of arrays exposes opportunites to pack your data more
| efficiently and greatly reduce your application's memory usage.
| Boolean struct fields can become bitsets. Nullable struct
| fields can become sparse (or dense) maps. Pointer/reference
| struct fields can become arrays of smaller-width integers that
| index into a pool. And so on. When everything runs on CPUs that
| frequently stall on memory accesses, the impact of these sorts
| of changes cannot be understated - the latency difference
| between L3 cache and RAM can be on the order of ~10x.
| jackmott42 wrote:
| The advice of keeping the data you access frequently contiguous
| in memory applies to everything on modern hardware. If there is
| a program where performance is an issue at all, probably this
| will be one way to make sure performance is good.
| marcosdumay wrote:
| Well, if you want to generalize, it's about keeping data with
| correlated accesses close to each other and aligned inside
| memory pages, and failing that, yes to keep it at least
| contiguous. It's not exactly about access frequency, except
| that you want to optimize the things you access more.
|
| Yes, that's a generic advice for high performance
| applications that is at least generic enough to apply on
| anything that is close to a normal computer. You will still
| need further details if you are talking about things like HPC
| (ironically) or mainframes, but it's general enough to say
| people should do it without qualifications.
| feoren wrote:
| But it _is_ true, much much more often than you probably
| realize. Just look at your tables and think about how
| repetitive they are. The reason you can 't come up with a lot
| of "column-based" (as you put it, which is still narrow-minded
| IMO) operations is because you've never looked for them before.
| Of course you haven't: you've been stuck in the traditional
| mode where such things are basically impossible.
|
| Do most of your tables have Name / Description type fields?
| Here's some "column-based operations": Allow translation of
| everything in your database. Generate natural-sounding text in
| a report, inserting these names and descriptions (from multiple
| different tables, of course). Free-text search of all your
| important database concepts. Detect similar names to the one
| the user is wanting to add, to prevent duplication. Clean
| whitespace. That's five off the top of my head.
|
| Do most of your tables have Archived / Status / Soft-delete
| type fields? Allow a user to archive a record. Choose whether
| to include archived records in a query or not. Delete archived
| records after X days.
|
| Do most of your tables have Comments fields? Allow multiple
| comments. Track who made a comment and when. Track responses to
| comments.
|
| Do most of your tables track who last modified the record?
| Track _all_ modifications. Show a list of recent modifications
| to any records.
|
| The list goes on and on and on. You call these "column-based
| operations", which again, is short-sighted. They're more like
| "concern-based operations". And it turns out _everything_ is a
| cross-cutting concern. You 're shooting this idea down without
| nearly understanding it.
| feoren wrote:
| For those who disagree, please give an example of a domain
| that is devoid of important "concern-based" operations.
| feoren wrote:
| [flagged]
| AdieuToLogic wrote:
| > Imagine being in such a state of mind that you read the
| above comment and downvote it! Please don't
| comment about the voting on comments. It never does
| any good, and it makes boring reading.
|
| source: https://news.ycombinator.com/newsguidelines.html
| feoren wrote:
| I don't see that as disallowing "for those who disagree
| with me: what do you disagree with, exactly?" Isn't that
| a reasonable question?
|
| Besides, isn't downvoting someone for nonsensical, petty
| reasons (and leaving no response as to why, of course) at
| least as harmful to the overall discussion as referring
| to those downvotes?
| loup-vaillant wrote:
| > _" for those who disagree with me: what do you disagree
| with, exactly?" Isn't that a reasonable question?_
|
| It's a hopeless one. People who downvote generally think
| you're not worth actually answering. Also note that
| heated language often gets automatically downvoted. And
| with few exceptions ("eating babies is bad"), one sided
| opinions tend to be less popular than anything that
| appears "balanced".
|
| It's especially tough if your one sided opinion is
| attacking a popular practice.
| flohofwoe wrote:
| > Games just happen to have a lot of operations that need
| column-based access.
|
| And that's not even true for many code areas in typical games,
| only where there's at least a few thousands 'things' to process
| (e.g. particle systems or navigation/collision systems).
|
| DOD makes a lot of sense within specific subsystems, but not
| necessarily in high level gameplay code (outside specific
| genres at least).
| paulddraper wrote:
| Databases are another common cause.
| forrestthewoods wrote:
| Hrmmm. Not sure this line of thinking makes sense.
|
| Game data access patterns are quite brutal. OOP for games
| results in extremely inefficient cache use, lots of random
| access, and lots of pointer chasing.
|
| ECS isn't a "natural" fit for games. It's quite difficult and
| ECS systems are still far from a solved problem.
|
| The two most popular game engines, Unreal and Unity, are
| decisively non-ECS for almost everything they do.
|
| In any case, I think the underlying principles of DOD apply to
| all programs. Specific solutions vary, as always.
| marcosdumay wrote:
| > In any case, I think the underlying principles of DOD apply
| to all programs.
|
| Yeah, I do agree with that.
| GartzenDeHaes wrote:
| It's for high-performance computing with current CPU designs
| that are dependent on data locality for performance.
|
| I agree that it's a harmful design for business data.
| Programmers want to push their runtime data model into the
| database and they have no interest in the operational,
| maintenance, and performance problems this causes. When someone
| suggests this kind of thing, I'll ask them "how do we diagnose
| performance problems with this technology when there are
| 100,000 concurrent users and millions of data elements?" The
| rows-and-columns people can answer this question.
| feoren wrote:
| > When someone suggests this kind of thing, I'll ask them
| "how do we diagnose performance problems with this technology
| when there are 100,000 concurrent users and millions of data
| elements?"
|
| I don't understand; the exact same performance diagnostics
| work in both cases. Why is this different? There's nothing
| intrinsically less performant about this approach. You really
| think your checkerboard tables and long lists of columns with
| names like "VALUE12" and "VALUE13" and multiple different
| kinds of key/value pairs you jammed in there for different
| clients -- you think _those_ are better performance!?
|
| > 100,000 concurrent users
|
| Do you _actually_ have 100,000 concurrent users? _Really_?
| You don 't, do you? You just kinda hope you will eventually.
| And again: this approach is not worse for that.
|
| > millions of data elements
|
| This is absolute peanuts for any modern database system. It's
| weird that this is your extreme example.
| torginus wrote:
| I feel like this is increasingly the only way to write high-
| performance code.
|
| With newer hardware, the only thing that's expected to scale is
| logic density - SRAM (and cache sizes) have stopped scaling
| with the latest lithographies - and RAM bandwidth hasn't really
| been scaling for quite a while (I'd think it's even possible
| that per-core bandwidth has been _decreasing_ ) - memory access
| has been the bottleneck for a while.
| eska wrote:
| Looked at it for 10 seconds and immediately found random 0s at
| the end of the document or unreplaced text like "Noel Llopis in
| his September 2009 article[#!NoelDOD!#]"
| jerrygenser wrote:
| The intro describes that it's a free version of a book that
| contains most of the content except for a few sections.
|
| The book was converted using an automated format from it's
| native design into html so a large portion could be made
| available for free and this automated conversion process might
| have some minor issues -- like the one you described.
| dkarl wrote:
| > [Abstraction heavy paradigms] structure the code around the
| description of the problem domain
|
| My experience of Domain-Driven Design has been that it is
| extremely effective for driving conversations about the domain
| throughout the product life-cycle, but it produces frustrating
| codebases that are poorly attuned to the world outside the
| running process. Domain-Driven Design codebases want to be self-
| contained universes and treat external systems as details. OO
| design paradigms in general seem to have little respect for
| messages exchanged with external systems.
|
| This wasn't true back when JavaBeans and object databases other
| distributed object systems were expected to take over the world,
| but the failure of those technologies shrank the OO world from
| distributed systems to isolated programs. These days, the
| messages exchanged with external systems are just data, without
| behavior. The marriage of data and behavior can only exist inside
| a single process. So object-oriented design turned inward and
| concentrated on the creation of rich inner worlds.
|
| I think this is backwards, or at least incomplete. As Rich Hickey
| says, effective programs need to be _situated in the real world._
| They are not ethereal abstract models. They have concrete
| functions, inputs and outputs. Having rich internal abstractions
| that mimic some aspects of reality is a means to an end, entirely
| subordinate to the purpose of executing interactions with other
| computing systems. Data-driven design embraces this reality. By
| treating data as essential, and behavior as something to be added
| when it is needed, it allows inputs and outputs to be first-class
| citizens.
|
| > The data-oriented design approach doesn't build the real-world
| problem into the code. This could be seen as a failing of the
| data-oriented approach by veteran object-oriented developers, as
| examples of the success of object-oriented design come from being
| able to bring the human concepts to the machine
|
| I think this perfectly sums up the confusion that OO modeling
| creates. You use code to write programs, or services, or cloud
| functions, things like that. The real-world problem that a
| program, service, or cloud function solves is interacting in a
| certain way with other programs, services, cloud functions.
|
| The real-world problems that OO modeling paradigms want you to
| focus on are the domain problems. This is vitally important for
| design products, systems of programs that solve real human
| problems. If you are designing a system for managing medical
| records in a hospital, you need to model doctors, nurses,
| patients, labs, radiology images, patient stays, all those real-
| world things. However, when you are designing a piece of software
| to do one thing within that medical records system, the "real-
| world problem" your code is solving is limited to its role within
| the larger system.
|
| Data-oriented design is a natural mental fit for writing programs
| or services that play a limited part in larger systems, which is
| always what you are doing when you are writing code. Object-
| oriented design wants to take on the complexity of the whole real
| world, which is the right perspective for product design and
| architecture, not for writing code.
| rapnie wrote:
| > My experience of Domain-Driven Design has been that it is
| extremely effective for driving conversations about the domain
| throughout the product life-cycle, but it produces frustrating
| codebases that are poorly attuned to the world outside the
| running process.
|
| The DDD blue book by Eric Evans consists of two parts:
| Strategic Design and Tactical Patters. Is it accurate to
| summarize your comment as that the strategic design works very
| well, but that most of the information out there leads one to
| then learn about the tactical patterns in a particular OOP
| contexts, which isn't a universal approach, and in many cases
| shouldn't be the way to elaborate findings from Strategic
| Design?
|
| Note that searching the web for "DDD" yields mostly OOP-related
| tactical patterns guidance, and this is why I think many people
| are so sceptical about DDD. It is the strategic parts where
| most of the (low-hanging fruit) value is.
|
| Other non-OOP 'tactical' guidance, such as functional /
| functional-reactive / actor-driven, etc. DDD is harder to come
| by.
| dkarl wrote:
| In my copy of the blue book, "Strategic Design" is Part IV,
| and there is no "Tactical Patterns" part or chapter. To be
| honest, I've only quickly read through Part IV, cherry-
| picking a couple of concepts, because it concentrates on
| techniques for dealing with very large domain models and/or
| very large organizations.
|
| I think the good and bad are interleaved together throughout
| the book. Part I Chapter 2, "Communication and the Use of
| Language," is the one chapter I wish everybody I work with
| would read and digest. I think establishing a consistent
| language about the domain that is shared across functional
| groups is critical, so the domain terminology used in source
| code and comments is consistent with the terminology
| engineers use when talking with product and customer support.
| Part I Chapter 3 contains the clearest statement of (what I
| think is) the core error: "Tightly relating the code to an
| underlying model [which context makes clear is the domain
| model] gives the code meaning and makes the model relevant."
| The rest of the book is like that for me, alternating between
| vigorously nodding my head and yelling "WHY WOULD YOU TELL
| PEOPLE THAT," sometimes within the same chapter.
|
| I suspect overall there's a communication issue with the
| book, where he focuses entirely on the themes of DDD and only
| occasionally gives lip service to other aspects of design.
| For example, when he says that module names and structures
| should reflect domain concepts, I think, "Yeah, that's really
| nice to the extent you can accomplish that, but you also want
| your module names and structures to reflect the logical
| structure of your program, so somebody can look at it and see
| how it works." You have these two aspects that should ideally
| both be legible in the code, but the DDD book doesn't show
| much respect for competing design priorities. In fact, it
| often warns against the dangers of being led astray by other
| design perspectives, such as architectural ones. Like so many
| other OO methodologies, the overwhelming thrust of the book
| is that if you attend to what it's teaching you, everything
| else will take care of itself.
|
| A priest once told me, a good priest will tell you when you
| need a lawyer, a good lawyer will tell you when you need a
| therapist, a good therapist will tell you when you need a
| doctor, and a good doctor will tell you when you need a
| priest. Be careful with an expert who always tells you that
| their expert perspective is what you need. The DDD book is
| definitely that kind of expert you have to be careful with.
| Acumen321 wrote:
| Mike Acton's talk "Data-Oriented Design and C++" from CppCon 2014
| is the best programming talk ever given in my opinion. A must
| watch:
|
| https://youtu.be/rX0ItVEVjHc
| wrapperup wrote:
| It's fantastic, and also my favorite. And for those who might
| not know, he was the one who really mainstreamed Data-oriented
| design and ECS architecture in my eyes.
|
| He previously was also leading the charge on Unity DOTS, though
| unfortunately it seems Unity is having a tailspin at the
| moment. The work on DOTS is solid, if incomplete.
| dosshell wrote:
| One key concept when I use DoD is to not abstract away the data.
| Less is more.
|
| But when quickly reading the intro text, I found it doing the
| opposite, it talks too much and abstracts away the key concepts.
| Only me who found it a bit ironic of not drinking the wine?
| inopinatus wrote:
| Some of the best advice I ever got for writing composable, high-
| performance code was "work on structs of arrays, not arrays of
| structs". I hear many echoes of that advice in this text. Turns
| out that entity-component architectures work well in line-of-
| business applications too, not just games.
|
| Alas, many developers in enterprise are rusted onto a record-
| keeping CRUD model and struggle to think in columns rather than
| rows. The idea of inserting an entity id into a "published"
| table, instead of setting a boolean "published" field to true,
| doesn't always come naturally. Yet once you realise how readily
| polymorphic this is, you may start wanting to use such approaches
| to data for everything. Rich new opportunities then arise from
| cross-pollinating component data. Some may question why it is
| structurally permissible that, say, a network interface can have
| a birthday, or why an invoice has an IPv6 address, why my cat is
| in the DHCP pool, whilst limegreen is deleted and $5 on Tuesdays.
| This of course is half the fun.
|
| I don't accept that it's wholly incompatible with OO, though, a
| thesis you'll see dotted around the place. I've even taken this
| approach with Ruby using Active Record for persistence; not
| normally a domain where the words "high performance" are bandied
| about. That worked particularly because Ruby's object system,
| being more Smalltalk-ish than C++/Java-ish, strongly favours
| composition over inheritance.
| incrudible wrote:
| > you may start wanting to use such approaches to data for
| everything
|
| Please, dont. Arrays of structs is more natural for a reason,
| SOA is not universally faster and it has some of its own
| performance problems. If you _really_ care about performance,
| AOSOA may well be your best bet. Again, please dont use it for
| everything.
|
| > a dhcp pool full of cats is half the fun
|
| Replacing the weird code that the last guy wrote for his own
| amusement is not fun.
| Capricorn2481 wrote:
| > Alas, many developers in enterprise are rusted onto a record-
| keeping CRUD model and struggle to think in columns rather than
| rows. The idea of inserting an entity id into a "published"
| table, instead of setting a boolean "published" field to true,
| doesn't always come naturally. Yet once you realise how readily
| polymorphic this is, you may start wanting to use such
| approaches to data for everything
|
| I don't really understand what's Polymorphic about this, or
| even beneficial. It seems like everytime I've had a boolean
| column in a long-standing application, it eventually needed to
| turn into something else
| ryukoposting wrote:
| Very insightful. I never thought about it like this, but it's
| common practice in embedded systems to approach problems in
| this way. One module may provide a statically-sized pool of
| like objects, and other modules extend behavior in relation to
| those objects by carrying buffers of pointers and/or indexes
| into that pool. It might feel inefficient due to the storage of
| so many pointers/indexes, but you're optimizing the size of the
| largest thing: the object pool itself.
| hu3 wrote:
| > The idea of inserting an entity id into a "published" table,
| instead of setting a boolean "published" field to true, doesn't
| always come naturally.
|
| It doesn't come naturally because now you need a JOIN in your
| SQL just to fetch what was before a column. Or two queries
| instead of one.
|
| Not to mention having closely related data spread in different
| tables increases cognitive load.
|
| You just added a layer of indirection for what gain precisely?
| xupybd wrote:
| I think they're arguing that you get a performance gain.
| Personally the systems if deal with wouldn't gain from such
| and optimisation as there is too little data. So I'll stick
| to the simplest approach.
| gnuvince wrote:
| > I don't accept that it's wholly incompatible with OO
|
| It's not incompatible with the mechanics of OO, but it does
| require that programmers change how they approach problems. For
| instance, a common way to write code in an OO language is to
| focus solely on the thing you want to think about (a user, a
| blog post, a money transaction, what have you) and to implement
| it in isolation of everything else, to hide all of its data,
| and then to think about what methods need to be exposed to be
| useful to other parts of the system. The idea of encapsulation
| is quite strong.
|
| In DOD, it is more common for data related to different domains
| to be accessible and let the subsystems pick and choose what
| they need to do their work. Nothing about Java or Ruby would
| prevent this, but programmers definitely have mental barriers.
| whstl wrote:
| _" I don't accept that it's wholly incompatible with OO,
| though, a thesis you'll see dotted around the place."_
|
| Agreed. Arrays of Structs is not the part that is really
| incompatible.
|
| The part that clashes a bit with _traditional_ OOP is ECS,
| where data and code are meant to be kept separate. But of
| course I 'm talking about _very traditional OOP_. It is
| entirely possible to use an OOP languages + classes to
| implement ECS. It 's just not gonna be "traditional".
|
| EDIT: To quote your other reply: "records don't have to
| identify strongly with a class hierarchy".
| loup-vaillant wrote:
| Not sure what you all mean by OOP to be honest. Is this a
| case of OOP actually meaning "good programming", then
| morphing into the most fashionable (and hopefully best)
| practices of the day?
|
| It wouldn't be the first time... https://loup-
| vaillant.fr/articles/deaths-of-oop
| m_mueller wrote:
| > I don't accept that it's wholly incompatible with OO
|
| I don't think it is, not even in Java. If you use its primitive
| array and value types, you can easily wrap this in a data
| wrapper class with functional interfaces, which then can be
| used quite elegantly in the rest of your code in an OO way -
| you just don't box all the records into objects.
| inopinatus wrote:
| Agreed. Honestly I think the hardest part here is programmer
| mindset - when you suggest that identity is fluid and records
| don't have to correspond to a class hierarchy some folks get
| a panicked look going on, like you just claimed to eat their
| pets
| m_mueller wrote:
| Yep. Funny story, this was actually a coding interview
| homework for a question with order of millions of data rows
| from a file. The exercise required 10s timing requirement,
| so I decided to do it data centric from start. Interviewers
| found my style ,non idiomatic', but probably never thought
| about why my version was done in fractions of a second.
| andsoitis wrote:
| > high-performance code was "work on structs of arrays, not
| arrays of structs"
|
| Wikipedia's article "Array of Structure (AoS) and Structure of
| Arrays (SoA)" explains the trade off between performance (SoA)
| and intuitiveness/lang support (AoS):
| https://en.wikipedia.org/wiki/AoS_and_SoA
|
| They also get into software support for SoA, including data
| frames as implemented by R, Python's Panda's package, and
| Julia's DataFrames.jl package which allows you to access SoA
| like AoS.
| DeathArrow wrote:
| As I see it, there are two types of DOD, one that you've
| mentioned and where you "work on structs of arrays, not arrays
| of structs".
|
| The other one means just giving up of encapsulation, separate
| the data from the methods that work with the data, think the
| whole app in terms on how the data flows through it and model
| it so everything is easy to understand and change. For added
| correctness, you can use immutable data structures and pure
| functions.
| AndrewKemendo wrote:
| >The idea of inserting an entity id into a "published" table,
| instead of setting a boolean "published" field to true, doesn't
| always come naturally. Yet once you realise how readily
| polymorphic this is, you may start wanting to use such
| approaches to data for everything.
|
| Wow thank you for this, it's a heuristic I didn't think about
| but is really powerful.
|
| I need to think through some of the implications, cause I think
| there are some risks to this approach - namely co-mingling
| production and non-production data in the same infra. That
| means that there are data at rest and in transport that are
| following the same data pathways but have different production
| criticality. It puts a lot of risk on the filter working
| perfectly, rather than not even being available on the same
| infra.
|
| Just spitballing:
|
| Is the ostensible "prod_live_bool" flag manually set? If not
| then a bug in any automation would certainly cause a nasty data
| exposure issue
|
| Are you doing column level security tokens? Wouldn't that need
| some kind of intra-table RBAC? In other words, it seems like if
| you want any RBAC within your table, then you have to bring it
| with you every time from the beginning because you have no idea
| what level of data sensitivity you'll have eventually, and
| refactoring an increasingly expanding table to inherit RBAC
| later ---- omg I can't even think of the amount of work that
| would be. O(n^2) level of manual work??
|
| Does it lead to a canonical "live or not" lookup table/dict at
| scale?
|
| I think the data security risks would prevent me from using
| this design pattern for critical applications in MOST cases,
| but I do love some of the patterns here and will be exploring
| this in the future for sure!
| foobarbaz33 wrote:
| I don't think you're talking about the same thing as the
| parent comment.
| Twisol wrote:
| I interpreted `published` along the lines of blog posts, in
| which an article can be either a draft (visible only to the
| author) or published (visible to the world). This seems
| different from dev/prod venueing, where having separate
| databases altogether makes sense.
|
| I understood the column-first approach more as an alternative
| to putting all columns for an entity in one table, especially
| when rows often don't populate every column. From that
| perspective, what's being described is a strong separation of
| concerns; applying this to dev/prod would be a weakening of
| this separation, and so probably not what is desired.
| rawgabbit wrote:
| In the data side of the world, "structs of arrays" translate to
| column based indexes i.e. Snowflake and OLAP. "Arrays of
| structs" translate to relational databases with its page/row
| based indexes.
|
| FWIW, I am a big fan of Snowflake and think it will eat
| everyone else's lunch. I also find it amusing that Snowflake
| "supports" foreign keys but don't enforce it. In other words,
| Snowflake is as "nosql" as I care to go.
| nerdponx wrote:
| Or an in-memory "data frame" like in R, Python's Pandas, and
| Polars.
| Ovid wrote:
| I've tried to push entity-component-systems (ECS) for non-game
| applications. A financial company in London took that advice to
| manage the complexity of their system since it was such a good
| fit.
|
| For those who are curious, here's a very brief introduction to
| ECS: https://dev.to/ovid/the-unknown-design-pattern-1l64
| oaiey wrote:
| Data Oriented Design is more beginner friendly. Because it does
| not deal with people and business but only with purity of data
| modelling.
|
| When I was young, my first step in a new project was painting the
| Entity Relationship Model. That gave me the foundation for
| everything else.
|
| Nowadays, I try to understand the problems and the domain, work
| on capabilities and how to group/box them before I start doing
| data models.
| chrisgd wrote:
| I like the content of your comment. Everyone who has experience
| recognizes that our love of data and programming often gets
| sideswiped by business needs. I think though this article tries
| to say that if we focus on gathering data needs from the
| beginning, it might make the business needs conversation moot
| lincpa wrote:
| [dead]
___________________________________________________________________
(page generated 2023-07-03 23:00 UTC)