[HN Gopher] What even is data mesh
___________________________________________________________________
What even is data mesh
Author : riccomini
Score : 72 points
Date : 2021-07-29 17:54 UTC (5 hours ago)
(HTM) web link (cnr.sh)
(TXT) w3m dump (cnr.sh)
| ryanmaclean wrote:
| Why the title change?
| zwkrt wrote:
| In my experience at three large companies, any project where one
| part of the organization wants "the data" from another is
| actually just a power grab at the mid-manager level. To me when I
| hear "accounting wants direct access to the inventory data" I
| interpret that as cuz "accounting manager thinks the inventory
| team is slow or incompetent and thinks if their own team just had
| the underlying inventory data directly accessible she could cut
| out the middle man!"
|
| The problem of course is that data has to be interpreted, and
| often that interpretation is complicated. After all, that is why
| we write programs and don't just query/insert into databases
| directly from the terminal. Most "data" is inextricably tied to
| the programs that interact with them, and freeing the data
| without making the complexities of the program known leaves both
| organizations open to horrible bugs.
| swordsmith8 wrote:
| Data mesh case studies: https://medium.com/intuit-
| engineering/intuits-data-mesh-stra... &
| https://towardsdatascience.com/from-0-to-data-mesh-kolibri-g...
| [deleted]
| jgraettinger1 wrote:
| Estuary Flow [1] may be interesting to those in this space.
|
| We're still building, but it's a GitOps workflow tool that
| tightly integrates schema definition (JSON Schema), captures and
| materializations from/to your systems & SaaS, rich
| transformations, catalog and provenance metadata tracking, built-
| in testing, and a managed runtime. All with sub-second latency.
|
| Flow's runtime uses nascent but really promising open protocols
| for building connectors to the myriad systems and APIs out there.
| We're seeing Airbyte's work (itself built off of Singer) as the
| best steps in this direction and are leaning into that effort
| ourselves.
|
| [1] github.com/estuary/flow
| brunoqc wrote:
| I have a dumb question. Could I use flow to import a text file
| into a postgresql database? The text file is not append-only.
|
| There's a lot of tools to import logs into stuff like kafka but
| not to import whole files (that can change) to a database.
| jgraettinger1 wrote:
| Yep. You can, for example, have it watch file(s) in S3, and
| every time a file changes it will flow its records through
| into a table it creates in your DB, keyed on your (arbitrary)
| primary key.
| brunoqc wrote:
| Any way to watch local files too? S3 might be fine, but
| just asking.
| siganakis wrote:
| From my experience, the core driver behind the data mesh
| architecture is organisational, not technological. Organisations
| are requiring more of data, be it for rapid product development,
| or self-service analytics. Often this involves large numbers of
| sources (e.g. external sources), rather than just larger volumes
| of the same thing.
|
| If marketing, finance and sales is dependent on a centralised
| data team for every new thing, the data team quickly becomes the
| bottleneck, stifling innovation and frustrating teams.
| Incorporating the principles of a Data Mesh enables those teams
| to manage their own data, according to well defined governance
| standards that enable interoperability.
|
| The reality is that different teams are already managing their
| own data (via excel spreadsheets, web-apps, etc). If we can apply
| a bit more rigor to how these datasets are managed (e.g. so they
| can be shared, integrated, secured, etc), then the whole
| organisation benefits.
| teekert wrote:
| I think I'm experiencing this where I work. The Data Lake is
| quickly gaining traction and feature requests poor in: please
| incorporate FHIR genomics resources, please make a UI for this
| image type, place make import filters to extract meta data from
| these files... this team seems swamped now. The solution would
| be to give more power to the requesters? Allow them to access
| underlying technologies, implement their own data models? Seems
| logical. Am I understanding this correctly?
| siganakis wrote:
| Yes, you are understanding it correctly. The idea is that you
| give the "requesters" access to the data, then enable them to
| do their thing with it (with training / support / shadowing)
| and publish their results as "data-products" so that others
| can leverage it too in their own "data products".
|
| The "data mesh" is essentially the collection of these
| independent "data-products".
|
| We already see management problems with self-service
| analytics like PowerBI, Tableau & Looker. Its too easy for
| people to create dashboards / reports that are subtly wrong
| and which cause confusion. There is a balance between
| empowering to build data products and centralised control.
| Too much empowerment of people who don't understand the right
| way to do something leads to a horrible mess of contradictory
| data. Not enough, and people can't effectively do their job.
| Governance and process is the key to finding the balance and
| enforcing it.
|
| The issue with the data-mesh is that there isn't really any
| great tooling to support the management or development of
| data products, or a data-mesh generally. I am sure this will
| change over time as vendors start building hype around it.
| barumrho wrote:
| Having just read this and Zhamak's article, it seems that there
| may be some incentive alignment issue with this.
|
| I assume a lot of valuable data originate from customer-facing
| applications, so the team that already has a customer-facing
| product now has to manage a new internal-facing data product.
|
| My worry is that the data product won't get the love it deserves.
| imwillofficial wrote:
| I feel like I don't have the prerequisite knowledge to understand
| the article. Does anyone have any tips where I can gain the
| foundational knowledge nessessary?
| [deleted]
| swordsmith8 wrote:
| https://towardsdatascience.com/what-is-a-data-mesh-and-how-n...
| riccomini wrote:
| Zhamak's article is the canonical reference. It does a decent
| job of outlining the problem space:
|
| https://martinfowler.com/articles/data-monolith-to-mesh.html
___________________________________________________________________
(page generated 2021-07-29 23:00 UTC)