hngopher.com

       [HN Gopher] What even is data mesh
       ___________________________________________________________________
        
       What even is data mesh
        
       Author : riccomini
       Score  : 72 points
       Date   : 2021-07-29 17:54 UTC (5 hours ago)
        
 (HTM) web link (cnr.sh)
 (TXT) w3m dump (cnr.sh)
        
       | ryanmaclean wrote:
       | Why the title change?
        
       | zwkrt wrote:
       | In my experience at three large companies, any project where one
       | part of the organization wants "the data" from another is
       | actually just a power grab at the mid-manager level. To me when I
       | hear "accounting wants direct access to the inventory data" I
       | interpret that as cuz "accounting manager thinks the inventory
       | team is slow or incompetent and thinks if their own team just had
       | the underlying inventory data directly accessible she could cut
       | out the middle man!"
       | 
       | The problem of course is that data has to be interpreted, and
       | often that interpretation is complicated. After all, that is why
       | we write programs and don't just query/insert into databases
       | directly from the terminal. Most "data" is inextricably tied to
       | the programs that interact with them, and freeing the data
       | without making the complexities of the program known leaves both
       | organizations open to horrible bugs.
        
       | swordsmith8 wrote:
       | Data mesh case studies: https://medium.com/intuit-
       | engineering/intuits-data-mesh-stra... &
       | https://towardsdatascience.com/from-0-to-data-mesh-kolibri-g...
        
       | [deleted]
        
       | jgraettinger1 wrote:
       | Estuary Flow [1] may be interesting to those in this space.
       | 
       | We're still building, but it's a GitOps workflow tool that
       | tightly integrates schema definition (JSON Schema), captures and
       | materializations from/to your systems & SaaS, rich
       | transformations, catalog and provenance metadata tracking, built-
       | in testing, and a managed runtime. All with sub-second latency.
       | 
       | Flow's runtime uses nascent but really promising open protocols
       | for building connectors to the myriad systems and APIs out there.
       | We're seeing Airbyte's work (itself built off of Singer) as the
       | best steps in this direction and are leaning into that effort
       | ourselves.
       | 
       | [1] github.com/estuary/flow
        
         | brunoqc wrote:
         | I have a dumb question. Could I use flow to import a text file
         | into a postgresql database? The text file is not append-only.
         | 
         | There's a lot of tools to import logs into stuff like kafka but
         | not to import whole files (that can change) to a database.
        
           | jgraettinger1 wrote:
           | Yep. You can, for example, have it watch file(s) in S3, and
           | every time a file changes it will flow its records through
           | into a table it creates in your DB, keyed on your (arbitrary)
           | primary key.
        
             | brunoqc wrote:
             | Any way to watch local files too? S3 might be fine, but
             | just asking.
        
       | siganakis wrote:
       | From my experience, the core driver behind the data mesh
       | architecture is organisational, not technological. Organisations
       | are requiring more of data, be it for rapid product development,
       | or self-service analytics. Often this involves large numbers of
       | sources (e.g. external sources), rather than just larger volumes
       | of the same thing.
       | 
       | If marketing, finance and sales is dependent on a centralised
       | data team for every new thing, the data team quickly becomes the
       | bottleneck, stifling innovation and frustrating teams.
       | Incorporating the principles of a Data Mesh enables those teams
       | to manage their own data, according to well defined governance
       | standards that enable interoperability.
       | 
       | The reality is that different teams are already managing their
       | own data (via excel spreadsheets, web-apps, etc). If we can apply
       | a bit more rigor to how these datasets are managed (e.g. so they
       | can be shared, integrated, secured, etc), then the whole
       | organisation benefits.
        
         | teekert wrote:
         | I think I'm experiencing this where I work. The Data Lake is
         | quickly gaining traction and feature requests poor in: please
         | incorporate FHIR genomics resources, please make a UI for this
         | image type, place make import filters to extract meta data from
         | these files... this team seems swamped now. The solution would
         | be to give more power to the requesters? Allow them to access
         | underlying technologies, implement their own data models? Seems
         | logical. Am I understanding this correctly?
        
           | siganakis wrote:
           | Yes, you are understanding it correctly. The idea is that you
           | give the "requesters" access to the data, then enable them to
           | do their thing with it (with training / support / shadowing)
           | and publish their results as "data-products" so that others
           | can leverage it too in their own "data products".
           | 
           | The "data mesh" is essentially the collection of these
           | independent "data-products".
           | 
           | We already see management problems with self-service
           | analytics like PowerBI, Tableau & Looker. Its too easy for
           | people to create dashboards / reports that are subtly wrong
           | and which cause confusion. There is a balance between
           | empowering to build data products and centralised control.
           | Too much empowerment of people who don't understand the right
           | way to do something leads to a horrible mess of contradictory
           | data. Not enough, and people can't effectively do their job.
           | Governance and process is the key to finding the balance and
           | enforcing it.
           | 
           | The issue with the data-mesh is that there isn't really any
           | great tooling to support the management or development of
           | data products, or a data-mesh generally. I am sure this will
           | change over time as vendors start building hype around it.
        
       | barumrho wrote:
       | Having just read this and Zhamak's article, it seems that there
       | may be some incentive alignment issue with this.
       | 
       | I assume a lot of valuable data originate from customer-facing
       | applications, so the team that already has a customer-facing
       | product now has to manage a new internal-facing data product.
       | 
       | My worry is that the data product won't get the love it deserves.
        
       | imwillofficial wrote:
       | I feel like I don't have the prerequisite knowledge to understand
       | the article. Does anyone have any tips where I can gain the
       | foundational knowledge nessessary?
        
         | [deleted]
        
         | swordsmith8 wrote:
         | https://towardsdatascience.com/what-is-a-data-mesh-and-how-n...
        
         | riccomini wrote:
         | Zhamak's article is the canonical reference. It does a decent
         | job of outlining the problem space:
         | 
         | https://martinfowler.com/articles/data-monolith-to-mesh.html
        
       ___________________________________________________________________
       (page generated 2021-07-29 23:00 UTC)