[HN Gopher] Shopify's Data Science and Engineering Foundations (...
___________________________________________________________________
Shopify's Data Science and Engineering Foundations (2020)
Author : mooreds
Score : 102 points
Date : 2022-03-11 18:09 UTC (4 hours ago)
(HTM) web link (shopify.engineering)
(TXT) w3m dump (shopify.engineering)
| kevinsundar wrote:
| Having recently worked on a data team at FAANG, all this is an
| ops nightmare for the team running the platform itself if you
| want to ensure data quality for everyone querying the data. Im
| talking when you have hundreds of data sources and hundreds of
| query use cases.
|
| Anyone have any solutions you've tried?
| atwebb wrote:
| FAANG seems to be an outlier but, it sounds a lot like the
| enterprise data mart strategy covered under a mix of stuff from
| principle #1.
|
| If you want quality, you need structure and review. Accessible
| data is helpful and needed to develop some of the mature
| processes, but for most day to day analysis/reporting, no one
| wants to create their own data model from scratch.
|
| Lots of FAANG doesn't apply to any other companies so it may
| just be a case of having a wholly unique use case. Though I'm
| surprised there isn't something already in place at this point
| (of course having very little knowledge of the case). For the
| dims/facts/marts, they tend to be business use case focused and
| not source/data which can reduce the targets down significantly
| since business use cases tend to repeat (or rhyme).
| bushbaba wrote:
| Checkout Apache Iceberg. Does a great job of handling many
| readers few writers. With data consistency and query
| consistency.
|
| It's a great approach for your data lake and data warehousing
| needs.
| faizshah wrote:
| This timetravel/rollback feature is really interesting:
| https://iceberg.apache.org/docs/latest/spark-
| queries/#time-t...
| xhevahir wrote:
| I've read stuff before about Shopify's use of Nix. Since this
| post doesn't mention Nix, I take it they don't use it in this
| department of the company?
| csears wrote:
| It sounds like they have data science and data engineering in one
| organization. Is that team structure something that others have
| seen work well?
| erulabs wrote:
| One of the most interesting bits of devops work I've done was
| when I was embedded with a data science team. Infrastructure
| for data science is just so different than traditional ops -
| but I feel like I was able to both help the team move more
| quickly and also prevent them from spending all of the
| companies money - so at least in that case, it worked quite
| well.
|
| I've never understood why data science teams are typically so
| far removed from "normal" engineering teams. Maybe it's the
| DevOps kool-aide speaking, but in my opinion, teams should be
| more horizontal than vertical!
| cromd wrote:
| I've been in orgs where it was on same team, and on different
| teams, both as a modeler and a data engineer. So far, I
| personally prefer when they're on the same team.
|
| Pros of same-team: fewer ideas "lost in translation" between
| data scientists and data engineers, better understanding of
| which datasets/flows are top priority, can sometimes share some
| stack components and help datascientists improve their code,
| better chances of getting data scientists to contribute their
| own batch jobs (there's just more trust as opposed to dealing
| with some "engineering" team that is less connected to you)
|
| Cons of same team: data engineers may not be as in-the-loop on
| what's happening with production datasets, may not be as
| tightly integrated with a devops team, may get overly caught up
| in "business logic" as opposed to "plumbing".
| quadrature wrote:
| Data scientists are embedded in product teams and data platform
| engineers are in a platform engineer org
| thenipper wrote:
| I work with operations research teams in a blended model of
| engineering being embedded with the OR Scientists. I really
| prefer it. Code can get to prod a lot quicker and we don't have
| the "throw it over the fence to engineering" issues that can
| arise.
| mooreds wrote:
| I liked how they took some of the essences of software
| development (one set of tooling, DRY, re-use) and applied it to
| the data science arena.
| [deleted]
| JHonaker wrote:
| I started this expecting to be disappointed, but I really like
| all of the principles they're describing. I've been pushing for
| more of this attitude at my own company.
___________________________________________________________________
(page generated 2022-03-11 23:00 UTC)