[HN Gopher] Launch HN: Mozart Data (YC S20) - One-stop shop for ...
       ___________________________________________________________________
        
       Launch HN: Mozart Data (YC S20) - One-stop shop for a modern data
       pipeline
        
       Hi HN, we're Pete and Dan, and together with our team we've built
       Mozart Data (https://www.mozartdata.com/), a tool to get companies
       started on collecting and organizing data to help drive better
       decisions. Mozart is a "modern data stack" -- we set up and manage
       a (Snowflake) data warehouse, automate ETL pipelines, provide an
       interface to schedule and visualize data transformations, and
       connect whatever data-visualization tool you want to use. For most
       teams, in under an hour, you can be querying data from your SaaS
       tools and databases.  Ten years ago, we started a hot sauce
       company, Bacon Hot Sauce, together. But more relevantly, we have
       spent the last two decades building data pipelines at startups like
       Clover Health, Eaze, Opendoor, Playdom, and Zenefits. For example,
       at Yammer, we built a tool called "Avocado," which was our end to
       end analysis tool chain -- we loaded data from our production
       database and relevant SaaS tools like Salesforce, we scheduled data
       transformations (similar to Airflow), and we had a front-end BI
       tool where we wrote and shared queries and dashboards. Today
       Avocado is two tools, Mozart Data and Mode Analytics (a
       collaborative analytics tool). We basically have been building
       similar data tools for years (though the names and underlying
       technologies have changed).  Dan & I decided to build a product to
       bring the same tools and technology to earlier stage companies (so
       that you don't need to make an early hire in data engineering).
       We've built a platform where business users can load data and
       create & schedule transformations with just SQL, wrapped in an
       interface anyone can use -- no Python, no Jinja, no custom
       language. We connect to over 150 SaaS tools and databases, most
       just need credentials to send data to Mozart. There is no need to
       define DAGs (we parse your SQL transforms to automatically infer
       the way data flows through the pipeline). Mozart does the rote and
       cumbersome data engineering that typically takes a while to set up
       and maintain, so that you can tackle the problems your company is
       uniquely suited to do.  Most data companies have focused on a
       single slice of the data pipeline (ETL, warehousing, BI). The
       maturation of data tools over the last decade has made now the time
       to combine them into an easy solution accessible to data scientists
       and business operations alike. We believe that there is immense
       value in centralizing and cleaning your data, as well as setting up
       the core tables for downstream analysis in your BI tool. Customers
       like Rippling, Tempo, & Zeplin use Mozart to automate key metrics
       dashboards, calculate CAC and LTV, or identify customers at risk of
       churn. We want to empower the teams -- like revenue and sales ops
       -- that have a lot of data, know what they want to do with it, but
       don't have the engineering bandwidth to execute it.  Try us out and
       see for yourself - you can sign up
       (https://app.mozartdata.com/signup) and immediately start loading,
       querying, cleaning, and analyzing your data in Mozart. We offer a
       free 14-day trial (no credit card required). After the free trial,
       we charge metered pricing based on compute time used and data
       ingested. We'd love to hear about your experiences with data
       pipelines and any ideas/feedback/questions you might have about
       what we're building.
        
       Author : pfduke02
       Score  : 71 points
       Date   : 2021-05-17 12:35 UTC (10 hours ago)
        
       | dvt wrote:
       | Congrats on the launch! I don't mean to hijack this thread, but
       | as a day-to-day data engineer, I can't help but think that even
       | though this explosion of ETL solutions are undeniably helpful,
       | they don't really get to the real root of the problem. These
       | days, you've got every company -- from small startups to large
       | corps -- warehousing data. But the real value proposition isn't
       | just having access to that raw data, but rather drawing insights
       | out of it.
       | 
       | I'm not sure this is even doable without a dedicated data
       | scientist, but a potential solution is a two-way marketplace that
       | connects companies with data scientists to help make heads or
       | tails of the data they're storing. Otherwise, it's just sitting
       | in a data lake somewhere. (Not sure if something like this exists
       | already, I'm just thinking out loud.)
        
         | pfduke02 wrote:
         | I'm in extreme agreement for part -- for a company to get value
         | out of their data, you want someone skilled at data cleaning,
         | cutting it properly, and teasing out the insights. Where I
         | disagree is that the person can be a data scientist, but
         | doesn't need to be. I believe that there is a growing
         | population of data savvy employees without that title, many of
         | them might not even have data at all in their title (they are
         | in business operations, marketing, finance, and sales) -- many
         | of them write SQL and are very comfortable manipulating data in
         | BI tools, R, Python, Excel, or GSheets.
         | 
         | I also believe that company context matters a lot. I think so
         | much of getting started with extracting value from data is
         | getting up the learning curve of understanding what it means
         | (which columns have the truth). One of the reasons that we
         | don't have a lot of canned reports is that understanding these
         | edge cases within a company often matters a lot (and that not
         | accounting for the nuance can often lead to a misinference).
         | With this in mind, the explosion of ETL solutions and products
         | like Mozart Data means that others at the company can
         | specialize in their business context, as opposed to needing
         | someone who can do all aspects of data including engineering,
         | data science, analysis, and communicating/presenting it.
        
         | shoo wrote:
         | > connects companies with data scientists to help make heads or
         | tails of the data they're storing
         | 
         | the consulting "data scientist" is likely able to do better job
         | if they have experience with the idiosyncracies of the
         | individual company's operations. If you get a fresh data
         | scientist every time they need to repeat the ramp-up period
         | before they are in a position to maybe add value.
         | 
         | Suggests a model where company keeps the same consultant on
         | retainer and brings them on board each time a situation pops up
         | where the consultant may be able to assist
         | 
         | (this isn't a particularly novel suggestion, the same
         | suggestion is made in a 60s/70s era thesis investigating how
         | applicable operations research is to small businesses)
        
       | theboat wrote:
       | do you run dbt under the hood, or did you create your own
       | transformation layer solution?
        
         | pfduke02 wrote:
         | We have created our own transformation layer solution, which
         | includes scheduling, run & version history, and lineage; we do
         | not use dbt under the hood. We share a philosophy of being able
         | to write transforms in SQL one layer above the BI tool -- this
         | leads to greater consistency of downstream answers and allows
         | for business users and analysts to write the business logic
         | into the core tables.
        
       | sbr464 wrote:
       | Could you clarify how you are working with Fivetran for (some?)
       | of the integrations?
       | 
       | Are you partnered with them or would there be additional Fivetran
       | fees if an integration went through them? I noticed when clicking
       | on the Xero integration.
        
         | pfduke02 wrote:
         | We partner with and use PBF (Powered by Fivetran) for some
         | connectors, which we believe are best in class. In addition, we
         | are using Singer Taps and have also custom built some
         | connectors. There are no additional fees for extract-transform-
         | load, whether Fivetran or any other ETL service (we cover
         | those). The primary additional data costs are for a BI tool,
         | though there are a number of free options to connect to.
        
           | sbr464 wrote:
           | Thank you.
        
       | carlineng wrote:
       | Reminds me of Panoply, which had native SaaS integrations, a
       | managed Redshift instance on the back-end, and a BI layer on top.
       | Basically a fully turn-key "modern data stack" [1]. The stack is
       | way easier to operate than it has ever been before, but still
       | requires folks with expertise to manage each of the components.
       | 
       | [1] https://blog.getdbt.com/future-of-the-modern-data-stack/
        
       | BugsJustFindMe wrote:
       | Can I self host? So many of these services assume that I'm
       | interested in sending my data to a third party, but I'm not. I
       | want a tool, not a service.
        
         | dsil wrote:
         | Sorry, we don't support self-hosting yet.
        
       | zomglings wrote:
       | These are not idle questions:
       | 
       | 1. What do multi-source joins look like?
       | 
       | 2. How expensive are they as a function of the sizes of the
       | "tables" being joined?
        
         | dsil wrote:
         | I should clarify, step 1 in most pipelines is pulling data out
         | of the sources and replicating it in Snowflake. Then a multi-
         | source join is a normal ANSI SQL join on literal tables in
         | different schemas of the same database, not "tables".
         | 
         | (Some call this model "ETLT", where the first ETL part is just
         | moving data from APIs or other databases into a shared db, and
         | the extra "T" joining that data across sources or otherwise
         | organizing it in useful ways.)
        
           | zomglings wrote:
           | Thank you for your clarification.
        
       | mpeg wrote:
       | As a data professional, I've got to admit I'm having difficulty
       | differentiating your service from any of a number of other
       | similar offerings like stitch or supermetrics.
       | 
       | You say you charge metered pricing but this information seems to
       | be missing from your site, I understand it's hard to price a new
       | product but I personally need to know pricing before I am able to
       | recommend a product to a client - so the more available this
       | information is the easier I can compare you to others.
       | 
       | I do like the SQL transforms, they don't replace DAG
       | orchestration tools like Airflow but it's a very nice feature
       | that covers a lot of what companies with basic data needs will
       | want.
        
         | pfduke02 wrote:
         | ETL tools like Stitch provide similar & critical functionality.
         | We do this and host/store the data, as well as offer SQL
         | transforms. This enables teams to put together a data pipeline
         | with just one tool.
         | 
         | In terms of pricing we charge by monthly active rows (MAR) and
         | compute time. An introductory package with 500k MAR and 500k
         | seconds costs $1000/month; but we try to tailor to individual
         | company needs.
        
       | jerrytsai wrote:
       | Riffing off of BugsJustFindMe's comment, what kind of
       | privacy/security can you offer at this time? I would love to use
       | a tool like Mozart, but I work with data that contains protected
       | health information (PHI). PHI requires a greater degree of
       | privacy. People working with proprietary financial information
       | have similar concerns.
        
         | dsil wrote:
         | I have a lot of experience with this coming from a background
         | in healthcare. We are not HIPAA compliant yet, so that might be
         | a dealbreaker for some.
         | 
         | There are workarounds, eg for database connectors, and some
         | other connectors, we let you specify which
         | schemas/tables/columns to sync, so you can choose to not sync
         | PII columns (or hash them), and still get a ton of value from
         | the other data and/or aggregates.
         | 
         | And not for PHI, but some of our customers pull all their data
         | into Mozart, write some data transformations within Mozart to
         | redact sensitive data, then use role-based-access-control to
         | give the rest of the company full access to redacted tables,
         | and only certain people have access to the full data.
         | 
         | That said, the security of our customers' data is our top
         | priority regardless of what type of data it is. We're currently
         | in the process of being audited for SOC2 type2.
        
       | mattmarcus wrote:
       | We just started using Mozart at Modern Treasury (S18), and have
       | been really happy so far. We didn't want to spend a ton of time
       | setting up data tooling, so we liked that we could use Mozart to
       | get up and running really fast. All we've had to do is write our
       | transforms, and things like snapshotting, scheduling, etc are
       | taken care of for us. Pete, Dan, and team have been really
       | responsive to our questions and good partners. At first I was a
       | bit skeptical and we were just going to do Snowflake+Fivetran
       | ourselves. But after talking to some of their larger customers, I
       | was convinced that (a) it would save us time and (b) it could
       | scale with us.
        
         | dmull wrote:
         | Thanks Matt, Modern Treasury is an ideal customer. They have
         | all the chops to build the right data stack, but they're laser
         | focused on their core business. Great to be working together.
        
       | shrig94 wrote:
       | Shri here - former YC founder and previously at Eaze. Peter
       | Fishman (founder of Mozart) is the most intelligent and pragmatic
       | data leader I've met and was a joy to work with. If you're
       | looking to setup your data stack, I can say wholeheartedly that
       | you're in good hands with Pete. :)
        
       | mrwnmonm wrote:
       | From the description, I thought you host/store the data, and
       | provide analytics visualization too. But looks like you Move,
       | Transform, Sync data. Doesn't this exist already?
        
         | dsil wrote:
         | Thanks for the feedback, maybe we need to make that clearer. We
         | do host/store the data - under the hood we're using Snowflake
         | for warehousing, but we don't currently provide visualizations
         | too. Once your data is organized most people hook up a BI tool,
         | and/or export to Excel/Gsheets.
         | 
         | Components of this certainly already exist, we're trying to put
         | it all together in a single platform and make this
         | functionality easier to use.
        
           | mrwnmonm wrote:
           | Got it. Love the name btw. Good luck!
        
             | dsil wrote:
             | Thanks!
        
             | pfduke02 wrote:
             | Thanks! We couldn't resist a good pun on "data
             | orchestration."
        
       | satyrnein wrote:
       | Could you contrast your offering with a Stitch/Snowflake/dbt
       | setup?
        
         | pfduke02 wrote:
         | Functionality-wise that stack would be very similar! A core
         | design principle of ours is that you should be able to have the
         | power of a modern data platform even if all you know is a bit
         | of SQL. So our product is functionally similar to a stack like
         | Stitch+Snowflake+dbt (and we use some of those under-the-hood),
         | but we try to wrap it all in an easier-to-use interface (e.g.
         | typically to snapshot a table you write a few lines of config
         | code, whereas in Mozart you just flip a toggle), and be more
         | cost-competitive for smaller orgs.
        
       | chrisfrantz wrote:
       | I'm really impressed with the list of data sources (120+) you
       | have at launch. How long did you spend integrating each of these
       | tools?
       | 
       | Having some experience here, I can say that this is typically not
       | a quick process since it depends so much on third-parties, so
       | it's really cool you have such a large library of connectors.
        
         | pfduke02 wrote:
         | Thanks! As mentioned in other comments, we partner with and use
         | PBF (Powered by Fivetran) for connectors we believe are best in
         | class. We are committed to ETL reliability, and that ease of
         | use/setup and automatically managing changes is critical for
         | success. In addition to PBF, we leverage Singer Taps, and our
         | team is adding to the long-tail of connectors.
        
       ___________________________________________________________________
       (page generated 2021-05-17 23:01 UTC)