[HN Gopher] Launch HN: Metaplane (YC W20) - Datadog for Data
       ___________________________________________________________________
        
       Launch HN: Metaplane (YC W20) - Datadog for Data
        
       Hey HN! We're Kevin, Guru, and Peter from Metaplane
       (https://metaplane.dev). Metaplane is a data observability tool
       that continuously monitors your data stack, alerts you when
       something goes wrong, and provides relevant metadata to help you
       debug.  Data teams are often the last to know about data-related
       issues. They commonly find out only when an executive messages them
       about a broken dashboard. This is comparable to finding out about
       your servers being down only when your end users report it! In
       software engineering, this problem is solved with observability
       tools like Datadog and SignalFx. These monitor your system over
       time by tracking metrics (like CPU, memory usage or any arbitrary
       value), and sending alerts when they hit thresholds or are
       anomalous.  Metaplane solves this problem for data teams. We
       continuously monitor our users' data warehouse tables and columns,
       testing for things like row counts, freshness, cardinality,
       uniqueness, nullness, and statistical properties like
       mean/median/min/max, as well as schema changes. After we build up a
       baseline of data points for each of these tests, we send alerts on
       anomalies to the user's Slack channel. Each alert includes metadata
       like upstream/downstream tables and BI dashboards affected by the
       issue, so that the user can assess how important the issue is and
       how quickly it should be addressed.  We're particularly careful
       about alert fatigue and false positives. Since we can't ask users
       to set manual thresholds (they would be changing all the time), we
       have to make a reasonable prediction based on past data, which can
       result in false positives and false negatives. If we under-alert,
       we miss important issues, but if we over-alert, users become
       desensitized and start ignoring alerts. Our solution is to include
       "Mark as anomaly" and "Mark as normal" buttons with each alert, for
       users to provide feedback to the model.  To give a common example,
       Metaplane can tell you that a revenue metric in a Snowflake column
       has spiked from $100 to $10,000 in an unexpected way. The alert
       includes upstream dependencies in dbt and downstream Looker
       dashboards that are impacted. Another example is if a table in
       Redshift that is usually updated every day hasn't been updated in
       over 48 hours. A third example is if a table in BigQuery that
       typically increments 10M rows every day suddenly adds only 1M rows
       because of an upstream vendor bug. These are all what we think of
       as "silent data bugs" -- all systems are green, but your data is
       just wrong!  Over the last eight months, we've caught problems like
       these for data teams at dozens of companies including Imperfect
       Foods, Drift, Vendr, Reforge, Air Up, Teachable, and Appcues.
       Today, we're excited to launch our self-serve product and free plan
       with the HN community. Setting up monitoring for your data stack
       takes less than 10 minutes. Here's a 4 minute demo video to see how
       it works:
       https://www.loom.com/share/1aa54eb8b45548e180f6ab3a4a580cc5. We
       make money by charging for more tests and team/enterprise features.
       You can use our new free plan or try out all of our features in a
       30 day trial, no credit card required.  Our goal is to help data
       teams of any size be the first to know about data issues. We think
       observability will become as much of a no-brainer to data teams as
       it is to software engineers today. Starting on AWS?--get Datadog.
       Bringing on Snowflake?--get a data observability tool (hopefully
       ours!). Eventually we want to support more use cases that you'd
       expect from a Datadog for data, like log centralization and
       diagnostics, spend monitoring, performance insights, and deep
       integration with upstream applications. For now, we're just
       starting where the pain is highest.  We'd love to hear your ideas,
       experiences, and feedback, and will be answering any questions in
       the comments!
        
       Author : kzh_
       Score  : 117 points
       Date   : 2021-11-15 13:05 UTC (9 hours ago)
        
       | technobabbler wrote:
       | For those of us in smaller businesses, any chance of supporting
       | Google Analytics as an integration? The ability to detect
       | statistical anomalies across random pages/events would be a
       | godsend, especially for organizations that don't have a proper
       | data warehouse and dedicated BI staff.
       | 
       | I guess that'd be kinda a crossover between data integrity and
       | marketing... in this case, an anomaly IS the data, where an
       | unexpected increase/decrease in pageviews somewhere is something
       | we'd love an alert on, but only after some threshold. As you
       | pointed out, doing that manually just results in a bunch of false
       | positives for those of us who aren't professional BI or
       | statisticians.
        
         | kzh_ wrote:
         | Not yet, but eventually! We're focused on data in warehouses
         | and transactional DBs right now, just to limit the amount of
         | integrations we need to build to start. We definitely plan to
         | integrate with application sources like Google Analytics down
         | the line though. Upstream applications are ultimately the
         | sources of data truth, after all.
         | 
         | I wanted to +1 what you said about "organizations that don't
         | have a proper data warehouse and dedicated BI staff." At the
         | end of the day, a huge number of companies (maybe even most?)
         | don't have dedicated data teams but still want to know be
         | alerted about data anomalies. Heck, we at Metaplane even fall
         | into that camp.
        
           | technobabbler wrote:
           | Sounds good, thank you!
        
         | jainaayush05 wrote:
         | Check out www.cliff.ai
        
           | technobabbler wrote:
           | Do they support Google Analytics? Nothing on their website or
           | release notes indicates this, as far as I could find.
        
       | langitbiru wrote:
       | I looked at the pricing page. It's a very big jump from $0 to
       | $400 per month. There is no middle ground. Something like $30 per
       | month. Any reasons?
        
         | kzh_ wrote:
         | We wanted to make Metaplane free for individuals and
         | approachable for teams, and found that $400/mo is justified by
         | the cost of engineering time saved and is comparable to other
         | paid tools that a smaller data team might use (like dbt,
         | Fivetran, Hightouch).
         | 
         | That said, we're still experimenting with pricing though, and I
         | can see the argument for a tier that's more suited for
         | individual paid plans. We also want the free plan to be pretty
         | generous -- is there a specific constraint that feels too
         | limiting? Thanks for your feedback!
        
           | tomhallett wrote:
           | One note about the pricing page itself - I found the "black
           | vs grey" links for the two tiers of "Growth" pretty
           | confusing.
           | 
           | I incorrectly assumed that once I enter the Growth plan
           | $400/mo, I get access to dbt/lineage. But those are only
           | "checked" when you pay $800/mo version of the Growth plan.
           | 
           | So it really feels like you have 4 plans: Free, Growth,
           | Business?, and Enterprise.
        
             | gurubavan wrote:
             | Thanks for the feedback on pricing -- it's definitely a
             | work in progress (and we'll make the visual distinction
             | clearer on our website). Our goal is to be as accessible as
             | possible, so we're flexible with the plans right now.
             | Pricing shouldn't be the reason that a team misses out on
             | data issues.
             | 
             | When you sign up, you'll have access to everything
             | immediately, so you can connect and try it out and when we
             | start enforcing the limits, we'll give you ample notice.
             | 
             | I'm curious if you have opinions on the plans and pricing.
             | Do the plans make sense as 1. individual 2. team for
             | warehouse 3. team for whole stack 4. enterprise?
        
       | cpach wrote:
       | Hi! Out of curiosity: Why post so early? It's only 5:48 am on the
       | US west coast.
        
         | kzh_ wrote:
         | We're based on the US east coast in Boston and just posted when
         | we woke up :)
        
         | 0des wrote:
         | We live in a global society.
        
         | HatchedLake721 wrote:
         | While it's 5:48am on the US west coast, it's 13:48pm in the UK
        
           | cpach wrote:
           | UK? Bah! I'm waaay ahead of you guys ;-) (CEST)
        
           | samstave wrote:
           | And its always 5pm in Ireland!
        
       | skadamat wrote:
       | Congrats on the launch! Any plans to integrate with Apache
       | Superset? https://superset.apache.org/
        
         | kzh_ wrote:
         | Thanks, and definitely! We're making a big push in the coming
         | months to keep building out our downstream BI integrations. The
         | Superset API is quite nice so we're looking forward to working
         | with it.
        
       | tomhallett wrote:
       | I'm building our companies first data platform right now
       | (fivetran, dbt, snowflake), so I'll definitely check this out!!
       | 
       | 1) Do you have Metabase on your roadmap? Lightdash?
       | 
       | 2) I see that you alert on schema changes, which is great. Can
       | you monitor for schema changes of a Postgres database? Reason I
       | ask: Fivetran (and others) will try to buffer some schema changes
       | from you to prevent data loss (drop columns, rename columns,
       | etc). There is some more complex nuance I have in mind here, but
       | it's a bit too long to type out on my phone, :)
        
         | gurubavan wrote:
         | 1) An integration with Metabase Cloud is on our roadmap for Q1!
         | We'd love to integrate with Lightdash, but they don't have a
         | public API just yet[1].
         | 
         | 2) Several of our customers use us to alert on schema changes
         | in Postgres, specifically so they can get ahead of application
         | database changes that will end up in the warehouse, so you're
         | definitely not alone! Here's a link on how to connect postgres:
         | https://docs.metaplane.dev/docs/postgres
         | 
         | That's an excellent stack and one we kept front and center when
         | building out Metaplane, so definitely let us know if you have
         | any feedback or suggestions here!
         | 
         | [1]: https://github.com/lightdash/lightdash/issues/632
        
           | tomhallett wrote:
           | All sounds great! I'll share it with my team.
           | 
           | My plan was to monitor the postgres database in the staging
           | environment, so we can be alerted to schema changes before
           | they are released into production (and hopefully stop the
           | production deploy).
           | 
           | I have a goal of moving this even further upstream into the
           | CI build for the source application itself (Ruby on Rails in
           | this case), so that the application's test suite will fail a
           | developer introduces a breaking schema change. Note: this is
           | a pretty tricky problem to solve without a) the tests being
           | way too brittle OR b) super slow end to end tests. I have
           | some goals of introducing which is a mashup of: Spectacles
           | [1], Pact [2], and dbt models [3].
           | 
           | [1] https://www.spectacles.dev [2] https://pact.io [3]
           | https://docs.getdbt.com/docs/building-a-dbt-project/using-
           | so...
        
             | gurubavan wrote:
             | That sounds like a great plan. We're planning to build our
             | public API and CI/CD integrations early next year, so that
             | developers can know what the downstream impact of their
             | changes might be, and whether it could introduce unexpected
             | results. We may be able to slot right in there with Pact.
             | 
             | Mitigating the impact with monitoring is where we're at
             | right now, but we're with you that preventing errors can be
             | even more important.
             | 
             | If it's interesting to you, we're happy to open up a shared
             | slack channel to dig into the nuance as well! Just email me
             | (guru@metaplane.dev) with the email you'd like to be added.
        
               | tomhallett wrote:
               | Very cool. I'll reach out.
               | 
               | When Nick Schrock created dagster, he argued that many
               | "data cleaning" tasks which people attribute to "data
               | engineering" aren't actually "cleaning", but are
               | architecture problems. I believe schema changes also fall
               | into this category. I'm extremely new to data
               | engineering, but when I think about "What are the things
               | which will break this system?" an application engineer
               | thinking "I'm going to rename this column and my tests
               | pass, so this should be fine" will break things all the
               | time. (Similar goes for dropping a column, changing a
               | one-to-many into a many-to-many)
        
       | jonathanbyrne wrote:
       | This sounds like a very cool product! Any plans on supporting
       | integration with Microsoft SQL Server?
        
         | kzh_ wrote:
         | Integrating with Microsoft SQL Server is definitely on our
         | roadmap in the coming months. If you're up to discuss your use
         | case, please reach out to team@metaplane.dev because we'd love
         | to explore building this integration for you!
        
       | bimil wrote:
       | Congrats on the launch! Data Quality is an important area that
       | our customers always ask about on Select Star
       | (https://selectstar.com). Looking forward to integrate with you
       | guys one day.
        
       | lrobinovitch wrote:
       | This might solve a problem I have right now - I'm persisting a
       | kafka topic to clickhouse using the clickhouse kafka engine and
       | realized that to get reliable monitoring on this pipeline I'll
       | have to roll my own service that polls clickhouse and sends
       | metrics to datadog, then write datadog monitors. Looking forward
       | to exploring Metaplane.
        
         | kzh_ wrote:
         | Thanks for sharing your use case :) We don't support Clickhouse
         | yet, but this is exactly the kind of problem that we want to
         | solve for data teams. You shouldn't need to write and deploy
         | code to monitor every metric or data that needs to be accurate
         | and timely.
         | 
         | Looking forward to hearing what you think, and please reach out
         | to team@metaplane.dev because we'd love to explore building
         | this integration for you!
        
       | i_like_waiting wrote:
       | Looks amazing, and timing on bringing the solution is great. Any
       | chance there will be on-prem version?
        
         | kzh_ wrote:
         | Great question! While right now we are focused on our cloud
         | native application, we do plan on supporting an on-prem version
         | for companies that require hosting themselves.
         | 
         | The reason our customers haven't required this is because we've
         | tried to take security seriously from day one and Metaplane
         | doesn't store any customer data (just the metadata). We
         | received our SOC2 Type II report, support IP whitelisting,
         | SSH/Reverse SSH tunnels and are always exploring other
         | integration options like AWS' PrivateLink.
         | 
         | That said, we definitely understand the need to keep even
         | metadata on-prem, so we plan on tackling that later next year.
        
           | gz5 wrote:
           | Congrats on the launch. Well done.
           | 
           | another security approach is to enable your customers to
           | close their inbound firewall ports and link listeners. this
           | helps cloud and on prem models have far stronger security.
           | 
           | example here (disclosure: i am a founder of the company
           | behind this solution) with both open source and OEM/SaaS
           | models:
           | 
           | https://github.com/openziti-incubator/zdbc (code for one
           | implementation - a wrapper around the JDBC drivers)
           | 
           | https://netfoundry.io/zero-trust-database-security/ (blog
           | post with links to developer example video, whitepaper, etc)
        
       | Etai wrote:
       | Congrats on the launch guys! Love the self service approach
       | you've taken to this problem
        
       | julee04 wrote:
       | Looks great! Congrats on the launch.
       | 
       | How does this differ from a data reliability platform like
       | Datafold? https://www.datafold.com/
       | 
       | And can this replace what https://atlan.com/ does as well?
        
         | kzh_ wrote:
         | Good question! Both Datafold and Atlan support data monitoring
         | as a secondary feature, but have different main focuses:
         | 
         | Datafold is primarily known for their Data Diff regression
         | testing that simulates the result of a PR on your data within a
         | CI/CD workflow. There's definitely a need for proactively
         | preventing data issues from occurring in the first place, but
         | issues introduced via code are only one subset of potential
         | data quality issues.
         | 
         | Metaplane is focused on catching the symptoms first via
         | continuous monitoring. Regression tests don't replace the need
         | for observability, and vice-versa.
         | 
         | Atlan is primarily known for their data workspace features that
         | make collaboration easier, like a data dictionary, SQL editor,
         | and governance.
         | 
         | Data collaboration is a huge unsolved problem and data
         | monitoring does play a role there. But Metaplane is focused
         | squarely on the problem of detecting data issues and giving you
         | relevant metadata to prioritize and debug.
        
           | julee04 wrote:
           | thanks for the reply! great breakdown of the space
        
       | atak1 wrote:
       | This is awesome! We did this in-house back in 2014, and it
       | quickly became an unmaintainable mess.
       | 
       | With dbt and Snowflake poised to take over this space, I can see
       | this fitting right in on top of these tools. One idea would be to
       | build in dbt integration into metafold
       | 
       | I'm curious - how did you settle on the pricing? I can see it
       | being a differentiator from Datafold, Supergrain depending on
       | your feature set
        
         | kzh_ wrote:
         | Amazing how many companies use dbt + Snowflake right? Such a
         | different world from 2014...
         | 
         | Good idea, we actually do have a dbt integration that pulls in
         | lineage and job metadata from your dbt manifests:
         | https://docs.metaplane.dev/docs/dbt. Eventually we want to let
         | you configure Metaplane tests from your dbt YAML.
         | 
         | Pricing is still in flux to be honest. We wanted to start with
         | a price that was approachable for small teams, comparable to
         | other tools in your stack, and could be paid for without going
         | through a whole procurement process. But we're trying to stay
         | as flexible on pricing as possible!
        
       | lettergram wrote:
       | My team has worked on a library for a similar purpose:
       | 
       | https://github.com/capitalone/DataProfiler
       | 
       | Load any document, profile and monitor the profiles for changes
       | that would impact downstream applications.
       | 
       | Very common problem, you all are in a great space! Very
       | interested and will check out!
        
         | kzh_ wrote:
         | Nice work! Love the support for merging profiles together and
         | also profiling unstructured data. I could've used this all of
         | the time back in research days, instead of having to do the
         | profiling by hand and from scratch every time.
         | 
         | We definitely want to explore suggesting data tests based on
         | profiling. Don't be surprised if you see a fork!
        
       | joelschw wrote:
       | How do you compare yourselves to Monte Carlo Data?
        
         | kzh_ wrote:
         | The main difference is how we get into the hands of data teams.
         | To use Monte Carlo you'll need to book a demo with their sales
         | team to see the product, go through an implementation process,
         | and potentially pay as much for a data monitoring tool as you
         | do for your database.
         | 
         | We want every data team to have have observability as soon as
         | they have data in a warehouse, and that means: 1) letting you
         | implement the tool within 10 minutes, without talking to us, 2)
         | providing a free plan and paid plans that make sense for modern
         | data teams, and 3) focusing on being as helpful as we can with
         | as little configuration as possible.
         | 
         | Monte Carlo has built a strong team and has done a great job
         | telling the story of data observability for larger companies.
         | Overall we want to support even the smallest data teams, who we
         | feel are being underserved by other companies in this space.
        
       | Grimm1 wrote:
       | Congrats Kevin and Guru on the launch. Happy to see this on here,
       | I was wondering when you'd do your Launch HN. These are real
       | problems that a lot of data teams have and I'm glad to see this
       | making it into the wild!
        
         | kzh_ wrote:
         | Thanks Ian -- our earlier conversations definitely helped shape
         | our thinking about this space!
        
       | dwolchon wrote:
       | I am a huge fan of this team and their tool. We already use it
       | and it has caught a bunch of issues before they became bigger
       | problems.
       | 
       | We've already had those "wow I'm glad we have this tool" moments,
       | just a couple of months in.
        
       | bdcravens wrote:
       | To be the Datadog for anything, you need a very aggressive sales
       | team who are willing to hold customers financially hostage in
       | exchange for C-grade customer service. (Though their technology
       | is certainly acceptable) Be better than Datadog.
        
         | bdcravens wrote:
         | Also you need to ensure the price of the monitoring solution is
         | more expensive than the resource being monitored.
        
       | mohzilla wrote:
       | Congrats on the launch! How does this differ from Bigeye or
       | Anomalo?
        
         | kzh_ wrote:
         | Our customers think of BigEye, Anomalo, and Monte Carlo very
         | similarly (needing to go through a sales process, spending
         | quite a bit of money), so this answer to a previous question
         | about Monte Carlo might be useful:
         | https://news.ycombinator.com/item?id=29228070 (linking to avoid
         | redundancy)
        
       | theboat wrote:
       | Congrats on the launch! The Metaplane team is great.
        
         | [deleted]
        
         | gurubavan wrote:
         | Thank you! appreciate your support :)
        
       | samstave wrote:
       | I love this, and the comment you made about getting up in 10 mins
       | with any size team.
       | 
       | e.g. we are just 2 of us - yet need this.
       | 
       | Also, apologies - but I litereally hear in my head "Meatplane"
       | every time I read 'Metaplane' (but I do like the name - Suckit
       | Zuck, metaplane is just way more meta-ier than 'Meta'
        
         | kzh_ wrote:
         | Haha you're definitely not alone -- this happened so much we
         | ended up getting http://meatplane.com. Maybe we'll consider a
         | rebrand in the near future :)
        
           | samstave wrote:
           | So click me and pay for me
           | 
           | Tell me that you'll data me
           | 
           | Watch me like you'll never let me down
           | 
           | 'Cause I'm Aler-tin' on a meat-plane
           | 
           | Don't know when I'll be backed-up again
           | 
           | Oh Ops, I hate to go
           | 
           | https://youtu.be/SneCkM0bJq0?t=34
        
       ___________________________________________________________________
       (page generated 2021-11-15 23:01 UTC)