[HN Gopher] Datadog's $65M/year customer mystery solved
___________________________________________________________________
Datadog's $65M/year customer mystery solved
Author : thunderbong
Score : 92 points
Date : 2025-06-30 18:31 UTC (4 hours ago)
(HTM) web link (blog.pragmaticengineer.com)
(TXT) w3m dump (blog.pragmaticengineer.com)
| mrkramer wrote:
| And who says that SaaS doesn't pay off?! It pays off like hell!
| delichon wrote:
| > For observability, Coinbase spun up a dedicated team with the
| goal of moving off of Datadog, and onto a
| Grafana/Prometheus/Clickhouse stack.
|
| We recently did the same, and our Datadog bill was only five
| figures. We're finding the new stack to not be a poor man's
| anything, but more flexible, complete and manageable than yet
| another SaaS. With just a little extra learning curve
| observability is a domain where open source trounces proprietary,
| and not just if you don't have money to set on fire.
| oulipo wrote:
| Have you tried the ClickStack?
| https://news.ycombinator.com/item?id=44194082
| asnyder wrote:
| There's also https://openobserve.ai, while not as stable as
| Grafana/Prometheus/Clickhouse, feels a bit easier to setup and
| manage. Though has a bit of ways to go, does the basics and more
| without issue.
|
| Crazy crazy they spent so much on observability. Even with
| DataDog they could've optimized that spend. DataDog does lots of
| bad things with billing where by default, especially with on-
| demand instances you get charged significantly more than you
| should as they have (had?) pretty deficient counting towards
| instance hours and instances.
|
| For example, rather than run the agent (which counts as an
| instance regardless of if it's on for a minute), you can send the
| logs, metrics, etc. directly to their ingestion endpoints and not
| have those instances counted towards their usage other than log
| and metric usage.
|
| Maybe at that level they don't even get into actual by usage
| anymore, and they just negotiate arbitrary amounts for some
| absurd quota of use.
| ljm wrote:
| I wonder how much that no-expense-spared, money-is-no-object
| attitude to buying SaaS impacts an engineers ability to make
| sensible decisions around infra and architecture. Coinbase might
| have been fine blowing 65 mil but take that approach to a new
| startup and you could trivially eat up a significant amount of
| runway with it.
|
| I won't single out Datadog on this because the exact same thing
| happens with cloud spend, and it's very literally burning money.
| swyx wrote:
| the visible cost of burning runway on a bill is very often far
| less than the invisible cost of burning engineer time
| rebuilding undifferentiated heavy lifting rather than working
| on product/customer needs
| 9283409232 wrote:
| People say this but I wonder about this from time to time. I
| don't think anyone is asking to rebuild datadog from scratch
| for your company but surely it's worth it to migrate to
| something not as expensive even if it takes a bit of elbow
| grease.
| closeparen wrote:
| Assuming there's nothing else you could do with that elbow
| grease that would create more value than the SaaS bill
| costs.
| pphysch wrote:
| Most of the complexity in observability is clientside.
|
| It is not hard to spin up Grafana and VictoriaMetrics (and
| now VictoriaLogs) and keep them running. It is not hard to
| build a Grafana dashboard that correlates data across both
| metrics and logs sources, and alerting functionality is
| pretty good now.
|
| The "heavy lift" is instrumenting your applications and
| infrastructure to provide valuable metrics and logs without
| exceeding a performance budget. I'm skeptical that Datadog
| actually does much of that heavy-lifting and that they are
| actually worth the money. You can probably save 10x with
| same/better outcomes by paying for managed Grafana + managed
| DBs and a couple FTEs as observability experts.
| lerchmo wrote:
| You could hire 100 people to manage your timeseries data
| and save 70%
| viccis wrote:
| >I wonder how much that no-expense-spared, money-is-no-object
| attitude to buying SaaS impacts an engineers ability to make
| sensible decisions around infra and architecture
|
| I saw this a lot at a previous company. Being able to just
| "have more Lambdas scale up to handle it" got some very
| mediocre engineers past challenges they encountered. But it did
| so at the cost of wasting VAST amounts of money and saddling
| themselves with tech debt that completely hobbled the company's
| ability to scale.
|
| It was very frustrating to be too junior to be able to change
| minds. Even basic things like "I know it worked for you with
| old on-prem NFS designs but we shouldn't be storing our data in
| 100kb files in S3 and firing off thousands of Lambda
| invocations to process workloads, we should be storing it in
| 100mb files and using industry leading ETL frameworks on it".
| They were old school guys who hadn't adjusted to best practices
| for object storage and modern large scale data loads (this was
| a 1M event per second system) and so the company never really
| succeeded despite thousands of customers and loads of revenue.
|
| I consider cost consideration and profiling to be an essential
| skill that any engineer working in cloud style environments
| should have, but it's especially important that a staff
| engineer or person in a similar position have this skill set
| and be ready to grill people who come up with wasteful
| solutions.
| JohnMakin wrote:
| > Coinbase might have been fine blowing 65 mil but take that
| approach to a new startup and you could trivially eat up a
| significant amount of runway with it.
|
| Most startups are not going to have anywhere near the scale to
| generate anything approaching this bill.
|
| > I won't single out Datadog on this because the exact same
| thing happens with cloud spend, and it's very literally burning
| money.
|
| Unless you're in the business of deploying and maintaining
| production-ready datacenters at scale, it very literally isn't.
| closeparen wrote:
| That's the point of usage-based pricing: it's cheap to adopt
| when you're small.
| abxyz wrote:
| (May 2023)
| everfrustrated wrote:
| >Originally published on 11 May 2023
| cybice wrote:
| An article that's basically an ad for Datadog: Pay us a ton of
| money - it's still cheaper in the long run.
| decimalenough wrote:
| > _Assume that Datadog cuts the number of outages by half, by
| preventing them with early monitoring. That would mean that
| without Datadog, we'd look at 24 hours' worth of downtime, not
| 12. Let's also assume that using Datadog results in mitigating
| outages 50% faster than without - thanks to being able to connect
| health metrics with logs, debug faster, pinpoint the root cause
| and mitigate faster. In that case, without Datadog, we could be
| looking at 36 hours worth of total downtime, versus the 12 hours
| with Datadog. To put it in numbers: the company would make around
| $9M in revenue it would otherwise lose, Now that $10M /year fee
| practically pays for itself!_
|
| Those are some pretty heroic assumptions. In particular, they
| assume the only options are Datadog or nothing, when there are
| far cheaper alternatives like the Prometheus/Grafana/Clickhouse
| stack mentioned in the article itself.
| passivepinetree wrote:
| Another assumption that bothers me here is that the $9M in
| revenue would be completely lost during an outage. I imagine
| many customers would simply wait until the outage was resolved
| before performing their intended transactions, meaning far less
| than $9M would be lost.
| calt wrote:
| On the other hand, customers can become frustrated at being
| unable to trade when they need during an outage to and go to
| a competitor.
| secondcoming wrote:
| We are moving from Datadog to Prometheus/Grafana and it's
| really not all a bed of roses. You'll need monitoring on your
| monitoring.
| cloudking wrote:
| What problems does Datadog solve that you can't solve with
| cheaper solutions?
| therein wrote:
| I should have known it was Coinbase. I know that Coinbase used to
| spend $35,000 a month to back up the data directory of ETH nodes.
| aeyes wrote:
| > we really work with customers to restructure their contracts
|
| Does anyone have such an experience with Datadog? A few million
| wasn't enough to get them to talk about anything, always paid
| list price and there was no negotiating either when they
| restructured their pricing.
| GuinansEyebrows wrote:
| > To put it in numbers: the company would make around $9M in
| revenue it would otherwise lose, Now that $10M/year fee
| practically pays for itself!
|
| am i misunderstanding, or is the author saying it's better to
| spend $10m than $9m?
| gneray wrote:
| This person is like the Gossip Guy of tech. Who cares?
| generalpf wrote:
| When did this guy stop writing about engineering and start
| running a tech gossip rag?
| willejs wrote:
| I have run ELK, Grafana + Prom, Grafana + Thanos/Coretex, New
| relic and all of the more traditional products for
| monitoring/observability. More recently in the last few years, I
| have been running full observability stacks via either The
| Grafana LGTM stack or datadog at a reasonable scale and
| complexities. Ultimately you want one tool that can alert you off
| a metric, present you some traces, and drill down into logs, all
| the way down the stack.
|
| I have found Datadog to be, by far hands down the best developer
| experience from the get go, the way it glues the mostly decent
| products together is unparalleled in comparison to other products
| (Grafana cloud/LGTM). I usually say if your at a small to medium
| scale business just makes sense, IF you understand the product
| and configure it correctly which is reasonably easy. The seamless
| integration between tracing, logging and metrics in the platform,
| which you can then easily combine with alerts is great. However,
| its easy to misconfigure it and spend a lot of money on seemingly
| nothing. If you do not implement tracing and structured logs (at
| the right volume and level) with trace/span ids etc all the way
| through services its hard to see the value, and seems expensive.
| It requires some good knowledge, and configuration of the product
| to make it pay off. The rest of the product features are
| generally good, for example their security suite is a good entry
| level to cloud security monitoring and SEIM too.
|
| However, when you get to a certain scale, the cost of APM and
| Infrastructure hosts in Datadog can become become somewhat
| prohibitive. Also, Datadogs custom metrics pricing is somewhat
| expensive and its query language cababilities does not quite
| match the power of promql, and you start to find yourself needed
| them to debug issues. At that point, the self hosted LGTM stack
| starts to make sense, however, it involves a lot more education
| for end users in both integration (a little less now Otel is
| popular) and querying/building dashboards etc, but also running
| it yourself. The grafana cloud platform is more attractive
| though.
___________________________________________________________________
(page generated 2025-06-30 23:00 UTC)