[HN Gopher] Show HN: OneUptime - open-source Datadog Alternative
___________________________________________________________________
Show HN: OneUptime - open-source Datadog Alternative
Author : devneelpatel
Score : 176 points
Date : 2024-04-02 08:22 UTC (14 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| jascha_eng wrote:
| I always liked Datadog as a product but it's also true that it is
| simply way too expensive if you don't spend significant time cost
| optimizing. But hosting it myself doesn't really seem like a
| great solution, I rather invest time in making my app robust than
| making my monitoring stable.
| bibliotekka wrote:
| expensive and totally weird pricing structures make it hard to
| predict costs
| bdcravens wrote:
| Haven't used their product in years due to terrible unethical
| sales practices, but when we did, they billed hourly while
| AWS billed by the second. As such, it was easy to have a
| monitoring bill that was much higher than the actual
| resources being monitored (for example, if you had a lot of
| instances that terminated and relaunched with short
| durations)
| devneelpatel wrote:
| You dont have to host it if you dont want to, we have a SaaS
| service as well at https://oneuptime.com
| AndyKluger wrote:
| I understand this is probably not a priority or concern at
| all, but FYI this is what the page looks like with uBlock
| Origin configured as what they call "hard mode:"
| https://cdn.imgchest.com/files/my2pc6adm87.mp4
| pranay01 wrote:
| You should also check out SigNoz [1], we are an open-core
| alternative to DataDog - based natively on OpenTelemetry. We
| also have a cloud product if you don't want to host yourself
|
| [1] https://signoz.io
| remram wrote:
| How does this compare to the usual Grafana stack?
| devneelpatel wrote:
| Some of the differences are:
|
| Grafana started as visualization tool and has now decoupled
| multiple products for observability - LGTM stack (Loki for
| logs, Grafana for visualization, Tempo for traces, and Mimir
| for metrics). You need to configure and maintain multiple sub-
| products for a full-stack observability setup.
|
| While grafana stack is great, OneUptime has all of these in one
| platform and makes it really simple to use.
|
| We also are built natively on OpenTelmetry, use clickhouse for
| logs / metrics / traces storage - so queries are really fast.
| Rapzid wrote:
| Lot of interesting OSS observability products coming out in
| recent years. One of the more impressive(and curious for many
| reasons) IMHO is OpenObserve:
| https://github.com/openobserve/openobserve .
|
| As opposed to just a stack, they are implementing just about the
| whole backend shebang from scratch.
| bbkane wrote:
| And (in a simple, non-TLS, local storage configuration), as a
| static binary OpenObserve is incredibly easy to install locally
| or with Ansible (
| https://github.com/bbkane/shovel_ansible/blob/master/openobs...
| )
|
| As a guy running this for personal infrastructure, I super
| appreciate the easy install and low system requirements
| agilob wrote:
| Apache Skywalking is doing the same
| https://skywalking.apache.org/
| esafak wrote:
| How does it compare to competitors, and what are the
| differences between the cloud offering and the open source
| version? Their web site barely mentions the open source part.
| intelVISA wrote:
| eBPF unleashed a wave of these sorta weekend projects and I
| kinda love it even if the value prop is fairly minimal.
| remram wrote:
| I don't understand what this uses for a storage backend. Object
| storage? DBMS? Custom? I see references to Clickhouse in the
| repo...
| OJFord wrote:
| The docker compose file has clickhouse and postgres.
| devneelpatel wrote:
| Postgres for tx data. Clickhouse for storing logs / traces /
| metrics.
| blitzar wrote:
| Is that all datadog is?
|
| I read the horror stories, the monthly bills of 10's of thousands
| for one server and just assumed there was something more
| substantial to the product; like they did something
| groundbreaking or novel. I never cared enough to actually look
| and see what they did.
|
| I use uptime-kuma - https://github.com/louislam/uptime-kuma - it
| obviously does a fraction of what these other things do but it
| does everything I need.
| hhhhhhhmmmmmmm wrote:
| I'm a former datadog user from a series D tech scale-up, and
| yeah the horror stories of billing are true.
|
| Grafana, Victoria Metrics or Honeycomb
| jcims wrote:
| Datadog is a pretty amazing product and the folks that build it
| should be proud of what they have done. BUT it's extremely
| expensive and most people don't use all the features. It's like
| Splunk, 99% of people havent invested the time or energy to get
| the full value of the product they are paying for.
| phillipcarter wrote:
| Just about every advertised Datadog alternative does maybe 10%
| of what Datadog can do, and likely has hundreds less pluggable
| integrations than Datadog. While it may be the case that it's
| overkill for a simple application, one of the biggest benefits
| to Datadog is that there's an integration for just about
| anything, and the product can go _deep_ if you need it to.
|
| The "omg my bill is out of control" issue is usually manifest
| from a few sources, one of the biggest is relying heavily on
| custom metrics added over time, and so you think you're paying
| X but really you end up paying 2-3X or more by the end of the
| year. But the tricky thing is, most of those things that cost a
| lot of money either have a lot of value, or held a lot of value
| at the time.
|
| (I work for a Datadog competitor)
| bdcravens wrote:
| For us it was the mismatch between AWS and Datadog billing
| (AWS bills by the second, Datadog bills by the hour, so you
| should only ever use Datadog for persistent instances, not
| high churn instances like dynamic background jobs, or else
| completely rearchitect your application for the benefit of a
| vendor)
| mikeshi42 wrote:
| This is incredibly common - I've heard a company end up
| rearchitecting their instance type choices due to Datadog
| billing on a per-node basis (with some peak usage billing
| shenanigans). Their business model unfortunately encourages
| some very specific architectures which doesn't work for
| everyone.
| icelancer wrote:
| Correct. Same as New Relic.
|
| I'd love an open source alternative. But there just isn't one
| for APM (which is our main use case). Nothing comes close.
| Every time I see "OpenTelemetry integration" I just close the
| page. Hours and hours of manual setup, code pushes, etc while
| New Relic installs once and works.
|
| I assume it's the same for people who use DataDog
| begrudgingly.
| sofixa wrote:
| > I'd love an open source alternative. But there just isn't
| one for APM (which is our main use case). Nothing comes
| close. Every time I see "OpenTelemetry integration" I just
| close the page. Hours and hours of manual setup, code
| pushes, etc while New Relic installs once and works.
|
| Depending on the language/environment/framework,
| OpenTelemetry Autoinstrumentation just works. It's the new
| standard, and lots of working is ongoing to make it work
| for everything, everywhere, and even the big observability
| vendors are adopting it.
| mikeshi42 wrote:
| I'm wondering when's the last time you tried OpenTelemetry
| - and which language it was in? I'm not going to say it's
| super mature (it's not) - but I think it's come a long way
| from being a ton of manual setup and it's more akin to SDKs
| available commercially. Admittedly we (HyperDX) do offer
| some wrapped OpenTelemetry SDKs ourselves to users to make
| it even easier - but I think the base Otel instrumentation
| is easy enough as it is.
| devneelpatel wrote:
| Oddly enough, this is why we started OneUptime in the first
| place. We were burned by the DataDog bill and wanted an open
| source observability platform ourselves.
| flyingpenguin wrote:
| I imagine datadog's AWS bill is also out of control,
| considering all the absurd levels of queries/groupings you
| can do.
|
| I used to work on a growing AWS product with tons of features
| that no one used.
|
| Often when we were creating a feature, our managers would
| have us include tags and support for making parts of the
| feature optional, but make sure no parts of the feature (or
| the feature itself) where optional to start with. We would
| enable the ability to toggle the feature if "A significant
| enough amount of customers weighted by revenue requested it".
|
| Also got the "Build filtering, but don't expose it unless we
| have to".
| anonzzzies wrote:
| What does it use for integration/workflow; frontend seems theirs
| but backend seems not in the repos. I saw more solutions like
| this boasting '5000+' integrations but I cannot find the code for
| that (I might have missed it)?
| devneelpatel wrote:
| I believe this is what you're looking for:
| https://oneuptime.com/product/workflows
|
| OneUptime is 100% open source, and always will be. It's a mono-
| repo and backend is divided into few services.
| anonzzzies wrote:
| I guess I should get into the chat with you, but I meant; you
| integrate with, let's say, Jira; there is no jira calling
| code in the repos, so how does that happen. Bit too in depth
| for here maybe.
| devneelpatel wrote:
| Oh yes! there's no native Jira integration so far - but we
| have some customers who already integrate oneuptime with
| Jira. They integrate Jira through workflow webhooks in
| OneUptime - so when an issue is created in Jira, incident
| is created in OneUptime and all Status Page subscribers are
| notified.
| anonzzzies wrote:
| Then I think the wording is a bit confusing for new
| clients. I expect, when you say that you can integrate,
| not that I have to build that myself.
| Xcelerate wrote:
| Question for those in the observability space: do moment-in-time
| observations preserve all of the dimensions of the event, and if
| so, how do most observability platforms compress the high volume
| of (ostensibly) low-rank data?
| withinboredom wrote:
| > low-rank data?
|
| It's "low-rank" until that one day systems start shitting the
| bed, and you're trying to understand what is going on.
| Xcelerate wrote:
| Sure, but that's fine. You're only collecting high-rank data
| for a short period of time, and the massive trove of
| historical data lets you identify what's causing those
| anomalies quickly.
| rbetts wrote:
| There are three-ish strategies (usually employed in combination
| at scale).
|
| Columnar databases are very good at compressing time series
| data which often has runs of repeating values that can be run
| length encoded, repeating deltas (store the delta not the full
| value), or common strings that can be dictionary encoded. So
| you can persist a lot of raw data with quite good compression
| and fast-scannability. Most commercial TSDBs are now backed by
| column stores. And several now tier with local SSDs for hot
| data and S3 for colder data.
|
| If that's still too much data to store, you have to start
| throwing some away. Both sampling and materializing aggregates
| (and discarding the raw data) are popular techniques and can
| both be very reasonable trade offs.
| pdimitar wrote:
| I can see some commits mentioning telemetry but it's not at all
| mentioned on the GitHub README. Strange.
|
| It looks solid and I'd try it if the need arises.
| devneelpatel wrote:
| https://oneuptime.com/product/apm
| sidcool wrote:
| I may be cynical here, but I find that all open source datadog
| alternatives are mostly frontend focussed with an out of the box
| database. And it does not scale well. It's not easy to maintain,
| scale, shard etc. Am I wrong?
|
| P.S. I am all for OSS.
| swyx wrote:
| presumably that is the right place to start for a datadog
| competitor because ddog is not going to care about smol
| instances that arent at scale that they can't charge a
| bajillion for
| esafak wrote:
| But you have to be ready for day two. Nobody is going to pick
| an observability solution that does not scale. Some
| scalability is table stakes.
| mikeshi42 wrote:
| We're one of those OSS alternatives (HyperDX) - built on
| Clickhouse. While I can't say it's stupid simple to scale
| Clickhouse (because anything stateful is inherently hard to
| scale), it's orders of magnitude easier than other platforms
| like Elastic and gives you a lot more flexible tuning options
| (last co I was at, we ran that at massive scale, it was an
| absolute handful).
|
| In theory, you can get away with just running a Clickhouse
| instance purely backed by S3 to get durability + scalability
| (at the cost of performance of course). It all depends on what
| scale you're running at and the HA/performance requirements you
| need.
| francoismassot wrote:
| Quickwit is an alternative with a strong focus on scalability
| (max we have seen is 40PB) with a decoupled compute and storage
| architecture. But we do only logs and traces for now.
|
| Repository: https://github.com/quickwit-oss/quickwit Latest
| release: https://quickwit.io/blog/quickwit-0.8
| htrp wrote:
| https://oneuptime.com/ also makes it a managed service to compete
| with datadog
| nurettin wrote:
| I make apps call home and aggregate incoming pings in grafana,
| because some of them are behind a vpn.
| nodesocket wrote:
| If using the Helm chart to install, does it also automatically
| monitor the cluster that oneuptime is installed on? Didn't see
| the Kubernetes integration docs
| mosselman wrote:
| If you self-host, do you still need a paid plan? Are there any
| limitations?
| JohnMakin wrote:
| Multiple times in my career at a new job I've had to build a kind
| of bootstrapped OSS-based observability platform (mostly a
| mixture of prometheus and grafana typically) which can come with
| more overhead than you'd think and doesn't usually provide a lot
| of the analytical stuff without stitching together a bunch of
| things. Datadog out of the box gives you everything you can
| possibly want - as others have stated bills can balloon, for
| instance if you run spot instances or have a lot of hosts coming
| up/down I believe it bills you for each one even if it's short
| lived.
|
| I've been working with an enterprise license for a year and while
| I don't really hear too much about cost, some simple
| considerations to the design of the infrastructure it was
| supporting seems to have prevented a ballooning bill (so far).
|
| So for me, not having the engineering time or buy-in to build a
| whole home grown observability platform by using OSS tools like
| this (and all the quirks that can come with them) ends up being a
| lot more expensive than just sucking it up and buying an
| enterprise plan. At least so far.
|
| If I had the option to do it from scratch how I wanted, with no
| time or budget constraints, I'd prefer of course not to be
| beholden to a major SaaS company that charges for ambiguous
| things that are hard to predict like "per host", because it's
| quite easy for these services to bury themselves so deep into
| your infrastructure that you just bite the bullet on whatever
| inevitable rug pull or price increase comes next. It has happened
| to me before managing enterprise Hashicorp Vault.
| markhalonen wrote:
| we've been pretty happy with just a Clickhouse DB and sending
| metrics directly from api servers to Clickhouse HTTP
| https://clickhouse.com/docs/en/interfaces/http . Hook up Grafana
| and you have a nice raw SQL (our team loves SQL) Grafana
| dashboard.
___________________________________________________________________
(page generated 2024-04-02 23:01 UTC)