[HN Gopher] Show HN: OneUptime - open-source Datadog Alternative
       ___________________________________________________________________
        
       Show HN: OneUptime - open-source Datadog Alternative
        
       Author : devneelpatel
       Score  : 176 points
       Date   : 2024-04-02 08:22 UTC (14 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jascha_eng wrote:
       | I always liked Datadog as a product but it's also true that it is
       | simply way too expensive if you don't spend significant time cost
       | optimizing. But hosting it myself doesn't really seem like a
       | great solution, I rather invest time in making my app robust than
       | making my monitoring stable.
        
         | bibliotekka wrote:
         | expensive and totally weird pricing structures make it hard to
         | predict costs
        
           | bdcravens wrote:
           | Haven't used their product in years due to terrible unethical
           | sales practices, but when we did, they billed hourly while
           | AWS billed by the second. As such, it was easy to have a
           | monitoring bill that was much higher than the actual
           | resources being monitored (for example, if you had a lot of
           | instances that terminated and relaunched with short
           | durations)
        
         | devneelpatel wrote:
         | You dont have to host it if you dont want to, we have a SaaS
         | service as well at https://oneuptime.com
        
           | AndyKluger wrote:
           | I understand this is probably not a priority or concern at
           | all, but FYI this is what the page looks like with uBlock
           | Origin configured as what they call "hard mode:"
           | https://cdn.imgchest.com/files/my2pc6adm87.mp4
        
         | pranay01 wrote:
         | You should also check out SigNoz [1], we are an open-core
         | alternative to DataDog - based natively on OpenTelemetry. We
         | also have a cloud product if you don't want to host yourself
         | 
         | [1] https://signoz.io
        
       | remram wrote:
       | How does this compare to the usual Grafana stack?
        
         | devneelpatel wrote:
         | Some of the differences are:
         | 
         | Grafana started as visualization tool and has now decoupled
         | multiple products for observability - LGTM stack (Loki for
         | logs, Grafana for visualization, Tempo for traces, and Mimir
         | for metrics). You need to configure and maintain multiple sub-
         | products for a full-stack observability setup.
         | 
         | While grafana stack is great, OneUptime has all of these in one
         | platform and makes it really simple to use.
         | 
         | We also are built natively on OpenTelmetry, use clickhouse for
         | logs / metrics / traces storage - so queries are really fast.
        
       | Rapzid wrote:
       | Lot of interesting OSS observability products coming out in
       | recent years. One of the more impressive(and curious for many
       | reasons) IMHO is OpenObserve:
       | https://github.com/openobserve/openobserve .
       | 
       | As opposed to just a stack, they are implementing just about the
       | whole backend shebang from scratch.
        
         | bbkane wrote:
         | And (in a simple, non-TLS, local storage configuration), as a
         | static binary OpenObserve is incredibly easy to install locally
         | or with Ansible (
         | https://github.com/bbkane/shovel_ansible/blob/master/openobs...
         | )
         | 
         | As a guy running this for personal infrastructure, I super
         | appreciate the easy install and low system requirements
        
         | agilob wrote:
         | Apache Skywalking is doing the same
         | https://skywalking.apache.org/
        
         | esafak wrote:
         | How does it compare to competitors, and what are the
         | differences between the cloud offering and the open source
         | version? Their web site barely mentions the open source part.
        
         | intelVISA wrote:
         | eBPF unleashed a wave of these sorta weekend projects and I
         | kinda love it even if the value prop is fairly minimal.
        
       | remram wrote:
       | I don't understand what this uses for a storage backend. Object
       | storage? DBMS? Custom? I see references to Clickhouse in the
       | repo...
        
         | OJFord wrote:
         | The docker compose file has clickhouse and postgres.
        
         | devneelpatel wrote:
         | Postgres for tx data. Clickhouse for storing logs / traces /
         | metrics.
        
       | blitzar wrote:
       | Is that all datadog is?
       | 
       | I read the horror stories, the monthly bills of 10's of thousands
       | for one server and just assumed there was something more
       | substantial to the product; like they did something
       | groundbreaking or novel. I never cared enough to actually look
       | and see what they did.
       | 
       | I use uptime-kuma - https://github.com/louislam/uptime-kuma - it
       | obviously does a fraction of what these other things do but it
       | does everything I need.
        
         | hhhhhhhmmmmmmm wrote:
         | I'm a former datadog user from a series D tech scale-up, and
         | yeah the horror stories of billing are true.
         | 
         | Grafana, Victoria Metrics or Honeycomb
        
         | jcims wrote:
         | Datadog is a pretty amazing product and the folks that build it
         | should be proud of what they have done. BUT it's extremely
         | expensive and most people don't use all the features. It's like
         | Splunk, 99% of people havent invested the time or energy to get
         | the full value of the product they are paying for.
        
         | phillipcarter wrote:
         | Just about every advertised Datadog alternative does maybe 10%
         | of what Datadog can do, and likely has hundreds less pluggable
         | integrations than Datadog. While it may be the case that it's
         | overkill for a simple application, one of the biggest benefits
         | to Datadog is that there's an integration for just about
         | anything, and the product can go _deep_ if you need it to.
         | 
         | The "omg my bill is out of control" issue is usually manifest
         | from a few sources, one of the biggest is relying heavily on
         | custom metrics added over time, and so you think you're paying
         | X but really you end up paying 2-3X or more by the end of the
         | year. But the tricky thing is, most of those things that cost a
         | lot of money either have a lot of value, or held a lot of value
         | at the time.
         | 
         | (I work for a Datadog competitor)
        
           | bdcravens wrote:
           | For us it was the mismatch between AWS and Datadog billing
           | (AWS bills by the second, Datadog bills by the hour, so you
           | should only ever use Datadog for persistent instances, not
           | high churn instances like dynamic background jobs, or else
           | completely rearchitect your application for the benefit of a
           | vendor)
        
             | mikeshi42 wrote:
             | This is incredibly common - I've heard a company end up
             | rearchitecting their instance type choices due to Datadog
             | billing on a per-node basis (with some peak usage billing
             | shenanigans). Their business model unfortunately encourages
             | some very specific architectures which doesn't work for
             | everyone.
        
           | icelancer wrote:
           | Correct. Same as New Relic.
           | 
           | I'd love an open source alternative. But there just isn't one
           | for APM (which is our main use case). Nothing comes close.
           | Every time I see "OpenTelemetry integration" I just close the
           | page. Hours and hours of manual setup, code pushes, etc while
           | New Relic installs once and works.
           | 
           | I assume it's the same for people who use DataDog
           | begrudgingly.
        
             | sofixa wrote:
             | > I'd love an open source alternative. But there just isn't
             | one for APM (which is our main use case). Nothing comes
             | close. Every time I see "OpenTelemetry integration" I just
             | close the page. Hours and hours of manual setup, code
             | pushes, etc while New Relic installs once and works.
             | 
             | Depending on the language/environment/framework,
             | OpenTelemetry Autoinstrumentation just works. It's the new
             | standard, and lots of working is ongoing to make it work
             | for everything, everywhere, and even the big observability
             | vendors are adopting it.
        
             | mikeshi42 wrote:
             | I'm wondering when's the last time you tried OpenTelemetry
             | - and which language it was in? I'm not going to say it's
             | super mature (it's not) - but I think it's come a long way
             | from being a ton of manual setup and it's more akin to SDKs
             | available commercially. Admittedly we (HyperDX) do offer
             | some wrapped OpenTelemetry SDKs ourselves to users to make
             | it even easier - but I think the base Otel instrumentation
             | is easy enough as it is.
        
         | devneelpatel wrote:
         | Oddly enough, this is why we started OneUptime in the first
         | place. We were burned by the DataDog bill and wanted an open
         | source observability platform ourselves.
        
           | flyingpenguin wrote:
           | I imagine datadog's AWS bill is also out of control,
           | considering all the absurd levels of queries/groupings you
           | can do.
           | 
           | I used to work on a growing AWS product with tons of features
           | that no one used.
           | 
           | Often when we were creating a feature, our managers would
           | have us include tags and support for making parts of the
           | feature optional, but make sure no parts of the feature (or
           | the feature itself) where optional to start with. We would
           | enable the ability to toggle the feature if "A significant
           | enough amount of customers weighted by revenue requested it".
           | 
           | Also got the "Build filtering, but don't expose it unless we
           | have to".
        
       | anonzzzies wrote:
       | What does it use for integration/workflow; frontend seems theirs
       | but backend seems not in the repos. I saw more solutions like
       | this boasting '5000+' integrations but I cannot find the code for
       | that (I might have missed it)?
        
         | devneelpatel wrote:
         | I believe this is what you're looking for:
         | https://oneuptime.com/product/workflows
         | 
         | OneUptime is 100% open source, and always will be. It's a mono-
         | repo and backend is divided into few services.
        
           | anonzzzies wrote:
           | I guess I should get into the chat with you, but I meant; you
           | integrate with, let's say, Jira; there is no jira calling
           | code in the repos, so how does that happen. Bit too in depth
           | for here maybe.
        
             | devneelpatel wrote:
             | Oh yes! there's no native Jira integration so far - but we
             | have some customers who already integrate oneuptime with
             | Jira. They integrate Jira through workflow webhooks in
             | OneUptime - so when an issue is created in Jira, incident
             | is created in OneUptime and all Status Page subscribers are
             | notified.
        
               | anonzzzies wrote:
               | Then I think the wording is a bit confusing for new
               | clients. I expect, when you say that you can integrate,
               | not that I have to build that myself.
        
       | Xcelerate wrote:
       | Question for those in the observability space: do moment-in-time
       | observations preserve all of the dimensions of the event, and if
       | so, how do most observability platforms compress the high volume
       | of (ostensibly) low-rank data?
        
         | withinboredom wrote:
         | > low-rank data?
         | 
         | It's "low-rank" until that one day systems start shitting the
         | bed, and you're trying to understand what is going on.
        
           | Xcelerate wrote:
           | Sure, but that's fine. You're only collecting high-rank data
           | for a short period of time, and the massive trove of
           | historical data lets you identify what's causing those
           | anomalies quickly.
        
         | rbetts wrote:
         | There are three-ish strategies (usually employed in combination
         | at scale).
         | 
         | Columnar databases are very good at compressing time series
         | data which often has runs of repeating values that can be run
         | length encoded, repeating deltas (store the delta not the full
         | value), or common strings that can be dictionary encoded. So
         | you can persist a lot of raw data with quite good compression
         | and fast-scannability. Most commercial TSDBs are now backed by
         | column stores. And several now tier with local SSDs for hot
         | data and S3 for colder data.
         | 
         | If that's still too much data to store, you have to start
         | throwing some away. Both sampling and materializing aggregates
         | (and discarding the raw data) are popular techniques and can
         | both be very reasonable trade offs.
        
       | pdimitar wrote:
       | I can see some commits mentioning telemetry but it's not at all
       | mentioned on the GitHub README. Strange.
       | 
       | It looks solid and I'd try it if the need arises.
        
         | devneelpatel wrote:
         | https://oneuptime.com/product/apm
        
       | sidcool wrote:
       | I may be cynical here, but I find that all open source datadog
       | alternatives are mostly frontend focussed with an out of the box
       | database. And it does not scale well. It's not easy to maintain,
       | scale, shard etc. Am I wrong?
       | 
       | P.S. I am all for OSS.
        
         | swyx wrote:
         | presumably that is the right place to start for a datadog
         | competitor because ddog is not going to care about smol
         | instances that arent at scale that they can't charge a
         | bajillion for
        
           | esafak wrote:
           | But you have to be ready for day two. Nobody is going to pick
           | an observability solution that does not scale. Some
           | scalability is table stakes.
        
         | mikeshi42 wrote:
         | We're one of those OSS alternatives (HyperDX) - built on
         | Clickhouse. While I can't say it's stupid simple to scale
         | Clickhouse (because anything stateful is inherently hard to
         | scale), it's orders of magnitude easier than other platforms
         | like Elastic and gives you a lot more flexible tuning options
         | (last co I was at, we ran that at massive scale, it was an
         | absolute handful).
         | 
         | In theory, you can get away with just running a Clickhouse
         | instance purely backed by S3 to get durability + scalability
         | (at the cost of performance of course). It all depends on what
         | scale you're running at and the HA/performance requirements you
         | need.
        
         | francoismassot wrote:
         | Quickwit is an alternative with a strong focus on scalability
         | (max we have seen is 40PB) with a decoupled compute and storage
         | architecture. But we do only logs and traces for now.
         | 
         | Repository: https://github.com/quickwit-oss/quickwit Latest
         | release: https://quickwit.io/blog/quickwit-0.8
        
       | htrp wrote:
       | https://oneuptime.com/ also makes it a managed service to compete
       | with datadog
        
       | nurettin wrote:
       | I make apps call home and aggregate incoming pings in grafana,
       | because some of them are behind a vpn.
        
       | nodesocket wrote:
       | If using the Helm chart to install, does it also automatically
       | monitor the cluster that oneuptime is installed on? Didn't see
       | the Kubernetes integration docs
        
       | mosselman wrote:
       | If you self-host, do you still need a paid plan? Are there any
       | limitations?
        
       | JohnMakin wrote:
       | Multiple times in my career at a new job I've had to build a kind
       | of bootstrapped OSS-based observability platform (mostly a
       | mixture of prometheus and grafana typically) which can come with
       | more overhead than you'd think and doesn't usually provide a lot
       | of the analytical stuff without stitching together a bunch of
       | things. Datadog out of the box gives you everything you can
       | possibly want - as others have stated bills can balloon, for
       | instance if you run spot instances or have a lot of hosts coming
       | up/down I believe it bills you for each one even if it's short
       | lived.
       | 
       | I've been working with an enterprise license for a year and while
       | I don't really hear too much about cost, some simple
       | considerations to the design of the infrastructure it was
       | supporting seems to have prevented a ballooning bill (so far).
       | 
       | So for me, not having the engineering time or buy-in to build a
       | whole home grown observability platform by using OSS tools like
       | this (and all the quirks that can come with them) ends up being a
       | lot more expensive than just sucking it up and buying an
       | enterprise plan. At least so far.
       | 
       | If I had the option to do it from scratch how I wanted, with no
       | time or budget constraints, I'd prefer of course not to be
       | beholden to a major SaaS company that charges for ambiguous
       | things that are hard to predict like "per host", because it's
       | quite easy for these services to bury themselves so deep into
       | your infrastructure that you just bite the bullet on whatever
       | inevitable rug pull or price increase comes next. It has happened
       | to me before managing enterprise Hashicorp Vault.
        
       | markhalonen wrote:
       | we've been pretty happy with just a Clickhouse DB and sending
       | metrics directly from api servers to Clickhouse HTTP
       | https://clickhouse.com/docs/en/interfaces/http . Hook up Grafana
       | and you have a nice raw SQL (our team loves SQL) Grafana
       | dashboard.
        
       ___________________________________________________________________
       (page generated 2024-04-02 23:01 UTC)