[HN Gopher] Show HN: OpenStatus - Open-source monitoring with in...
___________________________________________________________________
Show HN: OpenStatus - Open-source monitoring with incident
managements
Hey HN! We're Max and Thibault building OpenStatus.dev an
OpenSource synthetic monitoring platform with incident managements
1 min demo: https://twitter.com/mxkaske/status/1685666982786404352
We have just reached 2000 stars on GitHub
https://github.com/openstatusHQ/openstatus We are really excited
to hear your feedback/questions and connect further: our emails are
max@openstatus.dev and thibault@openstatus.dev. Thank you!
Author : tibozaurus
Score : 79 points
Date : 2023-10-02 16:52 UTC (6 hours ago)
(HTM) web link (www.openstatus.dev)
(TXT) w3m dump (www.openstatus.dev)
| ushakov wrote:
| Why should I use this instead of 100s others paid and open-source
| alternatives?
| [deleted]
| tibozaurus wrote:
| Because using our hosted solution you don't have to care about
| the infra :)
| ushakov wrote:
| That's a naive assumption. I can go with any of your
| competitors (Datadog, Checkly, BetterStack) and not care
| about the infra
| tibozaurus wrote:
| And mostly closed source :)
| lmeyerov wrote:
| It's worth coming up with a stronger public-facing answer
|
| We went through this last year, I think we have a public
| one, a private one, + our actual more 'serious' telemetry
| (opentelemetry, ...). For the status pages, I think one
| we don't pay for, and the other is like $20/yr.
|
| It's a crowded space, both open + closed, so clearer
| differentiation seems useful for your users and for your
| own journey: https://github.com/ivbeg/awesome-status-
| pages
| typosaur wrote:
| Is being open-source the only differentiator you have?
|
| Only software devs care about this. Your competitors make
| millions of $ annually, without being open-source.
| robertlagrant wrote:
| This seems a bit inappropriately toned. Plenty of
| businesses care about this, as it de-risks things if you
| know you can self-host if necessary.
| chrisandchris wrote:
| Yes, sure. But the difficulty about monitoring is not the
| hosting per-se, but hosting in different datacenters
| throughout the world abd keeping all these services up.
|
| My monitoring should not show "just down" if users from
| location A can't reach it but everyone else can.
| ushakov wrote:
| How many web-services out there are actually
| geographically distributed? Most companies just host
| everything in us-east1
| lucgagan wrote:
| I never understood the "de-risk" things angle. Is the
| idea that you'd self host if the service went under?
| tiberriver256 wrote:
| The site looks visually very good. Lots of typos in your English
| translations though.
| oooyay wrote:
| First, congrats on the launch!
|
| Why did you end up going with a SaaS model? 30 Euros or $31.50
| USD is pretty expensive for something like a status site. You'd
| have a lot less to manage day to day and be able to focus more on
| innovating the product if you _just_ sold the software, imo.
|
| Why the focus on synthetic monitoring? As a SRE, I actively
| eschew synthetic monitoring. It's highly error prone and doesn't
| actually indicate regional availability. I'd like a status site
| that I could push a certain internally derived SLA for a given
| service to and the status site reflects the average over time of
| that windowed SLA.
|
| SLA's are intended to incur customer refunds when they're
| violated if they're meaningful. If your synthetic monitoring
| shows an SLA of 4 nines but it was actually closer to 4.8 or 4.9
| then you could be on the hook for causing your customers a good
| bit of legal pain. Just something to think about in this space.
|
| Other status sites don't build external SLAs off of internal
| metrics because the process of deriving internal metrics that
| align with external outcomes is sufficiently difficult. Instead,
| they calculate an SLA based off of posted statuses over a period
| of time eg: Degraded, Down, Up. Supporting both modes could be a
| boon to potential customers.
|
| Overall looks like a great start; good luck on your venture!
| 101008 wrote:
| I don't know much about statuses pages, I just check them to
| see if the services I use are having an issue. It's the first
| time I read about "synthetic monitoring", and from a quick
| Google search, it seems to referring to "automatic monitoring".
| A bsic versino of this would be to do a ping to see if the
| server is responding, or a HTTP request to see if it's
| returning a 200 status code.
|
| However, if I read your comment carefully, you are suggesting
| to provide an alternative where the company (owner) could
| decide manually when a system is down or up. If that's the
| case, wouldn't the status page be just a page template where
| someone logs into a panel and toggle a button to say "down" or
| "up" and post updates? If there is no automatic monitoring, the
| service would look more like a blog/tumblr/twitter than
| anything else.
|
| Or probably I am missing something because of my lack of
| experience and I am curious, I'd like to know!
| oooyay wrote:
| Good question. Status sites usually advertise the
| availability of features. When your service to feature
| mapping is 1:1 with just a load balancer or a cache in
| between then it's relatively simple to calculate. The number
| of 500s on the load balancer, cache, or both indicates errors
| sent to users. As a company grows several services usually
| combine to form a single feature; think about how a company
| has a "sign in" feature. There's likely a service that
| handles typical username password auth, then one for SSO, one
| for passkey, etc... at this rate, you have several inputs but
| the outputs remain somewhat consistent. 500s seen on your
| most externally facing endpoints are errors to users.
|
| Now combine all of the above with a client that has retry
| capabilities. That client could be a modern web app or a
| desktop app. Eventually consistent systems often rely on
| retry behavior and rate limiting to achieve smooth user
| transitions. Now I can't simply rely on 500s being sent
| because they may indicate a timeout or a caching problem. Now
| I need to rely on statistics on specific endpoints that will
| _definitely_ result in a user facing error. Collecting that
| in real-time (real-time enough for alerting, anyway) is
| challenging as a company at that scale could be dealing with
| an abundance of requests per second.
|
| When SREs get into an incident they'll often try to determine
| customer impact in order to know what hemorrhaging to stop
| first. Looking at a list of 500s in a system like that is
| often unhelpful, so we'll build dashboards of specific
| endpoints that show a level of degradation eg: "Show me all
| requests that did not have 2xx where the number of retries is
| 3". In my contrived example the client shows an error after
| the third exponential retry. If you were calculating
| availability purely off of the number of 500s you're not
| actually calculating customer impact, you're calculating the
| number of errors. That said it's a lot easier said than done
| to build a data system to make a query like what I described,
| much less to export it. So in order to provide accurate
| information the status site is updated manually.
|
| On the flip side of what you described, some errors don't
| have a statistic. For instance, if I force rotate everyone's
| password and kill logins then I might post that on the status
| site as well. If it's the result of a security action or
| vulnerability I might declare the service degraded for a
| period of time.
| 101008 wrote:
| Thank you very very much for taking the time to write this
| explanation. I learnt a lot today :)
| lucgagan wrote:
| > Why the focus on synthetic monitoring? As a SRE, I actively
| eschew synthetic monitoring. It's highly error prone and
| doesn't actually indicate regional availability. I'd like a
| status site that I could push a certain internally derived SLA
| for a given service to and the status site reflects the average
| over time of that windowed SLA.
|
| As an end user, hard disagree.
|
| GitHub is a great example of this. Their status almost always
| shows 100% uptime while the service is entirely unstable.
|
| It is clear that their uptime SLAs do not align with end user
| experience.
|
| As an end user, I care whether I can access and use the
| service. I don't care what broke in between.
| oooyay wrote:
| I suspect on GitHubs front this has to do with _how_ they
| populate their status site. They may update it manually once
| they identify customer impact. If they 're using internal
| metrics to qualify the status site then they're likely not
| using all of the needed metrics to reflect customer impact.
| There's also a third possibility which is that between you
| and GitHub there's something that causes a partition or
| failure that is outside of GitHub and your domain of control.
|
| I agree with you that the ultimate value is in customer
| impact. I was saying "that's hard" but synthetic monitoring
| is not the solution because it doesn't achieve what it sounds
| like it achieves.
| tibozaurus wrote:
| Thanks again !
|
| Tbh we haven't thought of the sla violation
|
| For region availability we are planing to add multi region
| check per Monitor
|
| At the moment you can only set one region per monitor
| jjtang1 wrote:
| Congrats on the launch, the early traction is great to see.
|
| Would be happy to jam more on my experiences building Rootly.com,
| an incident management platform on Slack used by Canva, Cockroach
| Labs, and others! :)
|
| -JJ
| mrfynd wrote:
| congrats on the launch! product looks cool. just one thing
| though: the dots make it difficult to focus on the design or read
| text. or is it just me?
| typosaur wrote:
| Do you really need 4 SaaS provides to run _this_?
|
| From your docs: tinybird, turso, clerk, Resend
| [deleted]
| drorn wrote:
| What are you specifically worried about?
| tibozaurus wrote:
| We use Tinybird to store the request data payload Turso for
| hosted SQLite Clerk for Auth Resend to send email
|
| We could have build everything by ourself or just just some
| providers to build faster when we launched we have chosen the
| latter
| nodesocket wrote:
| I run Uptime Kuma[1] in my home to monitor all my homelab and
| Kubernetes services. It's really awesome. How does OpenStatus
| compare to it?
|
| [1] https://github.com/louislam/uptime-kuma
| octagons wrote:
| I'd recommend working on editing any public-facing copy and
| unifying your message. I'm assuming I know what this tool does,
| but these examples do not validate that assumption. Is it a
| platform or a service? In 3 different locations, OpenStatus is
| listed as the "open source monitoring XYZ with...": "on-call
| managements", "Incident Management", and "beautiful status page".
|
| Examples copied from the landing page and front page of the GH
| repo.
|
| > Open-source monitoring service
|
| > OpenStatus is an open source monitoring services with on-call
| managements.
|
| > The Open-Source Synthetic Monitoring Platform with Incident
| Management
|
| > OpenStatus is open-source synthetic monitoring platform with
| beautiful status page.
|
| > The open-source monitoring platform
| [deleted]
| tibozaurus wrote:
| Thank for the feedback
|
| Tbh we have struggled to find the perfect messaging in the last
| month
|
| But we will take it into account thanks again
| wicktron wrote:
| Speaking of status pages, are there any that exist that can
| aggregate the status pages of various SaaS apps?
|
| Meaning - Let's say I'm a company that subscribes to many SaaS
| apps (ie: Google Workspace, Slack, Zoom, etc.), but want to
| create an internal dashboard to monitor those SaaS apps and alert
| my internal users. What options are available?
| ebcase wrote:
| Have a look at https://metrist.io/
|
| (I've been meaning to try it out but haven't yet.)
| messutied wrote:
| Founder of StatusPal here. Our status pages can do this :)
|
| We support configuring SaaS dependencies as part of your status
| page and alert your team members:
| https://www.statuspal.io/features/status-page
|
| Alternatively, you can try our free Slack app, where you can do
| similar directly in Slack: https://statuspal.io/status-
| center/slack/
| dandrew5 wrote:
| Check out https://statusvista.com/ - it has lots of SaaS apps
| vladvasiliu wrote:
| I've built my own that can read statspage.io based status pages
| and export them as prometheus metrics. Grafana then shows me
| the general status.
|
| https://github.com/vladvasiliu/statuspage-exporter
| snowstormsun wrote:
| 10 minute intervals and only 5 monitors is very limited for the
| hobby plan. Why shouldn't I use UptimeRobot (or any other
| alternative) instead which has 5 minute intervals and also only
| one status page for free?
| kc10 wrote:
| Congrats on the launch!!
|
| I am previously the founder of a synthetic monitoring startup,
| devraven.io.
|
| Just sharing my experience - monitoring is brutally competitive.
| From my conversations most large enterprises have very little
| synthetic monitoring, they use DDOG or other APM tools and do not
| want to try any new tools for few thousand dollar savings. And in
| a lot of cases they are comfortable with their custom test
| frameworks that use Selenium. Some are even worried that setting
| up synthetic monitoring will bring down their environment or
| trash their database with junk data ::sigh::
|
| Most smaller companies we spoke to are not mature enough to have
| monitoring and did not have resources who can setup monitoring.
| They used to ask us for help to build tests for them. Asks for
| discounts on $29.99/mo price point were not uncommon.
|
| After few months of operating the product, we did find few angels
| who were interested in investing in us (not the product). But in
| the end, we did not feel that we can make good use of investor
| money and provide a decent return to them, so we ended up backing
| out of the investment and chose to shutdown the product.
| jacooper wrote:
| Looks good, however i find the pricing a bit on the high side,
| especially compared to others like uptime robot and hettrix
| tools.
| tibozaurus wrote:
| Yep but we are on the same pricing range as BetterStack or
| Checkly based on the number of request we make per month month
| jaxn wrote:
| I had just been looking at open source status pages this morning,
| and this was not in the list I was looking at.
|
| OP, you might want to d a PR here:
| https://github.com/ivbeg/awesome-status-pages
|
| Everyone else might be interested in that list of similar
| projects.
| tibozaurus wrote:
| We are already in it :)
| jaxn wrote:
| Ah, under Services, not Open Source.
___________________________________________________________________
(page generated 2023-10-02 23:01 UTC)