[HN Gopher] Show HN: OpenStatus - Open-source monitoring with in...
       ___________________________________________________________________
        
       Show HN: OpenStatus - Open-source monitoring with incident
       managements
        
       Hey HN!  We're Max and Thibault building OpenStatus.dev an
       OpenSource synthetic monitoring platform with incident managements
       1 min demo: https://twitter.com/mxkaske/status/1685666982786404352
       We have just reached 2000 stars on GitHub
       https://github.com/openstatusHQ/openstatus  We are really excited
       to hear your feedback/questions and connect further: our emails are
       max@openstatus.dev and thibault@openstatus.dev.  Thank you!
        
       Author : tibozaurus
       Score  : 79 points
       Date   : 2023-10-02 16:52 UTC (6 hours ago)
        
 (HTM) web link (www.openstatus.dev)
 (TXT) w3m dump (www.openstatus.dev)
        
       | ushakov wrote:
       | Why should I use this instead of 100s others paid and open-source
       | alternatives?
        
         | [deleted]
        
         | tibozaurus wrote:
         | Because using our hosted solution you don't have to care about
         | the infra :)
        
           | ushakov wrote:
           | That's a naive assumption. I can go with any of your
           | competitors (Datadog, Checkly, BetterStack) and not care
           | about the infra
        
             | tibozaurus wrote:
             | And mostly closed source :)
        
               | lmeyerov wrote:
               | It's worth coming up with a stronger public-facing answer
               | 
               | We went through this last year, I think we have a public
               | one, a private one, + our actual more 'serious' telemetry
               | (opentelemetry, ...). For the status pages, I think one
               | we don't pay for, and the other is like $20/yr.
               | 
               | It's a crowded space, both open + closed, so clearer
               | differentiation seems useful for your users and for your
               | own journey: https://github.com/ivbeg/awesome-status-
               | pages
        
               | typosaur wrote:
               | Is being open-source the only differentiator you have?
               | 
               | Only software devs care about this. Your competitors make
               | millions of $ annually, without being open-source.
        
               | robertlagrant wrote:
               | This seems a bit inappropriately toned. Plenty of
               | businesses care about this, as it de-risks things if you
               | know you can self-host if necessary.
        
               | chrisandchris wrote:
               | Yes, sure. But the difficulty about monitoring is not the
               | hosting per-se, but hosting in different datacenters
               | throughout the world abd keeping all these services up.
               | 
               | My monitoring should not show "just down" if users from
               | location A can't reach it but everyone else can.
        
               | ushakov wrote:
               | How many web-services out there are actually
               | geographically distributed? Most companies just host
               | everything in us-east1
        
               | lucgagan wrote:
               | I never understood the "de-risk" things angle. Is the
               | idea that you'd self host if the service went under?
        
       | tiberriver256 wrote:
       | The site looks visually very good. Lots of typos in your English
       | translations though.
        
       | oooyay wrote:
       | First, congrats on the launch!
       | 
       | Why did you end up going with a SaaS model? 30 Euros or $31.50
       | USD is pretty expensive for something like a status site. You'd
       | have a lot less to manage day to day and be able to focus more on
       | innovating the product if you _just_ sold the software, imo.
       | 
       | Why the focus on synthetic monitoring? As a SRE, I actively
       | eschew synthetic monitoring. It's highly error prone and doesn't
       | actually indicate regional availability. I'd like a status site
       | that I could push a certain internally derived SLA for a given
       | service to and the status site reflects the average over time of
       | that windowed SLA.
       | 
       | SLA's are intended to incur customer refunds when they're
       | violated if they're meaningful. If your synthetic monitoring
       | shows an SLA of 4 nines but it was actually closer to 4.8 or 4.9
       | then you could be on the hook for causing your customers a good
       | bit of legal pain. Just something to think about in this space.
       | 
       | Other status sites don't build external SLAs off of internal
       | metrics because the process of deriving internal metrics that
       | align with external outcomes is sufficiently difficult. Instead,
       | they calculate an SLA based off of posted statuses over a period
       | of time eg: Degraded, Down, Up. Supporting both modes could be a
       | boon to potential customers.
       | 
       | Overall looks like a great start; good luck on your venture!
        
         | 101008 wrote:
         | I don't know much about statuses pages, I just check them to
         | see if the services I use are having an issue. It's the first
         | time I read about "synthetic monitoring", and from a quick
         | Google search, it seems to referring to "automatic monitoring".
         | A bsic versino of this would be to do a ping to see if the
         | server is responding, or a HTTP request to see if it's
         | returning a 200 status code.
         | 
         | However, if I read your comment carefully, you are suggesting
         | to provide an alternative where the company (owner) could
         | decide manually when a system is down or up. If that's the
         | case, wouldn't the status page be just a page template where
         | someone logs into a panel and toggle a button to say "down" or
         | "up" and post updates? If there is no automatic monitoring, the
         | service would look more like a blog/tumblr/twitter than
         | anything else.
         | 
         | Or probably I am missing something because of my lack of
         | experience and I am curious, I'd like to know!
        
           | oooyay wrote:
           | Good question. Status sites usually advertise the
           | availability of features. When your service to feature
           | mapping is 1:1 with just a load balancer or a cache in
           | between then it's relatively simple to calculate. The number
           | of 500s on the load balancer, cache, or both indicates errors
           | sent to users. As a company grows several services usually
           | combine to form a single feature; think about how a company
           | has a "sign in" feature. There's likely a service that
           | handles typical username password auth, then one for SSO, one
           | for passkey, etc... at this rate, you have several inputs but
           | the outputs remain somewhat consistent. 500s seen on your
           | most externally facing endpoints are errors to users.
           | 
           | Now combine all of the above with a client that has retry
           | capabilities. That client could be a modern web app or a
           | desktop app. Eventually consistent systems often rely on
           | retry behavior and rate limiting to achieve smooth user
           | transitions. Now I can't simply rely on 500s being sent
           | because they may indicate a timeout or a caching problem. Now
           | I need to rely on statistics on specific endpoints that will
           | _definitely_ result in a user facing error. Collecting that
           | in real-time (real-time enough for alerting, anyway) is
           | challenging as a company at that scale could be dealing with
           | an abundance of requests per second.
           | 
           | When SREs get into an incident they'll often try to determine
           | customer impact in order to know what hemorrhaging to stop
           | first. Looking at a list of 500s in a system like that is
           | often unhelpful, so we'll build dashboards of specific
           | endpoints that show a level of degradation eg: "Show me all
           | requests that did not have 2xx where the number of retries is
           | 3". In my contrived example the client shows an error after
           | the third exponential retry. If you were calculating
           | availability purely off of the number of 500s you're not
           | actually calculating customer impact, you're calculating the
           | number of errors. That said it's a lot easier said than done
           | to build a data system to make a query like what I described,
           | much less to export it. So in order to provide accurate
           | information the status site is updated manually.
           | 
           | On the flip side of what you described, some errors don't
           | have a statistic. For instance, if I force rotate everyone's
           | password and kill logins then I might post that on the status
           | site as well. If it's the result of a security action or
           | vulnerability I might declare the service degraded for a
           | period of time.
        
             | 101008 wrote:
             | Thank you very very much for taking the time to write this
             | explanation. I learnt a lot today :)
        
         | lucgagan wrote:
         | > Why the focus on synthetic monitoring? As a SRE, I actively
         | eschew synthetic monitoring. It's highly error prone and
         | doesn't actually indicate regional availability. I'd like a
         | status site that I could push a certain internally derived SLA
         | for a given service to and the status site reflects the average
         | over time of that windowed SLA.
         | 
         | As an end user, hard disagree.
         | 
         | GitHub is a great example of this. Their status almost always
         | shows 100% uptime while the service is entirely unstable.
         | 
         | It is clear that their uptime SLAs do not align with end user
         | experience.
         | 
         | As an end user, I care whether I can access and use the
         | service. I don't care what broke in between.
        
           | oooyay wrote:
           | I suspect on GitHubs front this has to do with _how_ they
           | populate their status site. They may update it manually once
           | they identify customer impact. If they 're using internal
           | metrics to qualify the status site then they're likely not
           | using all of the needed metrics to reflect customer impact.
           | There's also a third possibility which is that between you
           | and GitHub there's something that causes a partition or
           | failure that is outside of GitHub and your domain of control.
           | 
           | I agree with you that the ultimate value is in customer
           | impact. I was saying "that's hard" but synthetic monitoring
           | is not the solution because it doesn't achieve what it sounds
           | like it achieves.
        
         | tibozaurus wrote:
         | Thanks again !
         | 
         | Tbh we haven't thought of the sla violation
         | 
         | For region availability we are planing to add multi region
         | check per Monitor
         | 
         | At the moment you can only set one region per monitor
        
       | jjtang1 wrote:
       | Congrats on the launch, the early traction is great to see.
       | 
       | Would be happy to jam more on my experiences building Rootly.com,
       | an incident management platform on Slack used by Canva, Cockroach
       | Labs, and others! :)
       | 
       | -JJ
        
       | mrfynd wrote:
       | congrats on the launch! product looks cool. just one thing
       | though: the dots make it difficult to focus on the design or read
       | text. or is it just me?
        
       | typosaur wrote:
       | Do you really need 4 SaaS provides to run _this_?
       | 
       | From your docs: tinybird, turso, clerk, Resend
        
         | [deleted]
        
         | drorn wrote:
         | What are you specifically worried about?
        
         | tibozaurus wrote:
         | We use Tinybird to store the request data payload Turso for
         | hosted SQLite Clerk for Auth Resend to send email
         | 
         | We could have build everything by ourself or just just some
         | providers to build faster when we launched we have chosen the
         | latter
        
       | nodesocket wrote:
       | I run Uptime Kuma[1] in my home to monitor all my homelab and
       | Kubernetes services. It's really awesome. How does OpenStatus
       | compare to it?
       | 
       | [1] https://github.com/louislam/uptime-kuma
        
       | octagons wrote:
       | I'd recommend working on editing any public-facing copy and
       | unifying your message. I'm assuming I know what this tool does,
       | but these examples do not validate that assumption. Is it a
       | platform or a service? In 3 different locations, OpenStatus is
       | listed as the "open source monitoring XYZ with...": "on-call
       | managements", "Incident Management", and "beautiful status page".
       | 
       | Examples copied from the landing page and front page of the GH
       | repo.
       | 
       | > Open-source monitoring service
       | 
       | > OpenStatus is an open source monitoring services with on-call
       | managements.
       | 
       | > The Open-Source Synthetic Monitoring Platform with Incident
       | Management
       | 
       | > OpenStatus is open-source synthetic monitoring platform with
       | beautiful status page.
       | 
       | > The open-source monitoring platform
        
         | [deleted]
        
         | tibozaurus wrote:
         | Thank for the feedback
         | 
         | Tbh we have struggled to find the perfect messaging in the last
         | month
         | 
         | But we will take it into account thanks again
        
       | wicktron wrote:
       | Speaking of status pages, are there any that exist that can
       | aggregate the status pages of various SaaS apps?
       | 
       | Meaning - Let's say I'm a company that subscribes to many SaaS
       | apps (ie: Google Workspace, Slack, Zoom, etc.), but want to
       | create an internal dashboard to monitor those SaaS apps and alert
       | my internal users. What options are available?
        
         | ebcase wrote:
         | Have a look at https://metrist.io/
         | 
         | (I've been meaning to try it out but haven't yet.)
        
         | messutied wrote:
         | Founder of StatusPal here. Our status pages can do this :)
         | 
         | We support configuring SaaS dependencies as part of your status
         | page and alert your team members:
         | https://www.statuspal.io/features/status-page
         | 
         | Alternatively, you can try our free Slack app, where you can do
         | similar directly in Slack: https://statuspal.io/status-
         | center/slack/
        
         | dandrew5 wrote:
         | Check out https://statusvista.com/ - it has lots of SaaS apps
        
         | vladvasiliu wrote:
         | I've built my own that can read statspage.io based status pages
         | and export them as prometheus metrics. Grafana then shows me
         | the general status.
         | 
         | https://github.com/vladvasiliu/statuspage-exporter
        
       | snowstormsun wrote:
       | 10 minute intervals and only 5 monitors is very limited for the
       | hobby plan. Why shouldn't I use UptimeRobot (or any other
       | alternative) instead which has 5 minute intervals and also only
       | one status page for free?
        
       | kc10 wrote:
       | Congrats on the launch!!
       | 
       | I am previously the founder of a synthetic monitoring startup,
       | devraven.io.
       | 
       | Just sharing my experience - monitoring is brutally competitive.
       | From my conversations most large enterprises have very little
       | synthetic monitoring, they use DDOG or other APM tools and do not
       | want to try any new tools for few thousand dollar savings. And in
       | a lot of cases they are comfortable with their custom test
       | frameworks that use Selenium. Some are even worried that setting
       | up synthetic monitoring will bring down their environment or
       | trash their database with junk data ::sigh::
       | 
       | Most smaller companies we spoke to are not mature enough to have
       | monitoring and did not have resources who can setup monitoring.
       | They used to ask us for help to build tests for them. Asks for
       | discounts on $29.99/mo price point were not uncommon.
       | 
       | After few months of operating the product, we did find few angels
       | who were interested in investing in us (not the product). But in
       | the end, we did not feel that we can make good use of investor
       | money and provide a decent return to them, so we ended up backing
       | out of the investment and chose to shutdown the product.
        
       | jacooper wrote:
       | Looks good, however i find the pricing a bit on the high side,
       | especially compared to others like uptime robot and hettrix
       | tools.
        
         | tibozaurus wrote:
         | Yep but we are on the same pricing range as BetterStack or
         | Checkly based on the number of request we make per month month
        
       | jaxn wrote:
       | I had just been looking at open source status pages this morning,
       | and this was not in the list I was looking at.
       | 
       | OP, you might want to d a PR here:
       | https://github.com/ivbeg/awesome-status-pages
       | 
       | Everyone else might be interested in that list of similar
       | projects.
        
         | tibozaurus wrote:
         | We are already in it :)
        
           | jaxn wrote:
           | Ah, under Services, not Open Source.
        
       ___________________________________________________________________
       (page generated 2023-10-02 23:01 UTC)