[HN Gopher] Monitoring tiny web services
___________________________________________________________________
Monitoring tiny web services
Author : mfrw
Score : 108 points
Date : 2022-07-09 17:29 UTC (5 hours ago)
(HTM) web link (jvns.ca)
(TXT) w3m dump (jvns.ca)
| dafelst wrote:
| I like this apparent shift back to "small is okay" where not
| every service has to be an overengineered allegedly hyper-
| scalable distributed mess of five nines uptime with enterprise
| logging, alerting and monitoring.
|
| Those things are nice when you have a bazillion users and
| downtime means hordes of unhappy users and dollars flushing away
| at insane rates, but for the vast majority of hobby projects and
| even mid stage startups, what is described in this article is
| plenty good enough.
| is_true wrote:
| I've thought about posting an AskHN about simple infrastructure
| for some time but I'm not sure how to word it to attract as
| many responses as possible.
| rozenmd wrote:
| My particular favourite is how GraphQL servers respond with "200
| OK" and the errors will be sent in a key called "errors". Makes
| regular healthchecks almost useless.
|
| I ended up writing my own service[0] to detect problems with
| graphql responses, before expanding it to cover websites and web
| apps too.
|
| -[0]: https://onlineornot.com
| BiteCode_dev wrote:
| Github answers 404 instead of a 403 when you try to access a
| private repository while not being logged in.
|
| I assume the rational is to not leak information about what's
| private. But still, it's weird.
| [deleted]
| dmlittle wrote:
| AWS S3 does the opposite when querying objects that don't
| exist. If you don't have s3:ListObjects permissions on the
| bucket you'll get a 403 error (you can't differentiate
| between the object not existing vs. you don't have access to
| it).
|
| I think either approach is valid as long as you're
| consistent. You can make a case for either 404 or 403 when
| you don't have enough permissions. In GitHub's case you can
| argue that it's a 404 because the resource does indeed not
| exist through your auth context. In AWS' case you can argue
| that a 403 makes sense because you don't have permission to
| know the answer to your query.
| OJFord wrote:
| I honestly hate that so much, it's a relief to read someone
| saying the same.
|
| I sort of almost made myself feel a bit better about it by
| thinking 'no, it's not REST, we _have_ reached the graphql
| server successfully and got a .. "successful" response from
| _it_ , it's sort of a "Layer 8" on top of HTTP'. The problem is
| that none of the bloody tooling is 'Layer 8', so you end up in
| browser dev tools with all these 200 responses and no idea
| which ones are errorful. If any.
| bdd wrote:
| Google's uptime monitoring also allows writing JSONPath checks,
| so one can monitor HTTP 200 JSON responses semantically.
| KronisLV wrote:
| Currently got the cheapest VPS that I could (in my case from
| Time4VPS, some others might prefer Hetzner, or Scaleway Stardust
| instances), setup Uptime Kuma on it
| (https://github.com/louislam/uptime-kuma), now have checks every
| 5 minutes against 30+ URLs (could easily do each minute, but
| don't need that sort of resolution yet).
|
| It's integrated with Mattermost currently, seems to work pretty
| well. Could also set it up on another VPS, for example on Hetzner
| (which also has excellent pricing), could also integrate another
| alerting method such as sending e-mails, or anything else that's
| supported out of the box: https://github.com/louislam/uptime-
| kuma/issues/284
|
| Oh, also Zabbix for the servers themselves. Honestly, if things
| are as simple to setup as nowadays and you have about 50 EUR per
| year per node that you want (1 is usually enough, 2 is better
| from a redundancy standpoint, since then it becomes feasible to
| monitor the monitoring, others might go for 3 nodes for important
| things etc.), you don't even need to look for cloud services or
| complex systems out there.
|
| Of course, if someone knows of some affordable options for cloud
| services, feel free to share!
|
| I briefly checked the prices for a few and most of them are a
| little bit more expensive than just getting a VPS, setting up
| sshd to only use key based auth, throwing Let's Encrypt in front
| of the web UI (or maybe additional auth, or making it accessible
| only through VPN, whatever you want), adding fail2ban and
| unattended updates, and doing some other basic configuration that
| you probably have automated anyways.
|
| The good news is that if you prefer cloud services and would
| rather have that piece of your setup be someone else's problem,
| they're not even an order of magnitude off in most cases - though
| I'm yet to see how Uptime Kuma in particular scales once I'll get
| to 100 endpoints. Seems like at a certain scale it's a bit
| cheaper to run your own monitoring, but at that point you might
| still find it easier to just pay a vendor.
|
| At the end of the day, there's lots of great options out there,
| both cloud based and self-hosted, whichever is your personal
| preference.
| jacooper wrote:
| You can get a free 4vcpu 24gb ram 200gb storage VPS with oracle
| cloud Free tier.
| perth wrote:
| You can get a cheaper VPS through ramnode & $15/year atm
| KronisLV wrote:
| That's pretty cool!
|
| I guess I'd personally also mention Contabo as an affordable
| host in general (though their web UI is antiquated),
| especially their storage nodes:
| https://contabo.com/en/storage-vps/
|
| For the most part, though, use whichever host you've been
| with for a few years (though feel free to experiment with
| whatever new platforms catch your eye), but ideally still
| have local backups for everything (as long as you don't have
| to deal with regulations that'd make it not possible) so you
| can migrate elsewhere.
| tatoalo wrote:
| I have been using cronitor[0] for a few months now and I have
| been really satisfied with them so far!
|
| [0]: https://cronitor.io
| pkrumins wrote:
| If you have a popular service, then one of the best approaches is
| to have your users notify you when something is down or is
| broken. This pattern follows the famous quote: "Given enough
| eyeballs all bugs are shallow." I have employed this approach to
| great success and haven't had a need for any monitoring services.
| redleader55 wrote:
| If users see the problem it is too late. You will be seen as
| unable to keep the service up and the service will be seen as
| flaky.
|
| Also, the Holly grail of monitoring is to be able to remediate
| the problem automatically - this is pretty hard when users are
| reporting it.
| dimitar wrote:
| If I have to do one thing to monitor a simple website I'm
| probably going to use something that takes a screenshot
| periodically and checks it for changes. There are open source
| solutions but I just prefer to pay a bit for a managed service to
| do it.
|
| I think it covers quite a lot of things - the servers are up, DNS
| is OK, assets are OK. It can also be a safety net in case of
| other, more sophisticated monitoring fails to detect an unusual
| state.
|
| This doesn't work well for website with too much javascript, ads
| or widgets.
| radus wrote:
| What are the OSS solutions for this?
| xrd wrote:
| I installed Uptime Kuma (https://github.com/louislam/uptime-kuma)
| on my dokku paas to monitor my dokku apps. It works great. It is
| great for pure HTTP services, but it can be used against things
| like RTMP servers because it also permits configuration of a
| health check with TCP pings. It gives me an email when things are
| down, and supports retry, heartbeat intervals, and can validate a
| string in the HTML retrieved. I love it.
| jslakro wrote:
| I considerated this option but then realized that both sides,
| the api/services and the uptime checker will be in the same
| server then any problem impacting the server itself will leave
| offline the monitoring
| jewel wrote:
| Another approach that has been working great for me:
| https://www.webalert.me. This app runs on your phone, you can
| configure it to check once an hour if any content on a page
| changes.
| blondin wrote:
| have to say, this is exactly what kubernetes was designed to
| solve. but the focus was on microservices and containers. and
| things also got out of hands.
| nickjj wrote:
| > have to say, this is exactly what kubernetes was designed to
| solve
|
| Kubernetes probes are much different in my opinion.
|
| Your Kubernetes liveness check will check if things are working
| inside of your cluster which is great for a high frequency
| checkup to potentially modify the state of your pod based on
| the result.
|
| But Uptime Robot is an end to end test. It tests a real
| connection over the internet to your domain which exercises
| external DNS, traffic flowing through any reverse proxies, your
| SSL certificate, etc..
|
| Both compliment each other for different use cases.
| dinvlad wrote:
| I really wish managed Kubernetes offerings remained "free" for
| small use, and would only expose "empty" nodes ready for full
| utilization by end-user containers.
|
| The reality however is that every managed node (like on GKE)
| uses quite a lot of CPU and memory out of the box, for which
| the user pays. On top of that there're cluster fees, just for
| having it around. This makes it completely unfriendly to
| hobbyist projects, unless one is ready to pay dozens of $s just
| to have Kubernetes (prior to deploying any apps to it).
|
| (And sure, there're free tiers here and there, but they never
| solve this problem completely on any of the big cloud
| providers, at least)
|
| Compare that to managed "serverless" offerings (even pseudo-
| compatible with K8s API like Cloud Run), which eliminate the
| management fees, but impose a tax with latency. Oh well.
| epelesis wrote:
| One reason this is not feasible is that K8s is not designed
| for secure multitenancy, so for every tenant, you'll need to
| spin up an entire K8s control plane, which includes a
| database and several services - this is what's driving the
| cluster fees. Keep in mind that customers also expect managed
| K8s to be highly available, so this cost is also going into
| things like replicating data, setting up load balancers,
| etc...
|
| Compare this to a serverless offering that is multitenant by
| design, the control plane is shared making the overhead cost
| of an extra user is basically zero, which is why they don't
| charge you a fee like this.
|
| IMO if you're a hobbyist interested in K8s, your best way to
| go is to install K3s, which is a lightweight, API compatible
| K8s alternative that runs on a single node. It's pretty nice
| if you don't care about fault tolerance or High Availability.
|
| https://k3s.io/
| dinvlad wrote:
| I'm not so sure about the economics of what you describe. I
| think it could very well be that small customers don't
| really consume that much "bandwidth" that their resource
| requirements could be subsumed entirely by larger uses. It
| doesn't make much sense that both large and small customers
| have to pay the same cluster fee, for example - it would be
| much more fair to charge more the more you use, and
| approach "near zero" the lesser you use it.
|
| At the end of the day, all resources are run by the cloud
| provider on KVMs sharing the same physical machines
| anyways, so it's up to them how much to charge. The fact
| that both small and large customers get to pay for the same
| amount of resources allocated for them, only means these
| resources are not allocated in the most efficient manner.
| So a cloud provider could fix this.
|
| We should also not discount the net positive effect of
| attracting more hobbyists and startups to your platform.
| That's how AWS and GCP started, for example, but now
| they're just focusing on more enterprise business so
| smaller ones mean less to them (although AWS arguably less
| so). But we shouldn't forget that while they don't
| contribute as much to the revenue, they're essentially a
| free advertising resource that make your platform stay
| "relevant" (and especially more so for burgeoning startups
| that could grow to bring more revenue in the future!). The
| moment they leave, the platform just becomes another IBM
| that's bound to die, for better or worse.
|
| On top of that, the anti-analogy with serverless for
| control plane breaks down, because one could always run it
| on the same shared pool of resources in gVisor or
| Firecracker, just like with serverless.
| daverobbins1 wrote:
| Since everyone is posting their favorite free-tier monitoring
| products - does anyone have a recommendation for a cloud product
| that will allow us to create a group of ping monitors and alert
| only if all monitors in the group are down for N minutes?
| machinerychorus wrote:
| You could hack that together with huginn pretty easily
|
| https://github.com/huginn/huginn
| zoover2020 wrote:
| > [...] recommend a cloud product
|
| Hacker mentality never left this site since inception :)
| prakashn27 wrote:
| I am curious for the use case of it. What group of servers do
| you want to monitor?
| daverobbins1 wrote:
| We have dual internet connections coming into a satellite
| office and we only want to be alerted if both are down.
| bdd wrote:
| You can get free uptime monitoring from Google Cloud. The limit
| is 100 uptime checks per monitoring scope, which may mean either
| a project or an organization based on how you configure IIUC.
| https://cloud.google.com/monitoring/uptime-checks. The checks are
| ran from 6 locations around the world, so you can also catch
| network issues, that you likely cannot do much about when you're
| running a tiny service. My uptime checks show the probes come
| from: usa-{virginia,oregon,iowa}, eur-belgium, apac-singapore,
| sa-brazil-sao_paulo
|
| Another neat monitoring thing I rely on is
| https://healthchecks.io. Anything that needs to run periodically
| checks in with the API at the start and the end of execution so
| you can be sure they are running as they should, on time, and
| without errors. Its free tier allows 20 checks.
| ydant wrote:
| healthchecks.io is a great service (and apparently can be self-
| hosted - https://github.com/healthchecks/healthchecks) that I
| use for both personal projects and at work.
|
| It works really well for cron jobs - while it works with a
| single call, you can also call a /start and finished endpoint
| and get extra insights such as runtime for your jobs.
|
| It would be nice if it had slightly more complex alerting rules
| available - for example, a "this service should run
| successfully at least once every X hours, but is fine to fail
| multiple times otherwise" type alert.
|
| We wanted to use it for monitoring some periodic downloads
| (like downloading partners' reports), and the expectation is
| the call will often time out or fail or have no data to
| download, which is technically a "failure", but only if it goes
| on for more than a day. Since healtchecks.io doesn't really
| support this, we ended up writing our own "stale data"
| monitoring logic and alerting inside the downloader, and just
| use healtchecks.io to monitor the script not crashing.
| jacooper wrote:
| What is the interval for the checks ?
|
| Its written that its 100 per metric scope, but I don't know
| what that means really.(2)
|
| Also there seems to be no status monitor page ?
|
| 2- https://cloud.google.com/monitoring/uptime-checks
| yawgmoth wrote:
| Continuing the tooling thread: the free tier of
| https://www.uptimetoolbox.com/ is quite good.
___________________________________________________________________
(page generated 2022-07-09 23:00 UTC)