[HN Gopher] Grafana releases OnCall open source project
___________________________________________________________________
Grafana releases OnCall open source project
Author : netingle
Score : 283 points
Date : 2022-06-14 15:28 UTC (7 hours ago)
(HTM) web link (grafana.com)
(TXT) w3m dump (grafana.com)
| pphysch wrote:
| Seems like a solid replacement for Alertmanager for those already
| using Grafana OSS. Anyone planning on using both OnCall and
| Alertmanager?
| dString wrote:
| Doesn't AlertManager evaluate metrics and fire alerts?
|
| A quick look at OnCall suggests it is more for managing fired
| alerts than firing alerts.
|
| Their own screenshot has AlertManager as an alert source.
| remram wrote:
| Grafana used to be so simple, I don't know if I'm a fan of
| this direction towards many services.
|
| Having to run alertmanager and configure it in addition to
| Grafana was bad enough, now you need to run and configure
| another service if you want some extra functionalities for
| those alerts? Are they going to keep maintaining
| acknowledgements and scheduled silence in AlertManager now
| that OnCall exists? Are we going to have "legacy
| notifications" in AlertManager when not running OnCall, the
| same way there are "legacy alerts" in Grafana when updating
| from Grafana 7 (pre-AlertManager)?
| pphysch wrote:
| AlertManager does not do the evaluations, it does not connect
| to any metrics database; those are done by Prometheus/etc and
| forwarded to AlertManager, which handles deduplication and
| routing among other things.
| juliennakache wrote:
| Looking forward to trying this out. I've always felt that
| PagerDuty was absurdly expensive for the feature set they were
| offering. It costs something at least $250 per user for
| organization larger than 5 person - even if you're not an
| engineer who is ever directly on call. At my previous company, IT
| had to regularly send surveys to employees to assess if they
| _really_ needed to have a PagerDuty account. Alerts are a key
| information in an organization that runs software in production
| and you shouldn 't have to pay $250 / month just to be able to
| have some visibility into it. I'm hoping Grafana OnCall is able
| to fully replace PagerDuty.
| CSMastermind wrote:
| > I've always felt that PagerDuty was absurdly expensive for
| the feature set they were offering
|
| For anyone out there in the same spot, I'll say that I switched
| my last company to Atlassian's OpsGenie and it was a 10x cost
| savings for the same feature set.
| arccy wrote:
| the opsgenie api is really bad though if you want to manage
| it as code/declaratively
| dijit wrote:
| I really can't find myself to ever recommend atlassian
| products though.
|
| If cost is the only measure: I understand. But time lost in
| various areas of the software package (performance alone!
| Before we get into weird UX paradigms and esoteric query
| languages, shoddy search systems etc;) surely has an impact
| on cost. Having your employees spending a lot of time
| navigating janky software has a cost too.
| jlg23 wrote:
| Thanks, I think I finally understand why some friends of mine,
| who can implement this for any company in half a day, take
| $2000/day...
| motakuk wrote:
| Check this ;)
| https://github.com/grafana/oncall/tree/dev/tools/pagerduty-m...
| ildari wrote:
| Hey HN, Ildar here, one of the co-founders of Amixr and one of
| the software engineers behind Grafana OnCall. Finally we open-
| sourced the product I'm really excited about that. Please try it
| out and leave your feedback
| sandstrom wrote:
| I think it would be great if it was easier to mix and match
| Grafana SaaS and self-hosted products.
|
| For example, we need to run Loki ourselves, for security /
| privacy reasons, but wouldn't mind using hosted versions of
| Tempo, Prometheus and OnCall.
|
| Right now it isn't super-easy to link e.g. self-hosted loki
| search queries with SaaS-Prometheus.
| netingle wrote:
| Its very much our aim to make this mix of self-hosted and cloud
| services as easy as going all-cloud; but I agree we're not
| quite there yet.
|
| Do you mind if I ask what isn't super-easy about linking self-
| hosted loki search queries with SaaS-Prometheus? You should be
| e.g. able to add a Prometheus data source to your local Grafana
| (or securely expose your Loki to the internet and add a Loki
| data source to your Cloud Grafana)
| [deleted]
| this_was_posted wrote:
| glad to hear this got open sourced!
|
| for someone at grafana; noticed a dead link in the post:
| https://grafana.com/docs/oncall/main/
| nojito wrote:
| Unfortunate that it's AGPL. But this is looks really great!
| josephcsible wrote:
| There's nothing unfortunate about the AGPLv3. Everything that
| it doesn't let you do is stuff that you shouldn't be doing
| anyway.
| [deleted]
| ucosty wrote:
| Why is that unfortunate? Unless you're looking to make
| proprietary changes to Grafana Oncall and host it as a SAAS,
| it's the same as running any other GPL software.
| nojito wrote:
| GPL and its variants are a no go where I work.
| ketralnis wrote:
| To distribute I understand, but even just to use? Almost
| any desktop OS you run has GPL code somewhere in it
| dividedbyzero wrote:
| Almost any desktop OS? I may be wrong but I don't think
| Windows and macOS contain any GPL code.
| warp wrote:
| Doesn't Windows 10 ship with WSL2 now? (which includes a
| full Linux kernel).
|
| Apple still ships bash under GPLv2 on current macOS
| versions. Apple hates GPLv3, which is why they're trying
| to switch away from bash to zsh, but for the time being
| they're still shipping bash.
| eeZah7Ux wrote:
| Then the problem is in the company and not in the license.
| woadwarrior01 wrote:
| Is Linux verboten at work?
| to11mtm wrote:
| Probably not.
|
| Linux usually gets a pass, because most times you're just
| deploying it and not mucking with source code.
|
| But a lot of places (I've worked at more that do than
| don't) will have rules about GPL/AGPL for libraries/infra
| as a whole though. Often evaluated case-by-case, but it's
| rare I've seen a AGPL stuff get approved for usage.
|
| I think some of it is not wanting to deal with the cost
| of vigilance; i.e. you can make sure that someone is
| using %thing% in a way that doesn't run afoul of AGPL
| right now, but does legal and upper management have
| confidence in that being true forever and always?
| Engineers are still human, and corporate management +
| legal teams tend to hate licensing folk tromping around.
|
| This results in refusals ranging from "This is internal
| for now but we will open it up later" (a fair concern) to
| "Somebody is worried that exposing it over the VPN to
| contractors would count as making it public" (IDK, I'm
| not a lawyer.)
| ucosty wrote:
| > Linux usually gets a pass, because most times you're
| just deploying it and not mucking with source code.
|
| That would apply for most uses of software, wouldn't it?
|
| > This results in refusals ranging from "This is internal
| for now but we will open it up later" (a fair concern) to
| "Somebody is worried that exposing it over the VPN to
| contractors would count as making it public" (IDK, I'm
| not a lawyer.)
|
| I've encountered variations of this problem at places I
| have worked in. Education goes a long way to solving
| this, and this example of simple usage of (A)GPL software
| is easy enough to explain with examples.
| nojito wrote:
| Any Linux deploy is through RedHat but most local
| development here is using windows.
|
| No idea why Linux gets a pass though.
| ucosty wrote:
| Must be quite the paranoid business, given even tier 1
| banks here (in the UK) will happily run GPL software.
| matsemann wrote:
| Running a service with a GPL license is different than
| including their code in your projects, though. So while it
| may be a blanket ban, it may be worth it to clarify the
| scope of that ban.
| bbkane wrote:
| LinkedIn built and uses https://iris.claims/ . I don't know how
| it compares to alternatives, but I find IRIS relatively pretty
| easy to use.
| acatton wrote:
| https://drewdevault.com/2020/07/27/Anti-AGPL-propaganda.html
| Equiet wrote:
| It's surprising how seemingly difficult it is to build a good on-
| call scheduling system. Everything I tried so far (not naming the
| companies here) felt like the UX was the last thing on the
| developers' minds. Which is tolerable during business hours but
| really annoying at 2am.
|
| Is there some hidden complexity or is it just a consequence of
| engineers building a product for other engineers? Also, any tips
| what worked for you?
| matsemann wrote:
| Have had lots of bad experiences with that from Pagerduty at
| least. Want to generate a schedule far in advance, so people
| know when they will be oncall and can plan/switch.
|
| Of course, in a few months we may have some new people having
| joined, some quit, or other circumstances. A single misclick
| when fixing that can invalidate the whole schedule and generate
| another. Infuriating.
|
| Or the UI itself, might have become better tha last two years,
| but having to click "next week" tens of times to see when I was
| scheduled (since I wasnt just interested in my next scheduled
| time but all of them) were annoying.
| raffraffraff wrote:
| Production helm chart link on this page leads to 404:
| https://grafana.com/docs/grafana-cloud/oncall/open-source/#p...
| Deritio wrote:
| I like what grafana labs does with grafana.
|
| Im annoyed by their license choice.
|
| But apparently when you are grafana everything looks like a
| dashboard UI?
|
| Joke aside I will have a look but I didn't like the screenshots
| before already. I like the dashboardy thing for dashboards but
| otherwise it's not a really good UI system for everything else.
| Maledictus wrote:
| What I really want is an Android app that keeps alerting until a
| page is ACKed or escalated.
| machinerychorus wrote:
| check out pushover, I use it for this exact case
|
| https://pushover.net/
| pphysch wrote:
| A bit disappointed by the architecture -- it's a Django stack
| with MySQL, Redis, RabbitMQ, and Celery -- for what is
| effectively AlertManager (a single golang binary) with a nicer
| web frontend + Grafana integration + etc.
|
| I'm curious why/if this architecture was chosen. I get that it
| started as a standalone product (Amixr), but in the current state
| it is hard to rationalize deploying this next to Grafana in my
| current containerless setting.
| alex_dev wrote:
| One of the most frustrating aspects of being a software
| engineer is dealing with others that love to over-engineer.
| Unfortunately, they make enough noise that complex solutions
| are necessary that it gets managers scared about taking any
| easier, simpler solutions.
| skullone wrote:
| That seems like a perfectly reasonable architecture. If only
| all of us could work on battle tested components like those
| during our job!
| contravariant wrote:
| For something that is supposed to add some more features to
| the basic email/HTTP message alert like grafana generates, I
| do wonder what extra features require an additional 2
| databases, a message queue and a separate task queue.
| skullone wrote:
| probably keeps history, state, escalation flow, etc?
| goodpoint wrote:
| That's very bad. 99% of organizations don't have a volume of
| alerts that justifies any of MySQL, Redis and RabbitMQ.
|
| Complexity comes at a steep price when something critical (e.g.
| OnCall) breaks and you have to debug it in a hurry.
|
| Shoving everything in a container and closing the lid does not
| help.
| [deleted]
| motakuk wrote:
| I agree that multi-component architecture is harder to deploy.
| We did our best and prepared tooling to make deployment an easy
| thing.
|
| Helm (https://github.com/grafana/oncall/tree/dev/helm/oncall),
| docker-composes for hobby and dev environments.
|
| Besides deployment, there are two main priorities for OnCall
| architecture: 1) It should be as "default" as possible. No
| fancy tech, no hacking around 2) It should deliver
| notifications no matter what.
|
| We chose the most "boring" (no offense Django community, that's
| a great quality for a framework) stack we know well: Django,
| Rabbit, Celery, MySQL, Redis. It's mature, reliable, and allows
| us to build a message bus-based pipeline with reliable and
| predictable migrations.
|
| It's important for such a tool to be based on message bus
| because it should have no single point of failure. If worker
| will die, the other will pick up the task and deliver alert. If
| Slack will go down, you won't loose your data. It will continue
| delivering to other destinations and will deliver to Slack once
| it's up.
|
| The architecture you see in the repo was live for 3+ years now.
| We were able to perform a few hundreds of data migrations
| without downtimes, had no major downtimes or data loss. So I'm
| pretty happy with this choice.
| Deritio wrote:
| Hearing your message bus assumption sounds like one of the
| most ridiculous claims I heard.
|
| Sorry but why is rabbitmq really necessary?
| slotrans wrote:
| You don't need Rabbit, Celery, or Redis. You should be able
| to replace MySQL with SQLite. Then it would be _radically_
| easier to deploy.
| sergiomattei wrote:
| It's curious to see people questioning the stack choices of
| apps they haven't built yet and problems they haven't faced
| either.
|
| They chose this stack, it works for them. They've put it
| through its paces in production.
|
| It's as boring as it gets.
| throwaway892238 wrote:
| A MySQL database cluster, and a local copy of a SQL
| database on a single file on a single filesystem, are not
| close to the same thing. Except they both have "SQL" in the
| name.
|
| One of them allows a thousand different nodes on different
| networks to share a single dataset with high availability.
| The other can't share data with any other application,
| doesn't have high availability, is constrained by the
| resources of the executing application node, has obvious
| performance limits, limited functionality, no commercial
| support, etc etc.
|
| And we're talking about a product that's intended for
| dealing with on-call alerts. The entire point is to alert
| when things are crashing, so you would want it to be highly
| available. As in, running on more than one node.
|
| I know the HN hipsters are all gung-ho for SQLite, but
| let's try to reign in the hype train.
| slotrans wrote:
| I don't need _any_ of that stuff, and nor does anyone who
| would use this. People who need clustered high-
| availability stuff are _paying for PagerDuty or
| VictorOps_.
|
| This is for tiny shops with 4 servers. And tiny shops
| with 4 servers don't have time to spin up a horrendous
| stack like this. I was excited to see this announcement
| until I saw all the moving pieces. No thanks!
| Spivak wrote:
| And this is the on-prem version of those tools. Just
| because it isn't the tool you wanted doesn't mean it's
| not good.
| throwaway892238 wrote:
| If you only have 4 servers, make a GitHub Action (or,
| hell, since we're assuming one node with SQLite, a cron
| job on one of your 4 servers) that _curl_ s your servers
| every 5 minutes and sends you a text when they're down.
| You don't need a Lamborghini to get groceries.
| pphysch wrote:
| This discussion is in the context of a self-contained app
| called Grafana OnCall, which is built on Django, which
| does not _particularly_ care which RDBMS you are using.
|
| At the very least, SQLite should be the default database
| for this product, and users can swap it out with their
| MySQL database cluster if they really are Google-scale.
| gjulianm wrote:
| > The entire point is to alert when things are crashing,
| so you would want it to be highly available. As in,
| running on more than one node.
|
| An important question to ask is how much availability are
| you actually gaining from the setup. It wouldn't be the
| first time I see a system moving from single-node to
| multinode and being less available than before due to the
| extra complexity and moving pieces.
| [deleted]
| gen220 wrote:
| I think your decisions were reasonable, as is the opinion of
| the person you're responding to.
|
| To be fair, even in its current form, it should be possible
| to operate this system with sqlite (i.e. no db server) and
| in-process celery workers (i.e. no rabbit MQ) if configured
| correctly, assuming they're not using MySQL-specific features
| in the app.
|
| Using a message bus, a persistent data store behind a SQL
| interface, and a caching layer are all good design choices. I
| think the OP's concern is less with your particular
| implementations, and more with the principle of preventing
| operators from bringing their own preferred implementation of
| those interfaces to the table.
|
| They mentioned that it makes sense because you were a
| standalone product, so stack portability was less of a
| concern. But as FOSS, you're opening yourself up to different
| standards on portability.
|
| It requires some work on the maintainer to make the
| application tolerant to different fulfillments of the same
| interfaces. But it's good work. It usually results in cleaner
| separation of concerns between application logic and
| caching/message bus/persistence logic, for one. It also
| allows your app to serve a wider audience: for example, those
| who are locked-in to using Postgres/Kafka/Memcached.
| raffraffraff wrote:
| Nothing wrong with that. I managed 7+ Sensu "clusters" at a
| previous job, and it's stack was a ruby server, Redis and
| RabbitMQ. But I completely ditched RabbitMQ and used Redis
| for the queue and data. Simpler, more performant and more
| reliable (even if the feature was marked _experimental_ ).
| Our alerts were really spammy, and we had ~8k servers (each
| running a bunch of containers) per cluster, so these things
| were busy. Each cluster was 3x small nodes (6gb memory, 2CPU)
| Memory usage was miniscule, typically <300mb. Any box could
| be restarted without any impact because Redis just operated
| in (failover) mode and Sensu was horizontally scalable.
|
| I get why you would add a relational DB to the mix.
| Personally, I'd like a Rabbit-free option.
| minusf wrote:
| not gonna argue that a single binary is the ultimate deploy
| solution but running a django app is not that difficult
| (although i am biased cause i do that for a living).
|
| i love django projects but mysql, celery and rabbitmq -- no
| thanks.
| pphysch wrote:
| Don't get me wrong, I love Django and think its a great
| framework for writing internal tools like this. Redis gets a
| pass too since Django has native support for it in 4.0+. It's
| really the (IMHO unnecessary) combo of MySQL+RabbitMQ+Celery
| that turns me off.
|
| Redis itself has had solid support for building reliable
| distributed task streaming for nearly 4 years (Redis
| ConsumerGroups introduced in 2018).
| lazyant wrote:
| Curious as to what architecture you would have preferred or why
| this pretty standard stack (that can be deployed to k8s) is not
| giving you.
| pphysch wrote:
| Any of the following:
|
| Python(Django)+Redis+[SQLite]
|
| Python(Django)+Postgres
|
| [Compiled Go binary]+[SQLite]
|
| SQLite barely even counts as an architectural dependency TBH
| :)
| theptip wrote:
| For a simple low-scale app you can often do without Redis and
| Celery/RMQ if you just push everything into Postgres.
|
| Far less scalable, but it is dramatically simpler to deploy.
| Often gets you surprisingly far though. Would be interesting
| to know how many monitored integrations could be supported by
| that flow.
| picozeta wrote:
| How does a message queue work via Postgres? Many people
| (including me) use Redis to run background jobs.
| theptip wrote:
| Here's the option I'm familiar with (siblings have others
| too):
|
| https://github.com/malthe/pq
|
| Doesn't have all the plumbing you'd want, there is a
| wrapper (https://github.com/bretth/django-pq/) that seems
| to give you an entrypoint command more like `celery
| worker ...` but I've not investigated it closely.
| minusf wrote:
| https://github.com/procrastinate-org/procrastinate
|
| https://github.com/gavinwahl/django-postgres-queue
| infogulch wrote:
| lmgtfy https://www.crunchydata.com/blog/message-queuing-
| using-nativ...
| slotrans wrote:
| This is a very confused question. The data store you keep
| your queued items in is completely orthogonal to what a
| message queue actually is.
|
| A simple way to use an RDBMS as a message queue, that has
| been in use since before most HN readers were born, is
| roughly: - enqueue an item by inserting a
| row into a table with a status of QUEUED - use a
| SELECT FOR UPDATE, or UPDATE...LIMIT 1, or similar, to
| atomically claim and return the first status=QUEUED item,
| while setting its status to RUNNING (setting a timestamp
| is also recommended) - when the work is complete,
| update the status to DONE
|
| There are more details to it obviously but that's the
| outline.
|
| The first software company I worked for was using this
| basic approach to queue outbound emails (and phone and
| fax... it was 2005!), millions per day, on an Oracle DB
| that _also_ ran the entire rest of the business. It 's
| not hard.
| gjulianm wrote:
| I bet quite a lot, probably at least 10-50 per second
| without doing anything special for performance, i.e.
| multiple queries per alert, calling different APIs, things
| like that. I don't know of many places that are dealing
| with alerts measured in "per second" as a unit.
|
| Not to mention that having multiple components doesn't mean
| it's "scalable" by default, it could happen that some part
| of the pipeline doesn't like multiple instances of
| something.
| chrisandchris wrote:
| Not OP, but one may interpret your response as "I don't
| understand why you prefer a single binary over this
| architecture that requires 6 different services and prefers
| k8s".
|
| IMHO, OP just stated that one could solve this with less
| dependencies and have the same (if not a better) result.
| pphysch wrote:
| Yes, thank you. I would be surprised if this same product
| couldn't be delivered with just Python(Django) + SQLite +
| Redis (assuming writing everything in Go is unrealistic).
| Spinning up a venv and launching a local Redis instance is
| significantly more reasonable than having to configure
| MySQL, RabbitMQ, and Celery.
| lazyant wrote:
| I missed that interpretation :(
|
| IMHO a fat binary written from scratch would have been a
| way worse choice than to use a standard stack, both in
| terms of bugs and time, let alone Open Source contributions
| or any scalability.
|
| In terms of number of services, what do you get rid of that
| produce a better result? maybe RMQ and use a worse queue?,
| celery and write your own task manager or use another
| dependency?
| gjulianm wrote:
| Installation in a regular system without Kubernetes? Right
| now I can install Grafana, Prometheus and Alertmanager in a
| regular Linux system using distribution packages, and just
| worry about those programs themselves. If I want to install
| OnCall, I need not only OnCall plus four other non-trivial
| dependencies that will still need configuration, management
| and troubleshooting. All for something that is going to deal
| with far less load than any of
| Grafana/Prometheus/Alertmanager. I honestly do not understand
| it.
| lazyant wrote:
| you can install this stack without kubernetes no? I don't
| see anything k8s-specific
| heavyset_go wrote:
| Yes, there is nothing Kubernetes specific here, and this
| can be deployed using whatever container orchestration
| system you want.
| gjulianm wrote:
| The problem still stands of adding dependencies, extra
| complexity and configuration. I'm usually happy about
| Grafana/Prometheus deployments because the base
| installation is fairly simple and self-contained, but
| this looks like a bit of a mess.
| vhold wrote:
| AlertManager is one component of a more complicated
| infrastructure.
|
| https://prometheus.io/docs/introduction/overview/#architectu...
|
| https://kubernetes.io/docs/concepts/overview/components/
| pphysch wrote:
| OnCall also does nothing unless you have something external
| firing alerts for you. They both fill similar niches in a
| larger monitoring system; this does not excuse OnCall having
| a drastically more complex internal architecture.
| mkl95 wrote:
| > Django stack with MySQL, Redis, RabbitMQ, and Celery
|
| MySQL is a weird if not slightly disturbing choice. Other than
| that it's a boring, battle-tested stack that is relatively easy
| to scale. I agree that Go is nicer, but I'm biased by several
| years of dealing with horrific Flask / Django projects.
| heavyset_go wrote:
| That's a tried and true stack, and a very good one for
| maintaining sane levels of reliability, consistency, durability
| etc. Resource wise, at least with Celery, RabbitMQ and Django,
| they're also pretty lean.
|
| It even ships in containers along with Docker Compose files and
| Helm charts, which would suit the deployment use cases of 99%
| of users. I understand that you're not using containers, but I
| don't think that's a limitation that many are inflicting upon
| themselves as of late, and if pressed, installing Docker
| Compose takes about 5 minutes and you don't have to think about
| it again.
| MarquesMa wrote:
| This. I find open source projects written in Go or Rust are
| usually more pleasant to work with than Java, Django or Rails,
| etc. They have less clunky dependencies, are less resource-
| hungry, and can ship with single executables which make
| people's life much easier.
|
| Just think about Gitea vs GitLab.
| matsemann wrote:
| Not sure why you include java in that, as you mostly get a
| standalone file. No such thing as a jre in modern java
| deployment.
|
| As for python, at least getting a dockerfile helps a lot.
| Otherwise it's a huge mess to get running, yes.
|
| Python is still a hassle anyways, since the lack of true
| multithreading means that you often need multiple
| deployments, which the Celery usage here for instance shows.
| Volundr wrote:
| > Not sure why you include java in that, as you mostly get
| a standalone file. No such thing as a jre in modern java
| deployment.
|
| Maybe I'm behind the times, but I can't figure out what you
| mean here. As far as I know 'java -jar' or servlets are
| still the most common ways of running a Java app. Are you
| talking graal and native image?
| matsemann wrote:
| For deploying your own stuff, most people do as before,
| yes. But even then, it's at least still only a single jar
| file, containing all dependencies. Not like a typical
| python project where they ask you to run some command to
| fetch dependencies and you have to pray it will work on
| your system.
|
| But using jlink for java, one can package everything to a
| smaller runtime distributed together with the
| application. So then I feel it will be not much different
| than a Go executable.
|
| > _The generated JRE with your sample application does
| not have any other dependencies..._
|
| > _You can distribute your application bundled with the
| custom runtime in custom-runtime. It includes your
| application._
|
| From the guide here
| https://access.redhat.com/documentation/en-
| us/openjdk/11/htm...
| FridgeSeal wrote:
| Python application deployments are all fun and games until
| suddenly the documentation starts unironically suggesting
| that you should "write your configuration as a Python
| script" that should get mounted to some random specific
| directory within the app as if that could ever be a sane
| and rational idea.
| eeZah7Ux wrote:
| Hell no, I want stuff like OnCall packaged into Linux
| distribution. I need something stable and reliable and that
| receive security fixes.
|
| Maintaining tenths of binaries pulled from random github
| projects over the years is a nightmare.
|
| (Not to mention all the issues around supply chain
| management, licensing issues, homecalling and so on)
| morelisp wrote:
| At this point I trust the Go modules supply chain
| considerably more than any free distro's packaging, which
| is ultimately pulling from GitHub anyway.
| dijit wrote:
| > At this point I trust the Go modules supply chain
| considerably more than any free distro's packaging
|
| What has happened in the package ecosystem to make you
| believe this? Is it velocity of updates or actual trust?
|
| I haven't heard of any malicious package maintainers.
| eeZah7Ux wrote:
| This is plain false. Most production-grade distribution
| do extensive vetting of the packages, both in terms of
| code and legal.
|
| Additionally, distribution packages are tested by a
| significant number of users before the release.
|
| Nothing of this sort happens around any language-specific
| package manager. You just get whatever happens to be
| around all software forges.
|
| Unsurprisingly, there has been many serious supply chain
| attacks in the last 5 years. None of which affected the
| usual big distros.
| morelisp wrote:
| No, Go modules implement a global TOFU checksum database.
| Obviously a compromised upstream at initial pull would
| not be affected, but distros (other than the well-scoped
| commercial ones) don't do anything close to security
| audits of every module they package either. Real-world
| untargeted SCAs come from compromised upstreams, not
| long-term bad faith actors. Go modules protects against
| that (as well as other forms of upstream incompetence
| that break immutable artifacts / deterministic builds).
|
| MVS also prevents unexpected upgrades just because
| someone deleted a lockfile.
| goodpoint wrote:
| It's very nice to see Python and AGPL used for this.
| ucosty wrote:
| Looks very cool, will have to give this a shot.
| motakuk wrote:
| Hello HN!
|
| Matvey Kukuy, ex-CEO of Amixr and a head of the OnCall project
| here. We've been working hard for a few months to make this OSS
| release happen. I believe it should make incident response
| features (on-call rotations, escalations, multi-channel
| notifications) and best practices more accessible to the wider
| audience of SRE and DevOps engineers.
|
| Hope someone will be able to finally sleep well at night being
| sure that OnCall will handle escalations and will alert the right
| person :)
|
| Please join our community on a GitHub! The whole Grafana OnCall
| team is help you and to make this thing better.
| knicholes wrote:
| Being on-call has never made me sleep better at night!
| krab wrote:
| If I know someone else is on call and he's competent, I can
| sleep better.
| the_duke wrote:
| The docs link [1] is 404.
|
| Seems like the /main is the culprit.
|
| [1] https://grafana.com/docs/oncall/main/.
| motakuk wrote:
| Fixed: https://grafana.com/docs/grafana-cloud/oncall/
| pachico wrote:
| I love Grafana, don't get me wrong, but I have the sensation they
| are now in that position where, companies that got a massive
| capital injection and, therefore, a massive increase of work
| power, release too much and too soon.
|
| It doesn't have anything to do, of course, with the fact that
| this morning we suddenly found that all our dashboards stopped
| working because we were upgraded to Grafana v9, for which there
| is not a stable release nor documentation for breaking changes.
|
| Luckily they rolled back our account.
| danlimerick wrote:
| I apologize for the disruption we caused you when rolling out
| Grafana 9. We are working on improving our releases to Grafana
| Cloud and also on making sure that errors due to breaking
| changes in a major release won't affect customers in the
| future. As a Grafana Cloud customer, you shouldn't need to read
| docs about breaking changes when we upgrade your instance.
| pachico wrote:
| Dude, I hope you also read when I say that I love what you do
| and your reply just confirms I'm putting my money in the
| right hands.
|
| I just wouldn't mind to be the last to upgrade to a newer
| version :)
| greatgib wrote:
| I would give a huge marketing bullshit award for the following
| sentence:
|
| <<We offered Grafana OnCall to users as a SaaS tool first for a
| few reasons. It's a commonly shared belief that the more
| independent your on-call management system is, the better it will
| be for your entire operation. If something goes wrong, there will
| be a "designated survivor" outside of your infrastructure to help
| identify any issues. >>
|
| They tried to ensure that you use their SaaS offering because
| they care more about your own good than yourself. So humanist...
| ezrast wrote:
| The point isn't that their infrastructure is more reliable than
| yours, but that it's decoupled from yours. If you run your
| monitoring on the same infra as production, it's liable to go
| down when production does, i.e. just when you need it most.
| This is a real reason to outsource monitoring to a SaaS, just
| like there are real reasons to self-host.
|
| I mean, obviously they chose to address the segment of the
| market they could get more money out of first; I'm not
| contesting that. But the bit you quoted is low-grade bullshit
| at best. Hardly award-winning.
| martypitt wrote:
| Congrats - this looks great, and definitely something I was
| wishing for during an incident earlier this week.
|
| A minor note, if anyone from Grafana is around - a bunch of the
| links on the bottom of the announcement go to a 404.
| motakuk wrote:
| We're fixing that, thank you ;)
| googletron wrote:
| Very cool. I love what the Grafana team is up to.
| anyfactor wrote:
| Here is the repo: https://github.com/grafana/oncall
|
| AGPL 3.0
___________________________________________________________________
(page generated 2022-06-14 23:00 UTC)