[HN Gopher] Grafana Incident: Smart incident management for your...
___________________________________________________________________
Grafana Incident: Smart incident management for your teams
Author : matryer
Score : 167 points
Date : 2022-02-02 18:10 UTC (4 hours ago)
(HTM) web link (grafana.com)
(TXT) w3m dump (grafana.com)
| palijer wrote:
| >Automatically create the online meeting spaces for collaboration
|
| >Manage TODO items so nothing falls through the cracks
|
| I work in incident response, and I feel a huge misunderstanding
| of incident response products fail to understand that companies
| already have established tools for collaborations and meetings
| and for capturing planned work.
|
| I find adding these things is seen as nice and inclusive and it
| is easier to sell a product that does a lot, but it turns into
| complete bloat and makes adoption harder, and makes it harder to
| support a larger product.
| quartz wrote:
| This was a big learning for us when we were first building out
| Kintaba[1].
|
| Re: task management specifically-- having previously been at
| FAANG companies that built all their own tools I had not
| realized just how prevalent Jira is. It. is. EVERYWHERE. and IT
| orgs at companies from 3 to 300,000 people are absolutely
| married to their carefully customized version of it as a system
| of record for everything that happens or will happen.
|
| We see many on-premise implementations as well despite the
| announced sunsetting of that product.
|
| I'm sure there's a #2 and #3 out there but honestly I almost
| never see it (we do see clubhouse/shortcut from time to time...
| but even those folks tend to move to Jira within 6 months).
|
| OT but it really makes me doubly impressed that Slack was able
| move into organizations so successfully from all corners such
| that it was able to dodge what would traditionally be a pretty
| big Atlassian-owned barrier.
|
| [1] shameless plug for our incident management tool @
| https://kintaba.com
| dharmab wrote:
| I've used products in this space that would integrate with your
| existing video, chat and ticketing tools.
| buscoquadnary wrote:
| I think the problem is trying to present an abstraction layer
| to management, because we have those same features of todo
| lists, and recording information, in Jira and ServiceNow and
| like a dozen other pieces that's purpose is to coordinate and
| track work, and often they are unpopular with developers
| because they end up trying to provide an abstraction layer to
| the Execs to replace their management by spreadsheets, but
| unfortunately as anyone who has worked in software for long
| enough can tell you, abstractions are leaky.
|
| Hence the dissatisfaction with a lot of these tools.
| aantix wrote:
| Interesting take..
|
| What do you think is the solution - when an enterprise already
| has Jira, Github and Confluence, how do you think a product
| like Grafana Incident should integrate with these somewhat
| overlapping products?
| ethbr0 wrote:
| This feels like a central question of post-cloud / post-SaaS
| outsourcing.
|
| In the end, it boils down to two options: offer deep APIs
| into your product, or don't.
|
| IMHO, what needs to happen to support the former is for every
| SaaS purchase to include full technical due diligence on
| external integration capabilities.
|
| Integration needs to start being a headline feature in
| purchasing. And less an afterthought when a horrified
| engineer looks at some new enterprise product that's already
| being adopted.
| encryptluks2 wrote:
| So does Grafana actually believe in open source or not?
| sjwhitworth wrote:
| capableweb wrote:
| You'd do your job as a CEO better if you didn't spam
| competitors HN threads with your own product, unless you have
| something relevant to bring to the table. This comment just
| looks like a shameless plug because you're in the same sector.
|
| One way you could approach is to highlight what you think is
| good with Grafanas implementation, and what could be better,
| and then contrast that with your own offering, without sounding
| like a salesman.
| burkaman wrote:
| This is just incredibly rude. Please don't do it again.
| JshWright wrote:
| This is timely... I just started building out an internal
| "chatops" solution that leans heavily on OnCall. Looks like I may
| be able to set that aside.
|
| If this is implemented as cleanly as OnCall, I have high hopes.
| It isn't without bugs, but it's already miles ahead of solutions
| like Pager Duty (in my opinion).
| btables wrote:
| I'd checkout FireHydrant, but I'm biased ;)
| JshWright wrote:
| Yeah, there are definitely already products in this space,
| but we're already invested in Grafana, so it makes sense to
| lean in that direction, even if it meant a little custom work
| on our end (though it looks like that may not be necessary
| now)
| bloodyplonker22 wrote:
| PagerDuty is a product that has not evolved much at all in the
| last 10 years, unfortunately.
| jtlisi wrote:
| This looks really sharp! Love the opinionated approach to how to
| handle incidents with assigned roles!
| amelius wrote:
| It seems like this is a special case of project management
| software. If the existing products can't handle incidents then
| that software should be improved, not new software written.
| It's the best way to ensure that everybody on the team knows
| how to use the software when it's most urgently needed.
|
| E.g. would you change your favorite editor to a different one,
| in case of an incident? Probably not. So why change project
| management systems?
| [deleted]
| bastardoperator wrote:
| Did we watch a different presentation? ChatOps isn't new.
| What you're describing is what I would consider an antiquated
| practice. Nobody wants to go sniffing around a PM tool at 3AM
| in the morning.
| hughrr wrote:
| Zero here!
| matryer wrote:
| You must have solid tech. :)
| JshWright wrote:
| While you certainly could cobble together incident response
| workflows in something like Jira, I think it makes more sense
| to extend the monitoring and paging tooling (in large part
| due to the reason you mention-- familiarity with the tools
| that you're using as part of that response).
| dijit wrote:
| It's funny what process can do.
|
| 13 years ago I was working on a SaaS eCommerce platform and it
| feels like this tool is a relatively minor improvement over what
| we had built on top of IRC.
|
| That said; it's pretty cool and I'm definitely going to evaluate
| it: as our current PagerDuty integration is not nearly as clean
| as this.
| cfors wrote:
| I wish Grafana would stop trying to make offerings that already
| exist and focus on making their dashboards and alerts as code
| usable.
|
| I would even pay money for an actual offering that worked.
| lukeqsee wrote:
| Grafana Cloud is the best ROI money my startup spends every
| month.
| wernerb wrote:
| Alert templating. Grafana is fussy about configuring alerts on
| dashboards that have variables. What this means is if you have
| 30 clusters and want to use a single dashboard with a drop-down
| variable seefting your cluster you cannot define alerts on it.
| It will refuse to do it.
|
| Alerts are also integrated tightly in dashboards. Forces alerts
| to be saved/backedup/imported as single json blob. We want
| separate management of alerts so they can be defined as code
| and not in the dashboard blob of json!
|
| What makes me chagrined is because of the above issues we have
| to use prometheus alert manager instead while our colleagues
| absolutely LOVE grafana itself! We can't duplicate alerts tens
| of tens times. We don't want that management nor do we want to
| teach our colleagues jsonnet/ksonnet to generate it. We also
| don't want permission problems.
| mikewave wrote:
| The new Grafana alerts do absolutely nothing to help with
| this.
|
| I'm at the point where I would pay 5 figures a year for
| something purely to do better alerting inside or alongside
| Grafana. Clicking alerts together is a nightmare when I have
| a ton of identical systems I need to configure. Same for
| dashboards - the limitations of the current mechanism are too
| severe.
|
| I'd build my own templating mechanism for it, but I still
| want the alerts visible in Grafana itself. Zabbix has the
| power to do all this but with a UX that is not ideal....
| gotjosh- wrote:
| Hey there! I work with alerting in general at Grafana - what
| are the pain points of dashboards and alerts as code you're
| currently experiencing? Would love to deliver / capitalise on
| the feedback.
| wernerb wrote:
| Alert templating. Grafana is fussy about configuring alerts
| on dashboards that have variables. What this means is if you
| have 30 clusters and want to use a single dashboard with a
| drop-down variable seefting your cluster you cannot define
| alerts on it. It will refuse to do it.
|
| Alerts are also integrated tightly in dashboards. Forces
| alerts to be saved/backedup/imported as single json blob. We
| want separate management of alerts so they can be defined as
| code and not in the dashboard blob of json!
|
| What makes me chagrined is because of the above issues we
| have to use prometheus alert manager instead while our
| colleagues absolutely LOVE grafana itself! We can't duplicate
| alerts tens of tens times. We don't want that management nor
| do we want to teach our colleagues jsonnet/ksonnet to
| generate it. We also don't want permission problems.
| wernerb wrote:
| I can't edit my above comment anymore but I see that at
| least alerting is now a separate system in grafana 8!
| Great, we will take a look again!
| cfors wrote:
| For one, I'm not convinced that the Grafana 8 Alerting API
| Swagger docs are up to date or ready for the public [0].
|
| I've literally copied an alert's json format, and then tried
| to post it back and never got it to work.
|
| Here's an example from my bash history:
|
| > curl -X POST -H "Authorization: Bearer $GRAFANA_API_KEY" -H
| "accept: application/json" -d @rule.json
| some_endpoint/api/ruler/grafana/api/v1/rules/test1
|
| I spent a solid day trying to play around with this to get it
| to work. Because of this the alerts are impossible to code
| review or store in a git source. Which stinks because
| Grafana's datasource API's would be amazing to use for
| alerting. But they're either unusable because anybody can
| change them or the administrator could bork them at any given
| point (which has happened before), or just undocumented to
| the point where they are useless.
|
| That's not even to begin on dealing with the "big blob of
| json" problem [1] that was clearly important enough to be
| given an entire spot at GrafanaCon, but even Grafonnet is not
| supported with Grafana 8. There is apparently some CUE way of
| doing this, but I can't seem to find any official
| documentation on that.
|
| Anyways, I've moved back to alertmanager for the time being.
|
| edit: is all of grafana labs downvoting the GP? this is very
| honest and candid feedback here.
|
| [0]: https://editor.swagger.io/?url=https://raw.githubusercon
| tent...
|
| [1]: https://grafana.com/go/grafanaconline/2021/dashboards-
| as-cod...
| wtfishackernews wrote:
| It's currently impossible to write alert rules for Prometheus
| vectors. https://github.com/grafana/grafana/issues/35663
|
| Missing basic functionality like that is a dealbreaker.
| antod wrote:
| Will it always be a Grafana Cloud only offering?
| netingle wrote:
| For now, yes. Long term we're trying to offer everything we do
| both on premise and in the cloud. It's a bit tricky, so we
| can't say when....
| zbhoy wrote:
| Have you heard of Replicated.com before? They might be able
| to get y'all to both on premise and in the cloud at the same
| time easier
| chosenken wrote:
| Would it be possible to have a split offering, with both on
| prem and cloud? In my mind I would prefer to have things like
| Prometheus, Logs, and Metrics stored on prem mainly due to
| the volume of logs and metrics we create. Then use Grafana
| cloud for Grafana Dashboards, Loki logs, and incident
| management that pull directly from my on prem data stores. I
| bring this up as it may be cost prohibitive for us to store
| our metrics in the cloud ( we make so many metrics and logs!
| ) but I would love to off load hosting the front end. Grafana
| cloud takes care of managing and maintaining Grafana
| Dashboard and backend database, Authentication, updates, ect.
| I'm fine hosting Prometheus and Loki locally, have been for a
| long time! I just get annoyed having to host Grafana and
| setting it up, the database up, configuring auth, etc.
| bboreham wrote:
| I'm pretty sure that is doable today: Hosted Grafana with
| data sources pointing at your on-prem Prometheus and Loki.
|
| https://grafana.com/docs/grafana-cloud/fundamentals/gs-
| visua...
|
| (I work for Grafana Labs, but not on this part)
| mikewave wrote:
| Is there any hope of a Grafana Cloud data access proxy that
| runs on prem and enables us to give the Cloud access to
| databases we cannot expose?
| netingle wrote:
| Yes! It's something we've be mulling for a while, and I was
| just talking to one of the PMs about it this morning. This
| year for sure I hope.
| matryer wrote:
| Yeah, building for Grafana Cloud has big dev benefits too. We
| can iterate quickly, run live experiments, and build a more
| complicated stack (e.g. for ML tasks). We're going to be
| integrating more and more with the rest of Grafana too. All of
| this is much easier to do in one place.
| encryptluks2 wrote:
| It also has drawbacks like being locked into Saas products
| that you don't have a lot of insight to.
| shamiln wrote:
| Seems like the industry is headed in that direction.
| [deleted]
___________________________________________________________________
(page generated 2022-02-02 23:00 UTC)