[HN Gopher] Real World Examples of GPT-3 Plain Language Root Cau...
___________________________________________________________________
Real World Examples of GPT-3 Plain Language Root Cause Summaries
Author : Loke123
Score : 47 points
Date : 2021-03-25 17:57 UTC (5 hours ago)
(HTM) web link (www.zebrium.com)
(TXT) w3m dump (www.zebrium.com)
| Animats wrote:
| Microsoft had something similar in the Windows 7 era. They have a
| crash dump analyzer that produces long reports. They fed those
| into a classifier, to group similar dumps together. Then all the
| dumps grouped together were given to one person to find the
| common bug.
| solidasparagus wrote:
| This is interesting, but I'm not sure I'm a fan. One of the core
| problems in building businesses out of AI is that people pick
| problems they want to solve and try to get AI to solve them
| rather than picking problems that AI is likely to be able to
| solve. AI is unreliable, particularly in the long-tail, and
| cannot easily be used in domains where the cost of failure is
| high. Using AI to accelerate humans via human-in-the-loop AI
| tools is a great example of where things like GPT-3 can have real
| value.
|
| But good root cause analysis is a matter of establishing facts
| and building a chain of logic on top of those facts to get to the
| root cause. You cannot rely on models like GPT-3 to give you
| reliable baseline facts. Particularly when you are talking about
| a production issue that needs to be fixed ASAP. The key line that
| worries me from this blog post is "when results are suboptimal,
| they are mostly not misleading". 'Mostly not misleading' isn't
| going to cut it when I'm in the middle of an outage. I think that
| will prove to be a problem if this tool gets widespread usage.
|
| That being said, I'm a huge fan of applying AI to human-in-the-
| loop problems and this was a cool idea for how modern language
| models can be applied.
| cl42 wrote:
| I'm so with you on this. First, you don't need to turn root
| cause analysis into text like this via GPT-3; there are easier
| ways.
|
| Secondly, I imagine most cases of "root cause analysis" require
| you to be very, very clear in understanding the... root
| cause... So using generalized language models will probably
| lead to unacceptable errors, which means there are probably
| better ways of addressing this problem (as per the discussion
| here on error rates and unacceptable errors in ML-products:
| https://phaseai.com/resources/how-to-build-ml-products)
| Ajs1 wrote:
| fair point, but this blog and your comment is about a summary
| sentence. if you read through to how the underlying log reports
| are constructed, those are very accurate (and also quite
| concise).
| Loke123 wrote:
| This is a follow up to an earlier post describing the use of
| GPT-3 to summarize log events that describe software problems.
| https://news.ycombinator.com/item?id=25749820
|
| This post shares examples of real summaries generated during beta
| tests, as well as examples of some sub-optimal outcomes.
| londons_explore wrote:
| This kind of output is what the logs should have said in the
| first place...
|
| I wish projects like the linux kernel would work on making log
| messages, at least those for common events, more readable to an
| engineer who isn't familiar with kernel internals.
| FBT wrote:
| When it comes to the systemd logs, this is kind of what the -x
| flag to journalctl does (or tries to do.)
|
| Having detailed human-level descriptions of what's going on and
| how to fix it is great. But you also don't want to drown out
| any important details under waves of verbose text.
|
| The solution, then, is to show the extra detail only when it's
| requested with the -x flag.
|
| This works pretty well, all things considered. The detailed
| messages are fine, but they could be better--but that's
| probably always going to be true. It's a start, anyway.
| Loke123 wrote:
| :) I'm sure you're not the only one to wish that.
| hooande wrote:
| The set of people who need to know the root cause of a problem
| but aren't familiar with reading logs seems like it might be
| pretty small. I find that as a developer I only need something
| that a few times and then I have a sense of what log messages
| mean what. This seems like it might be valuable for a tech
| executive to get a quick feel for why categories of errors are
| occuring, but I would generally have someone compile and
| summarize that data for me.
|
| Like most people, I love the idea of ingesting large amounts of
| data and making it readable. I guess what I would want,
| personally, is more like a gpt-3 powered stackoverflow search
| where I can put in an arbitrary cryptic error and get a human
| readable, root cause based explanation. This is a very
| interesting use case and I hope they continue to develop it.
| draugadrotten wrote:
| The human-readable text is very nice. However, these are not Root
| Causes:
|
| * The root cause of the issue was that the Jenkins master was not
| able to connect to the vCenter server. ==> Why was it not?
|
| * The root cause was a drive failed error ==> Why did the drive
| fail?
|
| * The root cause was that the Slack API was rate limited. => Why
| was it rate limited?
|
| These exampled from the article may be human-readable errors, but
| that doesn't make them root causes.
|
| To have a root cause analysis, try asking Why five times.
| https://en.wikipedia.org/wiki/Five_whys
| Ajs1 wrote:
| good point. This is the unfiltered response from the GPT-3
| prompt, and the phrase "root cause" is a bit of an
| overstatement by GPT-3. However the collection of log events
| that are in the actual reports are far more descriptive. You
| can find examples here: https://www.zebrium.com/blog/using-
| gpt-3-with-zebrium-for-pl... and here:
| https://www.zebrium.com/blog/is-autonomous-monitoring-the-an...
| travisjungroth wrote:
| This is the first use of GPT-3 to give me an "oh shit" reaction.
| Turning verbose, structured messages into something more human
| readable is a huge problem space. Seeing one instance of it
| working makes me think there will be many more.
| Loke123 wrote:
| Good to hear we're not alone in thinking that this is a
| promising use case.
| dcolkitt wrote:
| A similar approach might be pretty useful for C++ template errors
| and other notoriously complex compiler errors that you tend to
| get with higher-order type systems.
| Tossrock wrote:
| This could be an amazing IDE plugin.
| jeffbee wrote:
| You're only supposed to have novel outages. If you can train a
| machine to summarize outages, you might be doing it wrong.
| Loke123 wrote:
| just to clarify - our machine is unsupervised, so it learns the
| normal for any application, and identifies log sequences that
| are novel for that application. We then turn around and feed it
| to GPT-3, which indeed tries to match on existing data sets in
| the public domain. So while the problem indeed has been
| documented by someone else in the world, it is still novel for
| that particular application.
___________________________________________________________________
(page generated 2021-03-25 23:01 UTC)