[HN Gopher] Real World Examples of GPT-3 Plain Language Root Cau...
       ___________________________________________________________________
        
       Real World Examples of GPT-3 Plain Language Root Cause Summaries
        
       Author : Loke123
       Score  : 47 points
       Date   : 2021-03-25 17:57 UTC (5 hours ago)
        
 (HTM) web link (www.zebrium.com)
 (TXT) w3m dump (www.zebrium.com)
        
       | Animats wrote:
       | Microsoft had something similar in the Windows 7 era. They have a
       | crash dump analyzer that produces long reports. They fed those
       | into a classifier, to group similar dumps together. Then all the
       | dumps grouped together were given to one person to find the
       | common bug.
        
       | solidasparagus wrote:
       | This is interesting, but I'm not sure I'm a fan. One of the core
       | problems in building businesses out of AI is that people pick
       | problems they want to solve and try to get AI to solve them
       | rather than picking problems that AI is likely to be able to
       | solve. AI is unreliable, particularly in the long-tail, and
       | cannot easily be used in domains where the cost of failure is
       | high. Using AI to accelerate humans via human-in-the-loop AI
       | tools is a great example of where things like GPT-3 can have real
       | value.
       | 
       | But good root cause analysis is a matter of establishing facts
       | and building a chain of logic on top of those facts to get to the
       | root cause. You cannot rely on models like GPT-3 to give you
       | reliable baseline facts. Particularly when you are talking about
       | a production issue that needs to be fixed ASAP. The key line that
       | worries me from this blog post is "when results are suboptimal,
       | they are mostly not misleading". 'Mostly not misleading' isn't
       | going to cut it when I'm in the middle of an outage. I think that
       | will prove to be a problem if this tool gets widespread usage.
       | 
       | That being said, I'm a huge fan of applying AI to human-in-the-
       | loop problems and this was a cool idea for how modern language
       | models can be applied.
        
         | cl42 wrote:
         | I'm so with you on this. First, you don't need to turn root
         | cause analysis into text like this via GPT-3; there are easier
         | ways.
         | 
         | Secondly, I imagine most cases of "root cause analysis" require
         | you to be very, very clear in understanding the... root
         | cause... So using generalized language models will probably
         | lead to unacceptable errors, which means there are probably
         | better ways of addressing this problem (as per the discussion
         | here on error rates and unacceptable errors in ML-products:
         | https://phaseai.com/resources/how-to-build-ml-products)
        
         | Ajs1 wrote:
         | fair point, but this blog and your comment is about a summary
         | sentence. if you read through to how the underlying log reports
         | are constructed, those are very accurate (and also quite
         | concise).
        
       | Loke123 wrote:
       | This is a follow up to an earlier post describing the use of
       | GPT-3 to summarize log events that describe software problems.
       | https://news.ycombinator.com/item?id=25749820
       | 
       | This post shares examples of real summaries generated during beta
       | tests, as well as examples of some sub-optimal outcomes.
        
       | londons_explore wrote:
       | This kind of output is what the logs should have said in the
       | first place...
       | 
       | I wish projects like the linux kernel would work on making log
       | messages, at least those for common events, more readable to an
       | engineer who isn't familiar with kernel internals.
        
         | FBT wrote:
         | When it comes to the systemd logs, this is kind of what the -x
         | flag to journalctl does (or tries to do.)
         | 
         | Having detailed human-level descriptions of what's going on and
         | how to fix it is great. But you also don't want to drown out
         | any important details under waves of verbose text.
         | 
         | The solution, then, is to show the extra detail only when it's
         | requested with the -x flag.
         | 
         | This works pretty well, all things considered. The detailed
         | messages are fine, but they could be better--but that's
         | probably always going to be true. It's a start, anyway.
        
         | Loke123 wrote:
         | :) I'm sure you're not the only one to wish that.
        
       | hooande wrote:
       | The set of people who need to know the root cause of a problem
       | but aren't familiar with reading logs seems like it might be
       | pretty small. I find that as a developer I only need something
       | that a few times and then I have a sense of what log messages
       | mean what. This seems like it might be valuable for a tech
       | executive to get a quick feel for why categories of errors are
       | occuring, but I would generally have someone compile and
       | summarize that data for me.
       | 
       | Like most people, I love the idea of ingesting large amounts of
       | data and making it readable. I guess what I would want,
       | personally, is more like a gpt-3 powered stackoverflow search
       | where I can put in an arbitrary cryptic error and get a human
       | readable, root cause based explanation. This is a very
       | interesting use case and I hope they continue to develop it.
        
       | draugadrotten wrote:
       | The human-readable text is very nice. However, these are not Root
       | Causes:
       | 
       | * The root cause of the issue was that the Jenkins master was not
       | able to connect to the vCenter server. ==> Why was it not?
       | 
       | * The root cause was a drive failed error ==> Why did the drive
       | fail?
       | 
       | * The root cause was that the Slack API was rate limited. => Why
       | was it rate limited?
       | 
       | These exampled from the article may be human-readable errors, but
       | that doesn't make them root causes.
       | 
       | To have a root cause analysis, try asking Why five times.
       | https://en.wikipedia.org/wiki/Five_whys
        
         | Ajs1 wrote:
         | good point. This is the unfiltered response from the GPT-3
         | prompt, and the phrase "root cause" is a bit of an
         | overstatement by GPT-3. However the collection of log events
         | that are in the actual reports are far more descriptive. You
         | can find examples here: https://www.zebrium.com/blog/using-
         | gpt-3-with-zebrium-for-pl... and here:
         | https://www.zebrium.com/blog/is-autonomous-monitoring-the-an...
        
       | travisjungroth wrote:
       | This is the first use of GPT-3 to give me an "oh shit" reaction.
       | Turning verbose, structured messages into something more human
       | readable is a huge problem space. Seeing one instance of it
       | working makes me think there will be many more.
        
         | Loke123 wrote:
         | Good to hear we're not alone in thinking that this is a
         | promising use case.
        
       | dcolkitt wrote:
       | A similar approach might be pretty useful for C++ template errors
       | and other notoriously complex compiler errors that you tend to
       | get with higher-order type systems.
        
         | Tossrock wrote:
         | This could be an amazing IDE plugin.
        
       | jeffbee wrote:
       | You're only supposed to have novel outages. If you can train a
       | machine to summarize outages, you might be doing it wrong.
        
         | Loke123 wrote:
         | just to clarify - our machine is unsupervised, so it learns the
         | normal for any application, and identifies log sequences that
         | are novel for that application. We then turn around and feed it
         | to GPT-3, which indeed tries to match on existing data sets in
         | the public domain. So while the problem indeed has been
         | documented by someone else in the world, it is still novel for
         | that particular application.
        
       ___________________________________________________________________
       (page generated 2021-03-25 23:01 UTC)