[HN Gopher] Is GPT-4 a Good Data Analyst? (2023)
       ___________________________________________________________________
        
       Is GPT-4 a Good Data Analyst? (2023)
        
       Author : CharlesW
       Score  : 43 points
       Date   : 2024-03-25 20:00 UTC (2 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | einpoklum wrote:
       | I was somewhat put off by the abstract:
       | 
       | > LLMs... have demonstrated their powerful capabilities in ...
       | context understanding, code generation, language generation, data
       | storytelling, etc.,
       | 
       | LLMs have not demonstrated understanding (in fact, one could
       | argue that they are fundamentally incapable of understanding);
       | they have only AFAICT demonstrated the ability to generate
       | boilerplate-ish code; "language generation" is too general a task
       | to claim that LLMs have succeeded in; and as for data
       | storytelling I don't know, but they can spin yarns. The problems
       | is that those yarns are often divorced from reality; see:
       | 
       | https://www.ibm.com/topics/ai-hallucinations
       | 
       | --------
       | 
       | Leafing through the paper, and specifically tables 6 and 7, I
       | don't believe their conclusion, that "GPT-4 can perform
       | comparable [sic] to a data analyst", is well-founded.
        
         | mewpmewp2 wrote:
         | I don't even understand what understanding exactly means,
         | perhaps anyone who understands it, can enlighten me?
         | 
         | Do I, myself understand? Stand under what exactly? What is that
         | supposed to mean?
        
           | ocbyc wrote:
           | Transformers are just pattern matching. So if you write "give
           | me a list of dog names" it knows that "Spot" should be in
           | that result set. Even though it doesn't really know what a
           | dog is, a list is, or what a spot is.
        
             | bongodongobob wrote:
             | I don't think that's true. They clearly group related
             | things together and seem to be able to create concepts that
             | aren't specifically in the training data. For example, it
             | will figure out the different features of a face, eyes,
             | nose, mouth even if you don't explicitly tell it what those
             | are. Which is why they are so cool.
        
               | mewpmewp2 wrote:
               | They are cool, but then you are also cool.
        
             | rafaelero wrote:
             | > Transformers are just pattern matching.
             | 
             | That's trivially true. The question is: are we any
             | different?
        
             | ALittleLight wrote:
             | Can you describe a test that would separate trivial pattern
             | matching from true understanding?
        
               | lottin wrote:
               | A simple conversation would do.
        
               | mewpmewp2 wrote:
               | Could you share a conversation link with GPT-4 with
               | either about a "list" or a "dog", to determine whether it
               | truly understands one of those things compared to a
               | human?
        
               | bongodongobob wrote:
               | Just did that. It seems to understand. Checkmate
               | /fingerguns
        
             | mewpmewp2 wrote:
             | How would I test whether I "know" or "understand" what a
             | dog is?
        
             | inopinatus wrote:
             | Even this seems too grand a claim. I'd water it down thus:
             | the LLM encodes that the token(s) for "Spot" are
             | probabilistically plausible in the ensuing output.
        
           | advael wrote:
           | Solipsism is truly the best fully-general counterargument
        
         | unclebucknasty wrote:
         | Agreed, right down to their conclusion resonating as way
         | overstated. Actually, meaningless would be more accurate.
         | 
         | The thing about LLMs is exactly that they _don 't_ understand
         | by design. It often feels very distinctly like it's just
         | engaging in sophisticated wordplay. A parlor trick.
         | 
         | When ChatGPT 4 first came out I spent a couple of hours putting
         | together a chess game using ChatGPT as the engine. It was
         | shockingly bad, as in even attempting to make invalid moves.
         | 
         | I get it: it's not tuned for that purpose, and its chess
         | training corpus could probably be expanded to improve it as
         | well.
         | 
         | But, it actually served as a near-perfect demonstration of its
         | lack of understanding, as well as the confidence with which it
         | asserts things that are simply wrong.
         | 
         | On a recent integration project with a good bit of nuanced
         | functionality, it led me astray multiple times. I've gotten to
         | a point where I can feel when its answers are not quite right,
         | particularly if I know just a little about the topic. And, when
         | challenged, it does that strange thing of responding with
         | something along the lines of, "My apologies you're completely
         | right that I was completely wrong".
         | 
         | Over time, there becomes a sense that there is no there there.
         | Even it's writing capabilities, lauded by so many, are of a
         | style that is superficial and perfunctory or rote. That makes
         | sense when you know what it is, but that's the thing: we get
         | articles like these, lauding its wisdom.
        
           | bongodongobob wrote:
           | Idk. One of my first tests for GPT4 was writing a website
           | "for snakes." It was a flask app, and it did all the obvious
           | things you'd expect. There was a title that said "Snake.com -
           | A website for snakes" and a bunch of silly marketing stuff.
           | 
           | What impressed me is when I asked to make it more snake-like
           | (what does that even mean right?).
           | 
           | It changed the colors to shades of green, used italic fonts,
           | added some hisssssing sssstuff to wordssss, and added a
           | diamond pattern through the background.
           | 
           | It was a dumb and not very fancy site, but I'm not sure you
           | can say it doesn't understand anything at all when you ask it
           | to make a website more snakelike and actually made a pretty
           | good attempt at doing it.
        
             | unclebucknasty wrote:
             | Yeah, that's kind of a different conception of
             | understanding though. The lines do get a little blurry at a
             | certain point, and a lot of what it does "feels" like
             | understanding, especially given how it "communicates".
             | 
             | But I think it comes down to whether it can reason about
             | things and whether it can draw new conclusions or create
             | new information as a result.
             | 
             | Your snake site is probably a good example. ChatGPT has a
             | bunch of words that it knows are associated with snakes.
             | It's pretty straightforward pattern matching. It doesn't
             | really "know" what those words mean, except that they have
             | relationships to other words.
             | 
             | But, if you were to ask it to reason and draw new
             | conclusions about these things beyond its training corpus,
             | it would be unable to reliably do so.
             | 
             | Similarly, it had no idea about the quality (and sometimes
             | legality) of the chess moves it generated.
        
       | kva wrote:
       | Given the right prompt, I'm sure it is....but when do users ever
       | enter the right prompt? :(
        
         | viscanti wrote:
         | OpenAI should make something so that people can enter their
         | prompt and maybe even drop in a knowledge base and then share
         | with anyone else who wants that functionality.
        
           | snoman wrote:
           | That's ptetty close to what GPTs are, with the exception of
           | knowledge bases.
           | 
           | There's more to it, but the tooling to create a GPT is
           | basically a hand-holding mechanism to create a prompt.
        
             | gregorymichael wrote:
             | GPTs have the knowledge base too. (Mixed results though)
        
           | wolpoli wrote:
           | Would the final product be similar to Github copilot, but for
           | prompt?
        
         | williamcotton wrote:
         | Didn't you get the memo? If you're holding the hammer by the
         | head and wondering why it isn't driving the nail in that it is
         | clearly the fault of the manufacturer.
         | 
         | There's even a handy aphorism to remind you that the user is
         | never to blame: "You're holding it wrong."
         | 
         | Jokes aside, I wonder what the general writing abilities and
         | communication skills are for people that cannot for the life of
         | them get usable results from an LLM.
        
         | richardw wrote:
         | You can't depend on it at all. I mean, you can use it for a
         | tremendous amount of work, but until there is a way to
         | constrain the bullshit LLM's can't be used for anything that
         | requires a correct answer.
         | 
         | The terms "depend" and "require" there are the hard versions.
         | You can't send people to the moon on the outputs of LLM's.
        
         | SV_BubbleTime wrote:
         | "42"
        
       | greenavocado wrote:
       | Even the latest commercial LLMs are happy to confidently bullshit
       | about what they think is in published research even if they
       | provide citations. Often the citations themselves are slightly
       | corrupted. I actually verify each LLM claim so I know this is
       | happening a lot. Occasionally they are complete fabrications. It
       | really varies by research topic. Its really bad in esoteric
       | research areas. They even acknowledge the paper was actually
       | about something else if you call them out on it. What a disaster.
       | LLMs are still useful for information retrieval and exploration
       | as long as you understand you are having a conversation with a
       | habitual liar / expert beginner and adjust your prompts and
       | expectations accordingly.
        
         | bongodongobob wrote:
         | Unintuitively, I think you'll probably end up with better
         | answers if you don't ask for citations. The vast majority of
         | its training isn't white papers so you're artificially
         | constraining its "imagination" to the cited sources space. I
         | find the more constraints you add, the worse your answers are.
        
       | andy99 wrote:
       | May 2023 using GPT-4-0314.
        
       | mritchie712 wrote:
       | reminds me of this tweet [0]                   Them: Can you just
       | quickly pull this data for me?              Me: Sure, let me
       | just:               SELECT * FROM
       | some_ideal_clean_and_pristine.table_that_you_think_exists
       | 
       | GPT-4 is good on a single CSV, but breaks down quickly applied to
       | a real database / data warehouse. I know they're using multiple
       | tables in the paper, but it appears to be a pristine schema
       | that's very easy to reason about. In the real world, when you're
       | trying to join postgres to hubspot and stripe data, an LLM isn't
       | able to write the SQL from scratch get the right answer.
       | 
       | We're working on an approach using a semantic layer at
       | https://www.definite.app/ if you're interested in this sort of
       | thing.
       | 
       | 0 -
       | https://twitter.com/sethrosen/status/1252291581320757249?lan...
        
       | dangoodmanUT wrote:
       | Not on useful datasets in real places
        
       | cstanley wrote:
       | This paper was published 154 days ago, probably a year since the
       | authors did the experiment. Sooo much has happened since then!
       | This showed already that GPT4 is pretty darn good analyst.
       | 
       | All this real-world complexity can be tamed by stuffing the
       | prompt with a ton of relevant context and an amazing prompt
       | engine. We'll have bots that autonomously query the database
       | hundreds of times building a 5 page "deep-dive" analytics report
       | in minutes.
       | 
       | At least that's what we're trying at patterns.app.
        
       ___________________________________________________________________
       (page generated 2024-03-25 23:00 UTC)