[HN Gopher] Is GPT-4 a Good Data Analyst? (2023)
___________________________________________________________________
Is GPT-4 a Good Data Analyst? (2023)
Author : CharlesW
Score : 43 points
Date : 2024-03-25 20:00 UTC (2 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| einpoklum wrote:
| I was somewhat put off by the abstract:
|
| > LLMs... have demonstrated their powerful capabilities in ...
| context understanding, code generation, language generation, data
| storytelling, etc.,
|
| LLMs have not demonstrated understanding (in fact, one could
| argue that they are fundamentally incapable of understanding);
| they have only AFAICT demonstrated the ability to generate
| boilerplate-ish code; "language generation" is too general a task
| to claim that LLMs have succeeded in; and as for data
| storytelling I don't know, but they can spin yarns. The problems
| is that those yarns are often divorced from reality; see:
|
| https://www.ibm.com/topics/ai-hallucinations
|
| --------
|
| Leafing through the paper, and specifically tables 6 and 7, I
| don't believe their conclusion, that "GPT-4 can perform
| comparable [sic] to a data analyst", is well-founded.
| mewpmewp2 wrote:
| I don't even understand what understanding exactly means,
| perhaps anyone who understands it, can enlighten me?
|
| Do I, myself understand? Stand under what exactly? What is that
| supposed to mean?
| ocbyc wrote:
| Transformers are just pattern matching. So if you write "give
| me a list of dog names" it knows that "Spot" should be in
| that result set. Even though it doesn't really know what a
| dog is, a list is, or what a spot is.
| bongodongobob wrote:
| I don't think that's true. They clearly group related
| things together and seem to be able to create concepts that
| aren't specifically in the training data. For example, it
| will figure out the different features of a face, eyes,
| nose, mouth even if you don't explicitly tell it what those
| are. Which is why they are so cool.
| mewpmewp2 wrote:
| They are cool, but then you are also cool.
| rafaelero wrote:
| > Transformers are just pattern matching.
|
| That's trivially true. The question is: are we any
| different?
| ALittleLight wrote:
| Can you describe a test that would separate trivial pattern
| matching from true understanding?
| lottin wrote:
| A simple conversation would do.
| mewpmewp2 wrote:
| Could you share a conversation link with GPT-4 with
| either about a "list" or a "dog", to determine whether it
| truly understands one of those things compared to a
| human?
| bongodongobob wrote:
| Just did that. It seems to understand. Checkmate
| /fingerguns
| mewpmewp2 wrote:
| How would I test whether I "know" or "understand" what a
| dog is?
| inopinatus wrote:
| Even this seems too grand a claim. I'd water it down thus:
| the LLM encodes that the token(s) for "Spot" are
| probabilistically plausible in the ensuing output.
| advael wrote:
| Solipsism is truly the best fully-general counterargument
| unclebucknasty wrote:
| Agreed, right down to their conclusion resonating as way
| overstated. Actually, meaningless would be more accurate.
|
| The thing about LLMs is exactly that they _don 't_ understand
| by design. It often feels very distinctly like it's just
| engaging in sophisticated wordplay. A parlor trick.
|
| When ChatGPT 4 first came out I spent a couple of hours putting
| together a chess game using ChatGPT as the engine. It was
| shockingly bad, as in even attempting to make invalid moves.
|
| I get it: it's not tuned for that purpose, and its chess
| training corpus could probably be expanded to improve it as
| well.
|
| But, it actually served as a near-perfect demonstration of its
| lack of understanding, as well as the confidence with which it
| asserts things that are simply wrong.
|
| On a recent integration project with a good bit of nuanced
| functionality, it led me astray multiple times. I've gotten to
| a point where I can feel when its answers are not quite right,
| particularly if I know just a little about the topic. And, when
| challenged, it does that strange thing of responding with
| something along the lines of, "My apologies you're completely
| right that I was completely wrong".
|
| Over time, there becomes a sense that there is no there there.
| Even it's writing capabilities, lauded by so many, are of a
| style that is superficial and perfunctory or rote. That makes
| sense when you know what it is, but that's the thing: we get
| articles like these, lauding its wisdom.
| bongodongobob wrote:
| Idk. One of my first tests for GPT4 was writing a website
| "for snakes." It was a flask app, and it did all the obvious
| things you'd expect. There was a title that said "Snake.com -
| A website for snakes" and a bunch of silly marketing stuff.
|
| What impressed me is when I asked to make it more snake-like
| (what does that even mean right?).
|
| It changed the colors to shades of green, used italic fonts,
| added some hisssssing sssstuff to wordssss, and added a
| diamond pattern through the background.
|
| It was a dumb and not very fancy site, but I'm not sure you
| can say it doesn't understand anything at all when you ask it
| to make a website more snakelike and actually made a pretty
| good attempt at doing it.
| unclebucknasty wrote:
| Yeah, that's kind of a different conception of
| understanding though. The lines do get a little blurry at a
| certain point, and a lot of what it does "feels" like
| understanding, especially given how it "communicates".
|
| But I think it comes down to whether it can reason about
| things and whether it can draw new conclusions or create
| new information as a result.
|
| Your snake site is probably a good example. ChatGPT has a
| bunch of words that it knows are associated with snakes.
| It's pretty straightforward pattern matching. It doesn't
| really "know" what those words mean, except that they have
| relationships to other words.
|
| But, if you were to ask it to reason and draw new
| conclusions about these things beyond its training corpus,
| it would be unable to reliably do so.
|
| Similarly, it had no idea about the quality (and sometimes
| legality) of the chess moves it generated.
| kva wrote:
| Given the right prompt, I'm sure it is....but when do users ever
| enter the right prompt? :(
| viscanti wrote:
| OpenAI should make something so that people can enter their
| prompt and maybe even drop in a knowledge base and then share
| with anyone else who wants that functionality.
| snoman wrote:
| That's ptetty close to what GPTs are, with the exception of
| knowledge bases.
|
| There's more to it, but the tooling to create a GPT is
| basically a hand-holding mechanism to create a prompt.
| gregorymichael wrote:
| GPTs have the knowledge base too. (Mixed results though)
| wolpoli wrote:
| Would the final product be similar to Github copilot, but for
| prompt?
| williamcotton wrote:
| Didn't you get the memo? If you're holding the hammer by the
| head and wondering why it isn't driving the nail in that it is
| clearly the fault of the manufacturer.
|
| There's even a handy aphorism to remind you that the user is
| never to blame: "You're holding it wrong."
|
| Jokes aside, I wonder what the general writing abilities and
| communication skills are for people that cannot for the life of
| them get usable results from an LLM.
| richardw wrote:
| You can't depend on it at all. I mean, you can use it for a
| tremendous amount of work, but until there is a way to
| constrain the bullshit LLM's can't be used for anything that
| requires a correct answer.
|
| The terms "depend" and "require" there are the hard versions.
| You can't send people to the moon on the outputs of LLM's.
| SV_BubbleTime wrote:
| "42"
| greenavocado wrote:
| Even the latest commercial LLMs are happy to confidently bullshit
| about what they think is in published research even if they
| provide citations. Often the citations themselves are slightly
| corrupted. I actually verify each LLM claim so I know this is
| happening a lot. Occasionally they are complete fabrications. It
| really varies by research topic. Its really bad in esoteric
| research areas. They even acknowledge the paper was actually
| about something else if you call them out on it. What a disaster.
| LLMs are still useful for information retrieval and exploration
| as long as you understand you are having a conversation with a
| habitual liar / expert beginner and adjust your prompts and
| expectations accordingly.
| bongodongobob wrote:
| Unintuitively, I think you'll probably end up with better
| answers if you don't ask for citations. The vast majority of
| its training isn't white papers so you're artificially
| constraining its "imagination" to the cited sources space. I
| find the more constraints you add, the worse your answers are.
| andy99 wrote:
| May 2023 using GPT-4-0314.
| mritchie712 wrote:
| reminds me of this tweet [0] Them: Can you just
| quickly pull this data for me? Me: Sure, let me
| just: SELECT * FROM
| some_ideal_clean_and_pristine.table_that_you_think_exists
|
| GPT-4 is good on a single CSV, but breaks down quickly applied to
| a real database / data warehouse. I know they're using multiple
| tables in the paper, but it appears to be a pristine schema
| that's very easy to reason about. In the real world, when you're
| trying to join postgres to hubspot and stripe data, an LLM isn't
| able to write the SQL from scratch get the right answer.
|
| We're working on an approach using a semantic layer at
| https://www.definite.app/ if you're interested in this sort of
| thing.
|
| 0 -
| https://twitter.com/sethrosen/status/1252291581320757249?lan...
| dangoodmanUT wrote:
| Not on useful datasets in real places
| cstanley wrote:
| This paper was published 154 days ago, probably a year since the
| authors did the experiment. Sooo much has happened since then!
| This showed already that GPT4 is pretty darn good analyst.
|
| All this real-world complexity can be tamed by stuffing the
| prompt with a ton of relevant context and an amazing prompt
| engine. We'll have bots that autonomously query the database
| hundreds of times building a 5 page "deep-dive" analytics report
| in minutes.
|
| At least that's what we're trying at patterns.app.
___________________________________________________________________
(page generated 2024-03-25 23:00 UTC)