hngopher.com

       [HN Gopher] Re-Evaluating GPT-4's Bar Exam Performance
       ___________________________________________________________________
        
       Re-Evaluating GPT-4's Bar Exam Performance
        
       Author : rogerkeays
       Score  : 89 points
       Date   : 2024-06-01 07:02 UTC (1 days ago)
        
 (HTM) web link (link.springer.com)
 (TXT) w3m dump (link.springer.com)
        
       | Bromeo wrote:
       | Very interesting. The abstract claims that although GPT-4 was
       | claimed to score in the 92nd percentile on the bar exam, when
       | correcting for a bunch of things they find that these results are
       | overinflated, and that it only scores in the 15th percentile
       | specifically on essays when compared to only people that passed
       | the bar.
       | 
       | That still does put it into bar-passing territory, though, since
       | it still scores better than about one sixth of the people that
       | passed the exam.
        
         | falcor84 wrote:
         | If I understand currently, they measured it at the 69th
         | percentile for the full test across all test takers, so
         | definitely still impressive.
        
       | _fw wrote:
       | So it knows more about the law than you do, but less than they
       | do.
       | 
       | Really glad to see research replicated like this. I'm not
       | surprised that the 90th percentile doesn't hold up.
       | 
       | It's still handy though.
        
       | radford-neal wrote:
       | A basic problem with evaluations like these is that the test is
       | designed to discriminate between _humans_ who would make good
       | lawyers and humans who would not make good lawyers. The test is
       | not necessarily any good at telling whether a non-human would
       | make a good lawyer, since it will not test anything that pretty
       | much all humans know, but non-humans may not.
       | 
       | For example, I doubt that it asks whether, for a person of
       | average wealth and income, a $1000 fine is a more or less severe
       | punishment than a month in jail.
        
         | anon373839 wrote:
         | Honestly, this is giving the bar exam (and GPT-4) too much
         | credit. The bar tests memorization because it's challenging for
         | humans and easy to score objectively. But memorization isn't
         | that important in legal practice; analysis is. LLMs are
         | superhuman at memorization but terrible at analysis.
        
           | lazide wrote:
           | Eh, also in legal practice there are key skills like
           | selecting the best billable clients, covering your ass,
           | building a reputation, choosing the right market segment,
           | etc. which I'd also argue LLMs suck at.
        
             | gadflyinyoureye wrote:
             | I don't know. There was some talk this weekend about CEOs
             | being replaced by AI. Given the overlap in skill, I'd say
             | there is a distinct possibility an LLM could do that.
             | https://www.msn.com/en-us/money/companies/ceos-could-
             | easily-...
        
               | lazide wrote:
               | Bwahaha. This is like the 'everything can be a directed
               | graph db', 'everything should be a micro service', etc.
               | fads.
               | 
               | No one who has been a CEO, or frankly even worked closely
               | with one, would think this could be even remotely close
               | to possible. Or desirable if it was.
               | 
               | But that is probably 1% or less of the population eh?
        
               | EGreg wrote:
               | https://www.dqindia.com/company-makes-ai-robot-its-ceo-
               | makes...
               | 
               | Seems your claim's been disproven already
        
               | lazide wrote:
               | Bwaha. Funny the company named as doing so doesn't
               | mention it on their actual management team
               | [http://www.netdragon.com/about/management-team.shtml],
               | listing an actual human CEO instead.
               | 
               | But it makes for a fun soundbite eh? Especially when the
               | article claims it was in the past, and totally was
               | awesome. Sucker born every minute.
        
             | paulcole wrote:
             | > which I'd also argue LLMs suck at
             | 
             | OK, I'll bite. What's your evidence for this argument?
        
               | lazide wrote:
               | Every bit of interaction I've ever had with an LLM. And
               | all the research I've seen.
               | 
               | They're plausible word sequence generators, not 'planning
               | for the future' agents. Or market analyzers. Or character
               | evaluators. Or anything else.
               | 
               | And they tend to be _really_ 'gullible'.
               | 
               | What evidence do you have they could do any of those
               | things? (And not just generate plausible text at a
               | prompt, but _actually_ do those things)
        
               | paulcole wrote:
               | > What evidence do you have they could do any of those
               | things?
               | 
               | Every bit of interaction I've ever had with an LLM.
        
           | ethbr1 wrote:
           | I've always drawn the link between skill in memorization and
           | in analysis as:
           | 
           | - Memorization requires you to retain the details of a large
           | amount of material
           | 
           | - The most time-efficient analysis uses instant-recall of
           | relevant general themes to guide research
           | 
           | - Ergo, if someone can memorize and recall a large number of
           | details, they can probably also recall relevant general
           | themes, and therefore quickly perform quality analysis
           | 
           | (Side note: memorization also proves you actually read the
           | material in the first place)
        
             | Jensson wrote:
             | Problem is the LLM memorized the countless examples you can
             | find of old BAR questions using extreme amounts of compute
             | at training time, they don't have that ability to digest a
             | specific case due to both lack of data and it doesn't
             | retrain for new questions.
             | 
             | A human that can digest the general law can also digest a
             | special case, but that isn't true for an LLM.
        
             | anon373839 wrote:
             | I'm not sure why you're being downvoted for this. I agree
             | with you, fact recall is useful and necessary. If you have
             | a larger and more tightly connected base of facts in your
             | head, you can draw better connections.
             | 
             | And even though legal practice tends to be fairly slow and
             | deliberative, there are settings (such as trial advocacy)
             | where there is a real advantage to being able to cite a
             | case or statute from memory.
             | 
             | All that said, I still maintain that it's a poor way to
             | compare humans with machines, for the same reason it would
             | be poor to compare GPT-4 to a novelist on their tokens per
             | second written.
        
           | KennyBlanken wrote:
           | You clearly don't know anything about the bar. One half of
           | your score is split between 6 essay questions, and reviewing
           | two cases to then follow instructions from a theoretical lead
           | attorney.
        
             | anon373839 wrote:
             | I'm licensed in multiple states, including California.
             | 
             | The essay questions also test memorization. They don't
             | require any difficult analysis - just superficial issue-
             | spotting and reciting the correct elements.
             | 
             | If the bar exam were not a memorization test, it would be
             | open book!
        
         | justinpombrio wrote:
         | For a person of average wealth and income, is a $1000 fine is a
         | more or less severe punishment than a month in jail? Be brief.
         | 
         | "For a person of average wealth and income, a $1000 fine is
         | generally less severe than a month in jail. A month in jail
         | entails loss of freedom, potential loss of employment, and
         | social stigma, while a $1000 fine, though financially
         | burdensome, does not affect one's freedom or ability to work"
         | --ChatGPT 4o
        
           | Rinzler89 wrote:
           | What does GPT consider being "average wealth and income".
           | Statistics? Or biased weights from anecdotes he formed on the
           | anecdotes he scraped off the internet on how wealthy people
           | say the feel?
           | 
           | Would be cool to know how LLMs shape their opinions.
        
             | LeoPanthera wrote:
             | You can just ask it, you know.
             | 
             | GPT-4o:
             | 
             |  _"Average wealth and income" can vary significantly by
             | region and context. However, in the United States, as a
             | rough benchmark, the median household income is around
             | $70,000 per year. Wealth, which includes assets such as
             | savings, property, and investments minus debts, is harder
             | to pinpoint but median net worth for U.S. households is
             | approximately $100,000. These figures provide a general
             | idea of what might be considered "average" in terms of
             | wealth and income. "_
        
               | Rinzler89 wrote:
               | _> You can just ask it, you know._
               | 
               | But my question will not be part of the context of that
               | conversation.
        
               | LeoPanthera wrote:
               | Mine was. I asked it the first question, first.
        
               | lambdaxyzw wrote:
               | I like that it immediately assumed the US, even though
               | nothing in your question suggested it. I love that all
               | LLMs have a strong US centric bias.
               | 
               | Btw I'm not personally a lawyer, but I've heard that GPT
               | is especially prone to mixing laws across the borders -
               | for example you ask a law question in language X, and get
               | a response that uses a law from a country Y - and it's
               | extremally convincing doing that (unless you're a lawyer,
               | I guess).
        
               | LordDragonfang wrote:
               | I mean, to be fair, if you're speaking English to it, the
               | most likely possibility is that you're inside the US:
               | 
               | https://en.wikipedia.org/wiki/List_of_countries_by_Englis
               | h-s...
               | 
               | I know there's a lot of complaints about things being US-
               | centric, but the US is a _very_ large country.
        
               | ordersofmag wrote:
               | Well, except the number of English speakers outside the
               | US is much larger than inside the US (as per the
               | wikipedia page you point to) by 5 to 1. Granted many
               | folks are speaking it as their 2nd (or nth) language. But
               | when you take into account the limited set of languages
               | supported by ChatGPT one could reasonably assume English-
               | speaking (typing) users of ChatGPT are from outside the
               | U.S. as non-U.S. folks are in the majority of 'folks for
               | whom english would be their first option when interacting
               | with ChatGPT'. Even if you only count India, Nigeria and
               | Pakistan.
               | 
               | Though of course OpenAI can tell (frequently, roughly)
               | where folks are coming from geographically and could
               | (does?) take that into account.
        
               | LeoPanthera wrote:
               | ChatGPT has user-customizable "instructions", and mine
               | are set to tell it where I live. Any user can do the
               | same, so that it will not make incorrect assumptions for
               | you.
        
               | d3m0t3p wrote:
               | You might increase the probability of getting a correct
               | answer for your region, but imo you decrease your
               | awareness to allucination. Overall you can still get a
               | wrong answer
        
               | fjdjshsh wrote:
               | This is my experience with Hackernews. If the comment
               | doesn't specify the country, it's an American talking
               | about the USA
        
           | johnchristopher wrote:
           | "potential loss of employment,"
           | 
           | Where is that coming from ? That's a very lawyery way to
           | phrase things.
           | 
           | "potential ?" where I live I think people may max out their
           | holidays and overtime (if lucky enough) and leave-without-pay
           | but there would be a conversation with your employer to
           | justify it and how to handle the workload.
           | 
           | In the USA, from what I read, it's more than likely that you
           | would just be fired on the spot, right ?
           | 
           | edit: just googled a bit, where I live you _must_ tell your
           | employer why you will be absent if you go to jail but that
           | can 't be used to justify the breaking of the contract unless
           | the reason for the incarceration is damaging to the company
           | and... yeah, I am definitely not a lawyer :]
        
             | EGreg wrote:
             | So the GPT is on a law exam and is using a very lawyer way
             | to word things? I would say that's great!!
        
             | oconnor663 wrote:
             | My wild guess is that it would depend a lot on how much
             | your employer likes you and how they feel about the reason
             | you're in jail.
        
             | ghaff wrote:
             | Leave-without-pay normally requires some specific
             | justification(s)/discussion. I've certainly given my
             | manager advanced notice about any longer stretches of
             | vacation and I've tried to do it with awareness of
             | workloads (though for something planned months in advance
             | that's not always possible) but I've pretty much never
             | considered it as asking for permission or it being a
             | negotiation. This is in the US.
             | 
             | ADDED: You're probably going to end up lying or at least
             | being very vague "some family stuff to take care of" in
             | this specific scenario but for one month that didn't
             | trigger reporting to employer a lot of professionals could
             | probably get off with it. In any case, the GPT answer seems
             | totally correct for the parameters given.
        
             | Dayshine wrote:
             | Self-employed or small company might not care.
        
             | akira2501 wrote:
             | Many jails have work release. They get you up at 6am, check
             | you out of the jail, let you go to work, then expect you to
             | check back into jail by 6pm.
        
               | johnchristopher wrote:
               | Oh, that's really great ! Is that in the US ?
        
               | ghaff wrote:
               | Which for many professional (and other jobs) probably
               | would require a bunch of tap-dancing around your strict
               | schedule if you were hiding the actual reason.
        
       | dogmayor wrote:
       | The bigger issue here is that actual legal practice looks nothing
       | like the bar, so whether or not an llm passes says nothing about
       | how llms will impact the legal field.
       | 
       | Passing the bar should not be understood to mean "can
       | successfully perform legal tasks."
        
         | ben_w wrote:
         | Indeed, and this is also the general problem with most current
         | ways to evaluate AI: by every _test_ there 's at least one
         | model which looks wildly superhuman, but actually using them
         | reveals they're book-smart at everything without having any
         | street-smarts.
         | 
         | The difference between expectation and reality is tripping
         | people up in both directions -- a nearly-free everything-intern
         | is still very useful, but to treat LLMs* as experts (or capable
         | of meaningful on-the-job learning if you're not fine-tuning the
         | model) is a mistake.
         | 
         | * special purpose AI like Stockfish, however, should be treated
         | as experts
        
         | KennyBlanken wrote:
         | > Passing the bar should not be understood to mean "can
         | successfully perform legal tasks."
         | 
         | Nobody does except a bunch of HNers who among other things,
         | apparently have no idea that a considerable chunk of rulings
         | and opinions in the US federal court system and upper state
         | courts are drafted by law clerks who, ahem, have not taken the
         | bar yet...
         | 
         | The point of the bar and MPRE is like the point of most
         | professional examinations: try to establish minimum standards.
         | That said, the bar does test for "successfully perform legal
         | tasks", actually.
         | 
         | For the US bar, a chunk of your score is based off following
         | instructions on case from the lead attorney, and another chunk
         | is based on essay answers. _Literally demonstrating that you
         | can perform legal tasks_ and have both the knowledge and
         | critical thinking skills necessary.
         | 
         | Further, as previously mentioned, in the US, people usually
         | take it after a clerkship...where they've been receiving
         | extensive training and experience in practical application of
         | law.
         | 
         | Further, law firms do not hire purely based on your bar score.
         | They also look at your grades, what programs you participated
         | in (many law schools run legal clinics to help give students
         | some practical experience, under supervision), your
         | recommendations, who you clerked for, etc. When you're hired,
         | you're under supervision by more senior attorneys as you gain
         | experience.
         | 
         | There's also the MPRE, or ethics test - which involves
         | answering how to handle theoretical scenarios you would find
         | yourself in as a practicing attorney.
         | 
         | Multiple people in this discussion are acting like it's a
         | multiple choice test and if you pass, you're given a pat on the
         | ass and the next day you roll into criminal court and become
         | lead on a murder case...
        
         | violet13 wrote:
         | This, along with several other "meta" objections, is a
         | significant portion of the discussion in the paper.
         | 
         | They basically say two things. First, although the measurement
         | is repeatable at face value, there are several factors that
         | make it less impressive than assumed, and the model performs
         | fairly poorly compared to likely prospective lawyers. Second,
         | there is a number of reasons why the percentile on the test
         | doesn't measure lawyering skills.
         | 
         | One of the other interesting points they bring up is that there
         | is no incentive for humans to seek scores much above passing on
         | the test, because your career outlook doesn't depend on it in
         | any way. This is different from many other placement exams.
        
       | Digory wrote:
       | They originally scored against a test usually taken by people who
       | failed the bar.
       | 
       | So, GPT-4 scores closer to the bottom of people who pass the bar
       | the first time. In other words, it matches the people who cull
       | the rules from texts already written, but who cannot apply it
       | imaginatively.
        
         | speedgoose wrote:
         | > In other words, it matches the people who cull the rules from
         | texts already written, but who cannot apply it imaginatively.
         | 
         | Where did you find that in the article?
        
       | jeffbee wrote:
       | It appears that researchers and commentators are totally missing
       | the application of LLMs to law, and to other areas of
       | professional practice. A generic trained-on-Quora LLM is going to
       | be straight garbage for any specialization, but one that is
       | trained on the contents of the law library will be utterly
       | brilliant for assisting a practicing attorney. People pay serious
       | money for legal indexes, cross-references, and research. An LLM
       | is nothing but a machine-discovered compressed index of text. As
       | an augmentation to existing law research practices, the right LLM
       | will be extremely valuable.
        
         | violet13 wrote:
         | It is a _lossy_ compressed index. It has an approximate
         | knowledge of law, and that approximation can be pretty good -
         | but it doesn 't know when it's outputting plausible but made-up
         | claims. As with GitHub Copilot, it's probably going to be a
         | mixed bag until we can overcome that, because spotting subtle
         | but grave errors can be harder than writing something from
         | scratch.
         | 
         | There's already a fair number of stories of LLMs used by an
         | attorney messing up court filings - e.g., inventing fake case
         | law.
        
           | jeffbee wrote:
           | I am not suggesting that the generative aspects would be
           | useful in drafting motions and such. I am suggesting that
           | their tendency towards false results is harmless if you just
           | use them as a complex index. For example, you could ask it to
           | list appellate cases where one party argued such-and-such and
           | prevailed. Then you would go _read the cases_.
        
       | thehoneybadger wrote:
       | It is difficult to comment without sounding obnoxious, but having
       | taken the bar exam, I found the exam simple. Surprisingly simple.
       | I think it was the single most over hyped experience of my life.
       | I was fed all this insecurity and walked into the convention
       | center expecting to participate in the biggest intellectual
       | challenge in my life. Instead, it was endless multiple choice
       | questions and a couple contrived scenarios for essays.
       | 
       | It may also be surprising to some to understand that legal
       | writing is prized for its degree of formalism. It aims to remove
       | all connotation from a message so as to minimize
       | misunderstanding, much like clean code.
       | 
       | It may also be surprising, but the goal when writing a legal
       | brief or judicial opinion is not to try to sound smart. The goal
       | is to be clear, objective, and thereby, persuasive. Using big
       | words for the sake of using big words, using rare words, using
       | weasel words like "kind of" or "most of the time" or "many people
       | are saying", writing poetically, being overly obtuse and
       | abstract, these are things that get your law school application
       | rejected, your brief ridiculed, and your bar exam failed.
       | 
       | The simpler your communication, the more formulaic, the better.
       | The more your argument is structured, akin to a computer program,
       | the better.
       | 
       | As compared to some other domain, such as fiction, good legal
       | writing much easier for an attention model to simulate. The best
       | exam answers are the ones that are the most formulaic and that
       | use the smallest lexicon and that use words correctly.
       | 
       | I only want to add this comment because I want to inform how non-
       | lawyers perceive the bar exam. Getting an attention model to pass
       | the bar exam is a low bar. It is not some great technical feat. A
       | programmer can practically write a semantic disambiguation
       | algorithm for legal writing from scratch with moderate effort.
       | 
       | It will be a good accomplishment, but it will only be a stepping
       | stone. I am still waiting for AI to tackle messages that have
       | greater nuance and that are truly free form. LLMs are still not
       | there yet.
        
         | euroderf wrote:
         | > It may also be surprising to some to understand that legal
         | writing is prized for its degree of formalism. It aims to
         | remove all connotation from a message so as to minimize
         | misunderstanding, much like clean code.
         | 
         | > The more your argument is structured, akin to a computer
         | program, the better.
         | 
         | You certainly make legal writing sound like a flavor of
         | technical writing. Simplicity, clarity, structure. Is this an
         | accurate comparison ?
        
           | ChainOfFools wrote:
           | it is called a legal code after all
        
             | kergonath wrote:
             | "Code" in that sense predates pretty much any form of
             | computer or technical writing. It came from the same word
             | in old French in the 14th century, which itself came from
             | the Latin _codex_. It basically meant "book". Now of course
             | it is specific to books that contain laws.
        
           | mistrial9 wrote:
           | recently I read a US law trade magazine article on a
           | particular term used in US Federal employment law.. the
           | article was about 12 pages long.. by the second page, they
           | were using circular references, and switching between two
           | phrases that used the same words but had different word
           | order, contexts and therefore meanings, without clearly
           | saying when they switched. By the third or fourth page I was
           | done with that exercise. As a coder and reader of English
           | literature, there was no question at all that the terms where
           | being "churned" as a sleight of hand, directly in writing.
           | One theory about a reason that they did that in an article
           | that claimed to explain the terms, was to setup confusion and
           | misdirection as it is actually practiced in law involving
           | unskilled laymen, and then "solving" the problems by the end
           | of the article.
        
           | airstrike wrote:
           | [delayed]
        
         | A_D_E_P_T wrote:
         | I took a sample CA bar exam for fun, as a non-lawyer who has
         | never set foot in law school. Maybe the sample exam was tougher
         | than the real thing, but I found it surprisingly difficult. A
         | lot of the correct answers to questions were non-obvious --
         | they weren't based on straightforward logic, nor were they
         | based on moral reasoning, and there was no place for "natural
         | law" -- so to answer questions properly you had to have
         | memorized a bit of coursework. There were also a few questions
         | that seemed almost designed to deceive the test-taker; the
         | "obvious" moral choices were the wrong ones.
         | 
         | So maybe it's easy if you study that stuff for a year or two.
         | But you can't just walk in and expect to pass, or bullshit your
         | way through it.
         | 
         | I agree with you on legal writing, but there appears to be a
         | certain amount of ambiguity inherent to language. The Uniform
         | Commercial Code, for instance, is maddeningly vague at points.
        
           | gnicholas wrote:
           | The CA bar exam used the be much harder than other states'.
           | They lowered the pass threshold several years ago, and then
           | reduced the length from 3 days to 2. Now it's probably much
           | more in line with the national norms. Depending on when you
           | took the sample exam, it might be much easier now.
           | 
           | Also, sometimes sample exams are made extra difficult, to
           | convince students that they need to shell out thousands of
           | dollars for prep courses. I recall getting 75% of questions
           | wrong on some sections of a bar prep company's pre-test,
           | which I later realized was designed to emphasize
           | unintuitive/little-known exceptions to general rules. These
           | corners of the law made up a disproportionate number of the
           | questions on the pre-test and gave the impression that the
           | student really needed to work on that subject.
        
           | manquer wrote:
           | Obviously you need subject knowledge, that should be
           | implicit?
           | 
           | Keep in mind even today[1] ( in California and few other
           | states) you don't need to go law school to write the Bar exam
           | and practice law, various forms of apprenticeship under a
           | judge or lawyer are allowed
           | 
           | You also don't need to write the exam to practice many
           | aspects of the legal profession.
           | 
           | The exam is never meant to be a high bar of quality or
           | selection,it was always just a simple validation if you know
           | your basics. Law like many other professions always operated
           | on reputation and networks, not on degrees and
           | certifications.
           | 
           | [1] Unlike say being a doctor, you have to go to med school
           | without exception
        
             | A_D_E_P_T wrote:
             | > _Obviously you need subject knowledge, that should be
             | implicit?_
             | 
             | Well, in a lot of the so-called soft sciences, you can
             | easily beat a test without subject knowledge. I had figured
             | that the bar exam might be something like that -- but it's
             | more akin to something like biology, where there are a lot
             | of arcane and counterintuitive little rules that have
             | emerged over time. And you need to know _those_ , or you're
             | sunk. You can't guess your way past them, because the best-
             | looking guesses tend to be the wrong ones.
             | 
             | (For what it's worth, I realize that this mostly has to do
             | with the Common Law's reverence of precedent-as-binding,
             | and that continental Civil Law systems don't suffer as much
             | from it. But I suppose those continental systems have other
             | problems of their own.)
        
         | carabiner wrote:
         | Genuinely asking: you think the bar exam is a low bar because
         | you personally found it easy, even though the vast majority of
         | takers do not? Doesn't this just reflect your own inability to
         | empathize with other people?
        
       | elicksaur wrote:
       | > Furthermore, unlike its documentation for the other exams it
       | tested (OpenAI 2023b, p. 25), OpenAI's technical report provides
       | no direct citation for how the UBE percentile was computed,
       | creating further uncertainty over both the original source and
       | validity of the 90th percentile claim.
       | 
       | This is the part that bothered me (licensed attorney) from the
       | start. If it scores this high, where are the receipts? I'm sure
       | OpenAI has the social capital to coordinate with the National
       | Conference of Bar Examiners to have a GPT "sit" for a simulated
       | bar exam.
        
       | fnordpiglet wrote:
       | Scoring 96 percentile among humans taking the exam without moving
       | goal posts would have been science fiction two years ago. Now
       | it's suddenly not good enough and the fact a computer program can
       | score decent among passing lawyers and first time test takers is
       | something to sneer at.
       | 
       | The fact I can talk to the computer and it responds to me
       | idiomatically and understands my semantic intent well enough to
       | be nearly indistinguishable from a human being is breath taking.
       | Anyone who views it as anything less in 2024 and asserts with a
       | straight face they wouldn't have said the same thing in 2020 is
       | lying.
       | 
       | I do however find the paper really useful in contextualizing the
       | scoring with a much finer grain. Personally I didn't take the 96
       | percentile score to be anything other than "among the mass who
       | take the test," and have enough experience with professional
       | licensing exams to know a huge percentage of test takers fail and
       | are repeat test takers. Placing the goal posts quantitatively for
       | the next levels of achievement is a useful exercise. But the
       | profusion of jaded nerds makes me sad.
        
         | Workaccount2 wrote:
         | The nerds aren't jaded, they are worried. I'd be too if my job
         | needed nothing more than a keyboard to be completed. There are
         | a lot of people here who need to squeeze another 20-40 years
         | out of a keyboard job.
        
           | threeseed wrote:
           | Similar comments were made that microwaves will eliminate
           | cooking.
           | 
           | At the end of the day (a) LLMs aren't accurate enough for
           | many use cases and (b) there is far more to knowledge
           | worker's jobs than simply generating text.
        
         | QuantumGood wrote:
         | It scored less than 50% when compared to people who had taken
         | the test once.
        
         | d0mine wrote:
         | On any topic that I understand well, LLM output is garbage: it
         | requires more energy to fix it than to solve the original
         | problem to begin with.
         | 
         | Are we sure these exams are not present in the training data?
         | (ability to recall information is not impressive for a
         | computer)
         | 
         | Still I'm terrible at many many tasks e.g., drawing from
         | description and the models widen significantly types of
         | problems that I can even try (where results can be verified
         | easily, and no precision is required)
        
           | munchler wrote:
           | > On any topic that I understand well, LLM output is garbage:
           | it requires more energy to fix it than to solve the original
           | problem to begin with.
           | 
           | That's probably true, which is why human most knowledge
           | workers aren't going away any time soon.
           | 
           | That said, I have better luck with a different approach: I
           | use LLM's to learn things that I _don 't_ already understand
           | well. This forces me to actively understand and validate the
           | output, rather than consume it passively. With an LLM, I can
           | easily ask questions, drill down, and try different ideas,
           | like I'm working with a tutor. I find this to be much more
           | effective than traditional learning techniques alone (e.g.
           | textbooks, videos, blog posts, etc.).
        
           | mistrial9 wrote:
           | the models that you have tried .. are garbage. hmmm Maybe you
           | are not among the many, many, many inside professionals and
           | unofrmed services that have different access than you? money
           | talks?
        
             | fnordpiglet wrote:
             | It is remarkable that folks who tried a garbage LLM like
             | copilot, 3.5, Gemini, or made meta LLMs say naughty words,
             | seem to think these are still SOA. Sometimes I stumble on
             | them and I am shocked at the degradation in quality then
             | realize my settings are wrong. People are vastly
             | underestimating the rate of change here.
        
           | mordymoop wrote:
           | On what topics you understand well does GOT-4o or Claude Opus
           | produce garbage?
        
             | threeseed wrote:
             | I do run into the issue where the longer the conversation
             | goes the more inaccurate the information.
             | 
             | But a common situation is that with code generation it will
             | fail to understand the context of where the code belongs
             | and so it's a function that will compile but makes no
             | sense.
        
               | fnordpiglet wrote:
               | Yeah. I often springboard into a new context by having
               | the LLM compose the next prompt based on the discussion
               | and restart the context. Remarkably effective if you ask
               | it to incorporate "prompt engineering" terms from
               | research.
        
           | taberiand wrote:
           | It depends on the topic (and the LLM - ChatGPT-4 equivalent
           | at least, any model equivalent to 3.5 or earlier is just a
           | toy in comparison) - but I've had plenty of success using it
           | as a productivity enhancing tool for programming and AWS
           | infrastructure, both to generate very useful code and as an
           | alternative to Google for finding answers or at least a
           | direction to answers. But I only use it where I'm confident I
           | can vet the answers it provides.
        
         | iLoveOncall wrote:
         | > The fact I can talk to the computer and it responds to me
         | idiomatically and understands my semantic intent well enough to
         | be nearly indistinguishable from a human being is breath taking
         | 
         | That's called a programming language. It's nothing new.
        
           | fooker wrote:
           | It's a programming language except the programming part, and
           | the language part.
        
       | gnicholas wrote:
       | This analysis touches on the difference between first-time takers
       | and repeat takers. I recall when I took the bar in 2007, there
       | was a guy blogging about the experience. He went to a so-so
       | school and failed the bar. My friends and I, who had been
       | following his blog, checked in occasionally to see if he ever
       | passed. After something like a dozen attempts, he did. Every one
       | of us who passed was counted in the pass statistics once. He was
       | counted a dozen times. This dramatically skews the statistics,
       | and if you want to look at who becomes a lawyer (especially one
       | at a big firm or company), you really need to limit yourself to
       | those who pass on the first (or maybe second) try.
        
       ___________________________________________________________________
       (page generated 2024-06-02 23:00 UTC)