[HN Gopher] Re-Evaluating GPT-4's Bar Exam Performance
___________________________________________________________________
Re-Evaluating GPT-4's Bar Exam Performance
Author : rogerkeays
Score : 89 points
Date : 2024-06-01 07:02 UTC (1 days ago)
(HTM) web link (link.springer.com)
(TXT) w3m dump (link.springer.com)
| Bromeo wrote:
| Very interesting. The abstract claims that although GPT-4 was
| claimed to score in the 92nd percentile on the bar exam, when
| correcting for a bunch of things they find that these results are
| overinflated, and that it only scores in the 15th percentile
| specifically on essays when compared to only people that passed
| the bar.
|
| That still does put it into bar-passing territory, though, since
| it still scores better than about one sixth of the people that
| passed the exam.
| falcor84 wrote:
| If I understand currently, they measured it at the 69th
| percentile for the full test across all test takers, so
| definitely still impressive.
| _fw wrote:
| So it knows more about the law than you do, but less than they
| do.
|
| Really glad to see research replicated like this. I'm not
| surprised that the 90th percentile doesn't hold up.
|
| It's still handy though.
| radford-neal wrote:
| A basic problem with evaluations like these is that the test is
| designed to discriminate between _humans_ who would make good
| lawyers and humans who would not make good lawyers. The test is
| not necessarily any good at telling whether a non-human would
| make a good lawyer, since it will not test anything that pretty
| much all humans know, but non-humans may not.
|
| For example, I doubt that it asks whether, for a person of
| average wealth and income, a $1000 fine is a more or less severe
| punishment than a month in jail.
| anon373839 wrote:
| Honestly, this is giving the bar exam (and GPT-4) too much
| credit. The bar tests memorization because it's challenging for
| humans and easy to score objectively. But memorization isn't
| that important in legal practice; analysis is. LLMs are
| superhuman at memorization but terrible at analysis.
| lazide wrote:
| Eh, also in legal practice there are key skills like
| selecting the best billable clients, covering your ass,
| building a reputation, choosing the right market segment,
| etc. which I'd also argue LLMs suck at.
| gadflyinyoureye wrote:
| I don't know. There was some talk this weekend about CEOs
| being replaced by AI. Given the overlap in skill, I'd say
| there is a distinct possibility an LLM could do that.
| https://www.msn.com/en-us/money/companies/ceos-could-
| easily-...
| lazide wrote:
| Bwahaha. This is like the 'everything can be a directed
| graph db', 'everything should be a micro service', etc.
| fads.
|
| No one who has been a CEO, or frankly even worked closely
| with one, would think this could be even remotely close
| to possible. Or desirable if it was.
|
| But that is probably 1% or less of the population eh?
| EGreg wrote:
| https://www.dqindia.com/company-makes-ai-robot-its-ceo-
| makes...
|
| Seems your claim's been disproven already
| lazide wrote:
| Bwaha. Funny the company named as doing so doesn't
| mention it on their actual management team
| [http://www.netdragon.com/about/management-team.shtml],
| listing an actual human CEO instead.
|
| But it makes for a fun soundbite eh? Especially when the
| article claims it was in the past, and totally was
| awesome. Sucker born every minute.
| paulcole wrote:
| > which I'd also argue LLMs suck at
|
| OK, I'll bite. What's your evidence for this argument?
| lazide wrote:
| Every bit of interaction I've ever had with an LLM. And
| all the research I've seen.
|
| They're plausible word sequence generators, not 'planning
| for the future' agents. Or market analyzers. Or character
| evaluators. Or anything else.
|
| And they tend to be _really_ 'gullible'.
|
| What evidence do you have they could do any of those
| things? (And not just generate plausible text at a
| prompt, but _actually_ do those things)
| paulcole wrote:
| > What evidence do you have they could do any of those
| things?
|
| Every bit of interaction I've ever had with an LLM.
| ethbr1 wrote:
| I've always drawn the link between skill in memorization and
| in analysis as:
|
| - Memorization requires you to retain the details of a large
| amount of material
|
| - The most time-efficient analysis uses instant-recall of
| relevant general themes to guide research
|
| - Ergo, if someone can memorize and recall a large number of
| details, they can probably also recall relevant general
| themes, and therefore quickly perform quality analysis
|
| (Side note: memorization also proves you actually read the
| material in the first place)
| Jensson wrote:
| Problem is the LLM memorized the countless examples you can
| find of old BAR questions using extreme amounts of compute
| at training time, they don't have that ability to digest a
| specific case due to both lack of data and it doesn't
| retrain for new questions.
|
| A human that can digest the general law can also digest a
| special case, but that isn't true for an LLM.
| anon373839 wrote:
| I'm not sure why you're being downvoted for this. I agree
| with you, fact recall is useful and necessary. If you have
| a larger and more tightly connected base of facts in your
| head, you can draw better connections.
|
| And even though legal practice tends to be fairly slow and
| deliberative, there are settings (such as trial advocacy)
| where there is a real advantage to being able to cite a
| case or statute from memory.
|
| All that said, I still maintain that it's a poor way to
| compare humans with machines, for the same reason it would
| be poor to compare GPT-4 to a novelist on their tokens per
| second written.
| KennyBlanken wrote:
| You clearly don't know anything about the bar. One half of
| your score is split between 6 essay questions, and reviewing
| two cases to then follow instructions from a theoretical lead
| attorney.
| anon373839 wrote:
| I'm licensed in multiple states, including California.
|
| The essay questions also test memorization. They don't
| require any difficult analysis - just superficial issue-
| spotting and reciting the correct elements.
|
| If the bar exam were not a memorization test, it would be
| open book!
| justinpombrio wrote:
| For a person of average wealth and income, is a $1000 fine is a
| more or less severe punishment than a month in jail? Be brief.
|
| "For a person of average wealth and income, a $1000 fine is
| generally less severe than a month in jail. A month in jail
| entails loss of freedom, potential loss of employment, and
| social stigma, while a $1000 fine, though financially
| burdensome, does not affect one's freedom or ability to work"
| --ChatGPT 4o
| Rinzler89 wrote:
| What does GPT consider being "average wealth and income".
| Statistics? Or biased weights from anecdotes he formed on the
| anecdotes he scraped off the internet on how wealthy people
| say the feel?
|
| Would be cool to know how LLMs shape their opinions.
| LeoPanthera wrote:
| You can just ask it, you know.
|
| GPT-4o:
|
| _"Average wealth and income" can vary significantly by
| region and context. However, in the United States, as a
| rough benchmark, the median household income is around
| $70,000 per year. Wealth, which includes assets such as
| savings, property, and investments minus debts, is harder
| to pinpoint but median net worth for U.S. households is
| approximately $100,000. These figures provide a general
| idea of what might be considered "average" in terms of
| wealth and income. "_
| Rinzler89 wrote:
| _> You can just ask it, you know._
|
| But my question will not be part of the context of that
| conversation.
| LeoPanthera wrote:
| Mine was. I asked it the first question, first.
| lambdaxyzw wrote:
| I like that it immediately assumed the US, even though
| nothing in your question suggested it. I love that all
| LLMs have a strong US centric bias.
|
| Btw I'm not personally a lawyer, but I've heard that GPT
| is especially prone to mixing laws across the borders -
| for example you ask a law question in language X, and get
| a response that uses a law from a country Y - and it's
| extremally convincing doing that (unless you're a lawyer,
| I guess).
| LordDragonfang wrote:
| I mean, to be fair, if you're speaking English to it, the
| most likely possibility is that you're inside the US:
|
| https://en.wikipedia.org/wiki/List_of_countries_by_Englis
| h-s...
|
| I know there's a lot of complaints about things being US-
| centric, but the US is a _very_ large country.
| ordersofmag wrote:
| Well, except the number of English speakers outside the
| US is much larger than inside the US (as per the
| wikipedia page you point to) by 5 to 1. Granted many
| folks are speaking it as their 2nd (or nth) language. But
| when you take into account the limited set of languages
| supported by ChatGPT one could reasonably assume English-
| speaking (typing) users of ChatGPT are from outside the
| U.S. as non-U.S. folks are in the majority of 'folks for
| whom english would be their first option when interacting
| with ChatGPT'. Even if you only count India, Nigeria and
| Pakistan.
|
| Though of course OpenAI can tell (frequently, roughly)
| where folks are coming from geographically and could
| (does?) take that into account.
| LeoPanthera wrote:
| ChatGPT has user-customizable "instructions", and mine
| are set to tell it where I live. Any user can do the
| same, so that it will not make incorrect assumptions for
| you.
| d3m0t3p wrote:
| You might increase the probability of getting a correct
| answer for your region, but imo you decrease your
| awareness to allucination. Overall you can still get a
| wrong answer
| fjdjshsh wrote:
| This is my experience with Hackernews. If the comment
| doesn't specify the country, it's an American talking
| about the USA
| johnchristopher wrote:
| "potential loss of employment,"
|
| Where is that coming from ? That's a very lawyery way to
| phrase things.
|
| "potential ?" where I live I think people may max out their
| holidays and overtime (if lucky enough) and leave-without-pay
| but there would be a conversation with your employer to
| justify it and how to handle the workload.
|
| In the USA, from what I read, it's more than likely that you
| would just be fired on the spot, right ?
|
| edit: just googled a bit, where I live you _must_ tell your
| employer why you will be absent if you go to jail but that
| can 't be used to justify the breaking of the contract unless
| the reason for the incarceration is damaging to the company
| and... yeah, I am definitely not a lawyer :]
| EGreg wrote:
| So the GPT is on a law exam and is using a very lawyer way
| to word things? I would say that's great!!
| oconnor663 wrote:
| My wild guess is that it would depend a lot on how much
| your employer likes you and how they feel about the reason
| you're in jail.
| ghaff wrote:
| Leave-without-pay normally requires some specific
| justification(s)/discussion. I've certainly given my
| manager advanced notice about any longer stretches of
| vacation and I've tried to do it with awareness of
| workloads (though for something planned months in advance
| that's not always possible) but I've pretty much never
| considered it as asking for permission or it being a
| negotiation. This is in the US.
|
| ADDED: You're probably going to end up lying or at least
| being very vague "some family stuff to take care of" in
| this specific scenario but for one month that didn't
| trigger reporting to employer a lot of professionals could
| probably get off with it. In any case, the GPT answer seems
| totally correct for the parameters given.
| Dayshine wrote:
| Self-employed or small company might not care.
| akira2501 wrote:
| Many jails have work release. They get you up at 6am, check
| you out of the jail, let you go to work, then expect you to
| check back into jail by 6pm.
| johnchristopher wrote:
| Oh, that's really great ! Is that in the US ?
| ghaff wrote:
| Which for many professional (and other jobs) probably
| would require a bunch of tap-dancing around your strict
| schedule if you were hiding the actual reason.
| dogmayor wrote:
| The bigger issue here is that actual legal practice looks nothing
| like the bar, so whether or not an llm passes says nothing about
| how llms will impact the legal field.
|
| Passing the bar should not be understood to mean "can
| successfully perform legal tasks."
| ben_w wrote:
| Indeed, and this is also the general problem with most current
| ways to evaluate AI: by every _test_ there 's at least one
| model which looks wildly superhuman, but actually using them
| reveals they're book-smart at everything without having any
| street-smarts.
|
| The difference between expectation and reality is tripping
| people up in both directions -- a nearly-free everything-intern
| is still very useful, but to treat LLMs* as experts (or capable
| of meaningful on-the-job learning if you're not fine-tuning the
| model) is a mistake.
|
| * special purpose AI like Stockfish, however, should be treated
| as experts
| KennyBlanken wrote:
| > Passing the bar should not be understood to mean "can
| successfully perform legal tasks."
|
| Nobody does except a bunch of HNers who among other things,
| apparently have no idea that a considerable chunk of rulings
| and opinions in the US federal court system and upper state
| courts are drafted by law clerks who, ahem, have not taken the
| bar yet...
|
| The point of the bar and MPRE is like the point of most
| professional examinations: try to establish minimum standards.
| That said, the bar does test for "successfully perform legal
| tasks", actually.
|
| For the US bar, a chunk of your score is based off following
| instructions on case from the lead attorney, and another chunk
| is based on essay answers. _Literally demonstrating that you
| can perform legal tasks_ and have both the knowledge and
| critical thinking skills necessary.
|
| Further, as previously mentioned, in the US, people usually
| take it after a clerkship...where they've been receiving
| extensive training and experience in practical application of
| law.
|
| Further, law firms do not hire purely based on your bar score.
| They also look at your grades, what programs you participated
| in (many law schools run legal clinics to help give students
| some practical experience, under supervision), your
| recommendations, who you clerked for, etc. When you're hired,
| you're under supervision by more senior attorneys as you gain
| experience.
|
| There's also the MPRE, or ethics test - which involves
| answering how to handle theoretical scenarios you would find
| yourself in as a practicing attorney.
|
| Multiple people in this discussion are acting like it's a
| multiple choice test and if you pass, you're given a pat on the
| ass and the next day you roll into criminal court and become
| lead on a murder case...
| violet13 wrote:
| This, along with several other "meta" objections, is a
| significant portion of the discussion in the paper.
|
| They basically say two things. First, although the measurement
| is repeatable at face value, there are several factors that
| make it less impressive than assumed, and the model performs
| fairly poorly compared to likely prospective lawyers. Second,
| there is a number of reasons why the percentile on the test
| doesn't measure lawyering skills.
|
| One of the other interesting points they bring up is that there
| is no incentive for humans to seek scores much above passing on
| the test, because your career outlook doesn't depend on it in
| any way. This is different from many other placement exams.
| Digory wrote:
| They originally scored against a test usually taken by people who
| failed the bar.
|
| So, GPT-4 scores closer to the bottom of people who pass the bar
| the first time. In other words, it matches the people who cull
| the rules from texts already written, but who cannot apply it
| imaginatively.
| speedgoose wrote:
| > In other words, it matches the people who cull the rules from
| texts already written, but who cannot apply it imaginatively.
|
| Where did you find that in the article?
| jeffbee wrote:
| It appears that researchers and commentators are totally missing
| the application of LLMs to law, and to other areas of
| professional practice. A generic trained-on-Quora LLM is going to
| be straight garbage for any specialization, but one that is
| trained on the contents of the law library will be utterly
| brilliant for assisting a practicing attorney. People pay serious
| money for legal indexes, cross-references, and research. An LLM
| is nothing but a machine-discovered compressed index of text. As
| an augmentation to existing law research practices, the right LLM
| will be extremely valuable.
| violet13 wrote:
| It is a _lossy_ compressed index. It has an approximate
| knowledge of law, and that approximation can be pretty good -
| but it doesn 't know when it's outputting plausible but made-up
| claims. As with GitHub Copilot, it's probably going to be a
| mixed bag until we can overcome that, because spotting subtle
| but grave errors can be harder than writing something from
| scratch.
|
| There's already a fair number of stories of LLMs used by an
| attorney messing up court filings - e.g., inventing fake case
| law.
| jeffbee wrote:
| I am not suggesting that the generative aspects would be
| useful in drafting motions and such. I am suggesting that
| their tendency towards false results is harmless if you just
| use them as a complex index. For example, you could ask it to
| list appellate cases where one party argued such-and-such and
| prevailed. Then you would go _read the cases_.
| thehoneybadger wrote:
| It is difficult to comment without sounding obnoxious, but having
| taken the bar exam, I found the exam simple. Surprisingly simple.
| I think it was the single most over hyped experience of my life.
| I was fed all this insecurity and walked into the convention
| center expecting to participate in the biggest intellectual
| challenge in my life. Instead, it was endless multiple choice
| questions and a couple contrived scenarios for essays.
|
| It may also be surprising to some to understand that legal
| writing is prized for its degree of formalism. It aims to remove
| all connotation from a message so as to minimize
| misunderstanding, much like clean code.
|
| It may also be surprising, but the goal when writing a legal
| brief or judicial opinion is not to try to sound smart. The goal
| is to be clear, objective, and thereby, persuasive. Using big
| words for the sake of using big words, using rare words, using
| weasel words like "kind of" or "most of the time" or "many people
| are saying", writing poetically, being overly obtuse and
| abstract, these are things that get your law school application
| rejected, your brief ridiculed, and your bar exam failed.
|
| The simpler your communication, the more formulaic, the better.
| The more your argument is structured, akin to a computer program,
| the better.
|
| As compared to some other domain, such as fiction, good legal
| writing much easier for an attention model to simulate. The best
| exam answers are the ones that are the most formulaic and that
| use the smallest lexicon and that use words correctly.
|
| I only want to add this comment because I want to inform how non-
| lawyers perceive the bar exam. Getting an attention model to pass
| the bar exam is a low bar. It is not some great technical feat. A
| programmer can practically write a semantic disambiguation
| algorithm for legal writing from scratch with moderate effort.
|
| It will be a good accomplishment, but it will only be a stepping
| stone. I am still waiting for AI to tackle messages that have
| greater nuance and that are truly free form. LLMs are still not
| there yet.
| euroderf wrote:
| > It may also be surprising to some to understand that legal
| writing is prized for its degree of formalism. It aims to
| remove all connotation from a message so as to minimize
| misunderstanding, much like clean code.
|
| > The more your argument is structured, akin to a computer
| program, the better.
|
| You certainly make legal writing sound like a flavor of
| technical writing. Simplicity, clarity, structure. Is this an
| accurate comparison ?
| ChainOfFools wrote:
| it is called a legal code after all
| kergonath wrote:
| "Code" in that sense predates pretty much any form of
| computer or technical writing. It came from the same word
| in old French in the 14th century, which itself came from
| the Latin _codex_. It basically meant "book". Now of course
| it is specific to books that contain laws.
| mistrial9 wrote:
| recently I read a US law trade magazine article on a
| particular term used in US Federal employment law.. the
| article was about 12 pages long.. by the second page, they
| were using circular references, and switching between two
| phrases that used the same words but had different word
| order, contexts and therefore meanings, without clearly
| saying when they switched. By the third or fourth page I was
| done with that exercise. As a coder and reader of English
| literature, there was no question at all that the terms where
| being "churned" as a sleight of hand, directly in writing.
| One theory about a reason that they did that in an article
| that claimed to explain the terms, was to setup confusion and
| misdirection as it is actually practiced in law involving
| unskilled laymen, and then "solving" the problems by the end
| of the article.
| airstrike wrote:
| [delayed]
| A_D_E_P_T wrote:
| I took a sample CA bar exam for fun, as a non-lawyer who has
| never set foot in law school. Maybe the sample exam was tougher
| than the real thing, but I found it surprisingly difficult. A
| lot of the correct answers to questions were non-obvious --
| they weren't based on straightforward logic, nor were they
| based on moral reasoning, and there was no place for "natural
| law" -- so to answer questions properly you had to have
| memorized a bit of coursework. There were also a few questions
| that seemed almost designed to deceive the test-taker; the
| "obvious" moral choices were the wrong ones.
|
| So maybe it's easy if you study that stuff for a year or two.
| But you can't just walk in and expect to pass, or bullshit your
| way through it.
|
| I agree with you on legal writing, but there appears to be a
| certain amount of ambiguity inherent to language. The Uniform
| Commercial Code, for instance, is maddeningly vague at points.
| gnicholas wrote:
| The CA bar exam used the be much harder than other states'.
| They lowered the pass threshold several years ago, and then
| reduced the length from 3 days to 2. Now it's probably much
| more in line with the national norms. Depending on when you
| took the sample exam, it might be much easier now.
|
| Also, sometimes sample exams are made extra difficult, to
| convince students that they need to shell out thousands of
| dollars for prep courses. I recall getting 75% of questions
| wrong on some sections of a bar prep company's pre-test,
| which I later realized was designed to emphasize
| unintuitive/little-known exceptions to general rules. These
| corners of the law made up a disproportionate number of the
| questions on the pre-test and gave the impression that the
| student really needed to work on that subject.
| manquer wrote:
| Obviously you need subject knowledge, that should be
| implicit?
|
| Keep in mind even today[1] ( in California and few other
| states) you don't need to go law school to write the Bar exam
| and practice law, various forms of apprenticeship under a
| judge or lawyer are allowed
|
| You also don't need to write the exam to practice many
| aspects of the legal profession.
|
| The exam is never meant to be a high bar of quality or
| selection,it was always just a simple validation if you know
| your basics. Law like many other professions always operated
| on reputation and networks, not on degrees and
| certifications.
|
| [1] Unlike say being a doctor, you have to go to med school
| without exception
| A_D_E_P_T wrote:
| > _Obviously you need subject knowledge, that should be
| implicit?_
|
| Well, in a lot of the so-called soft sciences, you can
| easily beat a test without subject knowledge. I had figured
| that the bar exam might be something like that -- but it's
| more akin to something like biology, where there are a lot
| of arcane and counterintuitive little rules that have
| emerged over time. And you need to know _those_ , or you're
| sunk. You can't guess your way past them, because the best-
| looking guesses tend to be the wrong ones.
|
| (For what it's worth, I realize that this mostly has to do
| with the Common Law's reverence of precedent-as-binding,
| and that continental Civil Law systems don't suffer as much
| from it. But I suppose those continental systems have other
| problems of their own.)
| carabiner wrote:
| Genuinely asking: you think the bar exam is a low bar because
| you personally found it easy, even though the vast majority of
| takers do not? Doesn't this just reflect your own inability to
| empathize with other people?
| elicksaur wrote:
| > Furthermore, unlike its documentation for the other exams it
| tested (OpenAI 2023b, p. 25), OpenAI's technical report provides
| no direct citation for how the UBE percentile was computed,
| creating further uncertainty over both the original source and
| validity of the 90th percentile claim.
|
| This is the part that bothered me (licensed attorney) from the
| start. If it scores this high, where are the receipts? I'm sure
| OpenAI has the social capital to coordinate with the National
| Conference of Bar Examiners to have a GPT "sit" for a simulated
| bar exam.
| fnordpiglet wrote:
| Scoring 96 percentile among humans taking the exam without moving
| goal posts would have been science fiction two years ago. Now
| it's suddenly not good enough and the fact a computer program can
| score decent among passing lawyers and first time test takers is
| something to sneer at.
|
| The fact I can talk to the computer and it responds to me
| idiomatically and understands my semantic intent well enough to
| be nearly indistinguishable from a human being is breath taking.
| Anyone who views it as anything less in 2024 and asserts with a
| straight face they wouldn't have said the same thing in 2020 is
| lying.
|
| I do however find the paper really useful in contextualizing the
| scoring with a much finer grain. Personally I didn't take the 96
| percentile score to be anything other than "among the mass who
| take the test," and have enough experience with professional
| licensing exams to know a huge percentage of test takers fail and
| are repeat test takers. Placing the goal posts quantitatively for
| the next levels of achievement is a useful exercise. But the
| profusion of jaded nerds makes me sad.
| Workaccount2 wrote:
| The nerds aren't jaded, they are worried. I'd be too if my job
| needed nothing more than a keyboard to be completed. There are
| a lot of people here who need to squeeze another 20-40 years
| out of a keyboard job.
| threeseed wrote:
| Similar comments were made that microwaves will eliminate
| cooking.
|
| At the end of the day (a) LLMs aren't accurate enough for
| many use cases and (b) there is far more to knowledge
| worker's jobs than simply generating text.
| QuantumGood wrote:
| It scored less than 50% when compared to people who had taken
| the test once.
| d0mine wrote:
| On any topic that I understand well, LLM output is garbage: it
| requires more energy to fix it than to solve the original
| problem to begin with.
|
| Are we sure these exams are not present in the training data?
| (ability to recall information is not impressive for a
| computer)
|
| Still I'm terrible at many many tasks e.g., drawing from
| description and the models widen significantly types of
| problems that I can even try (where results can be verified
| easily, and no precision is required)
| munchler wrote:
| > On any topic that I understand well, LLM output is garbage:
| it requires more energy to fix it than to solve the original
| problem to begin with.
|
| That's probably true, which is why human most knowledge
| workers aren't going away any time soon.
|
| That said, I have better luck with a different approach: I
| use LLM's to learn things that I _don 't_ already understand
| well. This forces me to actively understand and validate the
| output, rather than consume it passively. With an LLM, I can
| easily ask questions, drill down, and try different ideas,
| like I'm working with a tutor. I find this to be much more
| effective than traditional learning techniques alone (e.g.
| textbooks, videos, blog posts, etc.).
| mistrial9 wrote:
| the models that you have tried .. are garbage. hmmm Maybe you
| are not among the many, many, many inside professionals and
| unofrmed services that have different access than you? money
| talks?
| fnordpiglet wrote:
| It is remarkable that folks who tried a garbage LLM like
| copilot, 3.5, Gemini, or made meta LLMs say naughty words,
| seem to think these are still SOA. Sometimes I stumble on
| them and I am shocked at the degradation in quality then
| realize my settings are wrong. People are vastly
| underestimating the rate of change here.
| mordymoop wrote:
| On what topics you understand well does GOT-4o or Claude Opus
| produce garbage?
| threeseed wrote:
| I do run into the issue where the longer the conversation
| goes the more inaccurate the information.
|
| But a common situation is that with code generation it will
| fail to understand the context of where the code belongs
| and so it's a function that will compile but makes no
| sense.
| fnordpiglet wrote:
| Yeah. I often springboard into a new context by having
| the LLM compose the next prompt based on the discussion
| and restart the context. Remarkably effective if you ask
| it to incorporate "prompt engineering" terms from
| research.
| taberiand wrote:
| It depends on the topic (and the LLM - ChatGPT-4 equivalent
| at least, any model equivalent to 3.5 or earlier is just a
| toy in comparison) - but I've had plenty of success using it
| as a productivity enhancing tool for programming and AWS
| infrastructure, both to generate very useful code and as an
| alternative to Google for finding answers or at least a
| direction to answers. But I only use it where I'm confident I
| can vet the answers it provides.
| iLoveOncall wrote:
| > The fact I can talk to the computer and it responds to me
| idiomatically and understands my semantic intent well enough to
| be nearly indistinguishable from a human being is breath taking
|
| That's called a programming language. It's nothing new.
| fooker wrote:
| It's a programming language except the programming part, and
| the language part.
| gnicholas wrote:
| This analysis touches on the difference between first-time takers
| and repeat takers. I recall when I took the bar in 2007, there
| was a guy blogging about the experience. He went to a so-so
| school and failed the bar. My friends and I, who had been
| following his blog, checked in occasionally to see if he ever
| passed. After something like a dozen attempts, he did. Every one
| of us who passed was counted in the pass statistics once. He was
| counted a dozen times. This dramatically skews the statistics,
| and if you want to look at who becomes a lawyer (especially one
| at a big firm or company), you really need to limit yourself to
| those who pass on the first (or maybe second) try.
___________________________________________________________________
(page generated 2024-06-02 23:00 UTC)