[HN Gopher] Field experimental evidence of AI on knowledge worke...
___________________________________________________________________
Field experimental evidence of AI on knowledge worker productivity
and quality
Author : CharlesW
Score : 117 points
Date : 2024-01-16 15:41 UTC (7 hours ago)
(HTM) web link (www.oneusefulthing.org)
(TXT) w3m dump (www.oneusefulthing.org)
| datadrivenangel wrote:
| "Consultants across the skills distribution benefited
| significantly from having AI augmentation, with those below the
| average performance threshold increasing by 43% and those above
| increasing by 17% compared to their own scores. For a task
| selected to be outside the frontier, however, consultants using
| AI were 19 percentage points less likely to produce correct
| solutions compared to those without AI."
|
| So AI makes consultants faster but worse. As a consultant, this
| should make my expertise even more valuable.
| tempest_ wrote:
| Only if being wrong actually matters.
| datadrivenangel wrote:
| The research was partially conducted by Boston Consulting
| Group, so in this case being wrong doesn't really matter as
| long as the Partners are smooth and the associates create
| billable hours.
| CharlesW wrote:
| > _So AI makes consultants faster but worse._
|
| That was just for tasks not suited to GPT-4 (and I assume, LLMs
| in general).
|
| > _" For a task selected to be outside the frontier, however,
| consultants using AI were 19 percentage points less likely to
| produce correct solutions compared to those without AI."_
|
| For tasks suited to GPT-4, consultants produced 40% higher
| quality results by leveraging GPT-4.
|
| My take on that is that people who have at least some
| understanding of how LLMs work will be at a significant
| advantage, while people who think LLMs "think" will misuse LLMs
| and get predictably worse results.
| ParetoOptimal wrote:
| > while people who think LLMs "think" will misuse LLMs and
| get predictably worse results.
|
| I've found that now that I see LLM's less as thinking and
| more as stochiastic parrot I get a lot less use from them
| because I give 3-5 turns at most versus 10 turns.
| NemoNobody wrote:
| Well, maybe a balance between a stoic parrot and a
| "thinking machine" might be the best approach.
|
| You know, everything in moderation and what not.
| ParetoOptimal wrote:
| Yes, ideally.
|
| The problem for me is it used to be an addition to my
| energy and confidence, but when you "take the magic out
| of it" it doesn't help out there so much.
|
| In antirez's blog he mentioned how it made things "not
| worth it, worth it" and I agreed, but that happens much
| less often if you think an LLM helping you is a fluke.
| NemoNobody wrote:
| Seriously, you cherry picked the paragraph you use to cite your
| opinion, which contrary to the finding of the article itself
| and completely ignore the the first part of the second sentence
| of your paragraph "for a task selected outside the frontier" -
| so, something beyond the capabilities of AI at this time ONLY
| decreased their correct solutions 19 percentage points.
|
| 43% better it made below average people, 17% better it made
| above average people.
|
| Same paragraph.
| steve1977 wrote:
| BCG consultants. Not average people.
| datadrivenangel wrote:
| You know, I missed that part. Oops.
|
| Looking into the full paper, it looks like the task
| identified inside the frontier is much more broken down, with
| 18 numbered steps. The outside the frontier task is two
| bullet points.
|
| So for clearly scoped tasks, ChatGPT is hands down good. For
| less well scoped work it reduces quality.
|
| Interestingly, the authors also looked at the amount of text
| retained from ChatGPT, and found that people who received
| ChatGPT training ended up retaining a larger portion of the
| ChatGPT output in their final answers and performed better in
| the quality assessment.
| quickcheque wrote:
| ' As a consultant, this should make my expertise even more
| valuable. '
|
| Consultant logic at its finest.
| j4yav wrote:
| More garbage outputs being produced faster, paired with
| expertise which makes it credibly sound like more fast
| garbage will solve any remaining issues, probably _is_
| approaching peak consultant value.
| datadrivenangel wrote:
| Only if that's what the clients want or need to justify
| actual improvements!
|
| Usually the goal is to produce less garbage outputs slower.
| j4yav wrote:
| The goal isn't to generate billable hours?
| ThrowawayTestr wrote:
| I just started a new job and used chatgpt to create some cool
| Excel automation with Python.
| Ecoste wrote:
| I probably should know but I don't even have the vaguest of
| ideas. What do consultants actually do?
| paulsutter wrote:
| Bill. Sometimes hourly, sometimes by project.
|
| They wake up every morning hell bent on maintaining existing
| customer accounts, growing existing accounts, and sometimes
| adding a new account. They maintain a professional demeanor,
| with just the right kind of smalltalk. Finally, they produce a
| report that justifies whatever was the decision that the CEO
| already made before he engaged the consultants
|
| Thats pretty much the whole job
| sockaddr wrote:
| Yeah I've struggled to understand what people mean by
| consultant and once heard that it was basically a person or
| group you can hire to take the blame for an unpopular
| decision and/or to weaponize it against others in the company
| since it was "someone else" that made the call.
| tokai wrote:
| Provides you with a neat document that states what you wanted
| to do is the right thing to do.
| fhd2 wrote:
| I'm a consultant - for about a year now. Not at BCG or
| anywhere, I'm a one man company.
|
| What I do is pretty similar to what I did before in my CTO
| roles, just a bit more strategy and less execution. Clients
| come to me with questions and problems and I help them solve
| them based on my experience. If you've got multiple C levels,
| middle management and a board, you're in a similarly consulting
| position also as a full time CTO from what I've seen.
|
| I don't go anywhere _near_ LLMs for this, I'd find it somewhere
| between disrespectful and unethical. They're paying for _my_
| expertise. I can imagine it could make a decent alternative to
| Google and Wikipedia for research purposes, but I'd have to
| double check it all anyway. I don't see how it'd make my work
| any easier.
| danielbln wrote:
| > I don't go anywhere _near_ search engines for this, I'd
| find it somewhere between disrespectful and unethical.
| They're paying for _my_ expertise.
|
| I bet someone said this 30 years ago. As it stands today,
| LLMs can't wholesale do the job for you, they merely augment
| it. And they are finicky and fallible, so expertise is needed
| to actually produce useful results and validate them. There
| are many reasons why an LLM isn't the right tool for a
| certain task, wanting to drive on some sort of moral ethical
| high road however is imo _not_ a great reason.
|
| Continue on that road and people who have no qualms about
| using whatever tool at their disposal will eat your lunch.
| Maybe not today or tomorrow, but eventually.
| fhd2 wrote:
| The value of advice is largely estimated based on who gives
| it. If a client is hiring me, it seems clear to me that
| they're paying for mine.
|
| For support tasks (like research or even supporting a
| writeup) I'm not ethically opposed to using LLMs. It's just
| that I try every couple of months and conclude it's not a
| net improvement. Your mileage may vary of course.
| czbond wrote:
| You're going to get across the board replies here. To each
| constituent, they do different things. There are consultants
| across the board in speciality, quality, etc.
|
| On HN, they get a bad rap from the "i'm smart, I know
| everything, worker bee crowd" who think business is just
| slinging JS code and product updates. Consultants are often
| brought into those people orgs to deal with / or around,
| valuable know-it-alls, fiefdoms, poor vision / strategy, or
| politics that are holding the company back. They're often cited
| as "just doing reports", when in fact consultants sometimes
| stay out of implementation to make sure the owning team owns &
| implements the solution.
|
| Consultants often provide the hiring manager a shield on dicey
| projects or risky outcomes. Where if it goes wrong, the manager
| can say "it was those consultants".
|
| Teams are brought in as trained, skilled, up to date staff the
| company cannot hire fire, or shouldn't due to duration of need,
| or skillset
|
| Sometimes they're brought in because the politics of the
| internal company lead to stalemates, poor strategy, etc.
|
| Often at the higher levels, they're brought in due to focusing
| on a speciality, market, or vertical to have large experience
| that isn't possible to get in house.
|
| One's experience with consultants frames their opinion. I've
| only worked with very high quality teams in the past that
| provide a healthy balance of vision, strategy, implementation,
| etc
| warner25 wrote:
| I've mostly dealt with RAND consultants in the military /
| government. They were brought in to help answer specific
| questions about what we should do and how, presumably by
| writing up and delivering a report. These weren't questions
| that the staff couldn't handle itself, but nobody on staff had
| the time to focus on answering these questions (by digging deep
| and doing research, not just expressing a gut opinion) given
| their other duties and responsibilities. So the RAND people
| basically augmented the staff.
|
| I guess they have the advantage of experience doing these
| things over and over again for similar organizations. It's also
| an outside perspective, which has both advantages and
| disadvantages. In the conversations that I was invited into, we
| spent most of our time just explaining things that any mid-
| level member of the organization would know, trying to get the
| RAND people up-to-speed, so that seemed wasteful. But the
| military is a huge bureaucracy dominated by people who've
| climbed up the ranks for 30-40 years, so there isn't a lot of
| fresh outside thinking, and it seems like it could be valuable
| to inject some.
| paxys wrote:
| In simplest terms, they solve problems for companies. If your
| company wants to achieve something and doesn't want to spend
| the time/effort/money to set up a new department and staff it
| with full-time experts, you can call a consulting firm and they
| will figure it out for you.
| sgt101 wrote:
| Don't worry, it's quite common for people who are busy and
| important like you not to know this. In our experience that's
| because the people that work for them are simply not using
| modern working practices.
|
| It is a problem though, because frankly, although you might
| think that moving to a more modern way of working is a simple
| fix, we have found that your employees are actually your
| enemies and are plotting to kill you and your family. You might
| think that you depend on them to get work done, but we can fix
| that for you and get our much more skilled and low paid workers
| to do the job instead.
|
| You will find out that if you allow us to take the required
| steps to prevent your family from rape and murder your
| shareholders will get a bump as well! And of course you will as
| well, you can expect your bonus to at least triple, and you are
| worth it for sure.
|
| We have already set up the culling stations filled with
| whirling blades to deal with the scum downstairs, if you say
| nothing at all we will act and slaughter them instantly. There
| is no guilt, only reward.
|
| We love you.
| mediaman wrote:
| This study had management consultants use GPT4 on a variety of
| different management consultant tasks.
|
| Some were more creative in nature, and some were more analytical.
| For example, ideas on marketing sneakers versus an analysis of
| store performance across a retailer's portfolio.
|
| In general they found that GPT4 was helpful for creative tasks,
| but didn't help much (and in fact reduced quality) for analytical
| questions.
|
| I think these kinds of studies are of limited use. I don't
| believe raw GPT4 is that helpful in the enterprise. Whether it is
| useful or not comes down to whether engineers can harness it
| within a pipeline of content and tools.
|
| For example, when engineers create a system to summarize issues a
| customer has had from the CRM, that can help a customer service
| person be more informed. Structured semantic search on a
| knowledge base can help them also find the right solution to
| common customer problems.
|
| McKinsey made a retrieval augmented generation system that
| searched all their research reports and created summaries of
| content they could use, so a consultant could quickly find prior
| work that would be relevant on a client project. If they built
| that correctly, I imagine that is pretty useful.
|
| GPT4 alone will especially not be that useful for analytical
| work. However, developers can make it a semi-capable analyst for
| some use cases by connecting it to data lakes, describing
| schemas, and giving it tools to do analysis. Usually this is not
| a generalist solution, and needs to be built for each company's
| application.
|
| Many of the studies so far only look at vanilla GPT4 via ChatGPT,
| and it seems unlikely that, if LLMs do transform the workplace,
| that a standalone ChatGPT is what it will look like.
| ebiester wrote:
| The first generation of company-focused llm will need to be
| bespoke. Teams involved with this will learn the necessary
| abstractions and will build new companies to lower the on-ramp
| or join the LLM providers to do the same.
| Fin_Code wrote:
| This is entirely unsurprising. A language generator is not
| designed for analytical analysis. The fact it codes well is
| what I think is throwing people. Since code is a form of
| language it works well for the language model. The issue with
| code will come from larger context where analytical thought is
| needed and it will break down.
| svachalek wrote:
| People are trying really hard to make "search" or something
| like it the killer app for LLMs but from what I've seen, it's
| translation. Coding in its purest form is a kind of
| translation, from natural language requirements to more
| precise computer languages and vice versa. Keeping an LLM
| properly focused on that task, it's pretty astonishing what
| it can do. But as you say, assuming it can do anything more
| than that is a trap.
| chewxy wrote:
| > Coding in its purest form is a kind of translation, from
| natural language requirements to more precise computer
| languages
|
| Unfortunately not so true. A simple way to check this is to
| look at what the brain is doing. Coding barely triggers the
| language components of the brain (your usual suspects like
| the Vernicke's and Broca's areas + other miscelleneous
| things). In fact what you'll find is that coding triggers
| the IP1, IP2 and AVI pathways. These are part of the
| multiple demand network. We call them programming
| "languages" because we can represent them in a token-based
| game we call "language" but the act of programming is far
| from mere language manipulation.
| roveo wrote:
| > GPT4 alone will especially not be that useful for analytical
| work. However, developers can make it a semi-capable analyst
| for some use cases by connecting it to data lakes, describing
| schemas, and giving it tools to do analysis. Usually this is
| not a generalist solution, and needs to be built for each
| company's application.
|
| I've been thinking about this a lot lately. I'm a data analyst
| and all these "give GPT you data warehouse schema, let it
| generate SQL to answer user's queries" products completely miss
| the point. An analyst has value as a curator of organizational
| knowledge, not translator from "business" to SQL. Things like
| knowing that when a product manager asks us for revenue/gmv, we
| exclude canceled orders, but include purchases made with bonus
| currency or promo codes.
|
| Things like this are not documented and are decided during
| meetings and in Slack chats. So my idea is that in order to
| make LLMs truly useful, we'll create "hybrid" programming
| languages that are half-written by humans, and the part that's
| written by LLMs when translating from "human" language is
| simple enough for LLMs to do it reliably. I even made some
| weekend-prototypes with pretty interesting results.
| jampekka wrote:
| > I think these kinds of studies are of limited use. I don't
| believe raw GPT4 is that helpful in the enterprise.
|
| I'm not so sure about this. A lot of work in and for especially
| large organizations is writing text that often doesn't even get
| read, or is at most just glanced at. LLMs are great at writing
| useless text.
| bitzun wrote:
| At first I thought this was a flippant joke, but I can
| actually see the value in saving time writing junk without
| having to alter existing bureaucratic processes from which
| nobody gets much value.
| jampekka wrote:
| I am using ChatGPT to write the genetic boilerplate that
| e.g. many funding applications require. Ethics statements
| or inclusivity policies etc where the contents are largely
| predefined and known by everybody but still have to be
| written in prose instead of e.g. clicking boxes.
| jprete wrote:
| That kind of text is valuable as a signaling exercise - the
| author is willing to claim the text as indicating their
| actual beliefs. Writing it with AI would actually damage the
| author's credibility since it's no longer their words -
| although this shouldn't be a problem if the author is
| carefully checking the text every single time to make sure it
| represents their position.
|
| The effect would be even worse the first time the author
| missed a mistake from the AI and had to walk it back
| publicly. Nobody's going to trust anything they put in
| writing for a while after that.
| jampekka wrote:
| There are genres of text that have little to do with the
| writer's beliefs and are a kind of formality with contents
| known by all interested parties beforehand. Sort of
| elaborate prose versions of "I Agree With the Terms and
| Conditions" textboxes.
| jart wrote:
| Isn't most of science like that too?
| jampekka wrote:
| There is a lot of boilerplate and fluff in scientific
| papers that everybody knows is fluff.
| totoglazer wrote:
| Note this paper was published in September 2023.
| clbrmbr wrote:
| Can someone explain Centaurs vs Cyborgs? (I didn't make an
| account to read the full paper...)
| araes wrote:
| The actual paper can be found at: https://www.iab.cl/wp-
| content/uploads/2023/11/SSRN-id4573321...
|
| Note: After reading the paper, there are serious methodology
| issues. ChatGPT without any human involvement (simply reading the
| task as instructions and producing an answer) produced a better
| "quality" result than any human involvement in many tasks.
|
| My read: ChatGPT produces what we pre-view as "correct".
|
| They even kind of state this: "when human subjects use ChatGPT
| there is a reduction in the variation in the eventual ideas they
| produce. This result is perhaps surprising one would assume that
| ChatGPT, with its expansive knowledge base, would instead be able
| to produce many very distinct ideas, compared to human subjects
| alone."
|
| My read: The humans converge towards what is already viewed as
| "correct". Its like being in a business and your boss already
| knows exactly what your boss actually wants, and any variation
| you produce is automatically "bad" anyways.
|
| There is also a drastic difference in the treatment of the
| in/out/on "frontier" subject. Lots of talk about how great adding
| AI is to "inside frontier" tasks, no similar graphs/charts for
| "outside frontier" tasks. Finally, if these people are
| consultants and business profs, they produce some crazily bad
| charts. Figure 3 in the appendix is so difficult to read.
| gandalfgeek wrote:
| If you don't want to read the full thing, Mollick's substack has
| more accessible summary of this paper:
|
| https://www.oneusefulthing.org/p/centaurs-and-cyborgs-on-the...
|
| There are two sides to this thing. The first author of this paper
| has also written another paper titles "Falling Asleep at the
| Wheel: Human/AI Collaboration in a Field Experiment". Abstract:
|
| "As AI quality increases, humans have fewer incentives to exert
| effort and remain attentive, allowing the AI to substitute,
| rather than augment their performance... I found that subjects
| with higher quality AI were less accurate in their assessments of
| job applications than subjects with lower quality AI. On average,
| recruiters receiving lower quality AI exerted more effort and
| spent more time evaluating the resumes, and were less likely to
| automatically select the AI-recommended candidate. The recruiters
| collaborating with low-quality AI learned to interact better with
| their assigned AI and improved their performance. Crucially,
| these effects were driven by more experienced recruiters.
| Overall, the results show that maximizing human/AI performance
| may require lower quality AI, depending on the effort, learning,
| and skillset of the humans involved."
|
| https://static1.squarespace.com/static/604b23e38c22a96e9c788...
| esafak wrote:
| If you don't need a skill, you won't practice it, and it will
| atrophy. This frees you to learn different things. Like how
| most software engineers today don't know computer engineering
| (hardware) but they can write more complex applications.
| SkyBelow wrote:
| That is if the AI fully replaces the need for a skill. What
| happens if it mostly replaces it, enough so the skill
| atrophies, but not enough that the user never needs the
| skill?
| esafak wrote:
| Good point. That sounds like a temporary product problem,
| like a level 2-3 self-driving car, which still needs you to
| pay attention.
|
| I think you also need to consider whether you can afford
| not to have the skill; what would happen if the AI were
| taken away, or malfunctioned? Airplane pilots are an
| example of this. In that case you simply have to learn the
| skill as if the AI will not be there.
| hn_throwaway_99 wrote:
| > That sounds like a temporary product problem
|
| The jury is still out on how "temporary" of a problem
| that will be. Improvements have been made but nobody
| really knows how to get an LLM, for example, to just say
| "I don't know".
| Retric wrote:
| And this is before people start poisoning the well.
| Google search worked better in a world without Google
| search. I suspect many AI systems will run into similar
| issues.
| hn_throwaway_99 wrote:
| > Google search worked better in a world without Google
| search.
|
| Oooh, I really like that. Very succinct and topical
| statement of Goodhart's Law.
| taway_6PplYu5 wrote:
| Oh, you mean like self-driving cars?
| vlovich123 wrote:
| Depends on what you mean by computer engineering but they
| should definitely have a good mental model of how computers
| work. Otherwise your complex applications will have trouble
| scaling. That can matter less in many cases but early
| architecture mistakes can easily put you down an evolutionary
| dead end forcing you to experience exponentially rising
| maintenance costs to add new features / fix bugs or rewrite
| the thing by someone who understands computer architecture
| better (+ take advantage of a better understanding the
| problem domain)
| WJW wrote:
| I see what you mean but there's a huge gap between a "good
| enough" mental model of how computers work and the actual
| in-depth details that GP means and that have been
| successfully abstracted away from us. For example: the
| amount of cases when I have needed to know about the
| details of how the transistors in RAM chips are laid out
| have been extremely minimal. Nor have I ever had to worry
| about the encoding scheme used by the transceivers for the
| optical fibers that my TCP packets move across.
|
| Obviously there jobs out there that _do_ have to worry
| about these things (mostly at companies like TSMC and
| Cisco), but in practice I doubt regular engineers at even
| the largest scale software companies have to worry about
| stuff like that.
| hn_throwaway_99 wrote:
| But that is the exact problem with AI - 99% of the time it's
| good, so your skills atrophy, but then there's that one
| percent of the time it decides the reflective side of that
| 18-wheeler is the sun and drives you into it. That's very
| different from, say, a compiler, which as a programmer I
| completely trust to turn my source code into machine code
| accurately.
| tikhonj wrote:
| > _completely trust to turn my source code into machine
| code accurately_
|
| Say hello to undefined behavior :)
|
| Appropriately enough, much of the reason undefined behavior
| is a consistent problem is exactly the same--it can do
| something reasonable 99% of the time and then totally screw
| you over out of nowhere.
| Jensson wrote:
| Undefined behavior is a defined thing, you know about it
| so you can avoid it if you want, or look up what the
| compiler you use does in that case to see if it is useful
| to you. No such thing for a black box AI model.
| BeetleB wrote:
| You're just mirroring the parent example. Prior to AI (or
| even computers), this was a problem.
|
| I used to be a more conventional engineering guy. We do a
| ton of calculus to get our degree. My first job - you just
| didn't need calculus (although all the associated courses
| in school used it to develop the theory). As a result, the
| one offs (less than once a year), when a problem did come
| up where calculus knowledge was helpful, they couldn't do
| the task and would go to that one guy (me) who still
| remembered calculus.
|
| BTW, I guarantee all the employees there got A's on their
| calculus courses.
| hn_throwaway_99 wrote:
| > You're just mirroring the parent example. Prior to AI
| (or even computers), this was a problem.
|
| Not at all, and your argument isn't the same thing.
|
| The problem with current AI, which _is_ very different
| from past forms of automation, is that, essentially, its
| failure mode is undefined. Past forms of automation, for
| example, did not "hallucinate" seemingly correct but
| false answers. Even taking your own example, it was
| obvious to your colleagues that _they didn 't know how to
| get the answer_. So they did the correct thing - they
| went to someone who knew how to do it. What they did
| _not_ do, and which is what most current AI solutions
| will do, is just "wing it" with a solution that is wrong
| but looks correct to those without more expertise.
| warner25 wrote:
| This a nice, tight statement of the big picture.
|
| This isn't new, like since LLMs were born, either. In the
| aviation community, and probably many other domains, there
| has been a running debate for decades about the pros and cons
| of introducing more automation into the cockpit and what
| still should or shouldn't be taught and practiced in training
| (to be able to effectively recognize and handle the edge
| cases where things fail).
|
| And in military aviation, this isn't just a question of
| safety; it's about productivity just like it is for knowledge
| workers. If an aircraft can fly itself most of the time, the
| crew can do more tactical decision-making and use more
| sensors and weapons. Instead of just flying their own
| aircraft, the crew can also control a number of unmanned
| aircraft with which they're teamed up.
| digging wrote:
| While this is great information, I wish more could be said
| about Centaurs & Cyborgs, the title of the piece.
|
| There's a brief example of a type of task that each approach
| excels at, but I'd definitely like more. The crux of this piece
| is that our human judgment of when/how to use AI tools is the
| most important factor in work quality when we're on the edge of
| the frontier. But there's no analysis of whether centaurs or
| cyborgs did better at those outside-the-frontier tasks; it's
| not clear why those categories are even mentioned since they
| appear to have no relevance to the preceding research results.
| And as the article mentions, the frontier is "invisible" (or at
| least difficult to see); learning to detect tasks that are in,
| on, or outside of the frontier seems like an immensely
| important skill. (I also realize it may become completely
| obsolete in <5 years as the frontier expands exponentially.)
|
| I understand the goal of this research was not to find these
| "edges" and to determine how we can improve our judgment about
| when & how to use AI. But after reading these strong results,
| that's definitely the only thing I'm interested in. I use AI
| almost not at all in my work, web development. It has been most
| useful to me in getting me unstuck from a thorny under-
| documented problem. Over a year after ChatGPT released, I still
| don't know _if_ modern LLMs can actually be a force multiplier
| for my work or if I 'm correctly judging that they're not
| appropriate for the majority of my tasks. The latter seems
| increasingly unlikely as capabilities advance.
| dang wrote:
| We've changed the URL from that to
| https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321.
| Thanks!
|
| (In cases like this, it's nice to have both the readable
| summary and the link to the paper.)
| SlightlyLeftPad wrote:
| I think this a fascinating, yet unsurprising finding. We've
| known for a long time that automation leads to skillset atrophy
| on the part of the user. Perhaps most famously, in aviation
| where some automated systems make complicated procedures so
| easy that pilots effectively lose the skills to do the proper
| procedure themselves.
|
| The discussions I hope are happening is the prospect of
| intentional capability limits of AI in critical
| industries/areas where humans must absolutely and always
| intervene. It seems analogous to having a superpower without
| knowing how to control it, creating a potentially deadly
| situation at best.
|
| Then perhaps more long term at the generational scale, will
| humans cognitively devolve to a point where AI is making all of
| our decisions for us? Essentially we make ourselves obsolete
| and eventually we're reclassified as "regular/inferior" species
| of animals. Profit and return on investments here cannot get in
| the way of handling these details with great oversight and
| responsibly.
| ethbr1 wrote:
| That opines interestingly on only allowing full self driving
| cars on pre-approved, high-assurance routes.
|
| It either needs to be 100% or (bad).
|
| Anywhere in the grey zone, and you retard human performance
| while still occasionally needing it.
| rich_sasha wrote:
| I am yet to to come across a programming task where ChatGPT saved
| me time.
|
| For a start, most of my work is about using internal APIs. But
| nevermind. Sometimes I come across a generic programming problem.
| ChatGPT get me to 80% of a solution.
|
| Getting it from there to a 100% solution takes as long as doing
| it from scratch.
|
| Just my $0.02
| snapcaster wrote:
| Have you considered that it may be a personal skill issue? I'm
| not saying it is, but it seems like people with this opinion
| never seem to consider maybe they aren't using it properly or
| don't get it
| elzbardico wrote:
| The reverse could also be true.
|
| Some people could be not able to figure out that the code GPT
| produced is not good because they lack the skill level to
| review it efficiently.
| rich_sasha wrote:
| Well, maybe. But it seems to me this somewhat defeats the
| purpose. If I tell ChatGPT to write me, say, a sorting
| algorithm (which I'm sure it can do), but if I ask it
| _wrongly_ it gives me a deceptively looking lemon of a
| solution, that's a liability.
|
| Conversely, could you share an example of a nontrivial,
| practical programming problem ChatGPT can solve, but only if
| you use it right?
| mewpmewp2 wrote:
| For coding it's mostly Copilot that saves time. Although
| ChatGPT in some cases as well, but not as reliably or
| frequently.
| throwaway4aday wrote:
| Agree, Copilot is a huge speed boost for stubbing things
| out, auto-generating the dead simple parts, and
| refactoring. ChatGPT is awesome for rubberducking, pair
| programming, brainstorming, and figuring out what that
| thing that you kind of remember is called. We aren't at
| the point where we can just say "robot, do my work" but
| it's for sure good enough to take care of the boring
| stuff and be a sounding board.
| pphysch wrote:
| ChatGPT/Bard is good for answering documentation or common
| syntax questions. For actual programming, you should use a
| proper AI "copilot" that actually integrates into the context
| of your IDE, and therefore understands the syntax of your
| internal APIs to some extent. You will save time copy-pasting
| and get better results.
| BeetleB wrote:
| Where it saves me time is when using a framework I'm not used
| to - it more often than not solves my problem (but not by a
| large stretch - perhaps 60% of the time).
|
| Even more common is getting shell/UNIX commands to do what I
| want. I don't have all the standard UNIX tools (cut, etc) in my
| head, and I _definitely_ don 't keep all the command line
| options in my head. GPT4 tells me what I need, and it's much
| quicker to confirm it via the --help or man pages than craft it
| myself by reading those pages.
| freedryk wrote:
| The biggest question I see with this study is it doesn't seem
| like subjects had access to tools outside of the AI. Were the
| subjects without AI able to do google searches? If not, then what
| is the performance gain of the AI users over people who can
| google stuff?
| siliconc0w wrote:
| There is a lot of noise from VC-types about how AI copilots turn
| random outsourced devs into 10x engineers. My experience is that
| it's pretty good at writing boilerplate, that is - it saves me
| typing what I would expended little thought in typing. I suspect
| this is where most of the productivity statistics are coming
| from. Co-pilot especially seems prone to hallucinating APIs or
| suggesting entire functions that are wrong, repetitive, or even
| dangerous. This is the case even for side projects where I'm
| working with public python libraries which should be like the
| best case scenario. It's generally better than IDE autocomplete
| when I already am familiar with the API and so can catch the
| errors.
|
| I will say GPT-4 is mostly better than referencing stack-overflow
| or library documentation. Maybe once that gets fast/cheap enough
| to use as co-pilot we'll see some of these mythical productivity
| gains.
| smackeyacky wrote:
| I don't think so. I think it will be more like a flare that
| burns out.
|
| The reason being that the source of material for GPT-4 is
| rotting underneath. As the internet gets ruined by AI garbage,
| the quality of output of AI will get worse, not better.
|
| edit: This is a modern day tower of Babel event. We had a
| magnificent resource and then polluted it with garbage and now
| it's quickly becoming useless.
| throwaway4aday wrote:
| This is wrong for at least two reasons:
|
| 1) The data scraped from the web is filtered, deduped,
| ranked, and cleaned in a variety of other ways before being
| used for training. While quantity is necessary for a model to
| learn the structure of language, quality is even more
| important once you have a model that can produce coherent
| output so a lot of work goes into grooming the data.
|
| 2) There have been a bunch of papers and training runs that
| show synthetic data created specifically for training a model
| is as good or better than scraped human produced data. The
| importance of scraped web content is quickly declining
| because you can now generate infinite higher quality examples
| using existing trained models. The only relevance it has now
| is for knowledge about new developments and that is a much
| easier stream to filter since most of the important stuff
| comes from official sources and you don't need as many
| variations since you can just generate your own using one or
| more examples.
| smackeyacky wrote:
| How much time is going to be wasted cleaning the data? It
| seems like AI can generate junk much faster than we could
| possibly filter it.
| spunker540 wrote:
| Fortunately there are now AIs that can help with the data
| cleaning task.
| majewsky wrote:
| But who monitors the monitor?
| ethanwillis wrote:
| Infinite?
| RecycledEle wrote:
| From the article: "consultants using AI were significantly more
| productive (they completed 12.2% more tasks on average, and
| completed task 25.1% more quickly), and produced significantly
| higher quality results (more than 40% higher quality compared to
| a control group)."
| lgleason wrote:
| I was underwhelmed when using it for code generation of things
| like tests.
___________________________________________________________________
(page generated 2024-01-16 23:01 UTC)