[HN Gopher] Field experimental evidence of AI on knowledge worke...
       ___________________________________________________________________
        
       Field experimental evidence of AI on knowledge worker productivity
       and quality
        
       Author : CharlesW
       Score  : 117 points
       Date   : 2024-01-16 15:41 UTC (7 hours ago)
        
 (HTM) web link (www.oneusefulthing.org)
 (TXT) w3m dump (www.oneusefulthing.org)
        
       | datadrivenangel wrote:
       | "Consultants across the skills distribution benefited
       | significantly from having AI augmentation, with those below the
       | average performance threshold increasing by 43% and those above
       | increasing by 17% compared to their own scores. For a task
       | selected to be outside the frontier, however, consultants using
       | AI were 19 percentage points less likely to produce correct
       | solutions compared to those without AI."
       | 
       | So AI makes consultants faster but worse. As a consultant, this
       | should make my expertise even more valuable.
        
         | tempest_ wrote:
         | Only if being wrong actually matters.
        
           | datadrivenangel wrote:
           | The research was partially conducted by Boston Consulting
           | Group, so in this case being wrong doesn't really matter as
           | long as the Partners are smooth and the associates create
           | billable hours.
        
         | CharlesW wrote:
         | > _So AI makes consultants faster but worse._
         | 
         | That was just for tasks not suited to GPT-4 (and I assume, LLMs
         | in general).
         | 
         | > _" For a task selected to be outside the frontier, however,
         | consultants using AI were 19 percentage points less likely to
         | produce correct solutions compared to those without AI."_
         | 
         | For tasks suited to GPT-4, consultants produced 40% higher
         | quality results by leveraging GPT-4.
         | 
         | My take on that is that people who have at least some
         | understanding of how LLMs work will be at a significant
         | advantage, while people who think LLMs "think" will misuse LLMs
         | and get predictably worse results.
        
           | ParetoOptimal wrote:
           | > while people who think LLMs "think" will misuse LLMs and
           | get predictably worse results.
           | 
           | I've found that now that I see LLM's less as thinking and
           | more as stochiastic parrot I get a lot less use from them
           | because I give 3-5 turns at most versus 10 turns.
        
             | NemoNobody wrote:
             | Well, maybe a balance between a stoic parrot and a
             | "thinking machine" might be the best approach.
             | 
             | You know, everything in moderation and what not.
        
               | ParetoOptimal wrote:
               | Yes, ideally.
               | 
               | The problem for me is it used to be an addition to my
               | energy and confidence, but when you "take the magic out
               | of it" it doesn't help out there so much.
               | 
               | In antirez's blog he mentioned how it made things "not
               | worth it, worth it" and I agreed, but that happens much
               | less often if you think an LLM helping you is a fluke.
        
         | NemoNobody wrote:
         | Seriously, you cherry picked the paragraph you use to cite your
         | opinion, which contrary to the finding of the article itself
         | and completely ignore the the first part of the second sentence
         | of your paragraph "for a task selected outside the frontier" -
         | so, something beyond the capabilities of AI at this time ONLY
         | decreased their correct solutions 19 percentage points.
         | 
         | 43% better it made below average people, 17% better it made
         | above average people.
         | 
         | Same paragraph.
        
           | steve1977 wrote:
           | BCG consultants. Not average people.
        
           | datadrivenangel wrote:
           | You know, I missed that part. Oops.
           | 
           | Looking into the full paper, it looks like the task
           | identified inside the frontier is much more broken down, with
           | 18 numbered steps. The outside the frontier task is two
           | bullet points.
           | 
           | So for clearly scoped tasks, ChatGPT is hands down good. For
           | less well scoped work it reduces quality.
           | 
           | Interestingly, the authors also looked at the amount of text
           | retained from ChatGPT, and found that people who received
           | ChatGPT training ended up retaining a larger portion of the
           | ChatGPT output in their final answers and performed better in
           | the quality assessment.
        
         | quickcheque wrote:
         | ' As a consultant, this should make my expertise even more
         | valuable. '
         | 
         | Consultant logic at its finest.
        
           | j4yav wrote:
           | More garbage outputs being produced faster, paired with
           | expertise which makes it credibly sound like more fast
           | garbage will solve any remaining issues, probably _is_
           | approaching peak consultant value.
        
             | datadrivenangel wrote:
             | Only if that's what the clients want or need to justify
             | actual improvements!
             | 
             | Usually the goal is to produce less garbage outputs slower.
        
               | j4yav wrote:
               | The goal isn't to generate billable hours?
        
       | ThrowawayTestr wrote:
       | I just started a new job and used chatgpt to create some cool
       | Excel automation with Python.
        
       | Ecoste wrote:
       | I probably should know but I don't even have the vaguest of
       | ideas. What do consultants actually do?
        
         | paulsutter wrote:
         | Bill. Sometimes hourly, sometimes by project.
         | 
         | They wake up every morning hell bent on maintaining existing
         | customer accounts, growing existing accounts, and sometimes
         | adding a new account. They maintain a professional demeanor,
         | with just the right kind of smalltalk. Finally, they produce a
         | report that justifies whatever was the decision that the CEO
         | already made before he engaged the consultants
         | 
         | Thats pretty much the whole job
        
           | sockaddr wrote:
           | Yeah I've struggled to understand what people mean by
           | consultant and once heard that it was basically a person or
           | group you can hire to take the blame for an unpopular
           | decision and/or to weaponize it against others in the company
           | since it was "someone else" that made the call.
        
         | tokai wrote:
         | Provides you with a neat document that states what you wanted
         | to do is the right thing to do.
        
         | fhd2 wrote:
         | I'm a consultant - for about a year now. Not at BCG or
         | anywhere, I'm a one man company.
         | 
         | What I do is pretty similar to what I did before in my CTO
         | roles, just a bit more strategy and less execution. Clients
         | come to me with questions and problems and I help them solve
         | them based on my experience. If you've got multiple C levels,
         | middle management and a board, you're in a similarly consulting
         | position also as a full time CTO from what I've seen.
         | 
         | I don't go anywhere _near_ LLMs for this, I'd find it somewhere
         | between disrespectful and unethical. They're paying for _my_
         | expertise. I can imagine it could make a decent alternative to
         | Google and Wikipedia for research purposes, but I'd have to
         | double check it all anyway. I don't see how it'd make my work
         | any easier.
        
           | danielbln wrote:
           | > I don't go anywhere _near_ search engines for this, I'd
           | find it somewhere between disrespectful and unethical.
           | They're paying for _my_ expertise.
           | 
           | I bet someone said this 30 years ago. As it stands today,
           | LLMs can't wholesale do the job for you, they merely augment
           | it. And they are finicky and fallible, so expertise is needed
           | to actually produce useful results and validate them. There
           | are many reasons why an LLM isn't the right tool for a
           | certain task, wanting to drive on some sort of moral ethical
           | high road however is imo _not_ a great reason.
           | 
           | Continue on that road and people who have no qualms about
           | using whatever tool at their disposal will eat your lunch.
           | Maybe not today or tomorrow, but eventually.
        
             | fhd2 wrote:
             | The value of advice is largely estimated based on who gives
             | it. If a client is hiring me, it seems clear to me that
             | they're paying for mine.
             | 
             | For support tasks (like research or even supporting a
             | writeup) I'm not ethically opposed to using LLMs. It's just
             | that I try every couple of months and conclude it's not a
             | net improvement. Your mileage may vary of course.
        
         | czbond wrote:
         | You're going to get across the board replies here. To each
         | constituent, they do different things. There are consultants
         | across the board in speciality, quality, etc.
         | 
         | On HN, they get a bad rap from the "i'm smart, I know
         | everything, worker bee crowd" who think business is just
         | slinging JS code and product updates. Consultants are often
         | brought into those people orgs to deal with / or around,
         | valuable know-it-alls, fiefdoms, poor vision / strategy, or
         | politics that are holding the company back. They're often cited
         | as "just doing reports", when in fact consultants sometimes
         | stay out of implementation to make sure the owning team owns &
         | implements the solution.
         | 
         | Consultants often provide the hiring manager a shield on dicey
         | projects or risky outcomes. Where if it goes wrong, the manager
         | can say "it was those consultants".
         | 
         | Teams are brought in as trained, skilled, up to date staff the
         | company cannot hire fire, or shouldn't due to duration of need,
         | or skillset
         | 
         | Sometimes they're brought in because the politics of the
         | internal company lead to stalemates, poor strategy, etc.
         | 
         | Often at the higher levels, they're brought in due to focusing
         | on a speciality, market, or vertical to have large experience
         | that isn't possible to get in house.
         | 
         | One's experience with consultants frames their opinion. I've
         | only worked with very high quality teams in the past that
         | provide a healthy balance of vision, strategy, implementation,
         | etc
        
         | warner25 wrote:
         | I've mostly dealt with RAND consultants in the military /
         | government. They were brought in to help answer specific
         | questions about what we should do and how, presumably by
         | writing up and delivering a report. These weren't questions
         | that the staff couldn't handle itself, but nobody on staff had
         | the time to focus on answering these questions (by digging deep
         | and doing research, not just expressing a gut opinion) given
         | their other duties and responsibilities. So the RAND people
         | basically augmented the staff.
         | 
         | I guess they have the advantage of experience doing these
         | things over and over again for similar organizations. It's also
         | an outside perspective, which has both advantages and
         | disadvantages. In the conversations that I was invited into, we
         | spent most of our time just explaining things that any mid-
         | level member of the organization would know, trying to get the
         | RAND people up-to-speed, so that seemed wasteful. But the
         | military is a huge bureaucracy dominated by people who've
         | climbed up the ranks for 30-40 years, so there isn't a lot of
         | fresh outside thinking, and it seems like it could be valuable
         | to inject some.
        
         | paxys wrote:
         | In simplest terms, they solve problems for companies. If your
         | company wants to achieve something and doesn't want to spend
         | the time/effort/money to set up a new department and staff it
         | with full-time experts, you can call a consulting firm and they
         | will figure it out for you.
        
         | sgt101 wrote:
         | Don't worry, it's quite common for people who are busy and
         | important like you not to know this. In our experience that's
         | because the people that work for them are simply not using
         | modern working practices.
         | 
         | It is a problem though, because frankly, although you might
         | think that moving to a more modern way of working is a simple
         | fix, we have found that your employees are actually your
         | enemies and are plotting to kill you and your family. You might
         | think that you depend on them to get work done, but we can fix
         | that for you and get our much more skilled and low paid workers
         | to do the job instead.
         | 
         | You will find out that if you allow us to take the required
         | steps to prevent your family from rape and murder your
         | shareholders will get a bump as well! And of course you will as
         | well, you can expect your bonus to at least triple, and you are
         | worth it for sure.
         | 
         | We have already set up the culling stations filled with
         | whirling blades to deal with the scum downstairs, if you say
         | nothing at all we will act and slaughter them instantly. There
         | is no guilt, only reward.
         | 
         | We love you.
        
       | mediaman wrote:
       | This study had management consultants use GPT4 on a variety of
       | different management consultant tasks.
       | 
       | Some were more creative in nature, and some were more analytical.
       | For example, ideas on marketing sneakers versus an analysis of
       | store performance across a retailer's portfolio.
       | 
       | In general they found that GPT4 was helpful for creative tasks,
       | but didn't help much (and in fact reduced quality) for analytical
       | questions.
       | 
       | I think these kinds of studies are of limited use. I don't
       | believe raw GPT4 is that helpful in the enterprise. Whether it is
       | useful or not comes down to whether engineers can harness it
       | within a pipeline of content and tools.
       | 
       | For example, when engineers create a system to summarize issues a
       | customer has had from the CRM, that can help a customer service
       | person be more informed. Structured semantic search on a
       | knowledge base can help them also find the right solution to
       | common customer problems.
       | 
       | McKinsey made a retrieval augmented generation system that
       | searched all their research reports and created summaries of
       | content they could use, so a consultant could quickly find prior
       | work that would be relevant on a client project. If they built
       | that correctly, I imagine that is pretty useful.
       | 
       | GPT4 alone will especially not be that useful for analytical
       | work. However, developers can make it a semi-capable analyst for
       | some use cases by connecting it to data lakes, describing
       | schemas, and giving it tools to do analysis. Usually this is not
       | a generalist solution, and needs to be built for each company's
       | application.
       | 
       | Many of the studies so far only look at vanilla GPT4 via ChatGPT,
       | and it seems unlikely that, if LLMs do transform the workplace,
       | that a standalone ChatGPT is what it will look like.
        
         | ebiester wrote:
         | The first generation of company-focused llm will need to be
         | bespoke. Teams involved with this will learn the necessary
         | abstractions and will build new companies to lower the on-ramp
         | or join the LLM providers to do the same.
        
         | Fin_Code wrote:
         | This is entirely unsurprising. A language generator is not
         | designed for analytical analysis. The fact it codes well is
         | what I think is throwing people. Since code is a form of
         | language it works well for the language model. The issue with
         | code will come from larger context where analytical thought is
         | needed and it will break down.
        
           | svachalek wrote:
           | People are trying really hard to make "search" or something
           | like it the killer app for LLMs but from what I've seen, it's
           | translation. Coding in its purest form is a kind of
           | translation, from natural language requirements to more
           | precise computer languages and vice versa. Keeping an LLM
           | properly focused on that task, it's pretty astonishing what
           | it can do. But as you say, assuming it can do anything more
           | than that is a trap.
        
             | chewxy wrote:
             | > Coding in its purest form is a kind of translation, from
             | natural language requirements to more precise computer
             | languages
             | 
             | Unfortunately not so true. A simple way to check this is to
             | look at what the brain is doing. Coding barely triggers the
             | language components of the brain (your usual suspects like
             | the Vernicke's and Broca's areas + other miscelleneous
             | things). In fact what you'll find is that coding triggers
             | the IP1, IP2 and AVI pathways. These are part of the
             | multiple demand network. We call them programming
             | "languages" because we can represent them in a token-based
             | game we call "language" but the act of programming is far
             | from mere language manipulation.
        
         | roveo wrote:
         | > GPT4 alone will especially not be that useful for analytical
         | work. However, developers can make it a semi-capable analyst
         | for some use cases by connecting it to data lakes, describing
         | schemas, and giving it tools to do analysis. Usually this is
         | not a generalist solution, and needs to be built for each
         | company's application.
         | 
         | I've been thinking about this a lot lately. I'm a data analyst
         | and all these "give GPT you data warehouse schema, let it
         | generate SQL to answer user's queries" products completely miss
         | the point. An analyst has value as a curator of organizational
         | knowledge, not translator from "business" to SQL. Things like
         | knowing that when a product manager asks us for revenue/gmv, we
         | exclude canceled orders, but include purchases made with bonus
         | currency or promo codes.
         | 
         | Things like this are not documented and are decided during
         | meetings and in Slack chats. So my idea is that in order to
         | make LLMs truly useful, we'll create "hybrid" programming
         | languages that are half-written by humans, and the part that's
         | written by LLMs when translating from "human" language is
         | simple enough for LLMs to do it reliably. I even made some
         | weekend-prototypes with pretty interesting results.
        
         | jampekka wrote:
         | > I think these kinds of studies are of limited use. I don't
         | believe raw GPT4 is that helpful in the enterprise.
         | 
         | I'm not so sure about this. A lot of work in and for especially
         | large organizations is writing text that often doesn't even get
         | read, or is at most just glanced at. LLMs are great at writing
         | useless text.
        
           | bitzun wrote:
           | At first I thought this was a flippant joke, but I can
           | actually see the value in saving time writing junk without
           | having to alter existing bureaucratic processes from which
           | nobody gets much value.
        
             | jampekka wrote:
             | I am using ChatGPT to write the genetic boilerplate that
             | e.g. many funding applications require. Ethics statements
             | or inclusivity policies etc where the contents are largely
             | predefined and known by everybody but still have to be
             | written in prose instead of e.g. clicking boxes.
        
           | jprete wrote:
           | That kind of text is valuable as a signaling exercise - the
           | author is willing to claim the text as indicating their
           | actual beliefs. Writing it with AI would actually damage the
           | author's credibility since it's no longer their words -
           | although this shouldn't be a problem if the author is
           | carefully checking the text every single time to make sure it
           | represents their position.
           | 
           | The effect would be even worse the first time the author
           | missed a mistake from the AI and had to walk it back
           | publicly. Nobody's going to trust anything they put in
           | writing for a while after that.
        
             | jampekka wrote:
             | There are genres of text that have little to do with the
             | writer's beliefs and are a kind of formality with contents
             | known by all interested parties beforehand. Sort of
             | elaborate prose versions of "I Agree With the Terms and
             | Conditions" textboxes.
        
           | jart wrote:
           | Isn't most of science like that too?
        
             | jampekka wrote:
             | There is a lot of boilerplate and fluff in scientific
             | papers that everybody knows is fluff.
        
       | totoglazer wrote:
       | Note this paper was published in September 2023.
        
       | clbrmbr wrote:
       | Can someone explain Centaurs vs Cyborgs? (I didn't make an
       | account to read the full paper...)
        
       | araes wrote:
       | The actual paper can be found at: https://www.iab.cl/wp-
       | content/uploads/2023/11/SSRN-id4573321...
       | 
       | Note: After reading the paper, there are serious methodology
       | issues. ChatGPT without any human involvement (simply reading the
       | task as instructions and producing an answer) produced a better
       | "quality" result than any human involvement in many tasks.
       | 
       | My read: ChatGPT produces what we pre-view as "correct".
       | 
       | They even kind of state this: "when human subjects use ChatGPT
       | there is a reduction in the variation in the eventual ideas they
       | produce. This result is perhaps surprising one would assume that
       | ChatGPT, with its expansive knowledge base, would instead be able
       | to produce many very distinct ideas, compared to human subjects
       | alone."
       | 
       | My read: The humans converge towards what is already viewed as
       | "correct". Its like being in a business and your boss already
       | knows exactly what your boss actually wants, and any variation
       | you produce is automatically "bad" anyways.
       | 
       | There is also a drastic difference in the treatment of the
       | in/out/on "frontier" subject. Lots of talk about how great adding
       | AI is to "inside frontier" tasks, no similar graphs/charts for
       | "outside frontier" tasks. Finally, if these people are
       | consultants and business profs, they produce some crazily bad
       | charts. Figure 3 in the appendix is so difficult to read.
        
       | gandalfgeek wrote:
       | If you don't want to read the full thing, Mollick's substack has
       | more accessible summary of this paper:
       | 
       | https://www.oneusefulthing.org/p/centaurs-and-cyborgs-on-the...
       | 
       | There are two sides to this thing. The first author of this paper
       | has also written another paper titles "Falling Asleep at the
       | Wheel: Human/AI Collaboration in a Field Experiment". Abstract:
       | 
       | "As AI quality increases, humans have fewer incentives to exert
       | effort and remain attentive, allowing the AI to substitute,
       | rather than augment their performance... I found that subjects
       | with higher quality AI were less accurate in their assessments of
       | job applications than subjects with lower quality AI. On average,
       | recruiters receiving lower quality AI exerted more effort and
       | spent more time evaluating the resumes, and were less likely to
       | automatically select the AI-recommended candidate. The recruiters
       | collaborating with low-quality AI learned to interact better with
       | their assigned AI and improved their performance. Crucially,
       | these effects were driven by more experienced recruiters.
       | Overall, the results show that maximizing human/AI performance
       | may require lower quality AI, depending on the effort, learning,
       | and skillset of the humans involved."
       | 
       | https://static1.squarespace.com/static/604b23e38c22a96e9c788...
        
         | esafak wrote:
         | If you don't need a skill, you won't practice it, and it will
         | atrophy. This frees you to learn different things. Like how
         | most software engineers today don't know computer engineering
         | (hardware) but they can write more complex applications.
        
           | SkyBelow wrote:
           | That is if the AI fully replaces the need for a skill. What
           | happens if it mostly replaces it, enough so the skill
           | atrophies, but not enough that the user never needs the
           | skill?
        
             | esafak wrote:
             | Good point. That sounds like a temporary product problem,
             | like a level 2-3 self-driving car, which still needs you to
             | pay attention.
             | 
             | I think you also need to consider whether you can afford
             | not to have the skill; what would happen if the AI were
             | taken away, or malfunctioned? Airplane pilots are an
             | example of this. In that case you simply have to learn the
             | skill as if the AI will not be there.
        
               | hn_throwaway_99 wrote:
               | > That sounds like a temporary product problem
               | 
               | The jury is still out on how "temporary" of a problem
               | that will be. Improvements have been made but nobody
               | really knows how to get an LLM, for example, to just say
               | "I don't know".
        
               | Retric wrote:
               | And this is before people start poisoning the well.
               | Google search worked better in a world without Google
               | search. I suspect many AI systems will run into similar
               | issues.
        
               | hn_throwaway_99 wrote:
               | > Google search worked better in a world without Google
               | search.
               | 
               | Oooh, I really like that. Very succinct and topical
               | statement of Goodhart's Law.
        
             | taway_6PplYu5 wrote:
             | Oh, you mean like self-driving cars?
        
           | vlovich123 wrote:
           | Depends on what you mean by computer engineering but they
           | should definitely have a good mental model of how computers
           | work. Otherwise your complex applications will have trouble
           | scaling. That can matter less in many cases but early
           | architecture mistakes can easily put you down an evolutionary
           | dead end forcing you to experience exponentially rising
           | maintenance costs to add new features / fix bugs or rewrite
           | the thing by someone who understands computer architecture
           | better (+ take advantage of a better understanding the
           | problem domain)
        
             | WJW wrote:
             | I see what you mean but there's a huge gap between a "good
             | enough" mental model of how computers work and the actual
             | in-depth details that GP means and that have been
             | successfully abstracted away from us. For example: the
             | amount of cases when I have needed to know about the
             | details of how the transistors in RAM chips are laid out
             | have been extremely minimal. Nor have I ever had to worry
             | about the encoding scheme used by the transceivers for the
             | optical fibers that my TCP packets move across.
             | 
             | Obviously there jobs out there that _do_ have to worry
             | about these things (mostly at companies like TSMC and
             | Cisco), but in practice I doubt regular engineers at even
             | the largest scale software companies have to worry about
             | stuff like that.
        
           | hn_throwaway_99 wrote:
           | But that is the exact problem with AI - 99% of the time it's
           | good, so your skills atrophy, but then there's that one
           | percent of the time it decides the reflective side of that
           | 18-wheeler is the sun and drives you into it. That's very
           | different from, say, a compiler, which as a programmer I
           | completely trust to turn my source code into machine code
           | accurately.
        
             | tikhonj wrote:
             | > _completely trust to turn my source code into machine
             | code accurately_
             | 
             | Say hello to undefined behavior :)
             | 
             | Appropriately enough, much of the reason undefined behavior
             | is a consistent problem is exactly the same--it can do
             | something reasonable 99% of the time and then totally screw
             | you over out of nowhere.
        
               | Jensson wrote:
               | Undefined behavior is a defined thing, you know about it
               | so you can avoid it if you want, or look up what the
               | compiler you use does in that case to see if it is useful
               | to you. No such thing for a black box AI model.
        
             | BeetleB wrote:
             | You're just mirroring the parent example. Prior to AI (or
             | even computers), this was a problem.
             | 
             | I used to be a more conventional engineering guy. We do a
             | ton of calculus to get our degree. My first job - you just
             | didn't need calculus (although all the associated courses
             | in school used it to develop the theory). As a result, the
             | one offs (less than once a year), when a problem did come
             | up where calculus knowledge was helpful, they couldn't do
             | the task and would go to that one guy (me) who still
             | remembered calculus.
             | 
             | BTW, I guarantee all the employees there got A's on their
             | calculus courses.
        
               | hn_throwaway_99 wrote:
               | > You're just mirroring the parent example. Prior to AI
               | (or even computers), this was a problem.
               | 
               | Not at all, and your argument isn't the same thing.
               | 
               | The problem with current AI, which _is_ very different
               | from past forms of automation, is that, essentially, its
               | failure mode is undefined. Past forms of automation, for
               | example, did not  "hallucinate" seemingly correct but
               | false answers. Even taking your own example, it was
               | obvious to your colleagues that _they didn 't know how to
               | get the answer_. So they did the correct thing - they
               | went to someone who knew how to do it. What they did
               | _not_ do, and which is what most current AI solutions
               | will do, is just  "wing it" with a solution that is wrong
               | but looks correct to those without more expertise.
        
           | warner25 wrote:
           | This a nice, tight statement of the big picture.
           | 
           | This isn't new, like since LLMs were born, either. In the
           | aviation community, and probably many other domains, there
           | has been a running debate for decades about the pros and cons
           | of introducing more automation into the cockpit and what
           | still should or shouldn't be taught and practiced in training
           | (to be able to effectively recognize and handle the edge
           | cases where things fail).
           | 
           | And in military aviation, this isn't just a question of
           | safety; it's about productivity just like it is for knowledge
           | workers. If an aircraft can fly itself most of the time, the
           | crew can do more tactical decision-making and use more
           | sensors and weapons. Instead of just flying their own
           | aircraft, the crew can also control a number of unmanned
           | aircraft with which they're teamed up.
        
         | digging wrote:
         | While this is great information, I wish more could be said
         | about Centaurs & Cyborgs, the title of the piece.
         | 
         | There's a brief example of a type of task that each approach
         | excels at, but I'd definitely like more. The crux of this piece
         | is that our human judgment of when/how to use AI tools is the
         | most important factor in work quality when we're on the edge of
         | the frontier. But there's no analysis of whether centaurs or
         | cyborgs did better at those outside-the-frontier tasks; it's
         | not clear why those categories are even mentioned since they
         | appear to have no relevance to the preceding research results.
         | And as the article mentions, the frontier is "invisible" (or at
         | least difficult to see); learning to detect tasks that are in,
         | on, or outside of the frontier seems like an immensely
         | important skill. (I also realize it may become completely
         | obsolete in <5 years as the frontier expands exponentially.)
         | 
         | I understand the goal of this research was not to find these
         | "edges" and to determine how we can improve our judgment about
         | when & how to use AI. But after reading these strong results,
         | that's definitely the only thing I'm interested in. I use AI
         | almost not at all in my work, web development. It has been most
         | useful to me in getting me unstuck from a thorny under-
         | documented problem. Over a year after ChatGPT released, I still
         | don't know _if_ modern LLMs can actually be a force multiplier
         | for my work or if I 'm correctly judging that they're not
         | appropriate for the majority of my tasks. The latter seems
         | increasingly unlikely as capabilities advance.
        
         | dang wrote:
         | We've changed the URL from that to
         | https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321.
         | Thanks!
         | 
         | (In cases like this, it's nice to have both the readable
         | summary and the link to the paper.)
        
         | SlightlyLeftPad wrote:
         | I think this a fascinating, yet unsurprising finding. We've
         | known for a long time that automation leads to skillset atrophy
         | on the part of the user. Perhaps most famously, in aviation
         | where some automated systems make complicated procedures so
         | easy that pilots effectively lose the skills to do the proper
         | procedure themselves.
         | 
         | The discussions I hope are happening is the prospect of
         | intentional capability limits of AI in critical
         | industries/areas where humans must absolutely and always
         | intervene. It seems analogous to having a superpower without
         | knowing how to control it, creating a potentially deadly
         | situation at best.
         | 
         | Then perhaps more long term at the generational scale, will
         | humans cognitively devolve to a point where AI is making all of
         | our decisions for us? Essentially we make ourselves obsolete
         | and eventually we're reclassified as "regular/inferior" species
         | of animals. Profit and return on investments here cannot get in
         | the way of handling these details with great oversight and
         | responsibly.
        
           | ethbr1 wrote:
           | That opines interestingly on only allowing full self driving
           | cars on pre-approved, high-assurance routes.
           | 
           | It either needs to be 100% or (bad).
           | 
           | Anywhere in the grey zone, and you retard human performance
           | while still occasionally needing it.
        
       | rich_sasha wrote:
       | I am yet to to come across a programming task where ChatGPT saved
       | me time.
       | 
       | For a start, most of my work is about using internal APIs. But
       | nevermind. Sometimes I come across a generic programming problem.
       | ChatGPT get me to 80% of a solution.
       | 
       | Getting it from there to a 100% solution takes as long as doing
       | it from scratch.
       | 
       | Just my $0.02
        
         | snapcaster wrote:
         | Have you considered that it may be a personal skill issue? I'm
         | not saying it is, but it seems like people with this opinion
         | never seem to consider maybe they aren't using it properly or
         | don't get it
        
           | elzbardico wrote:
           | The reverse could also be true.
           | 
           | Some people could be not able to figure out that the code GPT
           | produced is not good because they lack the skill level to
           | review it efficiently.
        
           | rich_sasha wrote:
           | Well, maybe. But it seems to me this somewhat defeats the
           | purpose. If I tell ChatGPT to write me, say, a sorting
           | algorithm (which I'm sure it can do), but if I ask it
           | _wrongly_ it gives me a deceptively looking lemon of a
           | solution, that's a liability.
           | 
           | Conversely, could you share an example of a nontrivial,
           | practical programming problem ChatGPT can solve, but only if
           | you use it right?
        
             | mewpmewp2 wrote:
             | For coding it's mostly Copilot that saves time. Although
             | ChatGPT in some cases as well, but not as reliably or
             | frequently.
        
               | throwaway4aday wrote:
               | Agree, Copilot is a huge speed boost for stubbing things
               | out, auto-generating the dead simple parts, and
               | refactoring. ChatGPT is awesome for rubberducking, pair
               | programming, brainstorming, and figuring out what that
               | thing that you kind of remember is called. We aren't at
               | the point where we can just say "robot, do my work" but
               | it's for sure good enough to take care of the boring
               | stuff and be a sounding board.
        
         | pphysch wrote:
         | ChatGPT/Bard is good for answering documentation or common
         | syntax questions. For actual programming, you should use a
         | proper AI "copilot" that actually integrates into the context
         | of your IDE, and therefore understands the syntax of your
         | internal APIs to some extent. You will save time copy-pasting
         | and get better results.
        
         | BeetleB wrote:
         | Where it saves me time is when using a framework I'm not used
         | to - it more often than not solves my problem (but not by a
         | large stretch - perhaps 60% of the time).
         | 
         | Even more common is getting shell/UNIX commands to do what I
         | want. I don't have all the standard UNIX tools (cut, etc) in my
         | head, and I _definitely_ don 't keep all the command line
         | options in my head. GPT4 tells me what I need, and it's much
         | quicker to confirm it via the --help or man pages than craft it
         | myself by reading those pages.
        
       | freedryk wrote:
       | The biggest question I see with this study is it doesn't seem
       | like subjects had access to tools outside of the AI. Were the
       | subjects without AI able to do google searches? If not, then what
       | is the performance gain of the AI users over people who can
       | google stuff?
        
       | siliconc0w wrote:
       | There is a lot of noise from VC-types about how AI copilots turn
       | random outsourced devs into 10x engineers. My experience is that
       | it's pretty good at writing boilerplate, that is - it saves me
       | typing what I would expended little thought in typing. I suspect
       | this is where most of the productivity statistics are coming
       | from. Co-pilot especially seems prone to hallucinating APIs or
       | suggesting entire functions that are wrong, repetitive, or even
       | dangerous. This is the case even for side projects where I'm
       | working with public python libraries which should be like the
       | best case scenario. It's generally better than IDE autocomplete
       | when I already am familiar with the API and so can catch the
       | errors.
       | 
       | I will say GPT-4 is mostly better than referencing stack-overflow
       | or library documentation. Maybe once that gets fast/cheap enough
       | to use as co-pilot we'll see some of these mythical productivity
       | gains.
        
         | smackeyacky wrote:
         | I don't think so. I think it will be more like a flare that
         | burns out.
         | 
         | The reason being that the source of material for GPT-4 is
         | rotting underneath. As the internet gets ruined by AI garbage,
         | the quality of output of AI will get worse, not better.
         | 
         | edit: This is a modern day tower of Babel event. We had a
         | magnificent resource and then polluted it with garbage and now
         | it's quickly becoming useless.
        
           | throwaway4aday wrote:
           | This is wrong for at least two reasons:
           | 
           | 1) The data scraped from the web is filtered, deduped,
           | ranked, and cleaned in a variety of other ways before being
           | used for training. While quantity is necessary for a model to
           | learn the structure of language, quality is even more
           | important once you have a model that can produce coherent
           | output so a lot of work goes into grooming the data.
           | 
           | 2) There have been a bunch of papers and training runs that
           | show synthetic data created specifically for training a model
           | is as good or better than scraped human produced data. The
           | importance of scraped web content is quickly declining
           | because you can now generate infinite higher quality examples
           | using existing trained models. The only relevance it has now
           | is for knowledge about new developments and that is a much
           | easier stream to filter since most of the important stuff
           | comes from official sources and you don't need as many
           | variations since you can just generate your own using one or
           | more examples.
        
             | smackeyacky wrote:
             | How much time is going to be wasted cleaning the data? It
             | seems like AI can generate junk much faster than we could
             | possibly filter it.
        
               | spunker540 wrote:
               | Fortunately there are now AIs that can help with the data
               | cleaning task.
        
               | majewsky wrote:
               | But who monitors the monitor?
        
             | ethanwillis wrote:
             | Infinite?
        
       | RecycledEle wrote:
       | From the article: "consultants using AI were significantly more
       | productive (they completed 12.2% more tasks on average, and
       | completed task 25.1% more quickly), and produced significantly
       | higher quality results (more than 40% higher quality compared to
       | a control group)."
        
       | lgleason wrote:
       | I was underwhelmed when using it for code generation of things
       | like tests.
        
       ___________________________________________________________________
       (page generated 2024-01-16 23:01 UTC)