[HN Gopher] Measuring the impact of AI on experienced open-sourc...
___________________________________________________________________
Measuring the impact of AI on experienced open-source developer
productivity
Author : dheerajvs
Score : 442 points
Date : 2025-07-10 16:29 UTC (6 hours ago)
(HTM) web link (metr.org)
(TXT) w3m dump (metr.org)
| Jabrov wrote:
| Very interesting methodology, but the sample size (16) is way too
| low. Would love to see this repeated with more participants.
| IshKebab wrote:
| They paid the developers about $75k in total to do this so I
| wouldn't hold your breath!
| barbazoo wrote:
| That's a lot of money for many of us. Do you know those folks
| were in a HCOL area?
| IshKebab wrote:
| No idea. They don't say who they were; just random popular
| GitHub projects.
|
| To be clear it wasn't $75k _each_.
| narush wrote:
| You can see a list of repositories with participating
| developers in the appendix! Section G.7.
|
| Paper is here: https://metr.org/Early_2025_AI_Experienced
| _OS_Devs_Study.pdf
| mapt wrote:
| It isn't a lot of money for industry research. Changes of
| +-40% in productivity are an enormous
| advantage/disadvantage for a large tech company moving tens
| of billions of dollars a year in cashflow through a
| pipeline that their software engineers built.
| lawlessone wrote:
| Neat, how to sign up??
| IshKebab wrote:
| Go back in time, create a popular github repo with lots of
| stars, be lucky.
| asdff wrote:
| I see these things posted on linkedin. Usually asking
| $40/hr though. But essentially the same thing as the OP
| outlines: you do some domain related task assigned either
| with or without an AI tool. Check linked in. They will have
| really vague titles like "data scientist" though even
| though that's not what is being described, its study
| subject. Maybe set 40/hr as a filter on linkedin and see if
| you can get a few to come up.
| narush wrote:
| Noting that most of our power comes from the number of tasks
| that developers complete; it's 246 total completed issues in
| the course of this study -- developers do about 15 issues (7.5
| with AI and 7.5 without AI) on average.
| biophysboy wrote:
| Did you compare the variance within individuals (due to
| treatment) to the variance between individuals (due to other
| stuff)?
| kokanee wrote:
| > developers expected AI to speed them up by 24%, and even after
| experiencing the slowdown, they still believed AI had sped them
| up by 20%.
|
| I feel like there are two challenges causing this. One is that
| it's difficult to get good data on how long the same person in
| the same context would have taken to do a task without AI vs
| with. The other is that it's tempting to time an AI with metrics
| like how long until the PR was opened or merged. But the AI
| workflow fundamentally shifts engineering hours so that a greater
| percentage of time is spent on refactoring, testing, and
| resolving issues later in the process, including after the code
| was initially approved and merged. I can see how it's easy for a
| developer to report that AI completed a task quickly because the
| PR was opened quickly, discounting the amount of future work that
| the PR created.
| qsort wrote:
| It's really hard to attribute productivity gains/losses to
| specific technologies or practices, I'm very wary of self-
| reported anecdotes in any direction precisely because it's so
| easy to fool ourselves.
|
| I'm not making any claim in either direction, the authors
| themselves recognize the study's limitations, I'm just trying
| to say that everyone should have far greater error bars. This
| technology is the weirdest shit I've seen in my lifetime,
| making deductions about productivity from anecdotes and dubious
| benchmarks is basically reading tea leaves.
| yorwba wrote:
| Figure 21 shows that initial implementation time (which I take
| to be time to PR) increased as well, although post-review time
| increased even more (but doesn't seem to have a significant
| impact on the total).
|
| But Figure 18 shows that time spent actively coding decreased
| (which might be where the feeling of a speed-up was coming
| from) and the gains were eaten up by time spent prompting,
| waiting for and then reviewing the AI output and generally
| being idle.
|
| So maybe it's not a good idea to use LLMs for tasks that you
| could've done yourself in under 5 minutes.
| narush wrote:
| Qualitatively, we don't see a drop in PR quality in between AI-
| allowed and AI-disallowed conditions in the study; the devs who
| participate are generally excellent, know their repositories
| standards super well, and aren't really into the 'get up a bad
| PR' vibe -- the median review time on the PRs in the study is
| about a minute.
|
| Developers totally spend time totally differently, though, this
| is a great callout! On page 10 of the paper [1], you can see a
| breakdown of how developers spend time when they have AI vs.
| not - in general, when these devs have AI, they spend a smaller
| % of time writing code, and a larger % of time working with AI
| (which... makes sense).
|
| [1]
| https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf
| gitremote wrote:
| > I feel like there are two challenges causing this. One is
| that it's difficult to get good data on how long the same
| person in the same context would have taken to do a task
| without AI vs with.
|
| The standard experimental design that solves this is to
| _randomly_ assign participants to the experiment group (with
| AI) and the control group (without AI), which is what they did.
| This isolates the variable (with or without AI), taking into
| account uncontrollable individual, context, and environmental
| differences. You don 't need to know how the single individual
| and context would have behaved in the other group. With a large
| enough sample size and effect size, you can determine
| statistical significance, and that the with-or-without-AI
| variable was the only difference.
| dash2 wrote:
| The authors say "High developer familiarity with repositories" is
| a likely reason for the surprising negative result, so I wonder
| if this generalizes beyond that.
| kennywinker wrote:
| Like if it generalizes to situations where the developer is not
| familiar with the repo? That doesn't seem like generalizing,
| that seems like specifying. Am I wrong in saying that the
| majority of developer time is spent in repos that they're
| familiar with? Every job and project I've worked has been on a
| fixed set of repos the entire time. If AI is only helpful for
| the first week or two on a project, that's not very many cases
| it's useful for.
| jbeninger wrote:
| I'd say I write the majority of my code in areas I'm familiar
| with, but spend the majority of my _time_ on sections I'm not
| familiar with, and ai helps a lot more with the latter than
| the former. I've always felt my coding life is speeding
| through a hundred lines of easy code then getting stuck on
| the 101st. Then as I get more experienced that hundred
| becomes 150, then 200, but always speeding through the easy
| part until I have to learn something new.
|
| So I never feel like I'm getting any faster. 90% of my time
| is still spent in frustration, even when I'm producing twice
| the code at higher quality
| add-sub-mul-div wrote:
| Without the familiarity would the work be getting done
| effectively? What does it mean for someone to commit AI code
| that they can't fully understand?
| noisy_boy wrote:
| It is 80/20 again - it gets you 80% of the way in 20% of the time
| and then you spend 80% of the time to get the rest of the 20%
| done. And since it always feels like it is almost there, sunk-
| cost fallacy comes into play as well and you just don't want to
| give up.
|
| I think an approach that I tried recently is to use it as a
| friction remover instead of a solution provider. I do the
| programming but use it to remove pebbles such as that small bit
| of syntax I forgot, basically to keep up the velocity. However, I
| don't look at the wholesale code it offers. I think keeping the
| active thinking cap on results in code I actually understand
| while avoiding skill atrophy.
| wmeredith wrote:
| > and then you spend 80% of the time to get the rest of the 20%
| done
|
| This was my pr-AI experience anyway, so getting that first
| chunk of time back is helpful.
|
| Related: One of the better takes I've seen on AI from an
| experienced developer was, "90% of my skills just became
| worthless, and the other 10% just became 1,000 times more
| valuable." There's some hyperbole there, I but I like the gist.
| skydhash wrote:
| It's not funny when you find yourself redoing the first 80%,
| as the only way to complete the second 80%.
| bluefirebrand wrote:
| Let us know if that dev you're talking about winds up working
| 90% less for the same amount, or earning 1000x more
|
| Otherwise he can shut the fuck up about being 1000x more
| valuable imo
| emodendroket wrote:
| I think it's most useful when you basically need Stack Overflow
| on steroids: I basically know what I want to do but I'm not
| sure how to achieve it using this environment. It can also be
| helpful for debugging and rubber ducking generally.
| threetonesun wrote:
| Absolutely this. For a while I was working with a language I
| was only partially familiar with, and I'd say "here's how I
| would do this in [primary language], rewrite it in [new
| language]" and I'd get a decent piece of code back. A little
| searching in the project to make sure it was stylistically
| correct and then done.
| some-guy wrote:
| All those things are true, but it's such a small part of my
| workflow at this point that the savings, while nice, aren't
| nearly as life-changing to my job as my CEO is forcing us to
| think it is.
|
| Once AI can actually untangle our 14 year old codebase full
| of hosh-posh code, read every commit message, JIRA ticket,
| and Slack conversation related to the changes in full
| context, it's not going to solve a lot of the hard problems
| at my job.
| skydhash wrote:
| The issue is that it is slow and verbose, at least in its
| default configuration. The amount of reading is non trivial.
| There's a reason most references are dense.
| lukan wrote:
| Those issues you can partly solve by changing the prompt to
| tell it to be concise and don't explain its code.
|
| But nothing will make them stick to the one API version I
| use.
| diggan wrote:
| > But nothing will make them stick to the one API version
| I use.
|
| Models trained for tool use can do that. When I use Codex
| for some Rust stuff for example, it can grep from source
| files in the directory dependencies are stored, so
| looking up the current APIs is trivial for them. Same
| works for JavaScript and a bunch of other languages too,
| as long as it's accessible somewhere via the tools they
| have available.
| lukan wrote:
| Hm, I never tried codex so far, but quite some other
| tools and models and none could help me in a consistent
| way. But I am sceptical, because also if I tell them
| explicitel, to only use one specific version they might
| or not might use that, depending on their training corpus
| and temperature I assume.
| malfist wrote:
| The less verbosity you allow the dumber the LLM is. It
| thinks in tokens and if you keep it from using tokens
| it's lobotomized.
| GuinansEyebrows wrote:
| > rubber ducking
|
| i don't mean to pick on your usage of this specifically, but
| i think it's noteworthy that the colloquial definition of
| "rubber ducking" seems to have expanded to include "using a
| software tool to generate advice/confirm hunches". I always
| understood the term to mean a personal process of talking
| through a problem out loud in order to methodically,
| explicitly understand a theoretical plan/process and expose
| gaps.
|
| based on a lot of articles/studies i've seen (admittedly
| haven't dug into them too deeply) it seems like the use of
| chatbots to perform this type of task actually has negative
| cognitive impacts on some groups of users - the opposite of
| the personal value i thought rubber-ducking was supposed to
| provide.
| eknkc wrote:
| It works great on adding stuff to an already established
| codebase. Things like "we have these search parameters, also
| add foo". Remove anything related to x...
| antonvs wrote:
| Exactly. If you can give it a contract and a context,
| essentially, and it doesn't need to write a large amount of
| code to fulfill it, it can be great.
|
| I just used it to write about 80 lines of new code like that,
| and there's no question it saves time.
| reverendsteveii wrote:
| well we used to have a sort of inverse pareto where 80% of the
| work took 80% of the effort and the remaining 20% of the work
| also took 80% of the effort.
|
| I do think you're onto something with getting pebbles out of
| the road inasmuch as once I know what I need to do AI coding
| makes the doing _much_ faster. Just yesterday I was playing
| around with removing things from a List object using the Java
| streams API and I kept running into
| ConcurrentOperationsExceptions, which happen when multiple
| threads are mutating the list object at the same time because
| no thread can guarantee it has the latest copy of the list
| unaltered by other threads. I spent about an hour trying to
| write a method that deep copies the list, makes the change and
| then returns the copy and running into all sorts of problems
| til I asked AI to build me a thread-safe list mutation method
| and it was like "Sure, this is how I'd do it but also the API
| you're working with already has a method that just....does
| this." Cases like this are where AI is supremely useful -
| intricate but well-defined problems.
| cwmoore wrote:
| Code reuse at scale: 80 + 80 = 160% ~ phi...coincidence?
|
| I think this may become a long horizon harvest for the
| rigorous OOP strategy, may Bill Joy be disproved.
|
| Gray goo may not [taste] like steel-cut oatmeal.
| Sharlin wrote:
| It's often said that _p_ is the factor by which one should
| multiply all estimates - reducing it to _F_ would be a
| significant improvement in estimation accuracy!
| visarga wrote:
| 1.6x multiplier is low, we usually need to apply 5x
| 01100011 wrote:
| As an old dev this is really all I want: a sort of autocorrect
| for my syntactical errors to save me a couple compile-edit
| cycles.
| pferde wrote:
| What I want is not autocorrect, because that won't teach me
| anything. I want it to yell at me loudly and point to the
| syntactical error.
|
| Autocorrect is a scourge of humanity.
| causal wrote:
| Agreed and +1 on "always feels like it is almost there" leading
| to time sink. AI is especially good at making you feel like
| it's doing something useful; it takes a lot of skill to discern
| the truth.
| i_love_retros wrote:
| The problem is I then have to also figure out the code it wrote
| to be able to complete the final 20%. I have no momentum and am
| starting from almost scratch mentally.
| fritzo wrote:
| As an open source maintainer on the brink of tech debt
| bankruptcy, I feel like AI is a savior, helping me keep up with
| rapid changes to dependencies, build systems, release
| methodology, and idioms.
| aerhardt wrote:
| But what about producing actual code?
| fritzo wrote:
| Producing code is overrated. There's lots of old code whose
| lifetime we can extend.
| fhd2 wrote:
| Very, very much this.
| resource_waste wrote:
| I find it useful for simple algorithms and error solving.
| candiddevmike wrote:
| If you stewarded that much tech debt in the first place, how
| can you be sure LLM will help prevent it going forward? In my
| experience, LLMs add more tech debt due to lacking cohesion
| with it's edits.
| IshKebab wrote:
| I wonder if the discrepancy is that it felt like it was taking
| less time because they were having to do less thinking which
| feels like it is easier and hence faster.
|
| Even so... I still would be really surprised if there wasn't some
| systematic error here skewing the results, like the developers
| deliberately picked "easy" tasks that they already knew how to
| do, so implementing them themselves was particularly fast.
|
| Seems like they authors had about as good methodology as you can
| get for something like this. It's just really hard to test stuff
| like this. I've seen studies proving that code comments don't
| matter for example... are you going to stop writing comments? No.
| narush wrote:
| > which feels like it is easier and hence faster.
|
| We explore this factor in section (C.2.5) - "Trading speed for
| ease" - in the paper [1]. It's labeled as a factor with an
| unclear effect, some developers seem to think so, and others
| don't!
|
| > like the developers deliberately picked "easy" tasks that
| they already knew how to do
|
| We explore this factor in (C.2.2) - "Unrepresentative task
| distribution." I think the effect here is unclear; these are
| certainly real tasks, but they are sampled from the smaller end
| of tasks developers would work on. I think the relative effect
| on AI vs. human performance is not super clear...
|
| [1]
| https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf
| IshKebab wrote:
| Sounds like you've thought of everything!
| tcdent wrote:
| This study neglects to incorporate the fact that I have forgotten
| how to write code.
| resource_waste wrote:
| I'm curious what space people are working in where AI does
| their job entirely.
|
| I can use it for parts of code, algorithms, error solving, and
| maybe sometimes a 'first draft'.
|
| But there is no way I could finish an entire piece of software
| with AI only.
| asdff wrote:
| Not a lot of people are empowered to create an entire piece
| of software. Most are probably in the trenches squashing
| tickets.
| tcdent wrote:
| I do create entire pieces of software, and while my
| workflow is always evolving, it goes something like this:
|
| Define schemas, interfaces, and perhaps some base classes
| that define the attributes I'm thinking about.
|
| Research libraries that support my cause, and include them.
|
| Reference patterns I have established in other parts of the
| codebase; internal tooling for database, HTTP services,
| etc.
|
| Instruct the agent to come up with a plan for a first pass
| at execution in markdown format. Iterate on this plan;
| "what about X?"
|
| Splat a bunch of code down that supports the structure I'm
| looking for. Iterate. Cleanup. Iterate. Implement unit
| tests and get them to pass.
|
| Go back through everything manually and adjust it to suit
| my personal style, while at the same time fully
| understanding what's being done and why.
|
| I use STT a lot to have conversations with the agent as we
| go, and very rarely allow it to make sequential edits
| without reviewing first; this is a great opportunity to go
| back and forth and refine what's being written.
| asdff wrote:
| You are going well above and beyond what a lot of people
| do to be fair. There are people in senior roles who are
| just futzing with json files.
| joks wrote:
| I think the question still stands.
| narush wrote:
| Honestly, this is a fair point -- and speaks the difficulty of
| figuring out the right baseline to measure against here!
|
| If we studied folks with _no_ AI experience, then we might
| underestimate speedup, as these folks are learning tools (see a
| discussion of learning effects in section (C.2.7) - Below-
| average use of AI tools - in the paper). If we studied folks
| with _only_ AI experience, then we might overestimate speedup,
| as perhaps these folks can't really program without AI at all.
|
| In some sense, these are just two separate and interesting
| questions - I'm excited for future work to really dig in on
| both!
| NewsaHackO wrote:
| So they paid developers 300 x 246 = about 73K just for developer
| recruitment for the study, which is not in any academic journal,
| or has no peer reviews? The underlying paper _looks_ quite
| polished and not overtly AI generated so I don 't want to say it
| entirely made up, but how were they even able to get funding for
| this?
| iLoveOncall wrote:
| https://metr.org/about Seems like they get paid by AI
| companies, and they also get government funding.
| narush wrote:
| Our largest funding was through The Audacious Project -- you
| can see an announcement here:
| https://metr.org/blog/2024-10-09-new-support-through-the-aud...
|
| Per our website, "To date, April 2025, we have not accepted
| compensation from AI companies for the evaluations we have
| conducted." You can check out the footnote on this page:
| https://metr.org/donate
| iLoveOncall wrote:
| This is really disingenuous when you also say that OpenAI and
| Anthropic have provided you with access and compute credits
| (on https://metr.org/about).
|
| Not all payment is cash. Compute credits is still by all
| means compensation.
| gtsop wrote:
| Are you willing to be compensated with compute credits for
| your job?
|
| Such companies spit out "credits" all over the place in
| order to gain traction and enstablish themselves. I
| remember when cloud providers gave vps credits to startups
| like they were peanuts. To me, it really means absolutelly
| nothing.
| bawolff wrote:
| I wouldn't do my job for $10, but if somehow someone did
| pay me $10 to do something, i wouldn't claim i wasn't
| compensated.
|
| In-kind compensation is still compensation.
| iLoveOncall wrote:
| > Are you willing to be compensated with compute credits
| for your job?
|
| Well, yes? I use compute for some personal projects so I
| would be absolutely fine if a part of my compensation was
| in compute credits.
|
| As a company, even more so.
| dolebirchwood wrote:
| Is it "really" disingenuous, or is it just a
| misinterpretation of what it means to be "compensated for"?
| Seems more like quibbling to me.
| iLoveOncall wrote:
| I was actually being kind by saying it's disingenuous. I
| think it's an outright lie.
| golly_ned wrote:
| Those are compute credits that are directly spent on the
| experiment itself. It's no more "compensation" than a
| chemistry researcher being "compensated" with test tubes.
| iLoveOncall wrote:
| > Those are compute credits that are directly spent on
| the experiment itself.
|
| You're extrapolating, it's not saying this anywhere.
|
| > It's no more "compensation" than a chemistry researcher
| being "compensated" with test tubes.
|
| Yes, that's compensation too. Thanks for contributing
| another example. Here's another one: it's no more
| compensation than a software engineer being compensated
| with a new computer.
|
| Actually the situation here is way worse than your
| example. Unless the chemistry researcher is commissioned
| by Big Test Tube Corp. to conduct research on the outcome
| of using their test tubes, there's no conflict of
| interest here. But there is an obvious conflict of
| interest on AI research being financed by credits given
| by AI companies to use their own AI tools.
| bee_rider wrote:
| Companies produce whitepapers all the time, right? They are
| typically some combination of technical report, policy
| suggestion, and advertisement for the organization.
| fabianhjr wrote:
| Most of the world provides funding for research, the US used to
| provide funding but now that has been mostly gutted.
| resource_waste wrote:
| >which is not in any academic journal, or has no peer reviews?
|
| As a philosopher who is into epistemology and ontology, I find
| this to be as abhorrent as religion.
|
| 'Science' doesn't matter who publishes it. Science needs to be
| replicated.
|
| The psychology replication crisis is a prime example of why
| peer reviews and publishing in a journal matters 0.
| bee_rider wrote:
| > The psychology replication crisis is a prime example of why
| peer reviews and publishing in a journal matters 0.
|
| Specifically, it works as an example of a specific case where
| peer review doesn't help as much. Peer review checks your
| arguments, not your data collection process (which the
| reviewer can't audit for obvious reasons). It works fine in
| other scenarios.
|
| Peer review is unrelated to replication problems, except to
| the extent to which confused people expect peer review to fix
| totally unrelated replication problems.
| raincole wrote:
| Peer reviews are very important to filter out obviously low
| effort stuff.
|
| ...Or should I say "were" very important? With the help of
| today's GenAI every low effort stuff can look high effort
| without much extra effort.
| 30minAdayHN wrote:
| This study focused on experienced OSS maintainers. Here is my
| personal experience, but a very different persona (or opposite to
| the one in the study). I always wanted to contribute to OSS but
| never had time to. Finally was able to do that, thanks to AI.
| Last month, I was able to contribute to 4 different repositories
| which I would never have dreamed of doing it. I was using an
| async coding agent I built[1], to generate PRs given a GitHub
| issue. Some PRs took a lot of back and forth. And some PRs were
| accepted as is. Without AI, there is no way I would have
| contributed to those repositories.
|
| One thing that did work in my favor is that, I was clearly
| creating a failing repro test case, and adding before and after
| along with PR. That helped getting the PR landed.
|
| There are also a few PRs that never got accepted because the
| repro is not as strong or clear.
|
| [1] https://workback.ai
| MYEUHD wrote:
| This does not mention the open-source developer time wasted while
| reviewing vibe coded PRs
| narush wrote:
| Yeah, I'll note that this study does _not_ capture the entire
| OS dev workflow -- you're totally right that reviewing PRs is a
| big portion of the time that many maintainers spend on their
| projects (and thanks to them for doing this [often hard] work).
| In the paper [1], we explore this factor in more detail -- see
| section (C.2.2) - Unrepresentative task distribution.
|
| There's some existing lit about increased contributions to OS
| repositories after the introduction of AI -- I've also
| personally heard a fear anecdotes about an increase in the
| number of low-quality PRs from first time contributors,
| seemingly as a result of AI making it easier to get started --
| ofc, the tradeoff is that making it easier to get started has
| pros to it too!
|
| [1]
| https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf
| castratikron wrote:
| I really like those graphics, does anyone know the tool was used
| to create them?
| narush wrote:
| The graphs are all matplotlib. The methodology figure is built
| in Figma! (Source: I'm a paper author :)).
| narush wrote:
| Hey HN, study author here. I'm a long-time HN user -- and I'll be
| in the comments today to answer questions/comments when possible!
|
| If you're short on time, I'd recommend just reading the linked
| blogpost or the announcement thread here [1], rather than the
| full paper.
|
| [1] https://x.com/METR_Evals/status/1943360399220388093
| jsnider3 wrote:
| It's good to know that Claude 3.7 isn't enough to build Skynet!
| causal wrote:
| Hey I just wanted to say this is one of the better studies I've
| seen - not clickbaity, very forthright about what is being
| claimed, and presented in such an easy-to-digest format. Thanks
| so much for doing this.
| narush wrote:
| Thanks for the kind words!
| igorkraw wrote:
| Could you either release the dataset (raw but anonymized) for
| independent statistical evaluation or at least add the absolute
| times of each dev per task to the paper? I'm curious what the
| absolute times of each dev with/without AI was and whether the
| one guy with lots of Cursor experience was actually faster than
| the rest of just a slow typer getting a big boost out of llms
|
| Also, cool work, very happy to see actually good evaluations
| instead of just vibes or observational stuies that don't
| account for the Hawthorne effect
| narush wrote:
| Yep, sorry, meant to post this somewhere but forgot in final-
| paper-polishing-sprint yesterday!
|
| We'll be releasing anonymized data and some basic analysis
| code to replicate core results within the next few weeks
| (probably next, depending).
|
| Our GitHub is here (http://github.com/METR/) -- or you can
| follow us (https://x.com/metr_evals) and we'll probably tweet
| about it.
| igorkraw wrote:
| Cool, thanks a lot. Btw, I have a very tiny tiny (50 to 100
| audience ) podcast where we try to give context to what we
| call the "muck" of AI discourse (trying to ground claims
| into both what we would call objectively observable
| facts/evidence, and then _separately_ giving out own biased
| takes), if you would be interested to come on it and chat
| => contact email in my profile.
| ryanar wrote:
| podcast link?
| antonvs wrote:
| Was any attention paid to whether the tickets being implemented
| with AI assistance were an appropriate use case for AI?
|
| If the instruction is just "implement this ticket with AI",
| then that's very realistic in that it's how management often
| tries to operate, but it's also likely to be quite suboptimal.
| There are ways to use AI that help a lot, and other ways that
| hurt more than it helps.
|
| If your developers had sufficient experience with AI to tell
| the difference, then they might have compensated for that, but
| reading the paper I didn't see any indication of that.
| narush wrote:
| The instructions given to developers was not just "implement
| with AI" - but rather that they could use AI if they deemed
| it would be helpful, but indeed did _not need to use AI if
| they didn't think it would be helpful_. In about ~16% of
| labeled screen recordings where developers were allowed to
| use AI, they choose to use no AI at all!
|
| That being said, we can't rule out that the experiment drove
| them to use more AI than they would have outside of the
| experiment (in a way that made them less productive). You can
| see more in section "Experimentally driven overuse of AI
| (C.2.1)" [1]
|
| [1]
| https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf
| isoprophlex wrote:
| I'll just say that the methodology of the paper and the
| professionalism with which you are answering us here is top
| notch. Great work.
| narush wrote:
| Thank you!
| JackC wrote:
| (I read the post but not paper.)
|
| Did you measure subjective fatigue as one way to explain the
| misperception that AI was faster? As a developer-turned-manager
| I like AI because it's easier when my brain is tired.
| narush wrote:
| We attempted to! We explore this more in the section Trading
| speed for ease (C.2.5) in the paper (https://metr.org/Early_2
| 025_AI_Experienced_OS_Devs_Study.pdf).
|
| TLDR: mixed evidence that developers make it less effortful,
| from quantitative and qualitative reports. Unclear effect.
| incomingpain wrote:
| Essentially an advertisement against Cursor Pro and/or Claude
| Sonnet 3.5/3.7
|
| I think personally when i tried tools like Void IDE, I was
| fighting with Void too much. It is beta software, it is buggy,
| but also the big one... learning curve of the tool.
|
| I havent had the chance to try cursor but i imagine its going to
| have a learning curve as a new tool.
|
| So perhaps there is a slowdown at first expected; but later after
| you get your context and prompting down pat. Asking specifically
| for what you want. Then you get your speed up.
| achenet wrote:
| I find agents useful for showing me how to do something I don't
| already know how to do, but I could see how for tasks I'm an
| expert on, I'd be faster without an extra thing to have to worry
| about (the AI).
| dboreham wrote:
| Any time you see the word "measuring" in the context of software
| development, you know what follows will be nonsense and probably
| in service of someone's business model.
| simonw wrote:
| Here's the full paper, which has a lot of details missing from
| the summary linked above:
| https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf
|
| My personal theory is that getting a significant productivity
| boost from LLM assistance and AI tools has a much steeper
| learning curve than most people expect.
|
| This study had 16 participants, with a mix of previous exposure
| to AI tools - 56% of them had never used Cursor before, and the
| study was mainly about Cursor.
|
| They then had those 16 participants work on issues (about 15
| each), where each issue was randomly assigned a "you can use AI"
| v.s. "you can't use AI" rule.
|
| So each developer worked on a mix of AI-tasks and no-AI-tasks
| during the study.
|
| A quarter of the participants saw increased performance, 3/4 saw
| reduced performance.
|
| One of the top performers for AI was also someone with the most
| previous Cursor experience. The paper acknowledges that here:
|
| > However, we see positive speedup for the one developer who has
| more than 50 hours of Cursor experience, so it's plausible that
| there is a high skill ceiling for using Cursor, such that
| developers with significant experience see positive speedup.
|
| My intuition here is that this study mainly demonstrated that the
| learning curve on AI-assisted development is high enough that
| asking developers to bake it into their existing workflows
| reduces their performance while they climb that learing curve.
| mjr00 wrote:
| > My intuition here is that this study mainly demonstrated that
| the learning curve on AI-assisted development is high enough
| that asking developers to bake it into their existing workflows
| reduces their performance while they climb that learing curve.
|
| Definitely. Effective LLM usage is not as straightforward as
| people believe. Two big things I see a lot of developers do
| when they share chats:
|
| 1. Talk to the LLM like a human. Remember when internet search
| first came out, and people were literally "Asking Jeeves" in
| full natural language? Eventually people learned that you don't
| need to type, "What is the current weather in San Francisco?"
| because "san francisco weather" gave you the same, or better,
| results. Now we've come full circle and people talk to LLMs
| like humans again; not out of any advanced prompt engineering,
| but just because it's so anthropomorphized it feels natural.
| But I can assure you that "pandas count unique values column
| 'Foo'" is just as effective an LLM prompt as "Using pandas, how
| do I get the count of unique values in the column named 'Foo'?"
| The LLM is also not insulted by you talking to it like this.
|
| 2. Don't know when to stop using the LLM. Rather than let the
| LLM take you 80% of the way there and then handle the remaining
| 20% "manually", they'll keep trying to prompt to get the LLM to
| generate what they want. Sometimes this works, but often it's
| just a waste of time and it's far more efficient to just take
| the LLM output and adjust it manually.
|
| Much like so-called Google-fu, LLM usage is a skill and people
| who don't know what they're doing are going to get substandard
| results.
| Jaxan wrote:
| > Effective LLM usage is not as straightforward as people
| believe
|
| It is not as straightforward as people are told to believe!
| sleepybrett wrote:
| ^ this, so much this. The amount of bullshit that gets
| shoveled into hacker news threads about the supposed
| capabilities of these models is epic.
| gedy wrote:
| > Talk to the LLM like a human
|
| Maybe the LLM doesn't strictly need it, but typing out does
| bring some clarity for the asker. I've found it helps a lot
| to catch myself - what am I even wanting from this?
| frotaur wrote:
| I'm not sure about your example about talking to LLMs. There
| is good reason to think that speaking to it like a human
| might produce better results, as that's what most of the
| training data is composed of.
|
| I don't have any studies, but it eems to me reasonable to
| assume.
|
| (Unlike google, where presumably it actually used keywords
| anyway)
| mjr00 wrote:
| > I'm not sure about your example about talking to LLMs.
| There is good reason to think that speaking to it like a
| human might produce better results, as that's what most of
| the training data is composed of.
|
| In practice I have not had any issues getting information
| out of an LLM when speaking to them like a computer, rather
| than a human. At least not for factual or code-related
| information; I'm not sure how it impacts responses for e.g.
| creative writing, but that's not what I'm using them for
| anyway.
| lukan wrote:
| "But I can assure you that "pandas count unique values column
| 'Foo'" is just as effective an LLM prompt as "Using pandas,
| how do I get the count of unique values in the column named
| 'Foo'?""
|
| How can you be so sure? Did you compare in a systematic way
| or read papers by people who did it?
|
| Now I surely get results giving the llm only snippets and
| keywords, but anything complex, I do notice differences the
| way I articulate. Not claiming there _is_ a significant
| difference, but it seems to me this way.
| mjr00 wrote:
| > How can you be so sure? Did you compare in a systematic
| way or read papers by people who did it?
|
| No, but I didn't need to read scientific papers to figure
| how to use Google effectively, either. I'm just using a
| results-based analysis after a lot of LLM usage.
| lukan wrote:
| Well, I did needed some tutorials to use google
| efficently in the old days when + meant something
| specific.
| skybrian wrote:
| Other people don't have benefit of your experience,
| though, so there's a communications gap here: this boils
| down to "trust me, bro."
|
| How do we get beyond that?
| mjr00 wrote:
| This is the gap between capability (what can this tool
| do?) versus workflow (what is the best way to use this
| tool to accomplish a goal?). Capabilities can be strictly
| evaluated, but workflow is subjective. Saying "Google has
| the site: and before: operators" is capability, saying
| "you should use site:reddit.com before:2020 in Google
| queries" is workflow.
|
| LLMs have made the distinction ambiguous because their
| capabilities are so poorly understood. When I say "you
| should talk to an LLM like it's a computer", that's a
| workflow statement; it's a more efficient way to
| accomplish the same goal. You can try it for yourself and
| see if you agree. I personally liken people who talk to
| LLMs in full, proper English, capitalization and all, to
| boomers who still type in full sentences when running a
| Google query. Is there anything _strictly_ wrong with it?
| Not really. Do I believe it 's a more efficient workflow
| to just type the keywords that will give you the same
| result? Yes.
|
| Workflow efficiencies can't really be scientifically
| evaluated. Some people still prefer to have desktop icons
| for programs on Windows; my workflow is pressing winkey
| -> typing the first few characters of the program ->
| enter. Is one of these methods scientifically more
| correct? Not really.
|
| So, yeah -- eventually you'll either find your own
| workflow or copy the workflow of someone you see who is
| using LLMs effectively. It really _is_ "just trust me,
| bro."
| skybrian wrote:
| Maybe it would help if more people wrote tutorials? It
| doesn't seem reasonable for people who don't have a buddy
| to learn from to have to figure it out on their own.
| bit1993 wrote:
| > Rather than let the LLM take you 80% of the way there and
| then handle the remaining 20% "manually"
|
| IMO 80% is way too much, LLMs are probably good for things
| that are not your domain knowledge and you can efford to not
| be 100% correct, like rendering the Mandelbrot set, simple
| functions like that.
|
| LLMs are not deterministic sometimes they produce correct
| code and other times they produce wrong code. This means one
| has to audit LLM generated code and auditing code takes more
| effort than writing it, especially if you are not the
| original author of the code being audited.
|
| Code has to be 100% deterministic. As programmers we write
| code, detailed instructions for the computer (CPU), we have
| developed allot of tools such as Unit Tests to make sure the
| computer does exactly what we wrote.
|
| A codebase has allot of context that you gain by writing the
| code, some things just look wrong and you know exactly why
| because you wrote the code, there is also allot of context
| that you should keep in your head as you write the code,
| context that you miss from simply prompting an LLM.
| narush wrote:
| Hey Simon -- thanks for the detailed read of the paper - I'm a
| big fan of your OS projects!
|
| Noting a few important points here:
|
| 1. Some prior studies that find speedup do so with developers
| that have similar (or less!) experience with the tools they
| use. In other words, the "steep learning curve" theory doesn't
| differentially explain our results vs. other results.
|
| 2. Prior to the study, 90+% of developers had reasonable
| experience prompting LLMs. Before we found slowdown, this was
| the only concern that most external reviewers had about
| experience was about prompting -- as prompting was considered
| the primary skill. In general, the standard wisdom was/is
| Cursor is very easy to pick up if you're used to VSCode, which
| most developers used prior to the study.
|
| 3. Imagine all these developers had a TON of AI experience. One
| thing this might do is make them worse programmers when not
| using AI (relatable, at least for me), which in turn would
| raise the speedup we find (but not because AI was better, but
| just because with AI is much worse). In other words, we're
| sorta in between a rock and a hard place here -- it's just
| plain hard to figure out what the right baseline should be!
|
| 4. We shared information on developer prior experience with
| expert forecasters. Even with this information, forecasters
| were still dramatically over-optimistic about speedup.
|
| 5. As you say, it's totally possible that there is a long-tail
| of skills to using these tools -- things you only pick up and
| realize after hundreds of hours of usage. Our study doesn't
| really speak to this. I'd be excited for future literature to
| explore this more.
|
| In general, these results being surprising makes it easy to
| read the paper, find one factor that resonates, and conclude
| "ah, this one factor probably just explains slowdown." My
| guess: there is no one factor -- there's a bunch of factors
| that contribute to this result -- at least 5 seem likely, and
| at least 9 we can't rule out (see the factors table on page
| 11).
|
| I'll also note that one really important takeaway -- that
| developer self-reports after using AI are overoptimistic to the
| point of being on the wrong side of speedup/slowdown -- isn't a
| function of which tool they use. The need for robust, on-the-
| ground measurements to accurately judge productivity gains is a
| key takeaway here for me!
|
| (You can see a lot more detail in section C.2.7 of the paper
| ("Below-average use of AI tools") -- where we explore the
| points here in more detail.)
| simonw wrote:
| Thanks for the detailed reply! I need to spend a bunch more
| time with this I think - above was initial hunches from
| skimming the paper.
| narush wrote:
| Sounds great. Looking forward to hearing more detailed
| thoughts -- my emails in the paper :)
| paulmist wrote:
| Were participants given time to customize their Cursor
| settings? In my experience tool/convention mismatch kills
| Cursor's productivity - once it gets going with a wrong
| library or doesn't use project's functions I will almost
| always reject code and re-prompt. But, especially for large
| projects, having a well-crafted repo prompt mitigates most of
| these issues.
| jdp23 wrote:
| Really interesting paper, and thanks for the followon points.
|
| The over-optimism is indeed a really important takeaway, and
| agreed that it's not tool-dependent.
| gojomo wrote:
| Did each developer do a large enough mix of AI/non-AI tasks,
| in varying orders, that you have any hints in your data
| whether the "AI penalty" grew or shrunk over time?
| narush wrote:
| You can see this analysis in the factor analysis of "Below-
| average use of AI tools" (C.2.7) in the paper [1], which we
| mark as an unclear effect.
|
| TLDR: over the first 8 issues, developers do not appear to
| get majorly less slowed down.
|
| [1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Stud
| y.pdf
| gojomo wrote:
| Thanks, that's great!
|
| But: if all developers did 136 AI-assisted issues, why
| only analyze excluding the 1st 8, rather than, say, the
| first 68 (half)?
| narush wrote:
| Sorry, this is the first 8 issues per-developer!
| amirhirsch wrote:
| Figure 6 which breaks-down the time spent doing different
| tasks is very informative -- it suggest: 15% less active
| coding 5% less testing, 8% less research and reading
|
| 4% more idle time 20% more AI interaction time
|
| The 28% less coding/testing/research is why developers
| reported 20% less work. You might be spending 20% more time
| overall "working" while you are really idle 5% more time and
| feel like you've worked less because you were drinking coffee
| and eating a sandwich between waiting for the AI and reading
| AI output.
|
| I think the AI skill-boost comes from having work flows that
| let you shave half that git-ops time, cut an extra 5% off
| coding, but cut the idle/waiting and do more prompting of
| parallel agents and a bit more testing then you really are a
| 2x dev.
| amirhirsch wrote:
| i just realized the figure is showing the time breakdown as
| a percentage of total time, it would be more useful to show
| absolute time (hours) for those side-by-side comparisons
| since the implied hours would boost the AI bars height by
| 18%
| narush wrote:
| There's additional breakdown per-minute in the appendix
| -- see appendix E.4!
| smokel wrote:
| I notice that some people have become more productive thanks to
| AI tools, while others are not.
|
| My working hypothesis is that people who are fast at scanning
| lots of text (or code for that matter) have a serious
| advantage. Being able to dismiss unhelpful suggestions quickly
| and then iterating to get to helpful assistance is key.
|
| Being fast at scanning code correlates with seniority, but
| there are also senior developers who can write at a solid pace,
| but prefer to take their time to read and understand code
| thoroughly. I wouldn't assume that this kind of developer gains
| little profit from typical AI coding assistance. There are also
| juniors who can quickly read text, and possibly these have an
| advantage.
|
| A similar effect has been around with being able to quickly
| "Google" something. I wouldn't be surprised if this is the same
| trait at work.
| luxpir wrote:
| Just to thank you for that point. I think it's likely more
| true than most of us realise. That and maybe the ability to
| mentally scaffold or outline a system or solution ahead of
| time.
| Filligree wrote:
| An interesting point. I wonder how much my decades-old habit
| of watching subtitled anime helps there--it's definitely made
| me dramatically faster at scanning text.
| onlyrealcuzzo wrote:
| How were "experienced engineers" defined?
|
| I've found AI to be quite helpful in pointing me in the right
| direction when navigating an entirely new code-base.
|
| When it's code I already know like the back of my hand, it's
| not super helpful, other than maybe doing a few automated tasks
| like refactoring, where there have already been some good tools
| for a while.
| smj-edison wrote:
| > To directly measure the real-world impact of AI tools on
| software development, we recruited 16 experienced developers
| from large open-source repositories (averaging 22k+ stars and
| 1M+ lines of code) that they've contributed to for multiple
| years.
| furyofantares wrote:
| > My personal theory is that getting a significant productivity
| boost from LLM assistance and AI tools has a much steeper
| learning curve than most people expect.
|
| I totally agree with this. Although also, you can end up in a
| bad spot even after you've gotten pretty good at getting the AI
| tools to give you good output, because you fail to learn the
| code you're producing well.
|
| A developer gets better at the code they're working on over
| time. An LLM gets worse.
|
| You can use an LLM to write a lot of code fast, but if you
| don't pay enough attention, you aren't getting any better at
| the code while the LLM is getting worse. This is why you can
| get like two months of greenfield work done in a weekend but
| then hit a brick wall - you didn't learn anything about the
| code that was written, and while the LLM started out producing
| reasonable code, it got worse until you have a ball of mud that
| neither the LLM nor you can effectively work on.
|
| So a really difficult skill in my mind is continually avoiding
| temptation to vibe. Take a whole week to do a month's worth of
| features, not a weekend to do two month's worth, and put in the
| effort to guide the LLM to keep producing clean code, and to be
| sure you know the code. You do want to know the code and you
| can't do that without putting in work yourself.
| danieldk wrote:
| _So a really difficult skill in my mind is continually
| avoiding temptation to vibe._
|
| I agree. I have found that I can use agents most effectively
| by letting it write code in small steps. After each step I do
| review of the changes and polish it up (either by doing the
| fixups myself or prompting). I have found that this helps me
| understanding the code, but also avoids that the model gets
| in a bad solution space or produces unmaintainable code.
|
| I also think this kind of close-loop is necessary. Like
| yesterday I let an LLM write a relatively complex data
| structure. It got the implementation nearly correct, but was
| stuck, unable to find an off-by-one comparison. In this case
| it was easy to catch because I let it write property-based
| tests (which I had to fix up to work properly), but it's easy
| for things to slip through the cracks if you don't review
| carefully.
|
| (This is all using Cursor + Claude 4.)
| bluefirebrand wrote:
| > Take a whole week to do a month's worth of features
|
| Everything else in your post is so reasonable and then you
| still somehow ended up suggesting that LLMs should be
| quadrupling our output
| furyofantares wrote:
| I'm specifically talking about greenfield work. I do a lot
| of game prototypes, it definitely does that at the very
| beginning.
| bluefirebrand wrote:
| Greenfield is still such a tiny percentage of all
| software work going on in the world though :/
| furyofantares wrote:
| I agree, that's fair. I think a lot of people are playing
| around with AI on side projects and making some bad
| extrapolations from their initial experiences.
|
| It'll also apply to isolated-enough features, which is
| still a small amount of someone's work (not often
| something you'd work on for a full month straight), but
| more people will have experience with this.
| lurking_swe wrote:
| greenfield development is also the "easiest" and most fun
| part of software development. As the famous saying goes,
| the last 10% of the project takes 90% of the time lol.
|
| I've also noticed that, generally, nobody likes
| maintaining old systems.
|
| so where does this leave us as software engineers? Should
| I be excited that it's easy to spin up a bunch of code
| that I don't deeply understand at the beginning of my
| project, while removing the fun parts of the project?
|
| I'm still grappling with what this means for our industry
| in 5-10 years...
| Filligree wrote:
| It's a tiny percentage of software work because the
| programming is slow, and setting up new projects is even
| slower.
|
| It's been a majority of my projects for the past two
| months. Not because work changed, but because I've
| written a dozen tiny, personalised tools that I wouldn't
| have written at all if I didn't have Claude to do it.
|
| Most of them were completed in less than an hour, to give
| you an idea of the size. Though it would have easily been
| a day on my own.
| Dzugaru wrote:
| This is really interesting, because I do gamejams from
| time to time - and I try every time to make it work, but
| I'm still quite a lot faster doing stuff myself.
|
| This is visible under extreme time pressure of producing
| a working game in 72 hours (our team scores consistenly
| top 100 in Ludum Dare which is a somewhat high standard).
|
| We use a popular Unity game engine all LLMs have wealth
| of experience (as in game development in general), but
| the output is 80% so strangely "almost correct but not
| usable" that I cannot take the luxury of letting it
| figure it out, and use it as fancy autocomplete. And I
| also still check docs and Stackoverflow-style forums a
| lot, because of stuff it plainly mades up.
|
| One of the reasons is maybe our game mechanics often is a
| bit off the beaten road, though the last game we made was
| literally a platformer with rope physics (LLM could not
| produce a good idea how to make stable and simple rope
| physics under our constraints codeable in 3 hours time).
| WD-42 wrote:
| I feel the same way. I use it for super small chunks, still
| understand everything it outputs, and often manually
| copy/paste or straight up write myself. I don't know if I'm
| actually faster before, but it feels more comfy than alt-
| tabbing to stack overflow, which is what I feel like it's
| mostly replaced.
|
| Poor stack overflow, it looks like they are the ones really
| hurting from all this.
| jona777than wrote:
| > but then hit a brick wall
|
| This is my intuition as well. I had a teammate use a pretty
| good analogy today. He likened vibe coding to vacuuming up a
| string in four tries when it only takes one try to reach down
| and pick it up. I thought that aligned well with my
| experience with LLM assisted coding. We have to vacuum the
| floor while exercising the "difficult skill [of] continually
| avoiding temptation to vibe"
| Uehreka wrote:
| > My personal theory is that getting a significant productivity
| boost from LLM assistance and AI tools has a much steeper
| learning curve than most people expect.
|
| You hit the nail on the head here.
|
| I feel like I've seen a lot of people trying to make strong
| arguments that AI coding assistants aren't useful. As someone
| who uses and enjoys AI coding assistants, I don't find this
| research angle to be... uh... very grounded in reality?
|
| Like, if you're using these things, the fact that they are
| useful is pretty irrefutable. If one thinks there's some sort
| of "productivity mirage" going on here, well OK, but to
| demonstrate that it might be better to start by acknowledging
| areas where they _are_ useful, and show that your method
| explains the reality we're seeing before using that method to
| show areas where we might be fooling ourselves.
|
| I can maybe buy that AI might not be useful for certain kinds
| of tasks or contexts. But I keep pushing their boundaries and
| they keep surprising me with how capable they are, so it feels
| like it'll be difficult to prove otherwise in a durable
| fashion.
| TechDebtDevin wrote:
| Still odd to me that the only vibe coded software that gets
| aquired are by companies selling tools or want to promote
| vibe coding.
| furyofantares wrote:
| That's not odd. These things are incredibly useful and vibe
| coding mostly sucks.
| Uehreka wrote:
| Pardon my caps, but WHO CARES about acquisitions?!
|
| You've been given a dubiously capable genie that can write
| code without you having to do it! If this thing can build
| first drafts of those side projects you always think about
| and never get around to, that in and of itself is useful!
| If it can do the yak-shaving required to set up those e2e
| tests you know you should have but never have time for it
| is useful!
|
| Have it try out all the dumb ideas you have that might be
| cool but don't feel worth your time to boilerplate out!
|
| I like to think we're a bunch of creative people here! Stop
| thinking about how it can make you money and use it for
| fun!
| fwip wrote:
| Unfortunately, HN is YC-backed, and attracts these types
| by design.
| Uehreka wrote:
| I mean sure, but HN/YC's founder was always going on
| about the kinship between "Hackers and Painters" (or at
| least he used to). It hasn't always been like this, and
| definitely doesn't have to be. We can and should aspire
| to better.
| furyofantares wrote:
| I think the thing is there IS a learning curve, AND there is
| a productivity mirage, AND they are immensely useful, AND it
| is context dependent. All of this leads to a lot of confusion
| when communicating with people who are having a different
| experience.
| GoatInGrey wrote:
| It always comes back to nuance!
| Uehreka wrote:
| Right, my problem is that while some people may be correct
| about the productivity mirage, many of those people are
| getting out over their skis and making bigger claims than
| they can reasonably prove. I'm arguing that they should be
| more nuanced and tactical.
| rcruzeiro wrote:
| Exactly. The people who say that these assistants are useless
| or "not good enough" are basically burying their heads in the
| sand. The people who claim that there is no mirage are
| burying their head in the sand as well...
| grey-area wrote:
| Well, there are two possible interpretations here of 75% of
| participants (all of whom had some experience using LLMs) being
| slower using generative AI:
|
| LLMs have a v. steep and long learning curve as you posit
| (though note the points from the paper authors in the other
| reply).
|
| Current LLMs just are not as good as they are sold to be as a
| programming assistant and people consistently predict and self-
| report in the wrong direction on how useful they are.
| Terr_ wrote:
| > people consistently predict and self-report in the wrong
| direction
|
| I recall an adage about work-estimation: As chunks get too
| big, people unconsciously substitute "how possible does the
| final outcome feel" with "how long will the work take to do."
|
| People asked "how long did it take" could be substituting
| something else, such as "how alone did I feel while working
| on it."
| sandinmyjoints wrote:
| That's an interesting adage. Any ideas of its source?
| Dilettante_ wrote:
| It might have been in Kahneman's "Thinking, Fast and
| Slow"
| Terr_ wrote:
| I'm not sure, but something involving Kahneman _et al._
| seems very plausible: The relevant term is probably
| "Attribute Substitution."
|
| https://en.wikipedia.org/wiki/Attribute_substitution
| steveklabnik wrote:
| > Current LLMs
|
| One thing that happened here is that they aren't using
| current LLMs:
|
| > Most issues were completed in February and March 2025,
| before models like Claude 4 Opus or Gemini 2.5 Pro were
| released.
|
| That doesn't mean this study is bad! In fact, I'd be very
| curious to see it done again, but with newer models, to see
| if that has an impact.
| blibble wrote:
| > One thing that happened here is that they aren't using
| current LLMs
|
| I've been hearing this for 2 years now
|
| the previous model retroactively becomes total dogshit the
| moment a new one is released
|
| convenient, isn't it?
| simonw wrote:
| The previous model retroactively becomes not as good as
| the best available models. I don't think that's a huge
| surprise.
| cwillu wrote:
| The surprise is the implication that the crossover
| between net-negative and net-positive impact happened to
| be in the last 4 months, in light of the initial release
| 2 years ago and sufficient public attention for a study
| to be funded and completed.
|
| Yes, it might make a difference, but it _is_ a little
| tiresome that there 's _always_ a "this is based on a
| model that is x months old!" comment, because it will
| always be true: an academic study does not get funded,
| executed, written up, and published in less time.
| Ntrails wrote:
| Some of it is just that (probably different) people said
| the same damn things 6 months ago.
|
| "No, the 2.8 release is the first good one. It massively
| improves workflows"
|
| Then, 6 months later, the study comes out.
|
| "Ah man, 2.8 was useless, 3.0 really crossed the
| threshold on value add"
|
| At some point, you roll your eyes and assume it is just
| snake oil sales
| Filligree wrote:
| Or you accept that different people have different skill
| levels, workflows and goals, and therefore the AIs reach
| usability at different times.
| steveklabnik wrote:
| There's a lot of confounding factors here. For example,
| you could point to any of these things in the last ~8
| months as being significant changes:
|
| * the release of agentic workflow tools
|
| * the release of MCPs
|
| * the release of new models, Claude 4 and Gemini 2.5 in
| particular
|
| * subagents
|
| * asynchronous agents
|
| All or any of these could have made for a big or small
| impact. For example, I'm big on agentic tools, skeptical
| of MCPs, and don't think we yet understand subagents.
| That's different from those who, for example, think MCPs
| are the future.
|
| > At some point, you roll your eyes and assume it is just
| snake oil sales
|
| No, you have to realize you're talking to a population of
| people, and not necessarily the same person. Opinions are
| going to vary, they're not literally the same person each
| time.
|
| There are surely snake oil salesman, but you can't buy
| anything from me.
| foobarqux wrote:
| That's not the argument being made though, which is that
| it does "work" _now_ and implying that actually it didn
| 't quite work before; except that that is the same thing
| the same people say for every model release, including at
| the time or release of the previous one, which is now
| acknowledged to be seriously flawed; and including the
| future one, at which time the current models will
| similarly be acknowledged to be, not only less performant
| that the future models, but inherently flawed.
|
| Of course it's possible that at some point you get to a
| model that really works, irrespective of the history of
| false claims from the zealots, but it does mean you
| should take their comments with a grain of salt.
| steveklabnik wrote:
| > That's not the argument being made though, which is
| that it does "work" now and implying that actually it
| didn't quite work before
|
| Right.
|
| > except that that is the same thing the same people say
| for every model release,
|
| I did not say that, no.
|
| I am sure you can find someone who is in a Groundhog Day
| about this, but it's just simpler than that: as tools
| improve, more people find them useful than before. You're
| not talking to the same people, you are talking to new
| people each time who now have had their threshold
| crossed.
| blibble wrote:
| > You're not talking to the same people, you are talking
| to new people each time who now have had their threshold
| crossed.
|
| no, it's the same names, again and again
| simonw wrote:
| Got receipts?
|
| That sounds like a claim you could back up with a little
| bit of time spent using Hacker News search or similar.
|
| (I might try to get a tool like o3 to run those searches
| for me.)
| blibble wrote:
| try asking it what sealioning is
| pdabbadabba wrote:
| Maybe it's convenient. But isn't it also just a fact that
| some of the models available today are better than the
| ones available five months ago?
| bryanrasmussen wrote:
| sure, but after having spent some time trying to get
| anything useful - programmatically - out of previous
| models and not getting anything once a new one is
| announced how much time should one spend.
|
| Sure you may end up missing out on a good thing and then
| having to come late to the party, but coming early to the
| party too many times and the beer is watered down and the
| food has grubs is apt to make you cynical the next time a
| party announcement comes your way.
| Terr_ wrote:
| Plus it's not even _possible_ to miss the metaphorical
| party: If it gets going, it will be quite obvious long
| before it peaks.
|
| (Unless one believes the most grandiose prophecies of a
| technological-singularity apocalypse, that is.)
| Terr_ wrote:
| That's not the issue. Their complaint is that proponents
| keep revising what ought to be _fixed_ goalposts... Well,
| fixed unless you believe unassisted human developers are
| _also_ getting dramatically better at their jobs every
| year.
|
| Like the boy who cried wolf, it'll _eventually_ be true
| with enough time... But we should stop giving them the
| benefit of the doubt.
|
| _____
|
| Jan 2025: "Ignore last month's models, they aren't good
| enough to show a marked increase in human productivity,
| test with _this_ month 's models and the benefits are
| obvious."
|
| Feb 2025: "Ignore last month's models, they aren't good
| enough to show a marked increase in human productivity,
| test with _this_ month 's models and the benefits are
| obvious."
|
| Mar 2025: "Ignore last month's models, they aren't good
| enough to show a marked increase in human productivity,
| test with _this_ month 's models and the benefits are
| obvious."
|
| Apr 2025: [Ad nauseam, you get the idea]
| pdabbadabba wrote:
| Fair enough. For what it's worth, I've always thought
| that the more reasonable claim is that AI tools make
| poor-average developers more productive, not necessarily
| _expert_ developers.
| steveklabnik wrote:
| Sorry, that's not my take. I didn't think these tools
| were useful _until_ the latest set of models, that is,
| they crossed the threshold of usefulness to me.
|
| Even then though, "technology gets better over time"
| shouldn't be surprising, as it's pretty common.
| mattmanser wrote:
| Do you really see a massive jump?
|
| For context, I've been using AI, a mix of OpenAi +
| Claude, mainly for bashing out quick React stuff. For
| over a year now. Anything else it's generally rubbish and
| slower than working without. Though I still use it to
| rubber duck, so I'm still seeing the level of quality for
| backend.
|
| I'd say they're only marginally better today than they
| were even 2 years ago.
|
| Every time a new model comes out you get a bunch of
| people raving how great the new one is and I honestly
| can't really tell the difference. The only real
| difference is reasoning models actually slowed everything
| down, but now I see its reasoning. It's only useful
| because I often spot it leaving out important stuff from
| the final answer.
| hombre_fatal wrote:
| I see a massive jump every time.
|
| Just two years ago, this failed.
|
| > Me: What language is this: "esto esta escrito en
| ingles"
|
| > LLM: English
|
| Gemini and Opus have solved questions that took me weeks
| to solve myself. And I'll feed some complex code into
| each new iteration and it will catch a race condition I
| missed even with testing and line by line scrutiny.
|
| Consider how many more years of experience you need as a
| software engineer to catch hard race conditions just from
| reading code than someone who couldn't do it after trying
| 100 times. We take it for granted already since we see it
| as "it caught it or it didn't", but these are massive
| jumps in capability.
| steveklabnik wrote:
| Yes. In January I would have told you AI tools are
| bullshit. Today I'm on the $200/month Claude Max plan.
|
| As with anything, your miles may vary: I'm not here to
| tell anyone that thinks they still suck that their
| experience is invalid, but to me it's been a pretty big
| swing.
| Uehreka wrote:
| > In January I would have told you AI tools are bullshit.
| Today I'm on the $200/month Claude Max plan.
|
| Same. For me the turning point was VS Code's Copilot
| Agent mode in April. That changed everything about how I
| work, though it had a lot of drawbacks due to its
| glitches (many of these were fixed within 6 or so weeks).
|
| When Claude Sonnet 4 came out in May, I could immediately
| tell it was a step-function increase in capability. It
| was the first time an AI, faced with ambiguous and
| complicated situations, would be willing to answer a
| question with a definitive and confident "No".
|
| After a few weeks, it became clear that VS Code's
| interface and usage limits were becoming the bottleneck.
| I went to my boss, bullet points in hand, and easily got
| approval for the Claude Max $200 plan. Boom, another
| step-function increase.
|
| We're living in an incredibly exciting time to be a
| skilled developer. I understand the need to stay
| skeptical and measure the real benefits, but I feel like
| a lot of people are getting caught up in the culture war
| aspect and are missing out on something truly wonderful.
| mattmanser wrote:
| Ok, I'll have to try it out then. I've got a side project
| I've 3/4 finished and will let it loose on it.
|
| So are you using Claude Code via the max plan, Cursor, or
| what?
|
| I think I'd definitely hit AI news exhaustion and was
| viewing people raving about this agentic stuff as yet
| more AI fanbois. I'd just continued using the AI separate
| as setting up a new IDE seemed like too much work for the
| fractional gains I'd been seeing.
| steveklabnik wrote:
| I had a bad time with Cursor. I use Claude Code inside of
| VS: Code. You don't necessarily need Max, but you can
| spend a lot of money very quickly on API tokens, so I'd
| recommend to anyone trying, start with the $20/month one,
| no need to spend a ton of money just to try something
| out.
|
| There is a skill gap, like, I think of it like vim: at
| first it slows you down, but then as you learn it, you
| end up speeding up. So you may also find that it doesn't
| really vibe with the way you work, even if I am having a
| good time with it. I know people who are great engineers
| who still don't like this stuff, just like I know ones
| that do too.
| simonw wrote:
| The massive jump in the last six months is that the new
| set of "reasoning" models got really good at reasoning
| about when to call tools, and were accompanied is by a
| flurry of tools-in-loop coding agents - Claude Code,
| OpenAI Codex, Cursor in Agent mode etc.
|
| An LLM that can test the code it is writing and then
| iterate to fix the bugs turns out to be a huge step
| forward from LLMs that just write code without trying to
| then exercise it.
| vidarh wrote:
| I've gone from asking the tools how to do things, and cut
| and pasting the bits (often small) that'd be helpful, via
| using assistants that I'd review every decision of and
| often having to start over, to now often starting an
| assistant with broad permissions and just reviewing the
| diff later, after they've made the changes pass the test
| suite, run a linter and fixed all the issues it brought
| up, and written a draft commit message.
|
| The jump has been massive.
| ipaddr wrote:
| Wait until the next set. You will find you the previous
| ones weren't useful after all.
| steveklabnik wrote:
| This makes no sense to me. I'm well aware that I'm
| getting value today, that's not going to change in the
| future: it's already happened.
|
| Sure they may get _even more_ useful in the future but
| that doesn't change my present.
| jstummbillig wrote:
| Convenient for whom and what...? There is nothing
| tangible to gain from you believing or not believing that
| someone else does (or does not) get a productivity boost
| from AI. This is not a religion and it's not crypto. The
| AI users' net worth is not tied to another ones use of or
| stance on AI (if anything, it's the opposite).
|
| More generally, the phenomenon this is quite simply
| explained and nothing surprising: New things improve,
| quickly. That does not mean that something is good or
| valuable but it's how new tech gets introduced every
| single time, and readily explains changing sentiment.
| card_zero wrote:
| I saw that edit. Indeed you can't predict that rejecting
| a new thing is part of a routine of being wrong. It's
| true that "it's strange and new, therefore I hate it" is
| a very human (and adorable) instinct, but sometimes it's
| reasonable.
| jstummbillig wrote:
| "I saw that edit" lol
| card_zero wrote:
| Sorry, just happened to. Slightly rude of me.
| jstummbillig wrote:
| Ah, you do you. It's just a fairly kindergarten thing to
| point out and not something I was actively trying to
| hide. Whatever it was.
|
| Generally, I do a couple of edits for clarity after
| posting and reading again. Sometimes that involves
| removing something that I feel could have been said
| better. If it does not work, I will just delete the
| comment. Whatever it was must not have been a super huge
| deal (to me).
| grey-area wrote:
| Honestly the hype cycle feels very like crypto, and just
| like crypto prominent vcs have a lot of money riding on
| the outcome.
| steveklabnik wrote:
| I agree with you, and I think that's coloring a lot of
| people's perceptions. I am not a crypto fan but am an LLM
| fan.
|
| Every hype cycle feels like this, and some of them are
| nonsense and some of them are real. We'll see.
| jstummbillig wrote:
| Of course, lot's of hype, but my point is that the reason
| why is very different and it matters: As an early bc
| adopter making your believe in bc is super important to
| my net worth (and you not believing in bc makes me look
| like an idiot and lose a lot of money).
|
| In contrast, what do I care if you believe in code
| generation AI? If you do, you are probably driving up
| pricing. I mean, I am sure that there are people that
| care very much, but there is little inherent value for me
| in you doing so, as long as the people who are building
| the AI are making enough profit to keep it running.
|
| With regards to the VCs, well, how many VCs are there in
| the world? How many of the people who have something good
| to say about AI are likely VCs? I might be off by an
| order of magnitude, but even then it would really not be
| driving the discussion.
| leshow wrote:
| I don't find that a compelling argument, lots of people
| get taken in by hype cycles even when they don't profit
| directly from it.
| leshow wrote:
| I think you're missing the broader context. There is a
| lot of people very invested in the maximalist outcome
| which does create pressure for people to be boosters. You
| don't need a digital token for that to happen. There's a
| social media aspect as well that creates a feedback loop
| about claims.
|
| We're in a hype cycle, and it means we should be extra
| critical when evaluating the tech so we don't get taken
| in by exaggerated claims.
| jstummbillig wrote:
| I mostly don't agree. Yes, there is always social
| pressure with these things, and we are in a hype cycle,
| but the people "buying in" are simply not doing much at
| all. They are mostly consumers, waiting for the next
| model, which they have no control over or stake in
| creating (by and large).
|
| The people _not_ buying into the hype, on the other
| hands, are actually the ones that have a very good reason
| to be invested, because if they turn out to be wrong they
| might face some very uncomfortable adjustments in the job
| landscape and a lot of the skills that they worked so
| hard to gain and believed to be valuable.
|
| As always, be weary of any claims, but the tension here
| is very much the reverse of crypto and I don't think
| that's very appreciated.
| cfst wrote:
| The current batch of models, specifically Claude Sonnet
| and Opus 4, are the first I've used that have actually
| been more helpful than annoying on the large mixed-
| language codebases I work in. I suspect that dividing
| line differs greatly between developers and applications.
| nalllar wrote:
| If you interact with internet comments and discussions as
| an amorphous blob of people you'll see a constant trickle
| of the view that models now are useful, and before were
| useless.
|
| If you pay attention to who says it, you'll find that
| people have different personal thresholds for finding
| llms useful, not that any given person like steveklabnik
| above keeps flip-flopping on their view.
|
| This is a variant on the goomba fallacy:
| https://englishinprogress.net/gen-z-slang/goomba-fallacy-
| exp...
| bix6 wrote:
| Everything actually got better. Look at the image
| generation improvements as an easily visible benchmark.
|
| I do not program for my day job and I vibe coded two
| different web projects. One in twenty mins as a test with
| cloudflare deployment having never used cloudflare and
| one in a week over vacation (and then fixed a deep safari
| bug two weeks later by hammering the LLM). These tools
| massively raise the capabilities for sub-average people
| like me and decrease the time / brain requirements
| significantly.
|
| I had to make a little update to reset the KV store on
| cloudflare and the LLM did it in 20s after failing the
| syntax twice. I would've spent at least a few minutes
| looking it up otherwise.
| Aeolun wrote:
| It's true though? Previous models could do well in
| specifically created settings. You can throw practically
| everything at Opus, and it'll work mostly fine.
| burnte wrote:
| > Current LLMs just are not as good as they are sold to be as
| a programming assistant and people consistently predict and
| self-report in the wrong direction on how useful they are.
|
| I would argue you don't need the "as a programming assistant"
| phrase as right now from my experience over the past 2 years,
| literally every single AI tool is massively oversold as to
| its utility. I've literally not seen a single one that
| delivers on what it's billed as capable of.
|
| They're useful, but right now they need a lot of handholding
| and I don't have time for that. Too much fact checking. If I
| want a tool I always have to double check, I was born with a
| memory so I'm already good there. I don't want to have to
| fact check my fact checker.
|
| LLMs are great at small tasks. The larger the single task is,
| or the more tasks you try to cram into one session, the worse
| they fall apart.
| atiedebee wrote:
| Let me bring you a third (not necessarily true)
| interpretation:
|
| The developer who has experience using cursor saw a
| productivity increase not because he became better at using
| cursor, but because he became worse at _not_ using it.
| card_zero wrote:
| Or, one person in 16 has a particular personality, inclined
| to LLM dependence.
| runarberg wrote:
| Invoking personality is to the behavioral science as
| invoking God is to the natural sciences. One can explain
| anything by appealing to personality, and as such it
| explains nothing. Psychologists have been trying to make
| sense of personality for over a century without much
| success (the best efforts so far have been a five factor
| model [Big 5] which has ultimately pretty minor
| predictive value), which is why most behavioral
| scientists have learned to simply leave personality to
| the philosophers and concentrate on much simpler
| theoretical framework.
|
| A much simpler explanation is what your parent offered.
| And to many behavioralists it is actually the same
| explanation, as to a true scotsm... [ _cough_ ]
| behavioralist personality is simply learned habits, so--
| by Occam's razor--you should omit personality from your
| model.
| card_zero wrote:
| Fair comment, but I'm not down with behavioralism, and
| people have personalities, regrettably.
| runarberg wrote:
| This is still ultimately a research within the field of
| the behavior sciences, and as such the laws of human
| behavior apply, where behaviorism offers a far more
| successful theoretical framework than personality
| psychology.
|
| Nobody is denying that people have personalities btw. Not
| even true behavioralists do that, they simply argue from
| reductionism that personality can be explained with
| learning contingencies and the reinforcement history.
| Very few people are true behavioralists these days
| though, but within the behavior sciences, scientists are
| much more likely to borrow missing factors (i.e. things
| that learning contingencies fail to explain) from fields
| such as cognitive science (or even further to
| neuroscience) and (less often) social science.
|
| What I am arguing here, however, is that the appeal to
| personality is unnecessary when explaining behavior.
|
| As for figuring out what personality is, that is still
| within the realm of philosophy. Maybe cognitive science
| will do a better job at explaining it than
| psychometricians have done for the past century. I
| certainly hope so, it would be nice to have a better
| model of human behavior. But I think even if we could
| explain personality, it still wouldn't help us here. At
| best we would be in a similar situation as physics, where
| one model can explain things traveling at the speed of
| light, while another model can explain things at the sub-
| atomic scale, but the two models cannot be applied
| together.
| cutemonster wrote:
| Didn't they rather mean:
|
| Developers' own skills might atrophy, when they don't
| write that much code themselves, relying on AI instead.
|
| And now when comparing with/without AI they're faster
| with. But a year ago they might have been that fast or
| faster _without_ an AI.
|
| I'm not saying that that's how things are. Just pointing
| out another way to interpret what GP said
| robwwilliams wrote:
| Or a sampling artifact. 4 vs 12 does seem significant within
| a study, but consider a set of N such studies.
|
| I assume that many large companies have tested efficiency
| gains and losses of there programmers much more extensively
| than the authors of this tiny study.
|
| A survey of companies and their evaluation and conclusions
| would carry more weight---excluding companies selling AI
| products, of course.
| rs186 wrote:
| If you use binomial test, P(X<=4) is about 0.105 which
| means p = 0.21.
| bgwalter wrote:
| We have heard variations of that narrative for at least a year
| now. It is not hard to use these chatbots and no one who was
| very productive in open source before "AI" has any higher
| output now.
|
| Most people who subscribe to that narrative have some
| connection to "AI" money, but there might be some misguided
| believers as well.
| bc1000003 wrote:
| "My intiution is that..." - AGREED.
|
| I've found that there are a couple of things you need to do to
| be very efficient.
|
| - Maintain an architecture.md file (with AI assistance) that
| answers many of the questions and clarifies a lot of the
| ambiguity in the design and structure of the code.
|
| - A bootstrap.md file(s) is also useful for a lot of tasks..
| having the AI read it and start with a correct idea about the
| subject is useful and a time saver for a variety of kinds of
| tasks.
|
| - Regularly asking the AI to refactor code, simplify it,
| modularize it - this is what the experienced dev is for. VIBE
| coding generally doesn't work as AI's tend to write messy non-
| modular code unless you tell them otherwise. But if you review
| code, ask for specific changes.. they happily comply.
|
| - Read the code produced, and carefully review it. And notice
| and address areas where there are issues, have the AI fix all
| of these.
|
| - Take over when there are editing tasks you can do more
| efficiently.
|
| - Structure the solution/architecture in ways that you know the
| AI will work well with.. things it knows about.. it's general
| sweet spots.
|
| - Know when to stop using the AI and code it yourself..
| particuarly when the AI has entered the confusion doom loop.
| Wasting time trying to get the AI to figure out what it's never
| going to is best used just fixing it yourself.
|
| - Know when to just not ever try to use AI. Intuitively you
| know there's just certain code you can't trust the AI to safely
| work on. Don't be a fool and break your software.
|
| ----
|
| I've found there's no guarantee that AI assistance will speed
| up any one project (and in some cases slow it down).. but
| measured cross all tasks and projects, the benefits are pretty
| substantial. That's probably others experience at this point
| too.
| ericmcer wrote:
| Looking at the example tasks in the pdf ("Sentencize wrongly
| splits sentence with multiple...") these look like really
| discrete and well defined bug fixes. AI should smash tasks like
| that so this is even less hopeful.
| rafaelmn wrote:
| >My personal theory is that getting a significant productivity
| boost from LLM assistance and AI tools has a much steeper
| learning curve than most people expect.
|
| Are we are still selling the "you are an expert senior
| developer" meme ? I can completely see how once you are working
| on a mature codebase LLMs would only slow you down. Especially
| one that was not created by an LLM and where you are the
| expert.
| bicx wrote:
| I think it depends on the kind of work you're doing, but I
| use it on mature codebases where I am the expert, and I
| heavily delegate to Claude Code. By being knowledgeable of
| the codebase, I know exactly how to specify a task I need
| performed. I set it to work on one task, then I monitor it
| while personally starting on other work.
|
| I think LLMs shine when you need to write a higher volume of
| code that extends a proven pattern, quickly explore
| experiments that require a lot of boilerplate, or have
| multiple smaller tasks that you can set multiple agents upon
| to parallelize. I've also had success in using LLMs to do a
| lot of external documentation research in order to integrate
| findings into code.
|
| If you are fine-tuning an algorithm or doing domain-expert-
| level tweaks that require a lot of contextual input-output
| expert analysis, then you're probably better off just coding
| on your own.
|
| Context engineering has been mentioned a lot lately, but it's
| not a meme. It's the real trick to successful LLM agent
| usage. Good context documentation, guides, and well-defined
| processes (just like with a human intern) will mean the
| difference between success and failure.
| dmezzetti wrote:
| I'm the developer of txtai, a fairly popular open-source
| project. I don't use any AI-generated code and it's not
| integrated into my workflows at the moment.
|
| AI has a lot of potential but it's way over-hyped right now.
| Listen to the people on the ground who are doing real work and
| building real projects, none of them are over-hyping it. It's
| mostly those who have tangentially used LLMs.
|
| It's also not surprising that many in this thread are clinging
| to a basic premise that it's 3 steps backwards to go 5 steps
| forward. Perhaps that is true but I'll take the study at face
| value, it seems very plausible to me.
| mnky9800n wrote:
| I feel like I get better at it as I use Claude code more
| because I both understand its strength and weaknesses and also
| understand what context it's usually missing. Like today I was
| struggling to debug an issue and realised that Claude's idea of
| a coordinate system was 90 degrees rotated from mine and thus
| it was getting confused because I was confusing it.
| throwawayoldie wrote:
| One of the major findings is that people's perception--that
| is, what it felt like--was incorrect.
| devin wrote:
| It seems really surprising to me that anyone would call 50
| hours of experience a "high skill ceiling".
| keeda wrote:
| _> My personal theory is that getting a significant
| productivity boost from LLM assistance and AI tools has a much
| steeper learning curve than most people expect._
|
| Yes, and I'll add that there is likely no single "golden
| workflow" that works for everybody, and everybody needs to
| figure it out for themselves. It took me _months_ to figure out
| how to be effective with these tools, and I doubt my approach
| will transfer over to others ' situations.
|
| For instance, I'm working solo on smallish, research-y projects
| and I had the freedom to structure my code and workflows in a
| way that works best for me and the AI. Briefly: I follow an ad-
| hoc, pair-programming paradigm, fluidly switching between
| manual coding and AI-codegen depending on an instinctive
| evaluation of whether a prompt would be faster. This rapid
| manual-vs-prompt assessment is second nature to me now, but it
| took me a while to build that muscle.
|
| I've not worked with coding agents, but I doubt this approach
| will transfer over well to them.
|
| I've said it before, but this is technology that behaves like
| people, and so you have to approach it like working with a
| colleague, with all their quirks and fallibilities and
| potentially-unbound capabilities, rather than a deterministic,
| single-purpose tool.
|
| I'd love to see a follow-up of the study where they let the
| same developers get more familiar with AI-assisted coding for a
| few months and repeat the experiment.
| Filligree wrote:
| > I've not worked with coding agents, but I doubt this
| approach will transfer over well to them.
|
| Actually, it works well so long as you tell them when you've
| made a change. Claude gets confused if things randomly change
| underneath it, but it has no trouble so long as you give it a
| short explanation.
| ummonk wrote:
| Devil's advocate: it's also possible the one developer hasn't
| become more productive with Cursor, but rather has atrophied
| their non-AI productivity due to becoming reliant on Cursor.
| thesz wrote:
| > My personal theory is that getting a significant productivity
| boost from LLM assistance and AI tools has a much steeper
| learning curve than most people expect.
|
| This is what I heard about strong type systems (especially
| Haskell's) about 20-15 years ago.
|
| "History does not repeat, but it rhymes."
|
| If we rhyme "strong types will change the world" with "agentic
| LLMs will change the world," what do we get?
|
| My personal theory is that we will get the same: some people
| will get modest-to-substantial benefits there, but changes in
| the world will be small if noticeable at all.
| ruszki wrote:
| Maybe it depends on the task. I'm 100% sure, that if you
| think that type system is a drawback, then you have never
| code in a diverse, large codebase. Our 1.5 million LOC 30
| years old monolith would be completely unmaintainable without
| it. But seriously, anything without a formal type system
| above 10 LOC after a few years is unmaintainable. An informal
| is fine for a while, but not long for sure. On a 30 years old
| code, basically every single informal rules are broken.
|
| Also, my long experience is that even in PoC phase, using a
| type system adds almost zero extra time... of course if you
| know the type system, which should be trivial in any case
| after you've seen a few.
| leshow wrote:
| I don't think that's a fair comparison. Type systems don't
| produce probabilistic output. Their entire purpose is to
| reduce the scope of possible errors you can write. They kind
| of did change the world, didn't they? I mean, not everyone is
| writing Haskell but Rust exists and it's doing pretty well.
| There was also not really a case to be made where type
| systems made software in general _worse_. But you could
| definitely make the case that LLM's might make software
| worse.
| atlintots wrote:
| Its too bad the management people never pushed Haskell as
| hard as they're pushing AI today! Alas.
| Aurornis wrote:
| > A quarter of the participants saw increased performance, 3/4
| saw reduced performance.
|
| The study used 246 tasks across 16 developers, for an average
| of 15 tasks per developer. Divide that further in half because
| tasks were assigned as AI or not-AI assisted, and the sample
| size per developer is still relatively small. Someone would
| have to take the time to review the statistics, but I don't
| think this is a case where you can start inferring that the
| developers who benefited from AI were just better at using AI
| tools than those who were not.
|
| I do agree that it would be interesting to repeat a similar
| test on developers who have more AI tool assistance, but then
| there is a potential confounding effect that AI-enthusiastic
| developers could actually lose some of their practice in
| writing code without the tools.
| th0ma5 wrote:
| Simon's opinion is unsurprisingly that people need to read his
| blog and spam on every story on HN lest we be left behind.
| eightysixfour wrote:
| I have been teaching people at my company how to use AI code
| tools, the learning curve is way worse for developers and I
| have had to come up with some exercises to try and breakthrough
| the curve. Some seemingly can't get it.
|
| The short version is that devs want to give instructions
| instead of ask for what outcome they want. When it doesn't
| follow the instructions, they double down by being more
| precise, the worst thing you can do. When non devs don't get
| what they want, they add more detail to the description of the
| desired outcome.
|
| Once you get past the control problem, then you have a second
| set of issues for devs where the things that should be easy or
| hard don't necessarily map to their mental model of what is
| easy or hard, so they get frustrated with the LLM when it can't
| do something "easy."
|
| Lastly, devs keep a shit load of context in their head - the
| project, what they are working on, application state, etc. and
| they need to do that for LLMs too, but you have to repeat
| themselves often and "be" the external memory for the LLM. Most
| devs I have taught hate that, they actually would rather have
| it the other way around where they get help with context and
| state but want to instruct the computer on their own.
|
| Interestingly, the best AI assisted devs have often moved to
| management/solution architecture, and they find the AI code
| tools brought back some of the love of coding. I have a
| hypothesis they're wired a bit differently and their role with
| AI tools is actually closer to management than it is
| development in a number of ways.
| heavyset_go wrote:
| Any "tricks" you learn for one model may not be applicable to
| another, it isn't a given that previous experience with a
| company's product will increase the likelihood of productivity
| increases. When models change out from under you, the
| heuristics you've built up might be useless.
| inetknght wrote:
| > _We pay developers $150 /hr as compensation for their
| participation in the study._
|
| Can someone point me to these 300k/yr jobs?
| recursive wrote:
| These are not 300k/yr jobs.
| akavi wrote:
| L5 ("Senior") at any FAANG co, L6 ("Staff") at pretty much any
| VC-backed startup in the bay.
| nestorD wrote:
| One thing I could not find on a cursory read is how used were
| those developers to AI tools. I would expect someone using those
| regularly to benefit while someone who only played with them a
| couple of time would likely be slowed down as they deal with the
| friction of learning to be productive with the tool.
| uludag wrote:
| In this case though you still wouldn't necessarily know if the
| AI tools had a positive causal effect. For example, I
| practically live in Emacs. Take that away and no doubt I would
| be immensely less effective. That Emacs improves my
| productivity and without it I am much worse in no way implies
| that Emacs is better than the alternatives.
|
| I feel like a proper study for this would involve following
| multiple developers over time, tracking how their contribution
| patterns and social standing changes. For example, take three
| cohorts of relatively new developers: instruct one to go all in
| on agentic development, one to freely use AI tools, and one
| prohibited from AI tools. Then teach these developers open
| source (like a course off of this book:
| https://pragprog.com/titles/a-vbopens/forge-your-future-
| with...) and have them work for a year to become part of a
| project of their choosing. Then in the end, track a number of
| metrics such as leadership position in community, coding/non-
| coding contributions, emotional connection to project, social
| connections made with community, knowledge of code base, etc.
|
| Personally, my prior probability is that the no-ai group would
| likely still be ahead overall.
| swayvil wrote:
| AI by design can only repeat and recombine past material.
| Therefore actual invention is out.
| elpakal wrote:
| underrated comment
| atleastoptimal wrote:
| HN moment
| luibelgo wrote:
| Is that actually proven?
| greenchair wrote:
| The easiest way to see this for yourself is with an image
| generator. Try asking for a very specific combination of
| things that would not normally appear together in an
| artpiece.
| keeda wrote:
| Pretty much all invention is novel combination of known
| techniques. Anything that introduces a fundamental new
| technique is usually in the realm of groundbreaking papers and
| Nobel prizes.
| zzzeek wrote:
| As a project for work, I've been using Claude CLI all week to do
| as many tasks as possible. So with my week's experience, I'm now
| an expert in this subject and can weigh in.
|
| Two things that stand out to me are 1. it depends a lot on what
| kind of task you are having the LLM do. and 2. if the LLM process
| takes more time, _it is very likely your cognitive effort was
| still way less_ - for sysadmin kinds of tasks, working with less
| often accessed systems, LLMs can read --help, man pages, doc
| sites, all for you, and give you the working command right there
| (And then run it, and then look at the output and tell you why it
| failed, or how it worked, and what it did). There is absolutely
| no question that second part is a big deal. Sticking it onto my
| large open source project to fix a deep, esoteric issue or write
| some subtle documentation where it doesnt really "get" what I'm
| doing, yeah it is not as productive in that realm and you might
| want to skip it for the thinking part there. I think everyone is
| trying to figure out this question of "when and how" for LLMs. I
| think the sweet spot is for tasks involving systems and
| technologies where you'd otherwise be spending a lot of time
| googling, stackoverflowing, reading man pages to get just the
| right parameters into commands and so forth. This is cognitive
| grunt work and the LLMs can do that part very well.
|
| My week of effort with it was not really "coding on my open
| source project"; two examples were, 1. running a bunch of ansible
| playbooks that I wrote years ago on a new host, where OS upgrades
| had lots of snags; I worked with Claude to debug all the various
| error messages and places where the newer OS distribution had
| different packages, missing packages, etc. it was ENORMOUSLY
| helpful since I never look at these playbooks and I dont even
| remember what I did, Claude can read it for you and interpret it
| as well as you can. 2. I got a bugzilla for a fedora package that
| I packaged years ago, where they have some change to the
| directives used in specfiles that everyone has to make. I look at
| fedora packaging workflows once every three years. I told Claude
| to read the BZ and just do it. IT DID IT. I had to get involved
| running the "mock" suite as it needed sudo but Claude gave me the
| commands. _zero googling_. _zero even reading the new format of
| the specfile_ (the bz linked to a tool that does the conversion).
| From bug received to bug closed and I didnt do any typing at all
| outside of the prompt. Had it done before breakfast since I didnt
| even need any glucose for mental energy expended. This would have
| been a painful and frustrating mental effort otherwise.
|
| so the studies have to get more nuanced and survey a lot more
| than 16 devs I think
| geerlingguy wrote:
| So far in my own hobby OSS projects, AI has only hampered things
| as code generation/scaffolding is probably the least of my
| concerns, whereas code review, community wrangling, etc. are more
| impactful. And AI tooling can only do so much.
|
| But it's hampered me in the fact that others, uninvited, toss an
| AI code review tool at some of my open PRs, and that spits out a
| 2-page document with cute emoji and formatted bullet points going
| over all aspects of a 30 line PR.
|
| Just adds to the noise, so now I spend time deleting or hiding
| those comments in PRs, which means I have even _less_ time for
| actual useful maintenance work. (Not that I have much already.)
| heisenbit wrote:
| AI sometimes points out hygiene issues that may be swept under
| the carpet but once pointed out can't be ignored anymore. I know
| I don't need that error handling, I'm certain for the near future
| but maybe it is needed... Also the code produced by the AI has
| some impedance match with my natural code. Then one needs to
| figure out whether that is due to moving best practices, until
| now ignored best practices or the AI being overwhelmed with code
| from beginners. This all takes time - some of it is transient,
| some of it is actually improving things and some of it is waste.
| The jury is still out there.
| ChrisMarshallNY wrote:
| It's been _very_ helpful for me. I find ChatGPT the easiest to
| use; not because it 's more accurate (it isn't), but because it
| seems to understand the intent of my questions most clearly. I
| don't usually have to iterate much.
|
| I use it like a know-it-all personal assistant that I can ask any
| question to; even [especially] the embarrassing, "stupid" ones.
|
| _> The only stupid question is the one we don 't ask.
|
| - On an old art teacher's wall_
| 0xmusubi wrote:
| I find myself having discussions with AI about different design
| possibilities and it sometimes comes up with ideas I hadn't
| thought of or features I wasn't aware of. I wouldn't classify
| this as "overuse" as I often find the discussions useful, even if
| it's just to get my thoughts down. This might be more relevant
| for larger scoped tasks or ones where the programmer isn't as
| familiar with certain features or libraries though.
| groos wrote:
| One thing I've experienced in trying to use LLMs to code in an
| existing large code base is that it's _extremely_ hard to
| accurately describe what you want to do. Oftentimes, you are
| working on a problem with a web of interactions all over the code
| and describing the problem to an LLM will take far longer than
| just doing it manually. This is not the case with generating new
| (boilerplate) code for projects, which is where users report the
| most favorable interaction with LLMs.
| 9dev wrote:
| That's my experience as well. It's where Knuth comes in again:
| the program doesn't just live in the code, but also in the
| minds of its creator. Unless I communicate all that context
| from the start, I can't just dump years of concepts and
| strategy out of my brain into the LLM without missing details
| that would be relevant.
| AvAn12 wrote:
| N = 16 developers. Is this enough to draw any meaningful
| conclusions?
| sarchertech wrote:
| That depends on the size of the effect you're trying to
| measure. If cursor provides a 5x, 10x, or 100x productivity
| boost as many people are claiming, you'd expect to see that in
| a sample size of 16 unless there's something seriously wrong
| with your sample selection.
|
| If you are looking for a 0.1% increase in productivity, then 16
| is too small.
| biophysboy wrote:
| Well it depends on the variance of the random variable
| itself. You're right that with big, obvious effects, a larger
| n is less "necessary". I could see individuals having very
| different "productivities", especially when the idea is
| flattened down to completion time.
| AvAn12 wrote:
| "A quarter of the participants saw increased performance, 3/4
| saw reduced performance." So I think any conclusions drawn on
| these 16 people doesn't signify much one way or the other.
| Cool paper but how is this anything other than a null
| finding?
| atleastoptimal wrote:
| I'm not surprised that AI doesn't help people with 5+ years
| experience in open source contribution, but I'd imagine most
| people aren't claiming AI tools are at senior engineer level yet.
|
| Soon once the tools and how people use them improve AI won't be a
| hinderance for advanced tasks like this, and soon after AI will
| be able to do these prs on their own. It's inevitable given the
| rate of improvement even since this study.
| artee_49 wrote:
| Even for senior levels the claim has been that AI will speed up
| their coding (take it over) so they can focus on higher level
| decisions and abstract level concepts. These contributions are
| not those and based on prior predictions the productivity
| should have gone up.
| pera wrote:
| Wow these are extremely interesting results, specially this part:
|
| > _This gap between perception and reality is striking:
| developers expected AI to speed them up by 24%, and even after
| experiencing the slowdown, they still believed AI had sped them
| up by 20%._
|
| I wonder what could explain such large difference between
| estimation/experience vs reality, any ideas?
|
| Maybe our brains are measuring mental effort and distorting our
| experience of time?
| alfalfasprout wrote:
| I would speculate that it's because there's been a huge
| concerted effort to make people want to believe that these
| tools are better than they are.
|
| The "economic experts" and "ml experts" are in many cases
| effectively the same group-- companies pushing AI coding tools
| have a vested interest in people believing they're more useful
| than they are. Executives take this at face value and broadly
| promise major wins. Economic experts take this at face value
| and use this for their forecasts.
|
| This propagates further, and now novices and casual individuals
| begin to believe in the hype. Eventually, as an experienced
| engineer it moves the "baseline" expectation much higher.
|
| Unfortunately this is very difficult to capture empirically.
| longwave wrote:
| I also wonder how many of the numerous AI proponents in HN
| comments are subject to the same effect. Unless they are truly
| measuring their own performance, is AI really making them more
| productive?
| chamomeal wrote:
| It's funny cause I sometimes have the opposite experience. I
| tried to use Claude code today to make a demo app to show off a
| small library I'm working on. I needed it to set up some very
| boilerplatey example app stuff.
|
| It was fun to watch, it's super polished and sci-fi-esque. But
| after 15 minutes I felt braindead and was bored out of my mind
| lol
| evanelias wrote:
| Here's a scary thought, which I'm admittedly basing on
| absolutely nothing scientific:
|
| What if agentic coding sessions are triggering a similar
| dopamine feedback loop as social media apps? Obviously not to
| the same degree as social media apps, I mean coding for work is
| still "work"... but there's maybe some similarity in getting
| iterative solutions from the agent, triggering something in
| your brain each time, yes?
|
| If that was the case, wouldn't we expect developers to have an
| overly positive perception of AI because they're literally
| becoming addicted to it?
| EarthLaunch wrote:
| > The LLMentalist Effect: how chat-based Large Language
| Models replicate the mechanisms of a psychic's con
|
| https://softwarecrisis.dev/letters/llmentalist/
|
| Plus there's a gambling mechanic: Push the button, sometimes
| get things for free.
| csherratt wrote:
| That's my suspicion to.
|
| My issue with this being a 'negative' thing is that I'm not
| sure it is. It works off of the same hunting / foraging
| instincts that keep us alive. If you feel addiction to
| something positive, it is bad?
|
| Social media is negative because it addicts you to mostly low
| quality filler content. Content that doesn't challenge you.
| You are reading shit posts instead of reading a book or doing
| something with better for you in the long run.
|
| One could argue that's true for AI, but I'm not confident
| enough to make such a statement.
| evanelias wrote:
| The study found AI caused a "significant slowdown" in
| developer efficiency though, so that doesn't seem positive!
| afro88 wrote:
| Early 2025. I imagine the results would be quite different with
| mid 2025 models and tools.
| gmaster1440 wrote:
| What if the slowdown isn't a bug but a feature? What if AI tools
| are forcing developers to think more carefully about their code,
| making them slower but potentially producing better results?
| AFAIK the study measured speed, not quality, maintainability, or
| correctness.
|
| The developers might feel more productive because they're
| engaging with their code at a higher level of abstraction, even
| if it takes longer. This would be consistent with why they
| maintained positive perceptions despite the slowdown.
| PessimalDecimal wrote:
| In my experience, LLMs are not causing people to think more
| carefully about their code.
| doctoboggan wrote:
| For me, the measurable gain in productiviy comes when I am
| working with a new language or new technology. If I were to use
| claude code to help implement a feature of a python library I've
| worked on for years then I don't think it would help much (Maybe
| even hurt). However, if I use claude code on some go code I have
| very little experience with, or using it to write/modify helm
| charts then I can definitely say it speeds me up.
|
| But, taking a broader view its possible that these initial speed
| ups are negated by the fact that I never really learn go or helm
| charts as deeply now that I use claude code. Over time, its
| possible that my net productiviy is still reduced. Hard to say
| for sure, especially considering I might not have even considered
| talking these more difficult go library modifications if I didn't
| have claude code to hold my hand.
|
| Regardless, these tools are out there, increasing in
| effectiveness and I do feel like I need to jump on the train
| before it leaves me at the station.
| LegNeato wrote:
| For certain tasks it can speed me up 30x compared to an expert in
| the space: https://rust-gpu.github.io/blog/2025/06/24/vulkan-
| shader-por...
| lpghatguy wrote:
| This is very disingenuous: we don't know how much spare time
| Sascha spent, and much of that time was likely spent learning,
| experimenting, and reporting issues to Slang.
| _jayhack_ wrote:
| This does not take into account the fact that experienced
| developers working with AI have shifted into roles of management
| and triage, working on several tasks simultaneously.
|
| Would be interesting (and in fact necessary to derive conclusions
| from this study) to see aggregate number of tasks completed per
| developer with AI augmentation. That is, if time per task has
| gone up by 20% but we clear 2x as many tasks, that is a pretty
| important caveat to the results published here
| isoprophlex wrote:
| Ed Zitron was 100% right. The mask is off and the AI subprime
| crisis is coming. Reading TFA, it would be hilarious if the
| bubble burst AND it turns out there's actually no value to be
| had, at ANY price. I for one can't wait for this era of hype to
| end. We'll see.
|
| _you 're addicted to the FEELING of productivity more than
| actual productivity. even knowing this, even seeing the data,
| even acknowledging the complete fuckery of it all, you're still
| gonna use me. i'm still gonna exist. you're all still gonna
| pretend this helps because the alternative is admitting you spent
| billions of dollars on spicy autocomplete._
| keerthiko wrote:
| IME AI coding is excellent for one-off scripts, personal
| automation tooling (I iterate on a tool to scrape receipts and
| submit expenses for my specific needs) and generally stuff that
| can be run in environments where the creator and the end user are
| effectively the same (and only) entity.
|
| Scaled up slightly, we use it to build plenty of internal tooling
| in our video content production pipeline (syncing between
| encoding tools and a status dashboard for our non-technical
| content team).
|
| Using it for anything more than boilerplate code, well-defined
| but tedious refactors, or quickly demonstrating how to use an
| unfamiliar API in production code, before a human takes a full
| pass at everything is something I'm going to be wary of for a
| long time.
| mrwaffle wrote:
| My overall concern has to do with our developer ecosystem from
| the important points mentioned by simonw and narush. I've been
| concerned about this for years but AI reliance seems to be
| pouring jet fuel on the fire. Particularly troubling is the lack
| of understanding less-experienced devs will have over time. Does
| anyone have a counter-argument for this they can share on why
| this is a good thing?
| partdavid wrote:
| The shallow analogy is like "why worry about not being able to
| do arithmetic without a calculator"? Like... the dev of the
| future just won't need it.
|
| I feel like programming has become increasingly specialized and
| even before AI tool explosion, it's way more possible to be
| ignorant of an enormous amount of "computing" than it used to
| be. I feel like a lot of "full stack" developers only
| understand things to the margin of their frameworks but above
| and below it they kind of barely know how a computer works or
| what different wire protocols actually are or what an OS might
| actually _do_ at a lower level. Let alone the context in which
| in application sits beyond let 's say, a level above a
| kubernetes pod and a kind of trial-end-error approach to poking
| at some YAML templates.
|
| Do we all need to know about processor architectures and
| microcode and L2 caches and paging and OS distributions and
| system software and installers and openssl engines and how to
| make sure you have the one that uses native instructions and
| TCP packets and envoy and controllers and raft systems and
| topic partitions and cloud IAM and CDN and DNS? Since that's
| not the case--nearly everyone has vast areas of ignorance yet
| still does a bunch of stuff--it's harder to sell the idea that
| whatever AI tools are doing that we lose skills in will somehow
| vaguely matter in the future.
|
| I kind of miss when you had to know a little of everything and
| it also seemed like "a little bit" was a bigger slice of what
| there was to know. Now you talk to people who use a different
| framework in your own language and you feel like you're talking
| to deep specialists whose concerns you can barely understand
| the existence of, let alone have an opinion on.
| OpenSourceWard wrote:
| Very cool work! And I love the nuance in your methodology and
| findings. Anyway, I'm preparing myself for all the "Bombshell
| news: AI is slowing down developers" posts that are coming.
| asciimov wrote:
| I'll be curious of the long term impacts of AI.
|
| Such as: do you end up spending more time to find and fix issues,
| does AI use reduce institutional knowledge, will you be more
| inclined to start projects over from scratch.
| lmeyerov wrote:
| As someone has been doing hardcore genai for 2+ years, my
| experience has been, and what we advise internally:
|
| * 3 weeks to transition from ai pairing to AI Delegation to ai
| multitasking. So work gains are mostly week 3+. That's 120+ hours
| in, as someone pretty senior here.
|
| * Speedup is the wrong metric. Think throughput, not latency.
| Some finite amount of work might take longer, but the volume of
| work should go up because AI can do more on a task and diff
| tasks/projects in parallel.
|
| Both perspectives seem consistent with the paper description...
| ieie3366 wrote:
| LLMs are godtier if you know what you're doing, and prompt them
| with "do X", where x is a SELF-CONTAINED change you would
| manually know how to implement
|
| For example, today I asked claude to implement per-user rate-
| limiting into my nestjs service, then iterated by asking
| implementing specific unit tests and some refactoring. It one-
| shot everything. I would say 90% time savings.
|
| Unskilled people ask them "i have giant problem X solve it" and
| end up with slop
| thepasswordis wrote:
| I actually think that pasting questions into chatGPT etc. and
| then getting general answers to put into your code is the way.
|
| "One shotting" apps, or even cursor and so forth seem like a
| waste of time. It feels like if you prompt it _just right_ it
| might help but then it never really does.
| partdavid wrote:
| I've done okay with copilot as a very smart autocomplete on: a)
| very typical codebase, with b) lots of boilerplate, where c)
| I'm not terribly familiar with the languages and frameworks,
| which are d) very, very popular but e) I don't really like, so
| I'm not particularly motivated to become familiar with them.
| I'm not a frontend developer, I don't like it, but I'm in a
| position now where I need to do frontend things with a verbose
| Typescript/React application which is not interesting from a
| technical point of view (good product, it's just not good
| because it has an interesting or demanding front end). Copilot
| (I use Emacs, so cursor is a non-starter, but copilot-mode
| works very well for Typescript) has been pretty invaluable to
| just sort of slogging through stuff.
|
| For everything else, I think you're right, and actually the
| dialog-oriented method is way better. If I learn an approach
| and apply some general example from ChatGPT, but I do the
| typing and implementation myself so I need to understand what
| I'm doing, I'm actually leveling up and I know what I'm
| finished with. If I weren't "experienced", I'd worry about what
| it was doing to my critical thinking skills, but I know enough
| about learning on my own at this point to know I'm doing
| something.
|
| I'm not interested in vibe coding at all--it seems like a one-
| way process to automate what was already not the hard part of
| software engineering; generating tutorial-level initial
| implementations. Just more scaffolding that eventually needs to
| be cleared away.
| thesz wrote:
| What is interesting here is that all predictions were positive,
| but results are negative.
|
| This shows that everyone in the study (economic experts, ML
| experts and even developers themselves, even after getting
| experience) are novices if we look at them from the Dunning-
| Kruger effect [1] perspective.
|
| [1] https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
|
| "The Dunning-Kruger effect is a cognitive bias in which people
| with limited competence in a particular domain overestimate their
| abilities."
| mattl wrote:
| I don't understand how anyone doing open source can use something
| trained on other people's code as a tool for contributions.
|
| I wouldn't accept someone's copy and pasted code from another
| project if it were under an incompatible license, let alone
| something with unknown origin.
| AIorNot wrote:
| Hey guys why are we making it so complicated? do we really need a
| paper and study?
|
| anyway -AI as the tech currently stand is a new skill to use and
| takes us humans time to learn, but once we do well, its becomes
| force multiplier
|
| ie see this:
| https://claude.ai/public/artifacts/221821f0-0677-409b-8294-3...
| tarofchaos wrote:
| Totally flawed study
| bit1993 wrote:
| It used to be that all you required to program was a computer and
| to RTFM but now we need to pay for API "tokens" and pray that
| there are no rug pull in the future.
| cadamsdotcom wrote:
| My hot take: Cursor is a bad tool for agentic coding. Had a
| subscription and canceled it in favor of Claude Code. I don't
| want to spend 90% of my time approving every line the agent wants
| to write. With Claude Code I review whole diffs - 1-2 minutes of
| the agent's work at a time. Then I work with the agent at a level
| of what its approach is, almost never asking about specific lines
| of code. I can look at 5 files at once in git diff and then ask
| "why'd you choose that way?" "Can we undo that and try to find a
| simpler way?"
|
| Cursor's workflow exposes how differently different people track
| context. The best ways to work with Cursor may simply not work
| for some of us.
|
| If Cursor isn't working for you, I strongly encourage you to try
| CLI agents like Claude Code.
| codyb wrote:
| So slow until a learning curve is hit (or as one user posited
| "until you forget how to work without it").
|
| But isn't the important thing to measure... how long does it take
| to debug the resulting code at 3AM when you get a PagerDuty
| alert?
|
| Similarly... how about the quality of this code over time? It's
| taken a lot of effort to bring some of the code bases I work in
| into a more portable, less coupled, more concise state through
| the hard work of
|
| - bringing shared business logic up into shared folders
|
| - working to ensure call chains flow top down towards root then
| back up through exposed APIs from other modules as opposed to
| criss-crossing through the directory structure
|
| - working to separate business logic from API logic from display
| logic
|
| - working to provide encapsulation through the use of wrapper
| functions creating portability
|
| - using techniques like dependency injection to decouple concepts
| allowing for easier testing
|
| etc
|
| So, do we end up with better code quality that ends up being more
| maintainable, extensible, portable, and composable? Or do we just
| end up with lots of poor quality code that eventually grows to
| become a tangled mess we spend 50% of our time fighting bugs on?
___________________________________________________________________
(page generated 2025-07-10 23:00 UTC)