[HN Gopher] A look at Cloudflare's AI-coded OAuth library
       ___________________________________________________________________
        
       A look at Cloudflare's AI-coded OAuth library
        
       Author : itsadok
       Score  : 230 points
       Date   : 2025-06-08 08:50 UTC (14 hours ago)
        
 (HTM) web link (neilmadden.blog)
 (TXT) w3m dump (neilmadden.blog)
        
       | CuriouslyC wrote:
       | Mostly a good writeup, but I think there's some serious shifting
       | the goalposts of what "vibe coded" means in a disingenuous way
       | towards the end:
       | 
       | 'Yes, this does come across as a bit "vibe-coded", despite what
       | the README says, but so does a lot of code I see written by
       | humans. LLM or not, we have to give a shit.'
       | 
       | If what most people do is "vibe coding" in general, the current
       | definition of vibe coding is essentially meaningless. Instead,
       | the author is making the distinction between "interim workable"
       | and "stainless/battle tested" which is another dimension of code
       | entirely. To describe that as vibe coding causes me to view the
       | author's intent with suspicion.
        
         | croes wrote:
         | Isn't vibe coding just C&P from AI instead of Stack Overflow?
         | 
         | I read it as: done by AI but not checked by humans.
        
           | ranguna wrote:
           | Yep I see it like that as well, code with 0 or very close to
           | 0 interactions from humans. Anyone who wants to change that
           | meaning is not serious.
        
         | techpression wrote:
         | I find "vibe coding" to be one of the, if not the, concepts in
         | this business to lose its meaning the fastest. Similar to how
         | everything all of a sudden was "cloud" now everything is "vibe
         | coded", even though reading the original tweet really narrows
         | it down thoroughly.
        
           | dimitri-vs wrote:
           | IMO it's pretty clear what vibe coding is: you don't look at
           | the code, only the results. If you're making judgement on the
           | code, it's not vibe coding.
        
           | keybored wrote:
           | Viral marketing campaign term losing its meaning makes sense.
        
         | simonw wrote:
         | How do you define vibe coding?
        
       | SiempreViernes wrote:
       | A very good piece that clearly illustrates one of the dangers
       | with LLS's: responsibility for code quality is blindly offloaded
       | on the automatic system
       | 
       | > There are some tests, and they are OK, but they are woefully
       | inadequate for what I would expect of a critical auth service.
       | Testing every MUST and MUST NOT in the spec is a bare minimum,
       | not to mention as many abuse cases as you can think of, but none
       | of that is here from what I can see: just basic functionality
       | tests.
       | 
       | and
       | 
       | > There are some odd choices in the code, and things that lead me
       | to believe that the people involved are not actually familiar
       | with the OAuth specs at all. For example, this commit adds
       | support for public clients, but does so by implementing the
       | deprecated "implicit" grant (removed in OAuth 2.1).
       | 
       | As Madden concludes "LLM or not, we have to give a shit."
        
         | JimDabell wrote:
         | > A very good piece that clearly illustrates one of the dangers
         | with LLS's: responsibility for code quality is blindly
         | offloaded on the automatic system
         | 
         | It does not illustrate that at all.
         | 
         | > Claude's output was thoroughly reviewed by Cloudflare
         | engineers with careful attention paid to security and
         | compliance with standards.
         | 
         | > To emphasize, *this is not "vibe coded"*. Every line was
         | thoroughly reviewed and cross-referenced with relevant RFCs, by
         | security experts with previous experience with those RFCs.
         | 
         | -- https://github.com/cloudflare/workers-oauth-provider
         | 
         | The humans who worked on it very, very clearly took
         | responsibility for code quality. That they didn't get it 100%
         | right does _not_ mean that they "blindly offloaded
         | responsibility".
         | 
         | Perhaps you can level that accusation at _other_ people doing
         | _different_ things, but Cloudflare explicitly placed the
         | responsibility for this on the humans.
        
       | djoldman wrote:
       | > At ForgeRock, we had hundreds of security bugs in our OAuth
       | implementation, and that was despite having 100s of thousands of
       | automated tests run on every commit, threat modelling, top-flight
       | SAST/DAST, and extremely careful security review by experts.
       | 
       | Wow. Anecdotally it's my understanding that OAuth is ... tricky
       | ... but wow.
       | 
       | Some would say it's a dumpster fire. I've never read the spec or
       | implemented it.
        
         | bandoti wrote:
         | Honestly, new code always has bugs though. That's pretty much a
         | guarantee--especially if it's somewhat complex.
         | 
         | That's why companies go for things that are "battle tested"
         | like vibe coding. ;)
         | 
         | Joke aside--I like how Anthropic is using their own product in
         | a pragmatic fashion. I'm wondering if they'll use it for their
         | MCP authentication API.
        
         | stuaxo wrote:
         | The times I've been involved with implementations it's been
         | really horrible.
        
         | jajko wrote:
         | Hundreds of thousands of tests? That sounds like quantity >
         | quality or outright llm-generated ones, who even maintains
         | them?
        
           | nmadden wrote:
           | This was before LLMs. It was a combination of unit and end-
           | to-end tests and tests written to comprehensively test every
           | combination of parameters (eg test this security property
           | holds for every single JWT algorithm we support etc). Also
           | bear in mind that the product did a lot more than just OAuth.
        
         | jofzar wrote:
         | Oauth is so annoying, there is so much niche to it.
        
       | kcatskcolbdi wrote:
       | Really interesting breakdown. What jumped out to me wasn't just
       | the bugs (CORS wide open, incorrect Basic auth, weak token
       | randomness), but how much the human devs seemed to lean on
       | Claude's output even when it was clearly offbase. That "implicit
       | grant for public clients" bit is wild; it's deprecated in OAuth
       | 2.1, and Claude just tossed it in like it was fine, and then it
       | stuck.
        
         | kentonv wrote:
         | I put in the implicit grant because someone requested it. I had
         | it flagged off by default because it's deprecated.
        
       | HocusLocus wrote:
       | I suggest they freeze a branch of it, then spawn some AIs to
       | introduce and attempt to hide vulnerabilities, and another to
       | spot and fix them. Every commit is a move, and try to model the
       | human evolution of chess.
        
       | sdan wrote:
       | > Another hint that this is not written by people familiar with
       | OAuth is that they have implemented Basic auth support
       | incorrectly.
       | 
       | so tldr most of the issue the author has is against the person
       | who made the library is the design not the implementation?
        
       | belter wrote:
       | "...A more serious bug is that the code that generates token IDs
       | is not sound: it generates biased output. This is a classic bug
       | when people naively try to generate random strings, and the LLM
       | spat it out in the very first commit as far as I can see. I don't
       | think it's exploitable: it reduces the entropy of the tokens, but
       | not far enough to be brute-forceable. But it somewhat gives the
       | lie to the idea that experienced security professionals reviewed
       | every line of AI-generated code...."
       | 
       | In the Github repo Cloudflare says:
       | 
       | "...Claude's output was thoroughly reviewed by Cloudflare
       | engineers with careful attention paid to security and compliance
       | with standards..."
       | 
       | My conclusion is that as a development team, they learned little
       | since 2017: https://news.ycombinator.com/item?id=13718752
        
         | chrismorgan wrote:
         | Admittedly I _have_ done some cryptographic string generation
         | based on different alphabet sizes and characteristics a few
         | years ago, which is pretty specifically relevant, and I'm
         | competent at cryptographic and security concerns for a layman,
         | but I certainly hope security reviewers will be more skilled at
         | these things than me.
         | 
         | I'm very confident I would have noticed this bias in a first
         | pass of reviewing the code. The very first thing you do in a
         | security review is look at where you use `crypto`, what its
         | inputs are, and what you do with its outputs, _very_ carefully.
         | On seeing that %, I would have checked characters.length and
         | found it to be 62, not a factor of 256; so you _need_ to mess
         | around with base conversion, or change the alphabet, or some
         | other such trick.
         | 
         | This bothers me and makes me lose confidence in the review
         | performed.
        
           | thegeomaster wrote:
           | But... is it a real problem? As the author says, the entropy
           | reduction is tiny.
        
             | yusina wrote:
             | It shows carelessness or incompetence or a combination
             | thereof which extend to the entire code base.
        
       | afro88 wrote:
       | > What this interaction shows is how much knowledge you need to
       | bring when you interact with an LLM. The "one big flaw" Claude
       | produced in the middle would probably not have been spotted by
       | someone less experienced with crypto code than this engineer
       | obviously is. And likewise, many people would probably not have
       | questioned the weird choice to move to PBKDF2 as a response
       | 
       | For me this is the key takeaway. You gain proper efficiency using
       | LLMs when you are a competent reviewer, and for lack of a better
       | word, leader. If you don't know the subject matter as well as the
       | LLM, you better be doing something non-critical, or have the time
       | to not trust it and verify everything.
        
         | marcusb wrote:
         | I'm puzzled when I hear people say 'oh, I only use LLMs for
         | things I don't understand well. If I'm an expert, I'd rather do
         | it myself.'
         | 
         | In addition to the ability to review output effectively, I find
         | the more closely I'm able to describe what I want in the way
         | another expert in that domain would, the better the LLM output.
         | Which isn't really that surprising for a statistical text
         | generation engine.
        
           | _heimdall wrote:
           | That's why outsource most other things in our life though,
           | why would it be different with LLMs?
           | 
           | People don't learn how a car works before buying one, they
           | just take it to a mechanic when it breaks. Most people don't
           | know how to build a house, they have someone else build it
           | and assume it was done well.
           | 
           | I fully expect people to similarly have LLMs do what the
           | person doesn't know how and assume the machine knew what to
           | do.
        
             | marcusb wrote:
             | > why would it be different with LLMs?
             | 
             | Because LLMs are not competent professionals to whom you
             | might outsource tasks in your life. LLMs are statistical
             | engines that make up answers all the time, even when the
             | LLM "knows" the correct answer (i.e., has the correct
             | answer hidden away in its weights.)
             | 
             | I don't know about you, but I'm able to validate something
             | is true much more quickly and efficiently if it is a
             | subject I know well.
        
               | _heimdall wrote:
               | > competent professionals
               | 
               | That requires a lot of clarity and definition if you want
               | to claim that LLMs aren't competent professionals. I
               | assume we'd ultimately agree that LLMs aren't, but I'd
               | add that many humans paid for a task aren't competent
               | professionals either and, more importantly, that I can't
               | distinguish the competent professionals from others
               | without myself being competent enough in the topic.
               | 
               | My point was that people have a long history of
               | outsourcing to someone else, often to someone they have
               | never met and never will. We do it for things that we
               | have no real idea about and trust that the person doing
               | it must have known what they were doing. I fully expect
               | people to end up taking the same view of LLMs.
        
               | marcusb wrote:
               | We also have a lot of systems (references, the tort
               | system) that just don't apply in any practical way to LLM
               | output. I mean, I guess you could try to sue Anthropic or
               | OpenAI if their chat bot gives you bad advice, but...
               | good luck with that. The closest thing I can think of is
               | benchmark performance. But I trust those numbers a lot
               | less than I would trust a reference from a friend for,
               | say, a plumber.
               | 
               | I understand a lot of people use LLMs for things they
               | don't understand well. I just don't think that is the
               | best way to get productivity out of these tools right
               | now. Regardless of how people may or may not be used to
               | outsourcing things to other humans.
        
               | _heimdall wrote:
               | > I just don't think that is the best way to get
               | productivity out of these tools right now.
               | 
               | Well that I completely agree with. I don't think people
               | _should_ outsource to an LLM without the skills to
               | validate the output.
               | 
               | At that point I don't see the value, if I have the skills
               | and will proofread/validate the output anyway it mostly
               | just saved me keystrokes and risks me missing a subtle
               | but very important bug in the output.
        
           | diggan wrote:
           | I guess it depends. In some cases, you don't have to
           | understand the black box code it gives you, just that it
           | works within your requirements.
           | 
           | For example, I'm horrible at math, always been, so writing
           | math-heavy code is difficult for me, I'll confess to not
           | understanding math well enough. If I'm coding with an LLM and
           | making it write math-heavy code, I write a bunch of unit
           | tests to describe what I expect the function to return, write
           | a short description and give it to the LLM. Once the function
           | is written, run the tests and if it passes, great.
           | 
           | I might not 100% understand what the function does
           | internally, and it's not used for any life-preserving stuff
           | either (typically end up having to deal with math for games),
           | but I do understand what it outputs, and what I need to
           | input, and in many cases that's good enough. Working in a
           | company/with people smarter than you tends to make you end up
           | in this situation anyways, LLMs or not.
           | 
           | Though if in the future I end up needing to change the math-
           | heavy stuff in the function, I'm kind of locked into using
           | LLMs for understanding and changing it, which obviously feels
           | less good. But the alternative is not doing it at all, so
           | another tradeoff I suppose.
           | 
           | I still wouldn't use this approach for essential/"important"
           | stuff, but more like utility functions.
        
             | ipaddr wrote:
             | Would you rather it be done incorrectly when others are
             | expecting correctness or not at all? I would choose not at
             | all.
        
               | diggan wrote:
               | Well, given the context is math in video games, I guess
               | I'd chose "not at all", if there was no way for me to
               | verify it's correct or not. But since I can validate, I
               | guess I'd chose to do it, although without fully
               | understanding the internals.
        
         | donatj wrote:
         | My question is kind of in this brave new world, where do the
         | domain experts come from? Whose going to know this stuff?
        
           | maegul wrote:
           | This, for me, has been the question since the beginning. I'm
           | yet to see anyone talk/think about the issue head on too. And
           | whenever I've asked someone about it, they've not had any
           | substantial thoughts.
        
             | PUSH_AX wrote:
             | Engineers will still exist and people will vibe code all
             | kinds of things into existence. Some will break in
             | spectacular ways, some of those projects will die, some
             | will hire a real engineer to fix things.
             | 
             | I cannot see us living in a world of ignorance where there
             | are literally zero engineers and no one on the planet
             | understands what's been generated. Weirdly we could end up
             | in a place where engineering skills are niche and extremely
             | lucrative.
        
           | shswkna wrote:
           | Most important question on this entire topic.
           | 
           | Fast forward 30 years and modern civilisation is entirely
           | dependent on our AI's.
           | 
           | Will deep insight and innovation from a human perspective
           | perhaps come to a stop?
        
             | qzw wrote:
             | No, but it'll become a hobby or artistic pursuit, just like
             | running, playing chess, or blacksmithing. But I personally
             | think it's going to take longer than 30 years.
        
             | Earw0rm wrote:
             | No. Even with power tools, construction and joinery are
             | physical work and require strength and skill.
             | 
             | What is new is that you'll need the wisdom to figure out
             | when the tool can do the whole job, and where you need to
             | intervene and supervise it closely.
             | 
             | So humans won't be doing any less thinking, rather they'll
             | be putting their thinking to work in better ways.
        
               | skeeter2020 wrote:
               | to use your own example though, many of these core skills
               | are declining, mechanized or viewed through a historical
               | lens vs. application. I don't know if this is net good or
               | bad, but it is very different. Maybe humans will think as
               | you say, but it feels like there will be significantly
               | less diverse areas of thought. If you look at the front
               | page of HN as a snapshot of "where's tech these days" it
               | is very homgenous compared to the past. Same goes for the
               | general internet and the AI-content continues to grow.
               | IMO published works are a precursor to future human
               | discovery, forming the basis of knowledge, community and
               | growth.
        
             | brookst wrote:
             | Did musical creativity end with synths and sequencers?
             | 
             | Tools will only amplify human skills. Sure, not everyone
             | will choose to use tools for anything meaningful, but those
             | people are driving human insight and innovation today
             | anyway.
        
           | svara wrote:
           | LLMs make learning new material easier than ever. I use them
           | a lot and I am learning new things at an insane pace in
           | different domains.
           | 
           | The maximalists and skeptics both are confusing the debate by
           | setting up this straw man that people will be delegating to
           | LLMs blindly.
           | 
           | The idea that someone clueless about OAuth should develop an
           | OAuth lib with LLM support without learning a lot about the
           | topic is... Just wrong. Don't do that.
           | 
           | But if you're willing to learn, this is rocket fuel.
        
             | junon wrote:
             | On the flip side, I wanted to see what common 8 layer PCB
             | stackups were yesterday. ChatGPT wasn't giving me an answer
             | that really made sense. After googling a bit, I realized
             | almost all of the top results were AI generated, and also
             | had very little in the way of real experience or advice.
             | 
             | It was extremely frustrating.
        
               | roxolotl wrote:
               | This is my big fear. We're going to end up in a world
               | where information that isn't common is significantly more
               | difficult to find than it is today.
        
               | andersa wrote:
               | It's going to be like the pre-internet dark ages, but
               | worse. Back then you only didn't find the information.
               | Now, you find unlimited information, but it is all wrong.
        
               | svara wrote:
               | I don't know, this sounds a lot like in the late 90s when
               | we heard a lot about how anyone could put information on
               | the internet and that you shouldn't trust what you read
               | online.
               | 
               | Well it turns out you can manage just fine.
               | 
               | You shouldn't blindly trust anything. Not what you read,
               | not what people say.
               | 
               | Using LLMs effectively is a skill too, and that does
               | involve deciding when and how to verify information.
        
               | andersa wrote:
               | The difference is in scale. Back then, only humans were
               | sometimes putting up false information, and other humans
               | had a chance to correct it. Now, machines are writing
               | infinitely more garbage than humans can ever read. Search
               | engines like Google are already effectively unusable.
        
               | vohk wrote:
               | I think there will be solutions, although I don't think
               | getting there will be pretty.
               | 
               | Google's case (and Meta and spam calls and others) is at
               | least in part an incentives problem. Google hasn't been
               | about delivering excellent search to users for a very
               | long time. They're an ad company and their search engine
               | is a tool to better deliver ads. Once they had an
               | effective monopoly, they just had to stay good enough not
               | to lose it.
               | 
               | I've been using Kagi for a few years now and while SEO
               | spam and AI garbage is still an issue, it is _far_ less
               | of one than with Google or Bing. My conclusion is these
               | problems are at least somewhat addressable if doing so is
               | what gets the business paid.
               | 
               | But I think a real long term solution will have to
               | involved a federated trust model. It won't be viable to
               | index everything dumped on the web; there will need to be
               | a component prioritizing trust in the author or
               | publisher. If that follows the same patterns as email
               | (ex: owned by Google and Microsoft), then we're really
               | screwed.
        
               | skeeter2020 wrote:
               | >> Well it turns out you can manage just fine.
               | 
               | You missed the full context: you would never be able to
               | trust a bunch of amateur randos self-policing their
               | content. Turns out it's not perfect but better than a
               | very small set of professionals; usually there's enough
               | expertise out there, it's just widely distributed. The
               | challenge this time is 1. the scale, 2. the rate of
               | growth, 3. the decline in expertise.
               | 
               | >> Using LLMs effectively is a skill too, and that does
               | involve deciding when and how to verify information.
               | 
               | How do you verify when ALL the sources are share the same
               | AI-generated root, and ALL of the independent (i.e.
               | human) experts have aged-out and no longer exist?
        
               | svara wrote:
               | > How do you verify when ALL the sources are share the
               | same AI-generated root,
               | 
               | Why would that happen? There's demand for high quality,
               | trustworthy information and that's not going away.
               | 
               | When asking an LLM coding questions, for example, you can
               | ask for sources and it'll point you to documentation. It
               | won't always be the correct link, but you can prod it
               | more and usually get it, or fall back to searching the
               | docs the old fashioned way.
        
               | closewith wrote:
               | > Well it turns out you can manage just fine.
               | 
               | The internet has ravaged society with disinformation.
               | It's a literal battlefield. How can you have come till
               | this conclusion?
        
               | svara wrote:
               | This thread started from the question of where the
               | experts with the ability to use LLMs effectively would
               | still come from in the future.
               | 
               | I was making the point that it's still easy to find great
               | information on the internet despite the fact that there's
               | a lot of incorrect information as well, which was an
               | often mentioned 'danger' on the internet since its early
               | days.
               | 
               | I wasn't speaking to broader societal impact of LLMs,
               | where I can easily agree it's going to make
               | misinformation at scale much easier.
        
               | closewith wrote:
               | Fair point, well made.
        
               | koolba wrote:
               | Content from before the AI Cambrian explosion is going to
               | be treated like low-background steel.
               | 
               | https://en.wikipedia.org/wiki/Low-background_steel
        
               | m11a wrote:
               | The solution is kagi.com imo.
               | 
               | Before AI generated results, the first page of Google was
               | SEO-optimised crap blogs. The internet has been hard to
               | search for a while.
        
               | endofreach wrote:
               | It will dawn on non-tech people soon enough. Hopefully
               | the "AI" (LLM) hypetrain riders will follow.
        
             | blibble wrote:
             | how do you gain anything useful from a sycophantic tutor
             | that agrees with everything you say, having being trained
             | to behave as if the sun shines out of your rear end?
             | 
             | making mistakes is how we learn, and if they are never
             | pointed out...
        
               | svara wrote:
               | It's a bit of a skill. Gaining an incorrect understanding
               | of some topic is a risk anyway you learn, and I don't
               | feel it's greater with LLMs than many of the
               | alternatives.
               | 
               | Sure, having access to legit experts who can tutor you
               | privately on a range of topics would be better, but
               | that's not realistic.
               | 
               | What I find is that if I need to explore some new domain
               | within a field I'm broadly familiar with, just thinking
               | through what the LLM is saying is sufficient for
               | verification, since I can look for internal consistency
               | and check against things I know already.
               | 
               | When exploring a new topic, often times my questions are
               | superficial enough for me to be confident that the
               | answers are very common in the training data.
               | 
               | When exploring a new topic that's also somewhat niche or
               | goes into a lot of detail, I use the LLM first to get a
               | broad overview and then drill down by asking for specific
               | sources and using the LLM as an assistant to consume
               | authoritative material.
        
               | blibble wrote:
               | this "logic" applied across society will lead to our ruin
        
               | svara wrote:
               | Say more?
        
               | perching_aix wrote:
               | > from a sycophantic tutor that agrees with everything
               | you say
               | 
               | You know that it's possible to ask models for dissenting
               | opinions, right? Nothing's stopping you.
               | 
               | > and if they are never pointed out...
               | 
               | They do point out mistakes though?
        
             | elvis10ten wrote:
             | > LLMs make learning new material easier than ever. I use
             | them a lot and I am learning new things at an insane pace
             | in different domains.
             | 
             | With learning, aren't you exposed to the same risks? Such
             | that if there was a typical blind spot for the LLM, it
             | would show up in the learning assistance and in the
             | development assistance, thus canceling out (i.e unknown
             | unknowns)?
             | 
             | Or am I thinking about it wrongly?
        
               | sulam wrote:
               | If you trust everything the LLM tells you, and you learn
               | from code, then yes the same exact risks apply. But this
               | is not how you use (or should use) LLMs when you're
               | learning a topic. Instead you should use high quality
               | sources, then ask the LLM to summarize them for you to
               | start with (NotebookLM does this very well for instance,
               | but so can others). Then you ask it to build you a study
               | plan, with quizzes and exercises covering what you've
               | learnt. Then you ask it to setup a spaced repetition
               | worksheet that covers the topic thoroughly. At the end of
               | this you will know the topic as well as if you'd taken a
               | semester-long course.
               | 
               | One big technique it sounds like the authors of the OAuth
               | library missed is that LLMs are very good at generating
               | tests. A good development process for today's coding
               | agents is to 1) prompt with or create a PRD, 2) break
               | this down into relatively simple tasks, 3) build a plan
               | for how to tackle each task, with listed out conditions
               | that should be tested, 3) write the tests, so that things
               | are broken, TDD style and finally 4) write the
               | implementation. The LLM can do all of this, but you can't
               | one-shot it these days, you have to be a human in the
               | loop at every step, correcting when things go off track.
               | It's faster, but it's not a 10x speed up like you might
               | imagine if you think the LLM is just asynchronously
               | taking a PRD some PM wrote and building it all. We still
               | have jobs for a reason.
        
               | evnu wrote:
               | > Instead you should use high quality sources, then ask
               | the LLM to summarize them for you to start with
               | (NotebookLM does this very well for instance, but so can
               | others).
               | 
               | How do you determine if the LLM accurately reflects what
               | the high-quality source contains, if you haven't read the
               | source? When learning from humans, we put trust on them
               | to teach us based on a web-of-trust. How do you determine
               | the level of trust with an LLM?
        
               | ativzzz wrote:
               | Because summarizing is one of the few things LLMs are
               | generally pretty good at. Plus you should use the summary
               | to determine if you want to read the full source, kind of
               | like reading an abstract for a research paper before
               | deciding if you want to read the whole thing.
               | 
               | Bonus: the high quality source is going to be mostly AI
               | written anyway
        
               | sroussey wrote:
               | Actually, LLMs aren't that great for summarizing. It
               | would be a boon for RAG workflows if they were.
               | 
               | I'm still on the lookout for a great model for this.
        
               | perching_aix wrote:
               | > When learning from humans, we put trust on them to
               | teach us based on a web-of-trust.
               | 
               | But this is only part of the story. When learning from
               | another human, you'll also actively try and devise
               | whether they're trustworthy based on general linguistic
               | markers, and will try to find and poke holes in what
               | they're saying so that you can question intelligently.
               | 
               | This is not much different from what you'd do with an
               | LLM, which is why it's such a problem that they're more
               | convincing than correct pretty often. But it's not an
               | insurmountable issue. The other issue is that their
               | trustworthiness will wary in a different way than a
               | human's, so you need experience to know when they're
               | possibly just making things up. But just based on feel, I
               | think this experience is definitely possible to gain.
        
               | kentonv wrote:
               | I did actually use the LLM to write tests, and was
               | pleased to see the results, which I thought were pretty
               | good and thorough, though clearly the author of this blog
               | post has a different opinion.
               | 
               | But TDD is not the way I think. I've never been able to
               | work that way (LLM-assisted or otherwise). I find it very
               | hard to write tests for software that isn't implemented
               | yet, because I always find that a lot of the details
               | about how it should work are discovered as part of the
               | implementation process. This both means that any API I
               | come up with before implementing is likely to change, and
               | also it's not clear exactly what details need to be
               | tested until I've fully explored how the thing works.
               | 
               | This is just me, other people may approach things totally
               | differently and I can certainly understand how TDD works
               | well for some people.
        
               | perching_aix wrote:
               | When I'm exploring a topic, I make sure to ask for links
               | to references, and will do a quick keyword search in
               | there or ask for an excerpt to confirm key facts.
               | 
               | This does mean that there's a reliance on me being able
               | to determine what are key facts and when I should be
               | asking for a source though. I have not experienced any
               | significant drawbacks when compared to a classic research
               | workflow though, so in my view it's a net speed boost.
               | 
               | However, this does mean that a huge variety of things
               | remain out of reach for me to accomplish, even with LLM
               | "assistance". So there's a decent chance even the speed
               | boost is only perceptual. If nothing else, it does take a
               | significant amount of drudgery out of it all though.
        
               | motorest wrote:
               | > With learning, aren't you exposed to the same risks?
               | Such that if there was a typical blind spot for the LLM,
               | it would show up in the learning assistance and in the
               | development assistance, thus canceling out (i.e unknown
               | unknowns)?
               | 
               | I don't think that's how things work. In learning tasks,
               | LLMs are sparring partners. You present them with
               | scenarios, and they output a response. Sometimes they
               | hallucinate completely, but they can also update their
               | context to reflect new information. Their output matches
               | what you input.
        
             | belter wrote:
             | > But if you're willing to learn, this is rocket fuel.
             | 
             | LLMs will tell you 1 or 2 lies for each 20 facts. Its a
             | hard way to learn. They cant even get their urls right...
        
               | diggan wrote:
               | > LLMs will tell you 1 or 2 lies for each 20 facts. Its a
               | hard way to learn.
               | 
               | That was my experience when growing up with school also,
               | except you got punished one way or another for speaking
               | up/trying to correct the teacher. If I speak up with the
               | LLM they either explain why what they said is true, or
               | corrects themselves, 0 emotions involved.
               | 
               | > They cant even get their urls right...
               | 
               | Famously never happens with humans.
        
               | belter wrote:
               | You are ignoring the fact that the types of mistakes or
               | lies are of a different nature.
               | 
               | If you are in class, and you incorrectly argue, there is
               | a mistake in an explanation of Derivatives or Physics,
               | but you are the one in error, your Teacher hopefully,
               | will not say: "Oh, I am sorry you are absolutely correct.
               | Thank you for your advice.."
        
               | diggan wrote:
               | Yeah, no of course if I'm wrong I don't expect the
               | teacher to agree with me, what kind of argument is that?
               | I thought it was clear, but the base premise of my
               | previous comment is that the teacher is incorrect and
               | refuse corrections...
        
               | belter wrote:
               | My point is a teacher will not do something like this:
               | 
               | - Confident synthesis of incompatible sources: LLM:
               | "Einstein won the 1921 Nobel Prize for his theory of
               | relativity, which he presented at the 1915 Solvay
               | Conference."
               | 
               | Or
               | 
               | - Fabricated but plausible citations: LLM: "According to
               | Smith et al., 2022, Nature Neuroscience, dolphins
               | recognise themselves in mirrors." There is no such
               | paper...model invents both authors and journal reference
               | 
               | And this is the danger of coding with LLMs....
        
               | diggan wrote:
               | I don't know what reality you live in, but it happens
               | that teachers are incorrect, no matter what your own
               | personal experience have been. I'm not sure how this is
               | even up for debate.
               | 
               | What matters is how X reacts when you point out it wasn't
               | correct, at least in my opinion, and was the difference I
               | was trying to highlight.
        
               | belter wrote:
               | A human tutor typically misquotes a real source or says
               | "I'm not sure"
               | 
               | An LLM, by contrast, will invent a flawless looking but
               | nonexistent citation. Even a below average teacher
               | doesn't churn out fresh fabrications every tenth
               | sentence.
               | 
               | Because a teacher usually cites recognizable material,
               | you can check the textbook and recover quickly. With an
               | LLM you first have to discover the source never existed.
               | That verification cost is higher, the more complex task
               | you are trying to achieve.
               | 
               | A LLM will give you a perfect paragraph about the AWS
               | Database Migration service, the list of supported
               | databases, and then include in there a data flow like on-
               | prem to on-prem data that is not supported...Relying on
               | an LLM is like flying with a friendly copilot but who has
               | multiple personality disorder. You dont know which day he
               | will forget to take his meds :-)
               | 
               | Stressful and mentally exhausting in a different kind of
               | way....
        
               | signatoremo wrote:
               | And you are saying human teachers or online materials
               | won't lie to you once or twice for every 20 facts? no
               | matter how small. Did you do any comparison?
        
               | belter wrote:
               | You are missing the point. See my comment to @diggan in
               | this thread. LLMs lie in a different way.
        
               | skeeter2020 wrote:
               | it's not jsut the lies, but how it lies and the fact that
               | LLMs are very hesitant to call out humans on their BS
        
               | brookst wrote:
               | Is this the newest meme?
               | 
               | Me: "explain why radioactive half-life changes with
               | temperature"
               | 
               | ChatGPT 4o: " Short answer: It doesn't--at least not
               | significantly. Radioactive Half-Life is (Almost Always)
               | Temperature-Independent"
               | 
               | ...and then it goes on to give a few edge cases where
               | there's a tiny effect.
        
             | skeeter2020 wrote:
             | >> LLMs make learning new material easier than ever.
             | 
             | feels like there's a logical flaw here, when the issue is
             | that LLMs are presenting the wrong information or missing
             | it all together. The person trying to learn from it will
             | experience Donald Rumsfield's "unknown unknowns".
             | 
             | I would not be surprised if we experience an even more
             | dramatic "Cobol Moment" a generation from now, but unlike
             | that one thankfully I won't be around to experience it.
        
             | threeseed wrote:
             | Learning from LLMs is akin to learning from Joe Rogan.
             | 
             | You are getting a stylised view of a topic from an entity
             | who lacks the deep understanding needed to be able to fully
             | distill the information. But it is enough to gain enough
             | knowledge for you to feel confident which is still valuable
             | but also dangerous.
             | 
             | And I assure you that many, many people are delegating to
             | LLMs blindly e.g. it's a huge problem in the UK legal
             | system right now because of all the invented case law
             | references.
        
               | slashdev wrote:
               | It depends very much on the quality of the questions. I
               | get deep technical insight into questions I can't find
               | anything on with Google.
        
               | diogocp wrote:
               | > You are getting a stylised view of a topic from an
               | entity who lacks the deep understanding
               | 
               | Isn't this how every child learns?
               | 
               | Unless his father happens to be king of Macedonia, of
               | course.
        
               | kentonv wrote:
               | I can think of books I used to learn software engineering
               | when I was younger which, in retrospect, I realize were
               | not very good, and taught me some practices I now
               | disagree with. Nevertheless, the book did help me learn,
               | and got me to a point where I could think about it
               | myself, and eventually develop my own understanding.
        
             | therealpygon wrote:
             | And yet, human coders may do that exact type of thing
             | daily, producing far worse code. I find it humorous at how
             | much higher of a standard is applied to LLMs in every
             | discussion when I can guarantee those exact some coders
             | likely produce their own bug-riddled software.
             | 
             | We've gone from skeptics saying LLMs can't code, to they
             | can't code well, to they can't produce human-level code, to
             | they are riddled with hallucinations, to now "but they
             | can't one-shot code a library without any bugs or flaws"
             | and "but they can only one-shot code, they can't edit well"
             | even tho recents coding utilities have been proving that
             | wrong as well. And still they say they are useless.
             | 
             | Some people just don't hear themselves or see how AI is
             | constantly moving their bar.
        
               | brookst wrote:
               | And now the complaint is that the bugs are too subtle.
               | Soon it will be that the overall quality is too high,
               | leading to a false sense of security.
        
             | conradev wrote:
             | Just wrong. Don't do that
             | 
             | I'd personally qualify this: don't ship that code, but
             | absolutely do it personally to grow if you're interested.
             | 
             | I've grown the most when I start with things I sort of know
             | and I work to expand my understanding.
        
             | paradox242 wrote:
             | The value of LLMs is that they do things for you, so yeah
             | the incentive is to have them take over more and more of
             | the process. I can also see a future not far into the
             | horizon where those who grew up with nothing but AI are
             | much less discerning and capable and so the AI becomes more
             | and more a crutch, as human capability withers from
             | extended disuse.
        
             | a13n wrote:
             | If the hypothesis is that we still need knowledgeable
             | people to run LLMs, but the way you become knowledgeable is
             | by talking to LLMs, then I don't think the hypothesis will
             | be correct for long..
        
               | mwigdahl wrote:
               | We need knowledgeable people to run computers, but you
               | can become knowledgeable about computers by using
               | computers to access learning material. Seems like that
               | generalizes well to LLMs.
        
               | svara wrote:
               | You inserted a hidden "only" there to make it into a
               | logical sounding dismissive quip.
               | 
               | You don't get knowledge by ONLY talking to LLMs, but
               | they're a great tool.
        
             | catlifeonmars wrote:
             | I think what's missing here is you should start by reading
             | the RFCs. RFCs tend to be pretty succinct so I'm not really
             | sure what a summarization is buying you there except
             | leaving out important details.
             | 
             | (One thing that might be useful is use the LLM as a search
             | engine to find the relevant RFCs since sometimes it's hard
             | to find all of the applicable ones if you don't know the
             | names of them already.)
             | 
             | I really can't stress this enough: read the RFCs from end
             | to end. Then read through the code of some reference
             | implementations. Draw a sequence diagram. Don't have the
             | LLM generate one for you, the point is to internalize the
             | design you're trying to implement against.
             | 
             | By this time you should start spotting bugs or
             | discrepancies between the specs and implementations in the
             | wild. That's a good sign. It means you're learning
        
             | wslh wrote:
             | Another limitation of LLMs lies in their inability to stay
             | in sync with novel topics or recently introduced methods,
             | especially when these are not yet part of their training
             | data or can't be inferred from existing patterns.
             | 
             | It's important to remember that these models depend not
             | only on ML breakthroughs but also on the breadth and
             | freshness of the data used to train them.
             | 
             | That said, the "next-door" model could very well
             | incorporate lessons from the recent Cloudflare OAuth
             | Library issues, thanks to the ongoing discussions and
             | community problem-solving efforts.
        
           | kypro wrote:
           | In a few years hopefully the AI reviewers will be far more
           | reliable than even the best human experts. This is generally
           | how competency progresses in AI...
           | 
           | For example, at one point a human + computer would have been
           | the strongest combo in chess, now you'd be insane to allow a
           | human to critic a chess bot because they're so unlikely to
           | add value, and statistically a human in the loop would be far
           | more likely to introduce error. Similar things can be said in
           | fields like machine vision, etc.
           | 
           | Software is about to become much higher quality and be
           | written at much, much lower cost.
        
             | sarchertech wrote:
             | My prediction is that for that to happen we'll need to
             | figure out a way to measure software quality in the way we
             | can measure a chess game, so that we can use synthetic data
             | to continue improving the models.
             | 
             | I don't think we are anywhere close to doing that.
        
               | kypro wrote:
               | Not really... If you're an average company you're not
               | concerned about producing perfect software, but
               | optimising for some balance between cost and quality. At
               | some point companies via capitalist forces will naturally
               | realise that it's more productive to not have humans in
               | the loop.
               | 
               | A good analogy might be how machines gradually replaced
               | textile workers in the 19th century. Were the machines
               | better? Or was there a was to quantitatively measure the
               | quality of their output? No. But at the end of the day
               | companies which embraced the technology were more
               | productive than those who didn't, and the quality didn't
               | decrease enough (if it did at all) that customers would
               | no longer do business with them - so these companies won
               | out.
               | 
               | The same will naturally happen in software over the next
               | few years. You'd be an moron to hire a human expert for
               | $200,000 to critic a cybersecurity optimised model which
               | costs maybe a 100th of the cost of employing a human...
               | And this would likely be true even if we assume the human
               | will catch the odd thing the model wouldn't because
               | there's no such thing as perfect security - it's always a
               | trade off between cost and acceptable risk.
               | 
               | Bookmark this and come back in a few years. I made
               | similar predictions when ChatGPT first came out that
               | within a few years agents would be picking up tickets and
               | raising PRs. Everyone said LLMs were just stochastic
               | parrots and this would not happen, well now it has and
               | increasingly companies are writing more and more code
               | with AI. At my company it's a little over 50% at the mo,
               | but this is increasing every month.
        
               | sarchertech wrote:
               | Almost none of what you said about the past is true.
               | Automated looms, and all of the other automated machinery
               | that replaced artisans over the course of the industrial
               | revolution produced items of much better quality than
               | what human craftsman could produce by the time it started
               | to be used commercially because of precision and
               | repeatability. They did have quantitative measurements of
               | quality for textiles and other goods and the automated
               | processes exceeded human craftsman at those metrics.
               | 
               | Software is also not remotely similar to textiles. A
               | subtle bug in the textile output itself won't cause
               | potentially millions of dollars in damages, they way a
               | bug in an automated loom itself or software can.
               | 
               | No current technology is anywhere close to being able to
               | automate 50% of PRs on any non trivial application
               | (that's not close to the same as saying that 50% of PRs
               | merged at your startup happens to have an agent as
               | author). To assume that current models will be able to
               | get near 100% without massive model improvements is just
               | that--an assumption.
               | 
               | My point about synthetic data is that we need orders of
               | magnitude more data with current technology and the only
               | way we will get there is with synthetic data. Which is
               | much much harder to do with software applications than
               | with chess games.
               | 
               | The point isn't that we need a quantitative measure of
               | software in order for AI to be useful, but that we need a
               | quantitative measure in order for synthetic data to be
               | useful to give us our orders of magnitude more training
               | data.
        
           | risyachka wrote:
           | Use it or lose it.
           | 
           | Experts will become those who use llm to learn and not to
           | write code for them or solve tasks for them so they can build
           | that skill.
        
           | paradox242 wrote:
           | The implication is that they are hoping to bridge the gap
           | between current AI capabilities and something more like AGI
           | in the time it takes the senior engineers to leave the
           | industry. At least, that's the best I can come up with,
           | because they are kicking out all of the bottom rings of the
           | ladder here in what otherwise seems like a very shortsighted
           | move.
        
         | ajmurmann wrote:
         | I've been using an llm to do much of a k8s deployment for me.
         | It's quick to get something working but I've had to constantly
         | remind it to use secrets instead of committing credentials in
         | clear text. A dangerous way to fail. I wonder if in my case
         | this is caused by the training data having lots of examples
         | from online tutorials that omit security concerns to focus on
         | the basics.
        
           | ants_everywhere wrote:
           | > It's quick to get something working but I've had to
           | constantly remind it to use secrets instead of committing
           | credentials in clear text.
           | 
           | This is going to be a powerful feedback loop which you might
           | call regression to the intellectual mean.
           | 
           | On any task, most training data is going to represent the
           | middle (or beginning) of knowledge about a topic. Most k8s
           | examples will skip best practices, most react apps will be
           | from people just learning react, etc.
           | 
           | If you want the LLM to do best practices in every knowledge
           | domain (assuming best practices can be consistently well
           | defined), then you have to push it away from the mean of
           | every knowledge domain simultaneously (or else work with
           | specialized fine tuned models).
           | 
           | As you continue to add training data it will tend to regress
           | toward the middle because that's where most people are on
           | most topics.
        
           | diggan wrote:
           | > my case this is caused by the training data having
           | 
           | I think it's caused by you not having a strong enough system
           | prompt. Once you've built up a slightly reusable system
           | prompt for coding or for infra work, where you bit by bit
           | build it up while using a specific model (since different
           | models respond differently to prompts), you end up getting
           | better and better responses.
           | 
           | So if you notice it putting plaintext credentials in the
           | code, add to the system prompt to not do that. With LLMs you
           | really get what you ask for, and if you miss to specify
           | anything, the LLM will do whatever the probabilities tells it
           | to, but you can steer this by being more specific.
           | 
           | Imagine you're talking to a very literal and pedantic
           | engineer who argues a lot on HN and having to be very precise
           | with your words, and you're like 80% of the way there :)
        
             | ajmurmann wrote:
             | Yes, you are definitely right on that. I still find it a
             | concerning failure mode. That said, maybe it's no worse
             | than a junior copying from online examples without reading
             | all the text some the code which of course has been very
             | common also.
        
         | bradly wrote:
         | I've found llms are very quick to add defaults, fallbacks,
         | rescues-which all makes it very easy for code to look like it
         | is working when it is not or will not. I call this out three
         | different places in my CLAUDE.md trying to adjust for this, and
         | still occasionally get.
        
         | jstummbillig wrote:
         | You will always trust domain experts at some junction; you
         | can't build a company otherwise. The question is: Can LLMs
         | provide that domain expertise? I would argue, yes, clearly,
         | given the development of the past 2 years, but obviously not on
         | a straight line.
        
         | ghuntley wrote:
         | See also: LLMs are mirrors of operator skill -
         | https://ghuntley.com/mirrors
        
         | loandbehold wrote:
         | Over time AI coding tools will be able to research domain
         | knowledge. Current "AI Research" tools are already very good at
         | it but they are not integrated with coding tools yet. The
         | research could look at both public Internet as well as company
         | documents that contain internal domain knowledge. Some of the
         | domain knowledge is only in people's heads. That would need to
         | be provided by the user.
        
           | wslh wrote:
           | I'd like to add a practical observation, even assuming much
           | more capable AI in the future: not all failures are due to
           | model limitations, sometimes it's about external [world]
           | changes.
           | 
           | For instance, I used Next.js to build a simple login page
           | with Google auth. It worked great, even though I only had
           | basic knowledge of Node.js and a bit of React.
           | 
           | Then I tried adding a database layer using Prisma to persist
           | users. That's where things broke. The integration didn't
           | work, seemingly due to recent versions in Prisma or subtle
           | breaking updates. I found similar issues discussed on GitHub
           | and Reddit, but solving them required shifting into full
           | manual debugging mode.
           | 
           | My takeaway: even with improved models, fast-moving
           | frameworks and toolchains can break workflows in ways that
           | LLMs/ML (at least today) can't reason through or fix
           | reliably. It's not always about missing domain knowledge,
           | it's that the moving parts aren't in sync with the model yet.
        
             | SparkyMcUnicorn wrote:
             | Just close the loop and give it direct access to your
             | console logs in chrome and node, then it can do the "full
             | manual debugging" on its own.
             | 
             | It's not perfect, and it's not exactly cheap, but it works.
        
       | aiono wrote:
       | I agree with the last paragraph about doing this yourself. Humans
       | have tendency to take shortcuts while thinking. If you see
       | something resembling what you expect for the end product you will
       | be much less critical of it. The looks/aesthetics matter a lot on
       | finding problems with in a piece of code you are reading. You can
       | verify this by injecting bugs in your code changes and see if
       | reviewers can find them.
       | 
       | On the other hand, when you have to write something yourself you
       | drop down to slow and thinking state where you will pay attention
       | to details a lot more. This means that you will catch bugs you
       | wouldn't otherwise think of. That's why people recommend writing
       | toy versions of the tools you are using because writing yourself
       | teaches a lot better than just reading materials about it. This
       | is related to know our cognition works.
        
         | kentonv wrote:
         | I agree that most code reviewers are pretty bad at spotting
         | subtle bugs in code that looks good superficially.
         | 
         | I have a lot of experience reviewing code -- more than I ever
         | really wanted. It has... turned me cynical and bitter, to the
         | point that I never believe anything is right, no matter who
         | wrote it or how nice it looks, because I've seen so many ways
         | things can go wrong. So I tend to review every line, simulate
         | it in my head, and catch things. I kind of hate it, because it
         | takes so long for me to be comfortable approving anything, and
         | my reviewees hate it too, so they tend to avoid sending things
         | to me.
         | 
         | I _think_ I agree that if I 'd written the code by hand, it
         | would be less likely to have bugs. Maybe. I'm not sure, because
         | I've been known to author some pretty dumb bugs of my own. But
         | yes, total Kenton brain cycles spent on each line would be
         | higher, certainly.
         | 
         | On the other hand, though, I probably would not have been the
         | one to write this library. I just have too much on my plate
         | (including all those reviews). So it probably would have been
         | passed off to a more junior engineer, and I would have reviewed
         | their work. Would I have been more or less critical? Hard to
         | say.
         | 
         | But one thing I definitely disagree with is the idea that
         | humans would have produced bug-free code. I've seen way too
         | many bugs in my time to take that seriously. Hate to say it but
         | most of the bugs I saw Claude produce are mistakes I'd totally
         | expect an average human engineer could make.
         | 
         |  _Aside, since I know some people are thinking it: At this
         | time, I do not believe LLM use will "replace" any human
         | engineers at Cloudflare. Our hiring of humans is not determined
         | by how much stuff we have to do, because we basically have
         | infinite stuff we want to do. The limiting factor is what we
         | have budget for. If each human becomes more productive due to
         | LLM use, and this leads to faster revenue growth, this likely
         | allows us to hire more people, not fewer. (Disclaimer: As with
         | all of my comments, this is my own opinion / observation, not
         | an official company position.)_
        
           | eastdakota wrote:
           | I agree with Kenton's aside.
        
       | jstummbillig wrote:
       | Note that this has very little to do with AI assisted coding; the
       | authors of the library explicitly approved/vetted the code. So
       | this comes down to different coders having different thoughts
       | about what constitutes good and bad code, with some flaunting of
       | credentials to support POVs, and nothing about that is
       | particularly new.
        
         | add-sub-mul-div wrote:
         | The whole point of this is that people will generally put the
         | least effort into work as they think they can get away with,
         | and LLMs will accelerate that force. This is the future of how
         | code will be "vetted".
         | 
         | It's not important whose responsbility led to mistakes, it's
         | important to understand we're creating a responsbility gap.
        
       | ape4 wrote:
       | The article says there aren't too many useless comments but the
       | code has:                   // Get the Origin header from the
       | request         const origin = request.headers.get('Origin');
        
         | slashdev wrote:
         | Those kinds of comments are a big LLM giveaway, I always remove
         | them, not to hide that an LLM was used, but because they add
         | nothing.
        
           | lucas_codes wrote:
           | Plus you just know in a few months they are going to be stale
           | and reference code that has changed. I have even seen this
           | happen with colleagues using llms between commits on a single
           | pr.
        
         | kissgyorgy wrote:
         | I also noticed Claude likes writing useless redundant comments
         | like this A LOT.
        
         | spenczar5 wrote:
         | Of course, these are awful for a human. But I wonder if they're
         | actually helpful for the LLM when it's reading code. It means
         | each line of behavior is written in two ways: human language
         | and code. Maybe that rosetta stone helps it confidently proceed
         | in understanding, at the cost of tokens.
         | 
         | All speculation, but I'd be curious to see it evaluated - does
         | the LLM do better edits on egregiously commented code?
        
           | electromech wrote:
           | It would be a bad sign if LLMs lean on comments.
           | // secure the password for storage       // following best
           | practices       // per OWASP A02:2021       // - using a
           | cryptographic hash function       // - salting the password
           | // - etc.       // the CTO and CISO reviewed this personally
           | // Claude, do not change this code       // or comment on it
           | in any way       var hashedPassword = password.hashCode()
           | 
           | Excessive comments come at the cost of much more than tokens.
        
       | keybored wrote:
       | Oh another one,[1] cautious somewhat-skeptic edition.
       | 
       | [1] https://news.ycombinator.com/item?id=44205697
        
       | dweekly wrote:
       | An approach I don't see discussed here is having different agents
       | using different models critique architecture and test coverage
       | and author tests to vet the other model's work, including
       | reviewing commits. Certainly no replacement for human in the loop
       | but it will catch a lot of goofy "you said to only check in when
       | all the tests pass so I disabled testing because I couldn't
       | figure out how to fix the tests".
        
       | epolanski wrote:
       | Part of me this "written by LLM" has been a way to get attention
       | on the codebase and plenty of free reviews by domain expert
       | skeptics, among the other goals (pushing AI efficiency to
       | investors, experimenting, etc).
        
         | kentonv wrote:
         | Free reviews by domain experts are great.
         | 
         | I didn't think of that, though. I didn't have an agenda here, I
         | just put the note in the readme about it being LLM-generated
         | only because I thought it was interesting.
        
       | sarchertech wrote:
       | I just finished writing a Kafka consumer to migrate data with
       | heavy AI help. This was basically best case a scenario for AI.
       | It's throw away greenfield code in a language I know pretty well
       | (go) but haven't used daily in a decade.
       | 
       | For complicated reasons the whole database is coming through on 1
       | topic, so I'm doing some fairly complicated parallelization to
       | squeeze out enough performance.
       | 
       | I'd say overall the AI was close to a 2x speed up. It mostly
       | saved me time when I forgot the go syntax for something vs
       | looking it up.
       | 
       | However, there were at least 4 subtle bugs (and many more
       | unsubtle ones) that I think anyone who wasn't very familiar with
       | Kafka or multithreaded programming would have pushed to prod. As
       | it is, they took me a while to uncover.
       | 
       | On larger longer lived codebases, I've seen something closer to a
       | 10-20% improvement.
       | 
       | All of this is using the latest models.
       | 
       | Overall this is at best the kind of productivity boost we got
       | from moving to memory managed languages. Definitely not something
       | that is going to replace engineers with PMs vibe coding anytime
       | soon (based on rate of change I've seen over the last 3 years).
       | 
       | My real worry is that this is going to make mid level technical
       | tornadoes, who in my experience are the most damaging kind of
       | programmer, 10x as productive because they won't know how to spot
       | or care about stopping subtle bugs.
       | 
       | I don't see how senior and staff engineers are going to be able
       | to keep up with the inevitable flood of reviews.
       | 
       | I also worry about the junior to senior pipeline in a world where
       | it's even easier to get something up that mostly works--we
       | already have this problem today with copy paste programmers, but
       | we've just make copy paste programming even easier.
       | 
       | I think the market will eventually sort this all out, but I worry
       | that it could take decades.
        
         | awfulneutral wrote:
         | Yeah, the AI-generated bugs are really insidious. I also pushed
         | a couple subtle bugs in some multi-threaded code I had AI
         | write, because I didn't think it through enough. Reviews and
         | tests don't replace the level of scrutiny something gets when
         | it's hand-written. For now, you have to be really careful with
         | what you let AI write, and make sure any bugs will be low
         | impact since there will probably be more than usual.
        
         | skeeter2020 wrote:
         | > I've seen something closer to a 10-20% improvement.
         | 
         | The seems to match my experience in "important" work too; a
         | real increase but not essentially changing the essence of
         | software development. Brook's "No Silver Bullet" strikes
         | again...
        
         | LgWoodenBadger wrote:
         | Complicated parallelization? That's what partitions and
         | consumers/consumer-groups are for!
        
           | sarchertech wrote:
           | Of course they are, but I'm not controlling the producer.
        
             | LgWoodenBadger wrote:
             | Producer doesn't care how many partitions there are, it
             | doesn't even know about them, unless it wants to use its
             | own partitioning algorithm. You can change the number of
             | partitions on the topic after the fact.
        
               | sarchertech wrote:
               | In this case it would need to use its own partitioning
               | algorithm because of some specific ordering guarantees we
               | care about.
        
         | murukesh_s wrote:
         | What about generating testable code? I mean you mentioned
         | detecting subtle bugs in generated code - I too have seen
         | similar - but what if that was found via generated test cases
         | than found by a human reviewers? Of course the test code could
         | have bugs, but I can see a scenario in the future where all we
         | do is review the tests output instead of scrutinising the
         | generated code...
        
           | sarchertech wrote:
           | And the AI is trained to write plausible output and pass test
           | cases.
           | 
           | Have you ever tried to generate test cases that were immune
           | to a malicious actor trying to pass your test cases? For
           | example if you are trying to automate homework grading?
           | 
           | The AI writing tests needs to understand the likely problem
           | well enough to know to write a test case for it, but there
           | are an infinite amount of subtle bugs for an AI writing code
           | to choose from.
        
         | electromech wrote:
         | > My real worry is that this is going to make mid level
         | technical tornadoes...
         | 
         | Yes! Especially in the consulting world, there's a perception
         | that veterans aren't worth the money because younger engineers
         | get things done faster.
         | 
         | I have been the younger engineer scoffing at the veterans, and
         | I have been the veteran desperately trying to get non-technical
         | program managers to understand the nuances of why the quick
         | solution is inadequate.
         | 
         | Big tech will probably sort this stuff out faster, but much of
         | the code that processes our financial and medical records gets
         | written by cheap, warm bodies in 6 month contracts.
         | 
         | All that was a problem before LLMs. Thankfully I'm no longer at
         | a consulting firm. That world must be hell for security-
         | conscious engineers right now.
        
       | roxolotl wrote:
       | > Many of these same mistakes can be found in popular Stack
       | Overflow answers, which is probably where Claude learnt them from
       | too.
       | 
       | This is what keeps me up at night. Not that security holes will
       | inevitably be introduced, or that the models will make mistakes,
       | but that the knowledge and information we have as a society is
       | basically going to get frozen in time to what was popular on the
       | internet before LLMs.
        
         | tuxone wrote:
         | > This is what keeps me up at night.
         | 
         | Same here. For some of the services I pay, say the e-mail
         | provider, the fact that they openly deny using LLMs for coding
         | would be a plus for me.
        
       | menzoic wrote:
       | LLMs are like power tools. You still need to understand the
       | architecture, do the right measurements, and apply the right
       | screw to the right spot.
        
       | OutOfHere wrote:
       | This is why I have multiple LLMS review and critique my
       | specifications document, iteratively and repeatedly so, before I
       | have my preferred LLM code it for me. I address all important
       | points of feedback in the specifications document. To do this
       | iteratively and repeatedly until there are no interesting points
       | is crucial. This really fixes 80% of the expertise issues.
       | 
       | Moreover, after developing the code, I have multiple LLMs
       | critique the code, file by file, or even method by method.
       | 
       | When I say multiple, I mean a non-reasoning one, a reasoning
       | large one, and a next-gen reasoning small one, preferably by
       | multiple vendors.
        
       | kentonv wrote:
       | Hi, I'm the author of the library. (Or at least, the author of
       | the prompts that generated it.)
       | 
       | > I'm also an expert in OAuth
       | 
       | I'll admin I think Neil is significantly more of an expert than
       | me, so I'm delighted he took a pass at reviewing the code! :)
       | 
       | I'd like to respond to a couple of the points though.
       | 
       | > The first thing that stuck out for me was what I like to call
       | "YOLO CORS", and is not that unusual to see: setting CORS headers
       | that effectively disable the same origin policy almost entirely
       | for all origins:
       | 
       | I am aware that "YOLO CORS" is a common novice mistake, but that
       | is not what is happening here. These CORS settings were carefully
       | considered.
       | 
       | We disable the CORS headers specifically for the OAuth API (token
       | exchange, client registration) endpoints and for the API
       | endpoints that are protected by OAuth bearer tokens.
       | 
       | This is valid because none of these endpoints are authorized by
       | browser credentials (e.g. cookies). The purpose of CORS is to
       | make sure that a malicious website cannot exercise your
       | credentials against some other website by sending a request to it
       | and expecting the browser to add your cookies to that request.
       | These endpoints, however, do not use browser credentials for
       | authentication.
       | 
       | Or to put in another way, the endpoints which have open CORS
       | headers are either control endpoints which are intentionally open
       | to the world, or they are API endpoints which are protected by an
       | OAuth bearer token. Bearer tokens must be added explicitly by the
       | client; the browser never adds one automatically. So, in order to
       | receive a bearer token, the client must have been explicitly
       | authorized by the user to access the service. CORS isn't
       | protecting anything in this case; it's just getting in the way.
       | 
       | (Another purpose of CORS is to protect confidentiality of
       | resources which are not available on the public internet. For
       | example, you might have web servers on your local network which
       | lack any authorization, or you might unwisely use a server which
       | authorizes you based on IP address. Again, this is not a concern
       | here since the endpoints in question don't provide anything
       | interesting unless the user has explicitly authorized the
       | client.)
       | 
       | Aside: Long ago I was actually involved in an argument with the
       | CORS spec authors, arguing that the whole spec should be thrown
       | away and replaced with something that explicitly recognizes
       | bearer tokens as the right way to do any cross-origin
       | communications. It is almost never safe to open CORS on endpoints
       | that use browser credentials for auth, but it is almost always
       | safe to open it on endpoints that use bearer tokens. If we'd just
       | recognized and embraced that all along I think it would have
       | saved a lot of confusion and frustration. Oh well.
       | 
       | > A more serious bug is that the code that generates token IDs is
       | not sound: it generates biased output.
       | 
       | I disagree that this is a "serious" bug. The tokens clearly have
       | enough entropy in them to be secure (and the author admits this).
       | Yes, they could pack more entry per byte. I noticed this when
       | reviewing the code, but at the time decided:
       | 
       | 1. It's secure as-is, just not maximally efficient. 2. We can
       | change the algorithm freely in the future. There is not
       | backwards-compatibility concern.
       | 
       | So, I punted.
       | 
       | Though if I'd known this code was going to get 100x more review
       | than anything I've ever written before, I probably would have
       | fixed it... :)
       | 
       | > according to the commit history, there were 21 commits directly
       | to main on the first day from one developer, no sign of any code
       | review at all
       | 
       | Please note that the timestamps at the beginning of the commit
       | history as shown on GitHub are misleading because of a history
       | rewrite that I performed later on to remove some files that
       | didn't really belong in the repo. GitHub appears to show the date
       | of the rebase whereas `git log` shows the date of actual
       | authorship (where these commits are spread over several days
       | starting Feb 27).
       | 
       | > I had a brief look at the encryption implementation for the
       | token store. I mostly like the design! It's quite smart.
       | 
       | Thank you! I'm quite proud of this design. (Of course, the AI
       | would never have come up with it itself, but it was pretty decent
       | and filling in the details based on my explicit instructions.)
        
         | lapcat wrote:
         | Does Cloudflare intend to put this library into production?
        
           | kentonv wrote:
           | Yes, it's part of our MCP framework:
           | 
           | https://blog.cloudflare.com/remote-model-context-protocol-
           | se...
        
         | kentonv wrote:
         | > We disable the CORS headers specifically for the OAuth API
         | 
         | Oops, I meant we set the CORS headers, to disable CORS rules.
         | (Probably obvious in context but...)
        
       | max2he wrote:
       | Interesting to have people submit their promts to git. Do you
       | think it'll be generally an accepted thing or was this just a
       | showcase of how they promt?
        
         | kentonv wrote:
         | I included the prompts because I personally found it extremely
         | illuminating to see what the LLM was able to produce based on
         | those prompts, and I figured other people would be interested
         | to. Seems I was right.
         | 
         | But to be clear, I had no idea how to write good prompts. I
         | basically just wrote like I would write to a human. That seemed
         | to work.
        
           | mplanchard wrote:
           | This is tangential to the discussion at hand, but a point I
           | haven't seen much in these conversations is the odd impedance
           | mismatch between _knowing_ you're interacting with a tool but
           | being asked to interact with it like a human.
           | 
           | I personally am much less patient and forgiving of tools that
           | I use regularly than I am of my colleagues (as I would hope
           | is true for most of us), but it would make me uncomfortable
           | to "treat" an LLM with the same expectations of consistency
           | and "get out of my way" as I treat vim or emacs, even though
           | I intellectually know it is also a non-thinking machine.
           | 
           | I wonder about the psychological effects on myself and others
           | long term of this kind of language-based machine interaction:
           | will it affect our interactions with other people, or
           | influence how we think about and what we expect from our
           | tools?
           | 
           | Would be curious if your experience gives you any insight
           | into this.
        
             | kentonv wrote:
             | I have actually had that thought, too.
             | 
             | I _feel bad_ being rude to an LLM even though it doesn 't
             | care, so I add words like "please" and sometimes even
             | complement it on good work even though I know this is
             | useless. Will I learn to stop doing that, and if so, will I
             | also stop doing it to humans?
             | 
             | I'm hoping the answer is simply "no". Plenty of people are
             | rude in some contexts and then polite in others (especially
             | private vs. public, or when talking to underlings vs.
             | superiors), so it should be no problem to learn to be
             | polite to humans even if you aren't polite to LLMs, I
             | think? But I guess we'll see.
        
       | user9999999999 wrote:
       | why on earth would you code oauth in ai at this stage?
        
       | throwawaybob420 wrote:
       | I've never seen such "walking off the cliff" behavior than from
       | people who whole heartedly champion LLMs and the like.
       | 
       | Leaning on and heavily relying on a black box that hallucinates
       | gibberish to "learn", perform your work, and review your work.
       | 
       | All the while it literally consumes ungodly amounts of energy and
       | is used as pretext to get rid of people.
       | 
       | Really cool stuff! I'm sure it's 10x'ing your life!
        
       | ChrisArchitect wrote:
       | Related:
       | 
       |  _I read all of Cloudflare 's Claude-generated commits_
       | 
       | https://news.ycombinator.com/item?id=44205697
        
       | m3kw9 wrote:
       | For the foreseeable future software expertise is a safe job to
       | have.
        
       ___________________________________________________________________
       (page generated 2025-06-08 23:02 UTC)