[HN Gopher] A look at Cloudflare's AI-coded OAuth library
___________________________________________________________________
A look at Cloudflare's AI-coded OAuth library
Author : itsadok
Score : 230 points
Date : 2025-06-08 08:50 UTC (14 hours ago)
(HTM) web link (neilmadden.blog)
(TXT) w3m dump (neilmadden.blog)
| CuriouslyC wrote:
| Mostly a good writeup, but I think there's some serious shifting
| the goalposts of what "vibe coded" means in a disingenuous way
| towards the end:
|
| 'Yes, this does come across as a bit "vibe-coded", despite what
| the README says, but so does a lot of code I see written by
| humans. LLM or not, we have to give a shit.'
|
| If what most people do is "vibe coding" in general, the current
| definition of vibe coding is essentially meaningless. Instead,
| the author is making the distinction between "interim workable"
| and "stainless/battle tested" which is another dimension of code
| entirely. To describe that as vibe coding causes me to view the
| author's intent with suspicion.
| croes wrote:
| Isn't vibe coding just C&P from AI instead of Stack Overflow?
|
| I read it as: done by AI but not checked by humans.
| ranguna wrote:
| Yep I see it like that as well, code with 0 or very close to
| 0 interactions from humans. Anyone who wants to change that
| meaning is not serious.
| techpression wrote:
| I find "vibe coding" to be one of the, if not the, concepts in
| this business to lose its meaning the fastest. Similar to how
| everything all of a sudden was "cloud" now everything is "vibe
| coded", even though reading the original tweet really narrows
| it down thoroughly.
| dimitri-vs wrote:
| IMO it's pretty clear what vibe coding is: you don't look at
| the code, only the results. If you're making judgement on the
| code, it's not vibe coding.
| keybored wrote:
| Viral marketing campaign term losing its meaning makes sense.
| simonw wrote:
| How do you define vibe coding?
| SiempreViernes wrote:
| A very good piece that clearly illustrates one of the dangers
| with LLS's: responsibility for code quality is blindly offloaded
| on the automatic system
|
| > There are some tests, and they are OK, but they are woefully
| inadequate for what I would expect of a critical auth service.
| Testing every MUST and MUST NOT in the spec is a bare minimum,
| not to mention as many abuse cases as you can think of, but none
| of that is here from what I can see: just basic functionality
| tests.
|
| and
|
| > There are some odd choices in the code, and things that lead me
| to believe that the people involved are not actually familiar
| with the OAuth specs at all. For example, this commit adds
| support for public clients, but does so by implementing the
| deprecated "implicit" grant (removed in OAuth 2.1).
|
| As Madden concludes "LLM or not, we have to give a shit."
| JimDabell wrote:
| > A very good piece that clearly illustrates one of the dangers
| with LLS's: responsibility for code quality is blindly
| offloaded on the automatic system
|
| It does not illustrate that at all.
|
| > Claude's output was thoroughly reviewed by Cloudflare
| engineers with careful attention paid to security and
| compliance with standards.
|
| > To emphasize, *this is not "vibe coded"*. Every line was
| thoroughly reviewed and cross-referenced with relevant RFCs, by
| security experts with previous experience with those RFCs.
|
| -- https://github.com/cloudflare/workers-oauth-provider
|
| The humans who worked on it very, very clearly took
| responsibility for code quality. That they didn't get it 100%
| right does _not_ mean that they "blindly offloaded
| responsibility".
|
| Perhaps you can level that accusation at _other_ people doing
| _different_ things, but Cloudflare explicitly placed the
| responsibility for this on the humans.
| djoldman wrote:
| > At ForgeRock, we had hundreds of security bugs in our OAuth
| implementation, and that was despite having 100s of thousands of
| automated tests run on every commit, threat modelling, top-flight
| SAST/DAST, and extremely careful security review by experts.
|
| Wow. Anecdotally it's my understanding that OAuth is ... tricky
| ... but wow.
|
| Some would say it's a dumpster fire. I've never read the spec or
| implemented it.
| bandoti wrote:
| Honestly, new code always has bugs though. That's pretty much a
| guarantee--especially if it's somewhat complex.
|
| That's why companies go for things that are "battle tested"
| like vibe coding. ;)
|
| Joke aside--I like how Anthropic is using their own product in
| a pragmatic fashion. I'm wondering if they'll use it for their
| MCP authentication API.
| stuaxo wrote:
| The times I've been involved with implementations it's been
| really horrible.
| jajko wrote:
| Hundreds of thousands of tests? That sounds like quantity >
| quality or outright llm-generated ones, who even maintains
| them?
| nmadden wrote:
| This was before LLMs. It was a combination of unit and end-
| to-end tests and tests written to comprehensively test every
| combination of parameters (eg test this security property
| holds for every single JWT algorithm we support etc). Also
| bear in mind that the product did a lot more than just OAuth.
| jofzar wrote:
| Oauth is so annoying, there is so much niche to it.
| kcatskcolbdi wrote:
| Really interesting breakdown. What jumped out to me wasn't just
| the bugs (CORS wide open, incorrect Basic auth, weak token
| randomness), but how much the human devs seemed to lean on
| Claude's output even when it was clearly offbase. That "implicit
| grant for public clients" bit is wild; it's deprecated in OAuth
| 2.1, and Claude just tossed it in like it was fine, and then it
| stuck.
| kentonv wrote:
| I put in the implicit grant because someone requested it. I had
| it flagged off by default because it's deprecated.
| HocusLocus wrote:
| I suggest they freeze a branch of it, then spawn some AIs to
| introduce and attempt to hide vulnerabilities, and another to
| spot and fix them. Every commit is a move, and try to model the
| human evolution of chess.
| sdan wrote:
| > Another hint that this is not written by people familiar with
| OAuth is that they have implemented Basic auth support
| incorrectly.
|
| so tldr most of the issue the author has is against the person
| who made the library is the design not the implementation?
| belter wrote:
| "...A more serious bug is that the code that generates token IDs
| is not sound: it generates biased output. This is a classic bug
| when people naively try to generate random strings, and the LLM
| spat it out in the very first commit as far as I can see. I don't
| think it's exploitable: it reduces the entropy of the tokens, but
| not far enough to be brute-forceable. But it somewhat gives the
| lie to the idea that experienced security professionals reviewed
| every line of AI-generated code...."
|
| In the Github repo Cloudflare says:
|
| "...Claude's output was thoroughly reviewed by Cloudflare
| engineers with careful attention paid to security and compliance
| with standards..."
|
| My conclusion is that as a development team, they learned little
| since 2017: https://news.ycombinator.com/item?id=13718752
| chrismorgan wrote:
| Admittedly I _have_ done some cryptographic string generation
| based on different alphabet sizes and characteristics a few
| years ago, which is pretty specifically relevant, and I'm
| competent at cryptographic and security concerns for a layman,
| but I certainly hope security reviewers will be more skilled at
| these things than me.
|
| I'm very confident I would have noticed this bias in a first
| pass of reviewing the code. The very first thing you do in a
| security review is look at where you use `crypto`, what its
| inputs are, and what you do with its outputs, _very_ carefully.
| On seeing that %, I would have checked characters.length and
| found it to be 62, not a factor of 256; so you _need_ to mess
| around with base conversion, or change the alphabet, or some
| other such trick.
|
| This bothers me and makes me lose confidence in the review
| performed.
| thegeomaster wrote:
| But... is it a real problem? As the author says, the entropy
| reduction is tiny.
| yusina wrote:
| It shows carelessness or incompetence or a combination
| thereof which extend to the entire code base.
| afro88 wrote:
| > What this interaction shows is how much knowledge you need to
| bring when you interact with an LLM. The "one big flaw" Claude
| produced in the middle would probably not have been spotted by
| someone less experienced with crypto code than this engineer
| obviously is. And likewise, many people would probably not have
| questioned the weird choice to move to PBKDF2 as a response
|
| For me this is the key takeaway. You gain proper efficiency using
| LLMs when you are a competent reviewer, and for lack of a better
| word, leader. If you don't know the subject matter as well as the
| LLM, you better be doing something non-critical, or have the time
| to not trust it and verify everything.
| marcusb wrote:
| I'm puzzled when I hear people say 'oh, I only use LLMs for
| things I don't understand well. If I'm an expert, I'd rather do
| it myself.'
|
| In addition to the ability to review output effectively, I find
| the more closely I'm able to describe what I want in the way
| another expert in that domain would, the better the LLM output.
| Which isn't really that surprising for a statistical text
| generation engine.
| _heimdall wrote:
| That's why outsource most other things in our life though,
| why would it be different with LLMs?
|
| People don't learn how a car works before buying one, they
| just take it to a mechanic when it breaks. Most people don't
| know how to build a house, they have someone else build it
| and assume it was done well.
|
| I fully expect people to similarly have LLMs do what the
| person doesn't know how and assume the machine knew what to
| do.
| marcusb wrote:
| > why would it be different with LLMs?
|
| Because LLMs are not competent professionals to whom you
| might outsource tasks in your life. LLMs are statistical
| engines that make up answers all the time, even when the
| LLM "knows" the correct answer (i.e., has the correct
| answer hidden away in its weights.)
|
| I don't know about you, but I'm able to validate something
| is true much more quickly and efficiently if it is a
| subject I know well.
| _heimdall wrote:
| > competent professionals
|
| That requires a lot of clarity and definition if you want
| to claim that LLMs aren't competent professionals. I
| assume we'd ultimately agree that LLMs aren't, but I'd
| add that many humans paid for a task aren't competent
| professionals either and, more importantly, that I can't
| distinguish the competent professionals from others
| without myself being competent enough in the topic.
|
| My point was that people have a long history of
| outsourcing to someone else, often to someone they have
| never met and never will. We do it for things that we
| have no real idea about and trust that the person doing
| it must have known what they were doing. I fully expect
| people to end up taking the same view of LLMs.
| marcusb wrote:
| We also have a lot of systems (references, the tort
| system) that just don't apply in any practical way to LLM
| output. I mean, I guess you could try to sue Anthropic or
| OpenAI if their chat bot gives you bad advice, but...
| good luck with that. The closest thing I can think of is
| benchmark performance. But I trust those numbers a lot
| less than I would trust a reference from a friend for,
| say, a plumber.
|
| I understand a lot of people use LLMs for things they
| don't understand well. I just don't think that is the
| best way to get productivity out of these tools right
| now. Regardless of how people may or may not be used to
| outsourcing things to other humans.
| _heimdall wrote:
| > I just don't think that is the best way to get
| productivity out of these tools right now.
|
| Well that I completely agree with. I don't think people
| _should_ outsource to an LLM without the skills to
| validate the output.
|
| At that point I don't see the value, if I have the skills
| and will proofread/validate the output anyway it mostly
| just saved me keystrokes and risks me missing a subtle
| but very important bug in the output.
| diggan wrote:
| I guess it depends. In some cases, you don't have to
| understand the black box code it gives you, just that it
| works within your requirements.
|
| For example, I'm horrible at math, always been, so writing
| math-heavy code is difficult for me, I'll confess to not
| understanding math well enough. If I'm coding with an LLM and
| making it write math-heavy code, I write a bunch of unit
| tests to describe what I expect the function to return, write
| a short description and give it to the LLM. Once the function
| is written, run the tests and if it passes, great.
|
| I might not 100% understand what the function does
| internally, and it's not used for any life-preserving stuff
| either (typically end up having to deal with math for games),
| but I do understand what it outputs, and what I need to
| input, and in many cases that's good enough. Working in a
| company/with people smarter than you tends to make you end up
| in this situation anyways, LLMs or not.
|
| Though if in the future I end up needing to change the math-
| heavy stuff in the function, I'm kind of locked into using
| LLMs for understanding and changing it, which obviously feels
| less good. But the alternative is not doing it at all, so
| another tradeoff I suppose.
|
| I still wouldn't use this approach for essential/"important"
| stuff, but more like utility functions.
| ipaddr wrote:
| Would you rather it be done incorrectly when others are
| expecting correctness or not at all? I would choose not at
| all.
| diggan wrote:
| Well, given the context is math in video games, I guess
| I'd chose "not at all", if there was no way for me to
| verify it's correct or not. But since I can validate, I
| guess I'd chose to do it, although without fully
| understanding the internals.
| donatj wrote:
| My question is kind of in this brave new world, where do the
| domain experts come from? Whose going to know this stuff?
| maegul wrote:
| This, for me, has been the question since the beginning. I'm
| yet to see anyone talk/think about the issue head on too. And
| whenever I've asked someone about it, they've not had any
| substantial thoughts.
| PUSH_AX wrote:
| Engineers will still exist and people will vibe code all
| kinds of things into existence. Some will break in
| spectacular ways, some of those projects will die, some
| will hire a real engineer to fix things.
|
| I cannot see us living in a world of ignorance where there
| are literally zero engineers and no one on the planet
| understands what's been generated. Weirdly we could end up
| in a place where engineering skills are niche and extremely
| lucrative.
| shswkna wrote:
| Most important question on this entire topic.
|
| Fast forward 30 years and modern civilisation is entirely
| dependent on our AI's.
|
| Will deep insight and innovation from a human perspective
| perhaps come to a stop?
| qzw wrote:
| No, but it'll become a hobby or artistic pursuit, just like
| running, playing chess, or blacksmithing. But I personally
| think it's going to take longer than 30 years.
| Earw0rm wrote:
| No. Even with power tools, construction and joinery are
| physical work and require strength and skill.
|
| What is new is that you'll need the wisdom to figure out
| when the tool can do the whole job, and where you need to
| intervene and supervise it closely.
|
| So humans won't be doing any less thinking, rather they'll
| be putting their thinking to work in better ways.
| skeeter2020 wrote:
| to use your own example though, many of these core skills
| are declining, mechanized or viewed through a historical
| lens vs. application. I don't know if this is net good or
| bad, but it is very different. Maybe humans will think as
| you say, but it feels like there will be significantly
| less diverse areas of thought. If you look at the front
| page of HN as a snapshot of "where's tech these days" it
| is very homgenous compared to the past. Same goes for the
| general internet and the AI-content continues to grow.
| IMO published works are a precursor to future human
| discovery, forming the basis of knowledge, community and
| growth.
| brookst wrote:
| Did musical creativity end with synths and sequencers?
|
| Tools will only amplify human skills. Sure, not everyone
| will choose to use tools for anything meaningful, but those
| people are driving human insight and innovation today
| anyway.
| svara wrote:
| LLMs make learning new material easier than ever. I use them
| a lot and I am learning new things at an insane pace in
| different domains.
|
| The maximalists and skeptics both are confusing the debate by
| setting up this straw man that people will be delegating to
| LLMs blindly.
|
| The idea that someone clueless about OAuth should develop an
| OAuth lib with LLM support without learning a lot about the
| topic is... Just wrong. Don't do that.
|
| But if you're willing to learn, this is rocket fuel.
| junon wrote:
| On the flip side, I wanted to see what common 8 layer PCB
| stackups were yesterday. ChatGPT wasn't giving me an answer
| that really made sense. After googling a bit, I realized
| almost all of the top results were AI generated, and also
| had very little in the way of real experience or advice.
|
| It was extremely frustrating.
| roxolotl wrote:
| This is my big fear. We're going to end up in a world
| where information that isn't common is significantly more
| difficult to find than it is today.
| andersa wrote:
| It's going to be like the pre-internet dark ages, but
| worse. Back then you only didn't find the information.
| Now, you find unlimited information, but it is all wrong.
| svara wrote:
| I don't know, this sounds a lot like in the late 90s when
| we heard a lot about how anyone could put information on
| the internet and that you shouldn't trust what you read
| online.
|
| Well it turns out you can manage just fine.
|
| You shouldn't blindly trust anything. Not what you read,
| not what people say.
|
| Using LLMs effectively is a skill too, and that does
| involve deciding when and how to verify information.
| andersa wrote:
| The difference is in scale. Back then, only humans were
| sometimes putting up false information, and other humans
| had a chance to correct it. Now, machines are writing
| infinitely more garbage than humans can ever read. Search
| engines like Google are already effectively unusable.
| vohk wrote:
| I think there will be solutions, although I don't think
| getting there will be pretty.
|
| Google's case (and Meta and spam calls and others) is at
| least in part an incentives problem. Google hasn't been
| about delivering excellent search to users for a very
| long time. They're an ad company and their search engine
| is a tool to better deliver ads. Once they had an
| effective monopoly, they just had to stay good enough not
| to lose it.
|
| I've been using Kagi for a few years now and while SEO
| spam and AI garbage is still an issue, it is _far_ less
| of one than with Google or Bing. My conclusion is these
| problems are at least somewhat addressable if doing so is
| what gets the business paid.
|
| But I think a real long term solution will have to
| involved a federated trust model. It won't be viable to
| index everything dumped on the web; there will need to be
| a component prioritizing trust in the author or
| publisher. If that follows the same patterns as email
| (ex: owned by Google and Microsoft), then we're really
| screwed.
| skeeter2020 wrote:
| >> Well it turns out you can manage just fine.
|
| You missed the full context: you would never be able to
| trust a bunch of amateur randos self-policing their
| content. Turns out it's not perfect but better than a
| very small set of professionals; usually there's enough
| expertise out there, it's just widely distributed. The
| challenge this time is 1. the scale, 2. the rate of
| growth, 3. the decline in expertise.
|
| >> Using LLMs effectively is a skill too, and that does
| involve deciding when and how to verify information.
|
| How do you verify when ALL the sources are share the same
| AI-generated root, and ALL of the independent (i.e.
| human) experts have aged-out and no longer exist?
| svara wrote:
| > How do you verify when ALL the sources are share the
| same AI-generated root,
|
| Why would that happen? There's demand for high quality,
| trustworthy information and that's not going away.
|
| When asking an LLM coding questions, for example, you can
| ask for sources and it'll point you to documentation. It
| won't always be the correct link, but you can prod it
| more and usually get it, or fall back to searching the
| docs the old fashioned way.
| closewith wrote:
| > Well it turns out you can manage just fine.
|
| The internet has ravaged society with disinformation.
| It's a literal battlefield. How can you have come till
| this conclusion?
| svara wrote:
| This thread started from the question of where the
| experts with the ability to use LLMs effectively would
| still come from in the future.
|
| I was making the point that it's still easy to find great
| information on the internet despite the fact that there's
| a lot of incorrect information as well, which was an
| often mentioned 'danger' on the internet since its early
| days.
|
| I wasn't speaking to broader societal impact of LLMs,
| where I can easily agree it's going to make
| misinformation at scale much easier.
| closewith wrote:
| Fair point, well made.
| koolba wrote:
| Content from before the AI Cambrian explosion is going to
| be treated like low-background steel.
|
| https://en.wikipedia.org/wiki/Low-background_steel
| m11a wrote:
| The solution is kagi.com imo.
|
| Before AI generated results, the first page of Google was
| SEO-optimised crap blogs. The internet has been hard to
| search for a while.
| endofreach wrote:
| It will dawn on non-tech people soon enough. Hopefully
| the "AI" (LLM) hypetrain riders will follow.
| blibble wrote:
| how do you gain anything useful from a sycophantic tutor
| that agrees with everything you say, having being trained
| to behave as if the sun shines out of your rear end?
|
| making mistakes is how we learn, and if they are never
| pointed out...
| svara wrote:
| It's a bit of a skill. Gaining an incorrect understanding
| of some topic is a risk anyway you learn, and I don't
| feel it's greater with LLMs than many of the
| alternatives.
|
| Sure, having access to legit experts who can tutor you
| privately on a range of topics would be better, but
| that's not realistic.
|
| What I find is that if I need to explore some new domain
| within a field I'm broadly familiar with, just thinking
| through what the LLM is saying is sufficient for
| verification, since I can look for internal consistency
| and check against things I know already.
|
| When exploring a new topic, often times my questions are
| superficial enough for me to be confident that the
| answers are very common in the training data.
|
| When exploring a new topic that's also somewhat niche or
| goes into a lot of detail, I use the LLM first to get a
| broad overview and then drill down by asking for specific
| sources and using the LLM as an assistant to consume
| authoritative material.
| blibble wrote:
| this "logic" applied across society will lead to our ruin
| svara wrote:
| Say more?
| perching_aix wrote:
| > from a sycophantic tutor that agrees with everything
| you say
|
| You know that it's possible to ask models for dissenting
| opinions, right? Nothing's stopping you.
|
| > and if they are never pointed out...
|
| They do point out mistakes though?
| elvis10ten wrote:
| > LLMs make learning new material easier than ever. I use
| them a lot and I am learning new things at an insane pace
| in different domains.
|
| With learning, aren't you exposed to the same risks? Such
| that if there was a typical blind spot for the LLM, it
| would show up in the learning assistance and in the
| development assistance, thus canceling out (i.e unknown
| unknowns)?
|
| Or am I thinking about it wrongly?
| sulam wrote:
| If you trust everything the LLM tells you, and you learn
| from code, then yes the same exact risks apply. But this
| is not how you use (or should use) LLMs when you're
| learning a topic. Instead you should use high quality
| sources, then ask the LLM to summarize them for you to
| start with (NotebookLM does this very well for instance,
| but so can others). Then you ask it to build you a study
| plan, with quizzes and exercises covering what you've
| learnt. Then you ask it to setup a spaced repetition
| worksheet that covers the topic thoroughly. At the end of
| this you will know the topic as well as if you'd taken a
| semester-long course.
|
| One big technique it sounds like the authors of the OAuth
| library missed is that LLMs are very good at generating
| tests. A good development process for today's coding
| agents is to 1) prompt with or create a PRD, 2) break
| this down into relatively simple tasks, 3) build a plan
| for how to tackle each task, with listed out conditions
| that should be tested, 3) write the tests, so that things
| are broken, TDD style and finally 4) write the
| implementation. The LLM can do all of this, but you can't
| one-shot it these days, you have to be a human in the
| loop at every step, correcting when things go off track.
| It's faster, but it's not a 10x speed up like you might
| imagine if you think the LLM is just asynchronously
| taking a PRD some PM wrote and building it all. We still
| have jobs for a reason.
| evnu wrote:
| > Instead you should use high quality sources, then ask
| the LLM to summarize them for you to start with
| (NotebookLM does this very well for instance, but so can
| others).
|
| How do you determine if the LLM accurately reflects what
| the high-quality source contains, if you haven't read the
| source? When learning from humans, we put trust on them
| to teach us based on a web-of-trust. How do you determine
| the level of trust with an LLM?
| ativzzz wrote:
| Because summarizing is one of the few things LLMs are
| generally pretty good at. Plus you should use the summary
| to determine if you want to read the full source, kind of
| like reading an abstract for a research paper before
| deciding if you want to read the whole thing.
|
| Bonus: the high quality source is going to be mostly AI
| written anyway
| sroussey wrote:
| Actually, LLMs aren't that great for summarizing. It
| would be a boon for RAG workflows if they were.
|
| I'm still on the lookout for a great model for this.
| perching_aix wrote:
| > When learning from humans, we put trust on them to
| teach us based on a web-of-trust.
|
| But this is only part of the story. When learning from
| another human, you'll also actively try and devise
| whether they're trustworthy based on general linguistic
| markers, and will try to find and poke holes in what
| they're saying so that you can question intelligently.
|
| This is not much different from what you'd do with an
| LLM, which is why it's such a problem that they're more
| convincing than correct pretty often. But it's not an
| insurmountable issue. The other issue is that their
| trustworthiness will wary in a different way than a
| human's, so you need experience to know when they're
| possibly just making things up. But just based on feel, I
| think this experience is definitely possible to gain.
| kentonv wrote:
| I did actually use the LLM to write tests, and was
| pleased to see the results, which I thought were pretty
| good and thorough, though clearly the author of this blog
| post has a different opinion.
|
| But TDD is not the way I think. I've never been able to
| work that way (LLM-assisted or otherwise). I find it very
| hard to write tests for software that isn't implemented
| yet, because I always find that a lot of the details
| about how it should work are discovered as part of the
| implementation process. This both means that any API I
| come up with before implementing is likely to change, and
| also it's not clear exactly what details need to be
| tested until I've fully explored how the thing works.
|
| This is just me, other people may approach things totally
| differently and I can certainly understand how TDD works
| well for some people.
| perching_aix wrote:
| When I'm exploring a topic, I make sure to ask for links
| to references, and will do a quick keyword search in
| there or ask for an excerpt to confirm key facts.
|
| This does mean that there's a reliance on me being able
| to determine what are key facts and when I should be
| asking for a source though. I have not experienced any
| significant drawbacks when compared to a classic research
| workflow though, so in my view it's a net speed boost.
|
| However, this does mean that a huge variety of things
| remain out of reach for me to accomplish, even with LLM
| "assistance". So there's a decent chance even the speed
| boost is only perceptual. If nothing else, it does take a
| significant amount of drudgery out of it all though.
| motorest wrote:
| > With learning, aren't you exposed to the same risks?
| Such that if there was a typical blind spot for the LLM,
| it would show up in the learning assistance and in the
| development assistance, thus canceling out (i.e unknown
| unknowns)?
|
| I don't think that's how things work. In learning tasks,
| LLMs are sparring partners. You present them with
| scenarios, and they output a response. Sometimes they
| hallucinate completely, but they can also update their
| context to reflect new information. Their output matches
| what you input.
| belter wrote:
| > But if you're willing to learn, this is rocket fuel.
|
| LLMs will tell you 1 or 2 lies for each 20 facts. Its a
| hard way to learn. They cant even get their urls right...
| diggan wrote:
| > LLMs will tell you 1 or 2 lies for each 20 facts. Its a
| hard way to learn.
|
| That was my experience when growing up with school also,
| except you got punished one way or another for speaking
| up/trying to correct the teacher. If I speak up with the
| LLM they either explain why what they said is true, or
| corrects themselves, 0 emotions involved.
|
| > They cant even get their urls right...
|
| Famously never happens with humans.
| belter wrote:
| You are ignoring the fact that the types of mistakes or
| lies are of a different nature.
|
| If you are in class, and you incorrectly argue, there is
| a mistake in an explanation of Derivatives or Physics,
| but you are the one in error, your Teacher hopefully,
| will not say: "Oh, I am sorry you are absolutely correct.
| Thank you for your advice.."
| diggan wrote:
| Yeah, no of course if I'm wrong I don't expect the
| teacher to agree with me, what kind of argument is that?
| I thought it was clear, but the base premise of my
| previous comment is that the teacher is incorrect and
| refuse corrections...
| belter wrote:
| My point is a teacher will not do something like this:
|
| - Confident synthesis of incompatible sources: LLM:
| "Einstein won the 1921 Nobel Prize for his theory of
| relativity, which he presented at the 1915 Solvay
| Conference."
|
| Or
|
| - Fabricated but plausible citations: LLM: "According to
| Smith et al., 2022, Nature Neuroscience, dolphins
| recognise themselves in mirrors." There is no such
| paper...model invents both authors and journal reference
|
| And this is the danger of coding with LLMs....
| diggan wrote:
| I don't know what reality you live in, but it happens
| that teachers are incorrect, no matter what your own
| personal experience have been. I'm not sure how this is
| even up for debate.
|
| What matters is how X reacts when you point out it wasn't
| correct, at least in my opinion, and was the difference I
| was trying to highlight.
| belter wrote:
| A human tutor typically misquotes a real source or says
| "I'm not sure"
|
| An LLM, by contrast, will invent a flawless looking but
| nonexistent citation. Even a below average teacher
| doesn't churn out fresh fabrications every tenth
| sentence.
|
| Because a teacher usually cites recognizable material,
| you can check the textbook and recover quickly. With an
| LLM you first have to discover the source never existed.
| That verification cost is higher, the more complex task
| you are trying to achieve.
|
| A LLM will give you a perfect paragraph about the AWS
| Database Migration service, the list of supported
| databases, and then include in there a data flow like on-
| prem to on-prem data that is not supported...Relying on
| an LLM is like flying with a friendly copilot but who has
| multiple personality disorder. You dont know which day he
| will forget to take his meds :-)
|
| Stressful and mentally exhausting in a different kind of
| way....
| signatoremo wrote:
| And you are saying human teachers or online materials
| won't lie to you once or twice for every 20 facts? no
| matter how small. Did you do any comparison?
| belter wrote:
| You are missing the point. See my comment to @diggan in
| this thread. LLMs lie in a different way.
| skeeter2020 wrote:
| it's not jsut the lies, but how it lies and the fact that
| LLMs are very hesitant to call out humans on their BS
| brookst wrote:
| Is this the newest meme?
|
| Me: "explain why radioactive half-life changes with
| temperature"
|
| ChatGPT 4o: " Short answer: It doesn't--at least not
| significantly. Radioactive Half-Life is (Almost Always)
| Temperature-Independent"
|
| ...and then it goes on to give a few edge cases where
| there's a tiny effect.
| skeeter2020 wrote:
| >> LLMs make learning new material easier than ever.
|
| feels like there's a logical flaw here, when the issue is
| that LLMs are presenting the wrong information or missing
| it all together. The person trying to learn from it will
| experience Donald Rumsfield's "unknown unknowns".
|
| I would not be surprised if we experience an even more
| dramatic "Cobol Moment" a generation from now, but unlike
| that one thankfully I won't be around to experience it.
| threeseed wrote:
| Learning from LLMs is akin to learning from Joe Rogan.
|
| You are getting a stylised view of a topic from an entity
| who lacks the deep understanding needed to be able to fully
| distill the information. But it is enough to gain enough
| knowledge for you to feel confident which is still valuable
| but also dangerous.
|
| And I assure you that many, many people are delegating to
| LLMs blindly e.g. it's a huge problem in the UK legal
| system right now because of all the invented case law
| references.
| slashdev wrote:
| It depends very much on the quality of the questions. I
| get deep technical insight into questions I can't find
| anything on with Google.
| diogocp wrote:
| > You are getting a stylised view of a topic from an
| entity who lacks the deep understanding
|
| Isn't this how every child learns?
|
| Unless his father happens to be king of Macedonia, of
| course.
| kentonv wrote:
| I can think of books I used to learn software engineering
| when I was younger which, in retrospect, I realize were
| not very good, and taught me some practices I now
| disagree with. Nevertheless, the book did help me learn,
| and got me to a point where I could think about it
| myself, and eventually develop my own understanding.
| therealpygon wrote:
| And yet, human coders may do that exact type of thing
| daily, producing far worse code. I find it humorous at how
| much higher of a standard is applied to LLMs in every
| discussion when I can guarantee those exact some coders
| likely produce their own bug-riddled software.
|
| We've gone from skeptics saying LLMs can't code, to they
| can't code well, to they can't produce human-level code, to
| they are riddled with hallucinations, to now "but they
| can't one-shot code a library without any bugs or flaws"
| and "but they can only one-shot code, they can't edit well"
| even tho recents coding utilities have been proving that
| wrong as well. And still they say they are useless.
|
| Some people just don't hear themselves or see how AI is
| constantly moving their bar.
| brookst wrote:
| And now the complaint is that the bugs are too subtle.
| Soon it will be that the overall quality is too high,
| leading to a false sense of security.
| conradev wrote:
| Just wrong. Don't do that
|
| I'd personally qualify this: don't ship that code, but
| absolutely do it personally to grow if you're interested.
|
| I've grown the most when I start with things I sort of know
| and I work to expand my understanding.
| paradox242 wrote:
| The value of LLMs is that they do things for you, so yeah
| the incentive is to have them take over more and more of
| the process. I can also see a future not far into the
| horizon where those who grew up with nothing but AI are
| much less discerning and capable and so the AI becomes more
| and more a crutch, as human capability withers from
| extended disuse.
| a13n wrote:
| If the hypothesis is that we still need knowledgeable
| people to run LLMs, but the way you become knowledgeable is
| by talking to LLMs, then I don't think the hypothesis will
| be correct for long..
| mwigdahl wrote:
| We need knowledgeable people to run computers, but you
| can become knowledgeable about computers by using
| computers to access learning material. Seems like that
| generalizes well to LLMs.
| svara wrote:
| You inserted a hidden "only" there to make it into a
| logical sounding dismissive quip.
|
| You don't get knowledge by ONLY talking to LLMs, but
| they're a great tool.
| catlifeonmars wrote:
| I think what's missing here is you should start by reading
| the RFCs. RFCs tend to be pretty succinct so I'm not really
| sure what a summarization is buying you there except
| leaving out important details.
|
| (One thing that might be useful is use the LLM as a search
| engine to find the relevant RFCs since sometimes it's hard
| to find all of the applicable ones if you don't know the
| names of them already.)
|
| I really can't stress this enough: read the RFCs from end
| to end. Then read through the code of some reference
| implementations. Draw a sequence diagram. Don't have the
| LLM generate one for you, the point is to internalize the
| design you're trying to implement against.
|
| By this time you should start spotting bugs or
| discrepancies between the specs and implementations in the
| wild. That's a good sign. It means you're learning
| wslh wrote:
| Another limitation of LLMs lies in their inability to stay
| in sync with novel topics or recently introduced methods,
| especially when these are not yet part of their training
| data or can't be inferred from existing patterns.
|
| It's important to remember that these models depend not
| only on ML breakthroughs but also on the breadth and
| freshness of the data used to train them.
|
| That said, the "next-door" model could very well
| incorporate lessons from the recent Cloudflare OAuth
| Library issues, thanks to the ongoing discussions and
| community problem-solving efforts.
| kypro wrote:
| In a few years hopefully the AI reviewers will be far more
| reliable than even the best human experts. This is generally
| how competency progresses in AI...
|
| For example, at one point a human + computer would have been
| the strongest combo in chess, now you'd be insane to allow a
| human to critic a chess bot because they're so unlikely to
| add value, and statistically a human in the loop would be far
| more likely to introduce error. Similar things can be said in
| fields like machine vision, etc.
|
| Software is about to become much higher quality and be
| written at much, much lower cost.
| sarchertech wrote:
| My prediction is that for that to happen we'll need to
| figure out a way to measure software quality in the way we
| can measure a chess game, so that we can use synthetic data
| to continue improving the models.
|
| I don't think we are anywhere close to doing that.
| kypro wrote:
| Not really... If you're an average company you're not
| concerned about producing perfect software, but
| optimising for some balance between cost and quality. At
| some point companies via capitalist forces will naturally
| realise that it's more productive to not have humans in
| the loop.
|
| A good analogy might be how machines gradually replaced
| textile workers in the 19th century. Were the machines
| better? Or was there a was to quantitatively measure the
| quality of their output? No. But at the end of the day
| companies which embraced the technology were more
| productive than those who didn't, and the quality didn't
| decrease enough (if it did at all) that customers would
| no longer do business with them - so these companies won
| out.
|
| The same will naturally happen in software over the next
| few years. You'd be an moron to hire a human expert for
| $200,000 to critic a cybersecurity optimised model which
| costs maybe a 100th of the cost of employing a human...
| And this would likely be true even if we assume the human
| will catch the odd thing the model wouldn't because
| there's no such thing as perfect security - it's always a
| trade off between cost and acceptable risk.
|
| Bookmark this and come back in a few years. I made
| similar predictions when ChatGPT first came out that
| within a few years agents would be picking up tickets and
| raising PRs. Everyone said LLMs were just stochastic
| parrots and this would not happen, well now it has and
| increasingly companies are writing more and more code
| with AI. At my company it's a little over 50% at the mo,
| but this is increasing every month.
| sarchertech wrote:
| Almost none of what you said about the past is true.
| Automated looms, and all of the other automated machinery
| that replaced artisans over the course of the industrial
| revolution produced items of much better quality than
| what human craftsman could produce by the time it started
| to be used commercially because of precision and
| repeatability. They did have quantitative measurements of
| quality for textiles and other goods and the automated
| processes exceeded human craftsman at those metrics.
|
| Software is also not remotely similar to textiles. A
| subtle bug in the textile output itself won't cause
| potentially millions of dollars in damages, they way a
| bug in an automated loom itself or software can.
|
| No current technology is anywhere close to being able to
| automate 50% of PRs on any non trivial application
| (that's not close to the same as saying that 50% of PRs
| merged at your startup happens to have an agent as
| author). To assume that current models will be able to
| get near 100% without massive model improvements is just
| that--an assumption.
|
| My point about synthetic data is that we need orders of
| magnitude more data with current technology and the only
| way we will get there is with synthetic data. Which is
| much much harder to do with software applications than
| with chess games.
|
| The point isn't that we need a quantitative measure of
| software in order for AI to be useful, but that we need a
| quantitative measure in order for synthetic data to be
| useful to give us our orders of magnitude more training
| data.
| risyachka wrote:
| Use it or lose it.
|
| Experts will become those who use llm to learn and not to
| write code for them or solve tasks for them so they can build
| that skill.
| paradox242 wrote:
| The implication is that they are hoping to bridge the gap
| between current AI capabilities and something more like AGI
| in the time it takes the senior engineers to leave the
| industry. At least, that's the best I can come up with,
| because they are kicking out all of the bottom rings of the
| ladder here in what otherwise seems like a very shortsighted
| move.
| ajmurmann wrote:
| I've been using an llm to do much of a k8s deployment for me.
| It's quick to get something working but I've had to constantly
| remind it to use secrets instead of committing credentials in
| clear text. A dangerous way to fail. I wonder if in my case
| this is caused by the training data having lots of examples
| from online tutorials that omit security concerns to focus on
| the basics.
| ants_everywhere wrote:
| > It's quick to get something working but I've had to
| constantly remind it to use secrets instead of committing
| credentials in clear text.
|
| This is going to be a powerful feedback loop which you might
| call regression to the intellectual mean.
|
| On any task, most training data is going to represent the
| middle (or beginning) of knowledge about a topic. Most k8s
| examples will skip best practices, most react apps will be
| from people just learning react, etc.
|
| If you want the LLM to do best practices in every knowledge
| domain (assuming best practices can be consistently well
| defined), then you have to push it away from the mean of
| every knowledge domain simultaneously (or else work with
| specialized fine tuned models).
|
| As you continue to add training data it will tend to regress
| toward the middle because that's where most people are on
| most topics.
| diggan wrote:
| > my case this is caused by the training data having
|
| I think it's caused by you not having a strong enough system
| prompt. Once you've built up a slightly reusable system
| prompt for coding or for infra work, where you bit by bit
| build it up while using a specific model (since different
| models respond differently to prompts), you end up getting
| better and better responses.
|
| So if you notice it putting plaintext credentials in the
| code, add to the system prompt to not do that. With LLMs you
| really get what you ask for, and if you miss to specify
| anything, the LLM will do whatever the probabilities tells it
| to, but you can steer this by being more specific.
|
| Imagine you're talking to a very literal and pedantic
| engineer who argues a lot on HN and having to be very precise
| with your words, and you're like 80% of the way there :)
| ajmurmann wrote:
| Yes, you are definitely right on that. I still find it a
| concerning failure mode. That said, maybe it's no worse
| than a junior copying from online examples without reading
| all the text some the code which of course has been very
| common also.
| bradly wrote:
| I've found llms are very quick to add defaults, fallbacks,
| rescues-which all makes it very easy for code to look like it
| is working when it is not or will not. I call this out three
| different places in my CLAUDE.md trying to adjust for this, and
| still occasionally get.
| jstummbillig wrote:
| You will always trust domain experts at some junction; you
| can't build a company otherwise. The question is: Can LLMs
| provide that domain expertise? I would argue, yes, clearly,
| given the development of the past 2 years, but obviously not on
| a straight line.
| ghuntley wrote:
| See also: LLMs are mirrors of operator skill -
| https://ghuntley.com/mirrors
| loandbehold wrote:
| Over time AI coding tools will be able to research domain
| knowledge. Current "AI Research" tools are already very good at
| it but they are not integrated with coding tools yet. The
| research could look at both public Internet as well as company
| documents that contain internal domain knowledge. Some of the
| domain knowledge is only in people's heads. That would need to
| be provided by the user.
| wslh wrote:
| I'd like to add a practical observation, even assuming much
| more capable AI in the future: not all failures are due to
| model limitations, sometimes it's about external [world]
| changes.
|
| For instance, I used Next.js to build a simple login page
| with Google auth. It worked great, even though I only had
| basic knowledge of Node.js and a bit of React.
|
| Then I tried adding a database layer using Prisma to persist
| users. That's where things broke. The integration didn't
| work, seemingly due to recent versions in Prisma or subtle
| breaking updates. I found similar issues discussed on GitHub
| and Reddit, but solving them required shifting into full
| manual debugging mode.
|
| My takeaway: even with improved models, fast-moving
| frameworks and toolchains can break workflows in ways that
| LLMs/ML (at least today) can't reason through or fix
| reliably. It's not always about missing domain knowledge,
| it's that the moving parts aren't in sync with the model yet.
| SparkyMcUnicorn wrote:
| Just close the loop and give it direct access to your
| console logs in chrome and node, then it can do the "full
| manual debugging" on its own.
|
| It's not perfect, and it's not exactly cheap, but it works.
| aiono wrote:
| I agree with the last paragraph about doing this yourself. Humans
| have tendency to take shortcuts while thinking. If you see
| something resembling what you expect for the end product you will
| be much less critical of it. The looks/aesthetics matter a lot on
| finding problems with in a piece of code you are reading. You can
| verify this by injecting bugs in your code changes and see if
| reviewers can find them.
|
| On the other hand, when you have to write something yourself you
| drop down to slow and thinking state where you will pay attention
| to details a lot more. This means that you will catch bugs you
| wouldn't otherwise think of. That's why people recommend writing
| toy versions of the tools you are using because writing yourself
| teaches a lot better than just reading materials about it. This
| is related to know our cognition works.
| kentonv wrote:
| I agree that most code reviewers are pretty bad at spotting
| subtle bugs in code that looks good superficially.
|
| I have a lot of experience reviewing code -- more than I ever
| really wanted. It has... turned me cynical and bitter, to the
| point that I never believe anything is right, no matter who
| wrote it or how nice it looks, because I've seen so many ways
| things can go wrong. So I tend to review every line, simulate
| it in my head, and catch things. I kind of hate it, because it
| takes so long for me to be comfortable approving anything, and
| my reviewees hate it too, so they tend to avoid sending things
| to me.
|
| I _think_ I agree that if I 'd written the code by hand, it
| would be less likely to have bugs. Maybe. I'm not sure, because
| I've been known to author some pretty dumb bugs of my own. But
| yes, total Kenton brain cycles spent on each line would be
| higher, certainly.
|
| On the other hand, though, I probably would not have been the
| one to write this library. I just have too much on my plate
| (including all those reviews). So it probably would have been
| passed off to a more junior engineer, and I would have reviewed
| their work. Would I have been more or less critical? Hard to
| say.
|
| But one thing I definitely disagree with is the idea that
| humans would have produced bug-free code. I've seen way too
| many bugs in my time to take that seriously. Hate to say it but
| most of the bugs I saw Claude produce are mistakes I'd totally
| expect an average human engineer could make.
|
| _Aside, since I know some people are thinking it: At this
| time, I do not believe LLM use will "replace" any human
| engineers at Cloudflare. Our hiring of humans is not determined
| by how much stuff we have to do, because we basically have
| infinite stuff we want to do. The limiting factor is what we
| have budget for. If each human becomes more productive due to
| LLM use, and this leads to faster revenue growth, this likely
| allows us to hire more people, not fewer. (Disclaimer: As with
| all of my comments, this is my own opinion / observation, not
| an official company position.)_
| eastdakota wrote:
| I agree with Kenton's aside.
| jstummbillig wrote:
| Note that this has very little to do with AI assisted coding; the
| authors of the library explicitly approved/vetted the code. So
| this comes down to different coders having different thoughts
| about what constitutes good and bad code, with some flaunting of
| credentials to support POVs, and nothing about that is
| particularly new.
| add-sub-mul-div wrote:
| The whole point of this is that people will generally put the
| least effort into work as they think they can get away with,
| and LLMs will accelerate that force. This is the future of how
| code will be "vetted".
|
| It's not important whose responsbility led to mistakes, it's
| important to understand we're creating a responsbility gap.
| ape4 wrote:
| The article says there aren't too many useless comments but the
| code has: // Get the Origin header from the
| request const origin = request.headers.get('Origin');
| slashdev wrote:
| Those kinds of comments are a big LLM giveaway, I always remove
| them, not to hide that an LLM was used, but because they add
| nothing.
| lucas_codes wrote:
| Plus you just know in a few months they are going to be stale
| and reference code that has changed. I have even seen this
| happen with colleagues using llms between commits on a single
| pr.
| kissgyorgy wrote:
| I also noticed Claude likes writing useless redundant comments
| like this A LOT.
| spenczar5 wrote:
| Of course, these are awful for a human. But I wonder if they're
| actually helpful for the LLM when it's reading code. It means
| each line of behavior is written in two ways: human language
| and code. Maybe that rosetta stone helps it confidently proceed
| in understanding, at the cost of tokens.
|
| All speculation, but I'd be curious to see it evaluated - does
| the LLM do better edits on egregiously commented code?
| electromech wrote:
| It would be a bad sign if LLMs lean on comments.
| // secure the password for storage // following best
| practices // per OWASP A02:2021 // - using a
| cryptographic hash function // - salting the password
| // - etc. // the CTO and CISO reviewed this personally
| // Claude, do not change this code // or comment on it
| in any way var hashedPassword = password.hashCode()
|
| Excessive comments come at the cost of much more than tokens.
| keybored wrote:
| Oh another one,[1] cautious somewhat-skeptic edition.
|
| [1] https://news.ycombinator.com/item?id=44205697
| dweekly wrote:
| An approach I don't see discussed here is having different agents
| using different models critique architecture and test coverage
| and author tests to vet the other model's work, including
| reviewing commits. Certainly no replacement for human in the loop
| but it will catch a lot of goofy "you said to only check in when
| all the tests pass so I disabled testing because I couldn't
| figure out how to fix the tests".
| epolanski wrote:
| Part of me this "written by LLM" has been a way to get attention
| on the codebase and plenty of free reviews by domain expert
| skeptics, among the other goals (pushing AI efficiency to
| investors, experimenting, etc).
| kentonv wrote:
| Free reviews by domain experts are great.
|
| I didn't think of that, though. I didn't have an agenda here, I
| just put the note in the readme about it being LLM-generated
| only because I thought it was interesting.
| sarchertech wrote:
| I just finished writing a Kafka consumer to migrate data with
| heavy AI help. This was basically best case a scenario for AI.
| It's throw away greenfield code in a language I know pretty well
| (go) but haven't used daily in a decade.
|
| For complicated reasons the whole database is coming through on 1
| topic, so I'm doing some fairly complicated parallelization to
| squeeze out enough performance.
|
| I'd say overall the AI was close to a 2x speed up. It mostly
| saved me time when I forgot the go syntax for something vs
| looking it up.
|
| However, there were at least 4 subtle bugs (and many more
| unsubtle ones) that I think anyone who wasn't very familiar with
| Kafka or multithreaded programming would have pushed to prod. As
| it is, they took me a while to uncover.
|
| On larger longer lived codebases, I've seen something closer to a
| 10-20% improvement.
|
| All of this is using the latest models.
|
| Overall this is at best the kind of productivity boost we got
| from moving to memory managed languages. Definitely not something
| that is going to replace engineers with PMs vibe coding anytime
| soon (based on rate of change I've seen over the last 3 years).
|
| My real worry is that this is going to make mid level technical
| tornadoes, who in my experience are the most damaging kind of
| programmer, 10x as productive because they won't know how to spot
| or care about stopping subtle bugs.
|
| I don't see how senior and staff engineers are going to be able
| to keep up with the inevitable flood of reviews.
|
| I also worry about the junior to senior pipeline in a world where
| it's even easier to get something up that mostly works--we
| already have this problem today with copy paste programmers, but
| we've just make copy paste programming even easier.
|
| I think the market will eventually sort this all out, but I worry
| that it could take decades.
| awfulneutral wrote:
| Yeah, the AI-generated bugs are really insidious. I also pushed
| a couple subtle bugs in some multi-threaded code I had AI
| write, because I didn't think it through enough. Reviews and
| tests don't replace the level of scrutiny something gets when
| it's hand-written. For now, you have to be really careful with
| what you let AI write, and make sure any bugs will be low
| impact since there will probably be more than usual.
| skeeter2020 wrote:
| > I've seen something closer to a 10-20% improvement.
|
| The seems to match my experience in "important" work too; a
| real increase but not essentially changing the essence of
| software development. Brook's "No Silver Bullet" strikes
| again...
| LgWoodenBadger wrote:
| Complicated parallelization? That's what partitions and
| consumers/consumer-groups are for!
| sarchertech wrote:
| Of course they are, but I'm not controlling the producer.
| LgWoodenBadger wrote:
| Producer doesn't care how many partitions there are, it
| doesn't even know about them, unless it wants to use its
| own partitioning algorithm. You can change the number of
| partitions on the topic after the fact.
| sarchertech wrote:
| In this case it would need to use its own partitioning
| algorithm because of some specific ordering guarantees we
| care about.
| murukesh_s wrote:
| What about generating testable code? I mean you mentioned
| detecting subtle bugs in generated code - I too have seen
| similar - but what if that was found via generated test cases
| than found by a human reviewers? Of course the test code could
| have bugs, but I can see a scenario in the future where all we
| do is review the tests output instead of scrutinising the
| generated code...
| sarchertech wrote:
| And the AI is trained to write plausible output and pass test
| cases.
|
| Have you ever tried to generate test cases that were immune
| to a malicious actor trying to pass your test cases? For
| example if you are trying to automate homework grading?
|
| The AI writing tests needs to understand the likely problem
| well enough to know to write a test case for it, but there
| are an infinite amount of subtle bugs for an AI writing code
| to choose from.
| electromech wrote:
| > My real worry is that this is going to make mid level
| technical tornadoes...
|
| Yes! Especially in the consulting world, there's a perception
| that veterans aren't worth the money because younger engineers
| get things done faster.
|
| I have been the younger engineer scoffing at the veterans, and
| I have been the veteran desperately trying to get non-technical
| program managers to understand the nuances of why the quick
| solution is inadequate.
|
| Big tech will probably sort this stuff out faster, but much of
| the code that processes our financial and medical records gets
| written by cheap, warm bodies in 6 month contracts.
|
| All that was a problem before LLMs. Thankfully I'm no longer at
| a consulting firm. That world must be hell for security-
| conscious engineers right now.
| roxolotl wrote:
| > Many of these same mistakes can be found in popular Stack
| Overflow answers, which is probably where Claude learnt them from
| too.
|
| This is what keeps me up at night. Not that security holes will
| inevitably be introduced, or that the models will make mistakes,
| but that the knowledge and information we have as a society is
| basically going to get frozen in time to what was popular on the
| internet before LLMs.
| tuxone wrote:
| > This is what keeps me up at night.
|
| Same here. For some of the services I pay, say the e-mail
| provider, the fact that they openly deny using LLMs for coding
| would be a plus for me.
| menzoic wrote:
| LLMs are like power tools. You still need to understand the
| architecture, do the right measurements, and apply the right
| screw to the right spot.
| OutOfHere wrote:
| This is why I have multiple LLMS review and critique my
| specifications document, iteratively and repeatedly so, before I
| have my preferred LLM code it for me. I address all important
| points of feedback in the specifications document. To do this
| iteratively and repeatedly until there are no interesting points
| is crucial. This really fixes 80% of the expertise issues.
|
| Moreover, after developing the code, I have multiple LLMs
| critique the code, file by file, or even method by method.
|
| When I say multiple, I mean a non-reasoning one, a reasoning
| large one, and a next-gen reasoning small one, preferably by
| multiple vendors.
| kentonv wrote:
| Hi, I'm the author of the library. (Or at least, the author of
| the prompts that generated it.)
|
| > I'm also an expert in OAuth
|
| I'll admin I think Neil is significantly more of an expert than
| me, so I'm delighted he took a pass at reviewing the code! :)
|
| I'd like to respond to a couple of the points though.
|
| > The first thing that stuck out for me was what I like to call
| "YOLO CORS", and is not that unusual to see: setting CORS headers
| that effectively disable the same origin policy almost entirely
| for all origins:
|
| I am aware that "YOLO CORS" is a common novice mistake, but that
| is not what is happening here. These CORS settings were carefully
| considered.
|
| We disable the CORS headers specifically for the OAuth API (token
| exchange, client registration) endpoints and for the API
| endpoints that are protected by OAuth bearer tokens.
|
| This is valid because none of these endpoints are authorized by
| browser credentials (e.g. cookies). The purpose of CORS is to
| make sure that a malicious website cannot exercise your
| credentials against some other website by sending a request to it
| and expecting the browser to add your cookies to that request.
| These endpoints, however, do not use browser credentials for
| authentication.
|
| Or to put in another way, the endpoints which have open CORS
| headers are either control endpoints which are intentionally open
| to the world, or they are API endpoints which are protected by an
| OAuth bearer token. Bearer tokens must be added explicitly by the
| client; the browser never adds one automatically. So, in order to
| receive a bearer token, the client must have been explicitly
| authorized by the user to access the service. CORS isn't
| protecting anything in this case; it's just getting in the way.
|
| (Another purpose of CORS is to protect confidentiality of
| resources which are not available on the public internet. For
| example, you might have web servers on your local network which
| lack any authorization, or you might unwisely use a server which
| authorizes you based on IP address. Again, this is not a concern
| here since the endpoints in question don't provide anything
| interesting unless the user has explicitly authorized the
| client.)
|
| Aside: Long ago I was actually involved in an argument with the
| CORS spec authors, arguing that the whole spec should be thrown
| away and replaced with something that explicitly recognizes
| bearer tokens as the right way to do any cross-origin
| communications. It is almost never safe to open CORS on endpoints
| that use browser credentials for auth, but it is almost always
| safe to open it on endpoints that use bearer tokens. If we'd just
| recognized and embraced that all along I think it would have
| saved a lot of confusion and frustration. Oh well.
|
| > A more serious bug is that the code that generates token IDs is
| not sound: it generates biased output.
|
| I disagree that this is a "serious" bug. The tokens clearly have
| enough entropy in them to be secure (and the author admits this).
| Yes, they could pack more entry per byte. I noticed this when
| reviewing the code, but at the time decided:
|
| 1. It's secure as-is, just not maximally efficient. 2. We can
| change the algorithm freely in the future. There is not
| backwards-compatibility concern.
|
| So, I punted.
|
| Though if I'd known this code was going to get 100x more review
| than anything I've ever written before, I probably would have
| fixed it... :)
|
| > according to the commit history, there were 21 commits directly
| to main on the first day from one developer, no sign of any code
| review at all
|
| Please note that the timestamps at the beginning of the commit
| history as shown on GitHub are misleading because of a history
| rewrite that I performed later on to remove some files that
| didn't really belong in the repo. GitHub appears to show the date
| of the rebase whereas `git log` shows the date of actual
| authorship (where these commits are spread over several days
| starting Feb 27).
|
| > I had a brief look at the encryption implementation for the
| token store. I mostly like the design! It's quite smart.
|
| Thank you! I'm quite proud of this design. (Of course, the AI
| would never have come up with it itself, but it was pretty decent
| and filling in the details based on my explicit instructions.)
| lapcat wrote:
| Does Cloudflare intend to put this library into production?
| kentonv wrote:
| Yes, it's part of our MCP framework:
|
| https://blog.cloudflare.com/remote-model-context-protocol-
| se...
| kentonv wrote:
| > We disable the CORS headers specifically for the OAuth API
|
| Oops, I meant we set the CORS headers, to disable CORS rules.
| (Probably obvious in context but...)
| max2he wrote:
| Interesting to have people submit their promts to git. Do you
| think it'll be generally an accepted thing or was this just a
| showcase of how they promt?
| kentonv wrote:
| I included the prompts because I personally found it extremely
| illuminating to see what the LLM was able to produce based on
| those prompts, and I figured other people would be interested
| to. Seems I was right.
|
| But to be clear, I had no idea how to write good prompts. I
| basically just wrote like I would write to a human. That seemed
| to work.
| mplanchard wrote:
| This is tangential to the discussion at hand, but a point I
| haven't seen much in these conversations is the odd impedance
| mismatch between _knowing_ you're interacting with a tool but
| being asked to interact with it like a human.
|
| I personally am much less patient and forgiving of tools that
| I use regularly than I am of my colleagues (as I would hope
| is true for most of us), but it would make me uncomfortable
| to "treat" an LLM with the same expectations of consistency
| and "get out of my way" as I treat vim or emacs, even though
| I intellectually know it is also a non-thinking machine.
|
| I wonder about the psychological effects on myself and others
| long term of this kind of language-based machine interaction:
| will it affect our interactions with other people, or
| influence how we think about and what we expect from our
| tools?
|
| Would be curious if your experience gives you any insight
| into this.
| kentonv wrote:
| I have actually had that thought, too.
|
| I _feel bad_ being rude to an LLM even though it doesn 't
| care, so I add words like "please" and sometimes even
| complement it on good work even though I know this is
| useless. Will I learn to stop doing that, and if so, will I
| also stop doing it to humans?
|
| I'm hoping the answer is simply "no". Plenty of people are
| rude in some contexts and then polite in others (especially
| private vs. public, or when talking to underlings vs.
| superiors), so it should be no problem to learn to be
| polite to humans even if you aren't polite to LLMs, I
| think? But I guess we'll see.
| user9999999999 wrote:
| why on earth would you code oauth in ai at this stage?
| throwawaybob420 wrote:
| I've never seen such "walking off the cliff" behavior than from
| people who whole heartedly champion LLMs and the like.
|
| Leaning on and heavily relying on a black box that hallucinates
| gibberish to "learn", perform your work, and review your work.
|
| All the while it literally consumes ungodly amounts of energy and
| is used as pretext to get rid of people.
|
| Really cool stuff! I'm sure it's 10x'ing your life!
| ChrisArchitect wrote:
| Related:
|
| _I read all of Cloudflare 's Claude-generated commits_
|
| https://news.ycombinator.com/item?id=44205697
| m3kw9 wrote:
| For the foreseeable future software expertise is a safe job to
| have.
___________________________________________________________________
(page generated 2025-06-08 23:02 UTC)