[HN Gopher] Using LLMs at Oxide
___________________________________________________________________
Using LLMs at Oxide
Author : steveklabnik
Score : 639 points
Date : 2025-12-07 01:17 UTC (21 hours ago)
(HTM) web link (rfd.shared.oxide.computer)
(TXT) w3m dump (rfd.shared.oxide.computer)
| thatxliner wrote:
| The empathy section is quite interesting
| monkaiju wrote:
| Hmmm, I'm a bit confused of their conclusions (encouraging use)
| given some of the really damning caveats they point out. A tool
| they themselves determine to need such careful oversight probably
| just shouldn't be used near prod at all.
| gghffguhvc wrote:
| For the same quality and quantity output, if the cost of using
| LLMs + the cost of careful oversight is less than the cost of
| not using LLMs then the rational choice is to use them.
|
| Naturally this doesn't factor in things like human
| obsolescence, motivation and self-worth.
| zihotki wrote:
| And it doesn't factor seniority/experience. What's good for a
| senior developer is not necessarily same for a beginner
| ahepp wrote:
| It seems like this would be a really interesting field to
| research. Does AI assisted coding result in fewer bugs, or
| more bugs, vs an unassisted human?
|
| I've been thinking about this as I do AoC with Copilot
| enabled. It's been nice for those "hmm how do I do that in
| $LANGUAGE again?" moments, but it's also wrote some nice
| looking snippets that don't do quite what I want it to. And
| many cases of "hmmm... that would work, but it would read the
| entire file twice for no reason".
|
| My guess, however, is that it's a net gain for quality and
| productivity. Humans make bugs too and there need to be
| processes in place to discover and remediate those
| regardless.
| sunshowers wrote:
| I'm not sure about research, but I've used LLMs for a few
| things here at Oxide with (what I hope is) appropriate
| judgment.
|
| I'm currently trying out using Opus 4.5 to take care of a
| gnarly code reorganization that would take a human most of
| a week to do -- I spent a day writing a spec (by hand, with
| some editing advice from Claude Code), having it reviewed
| as a document for humans by humans, and feeding it into
| Opus 4.5 on some test cases. It seems to work well. The
| spec is, of course, in the form of an RFD, which I hope to
| make public soon.
|
| I like to think of the spec is basically an extremely
| advanced sed script described in ~1000 English words.
| AlexCoventry wrote:
| Maybe it's not as necessary with a codebase as well-
| organized as Oxide's, but I found gemini 3 useful for a
| refactor of some completely test-free ML research code,
| recently. I got it to generate a test case which would
| exercise all the code subject to refactoring, got it to
| do the refactoring and verify that it leads to exactly
| the same state, then finally got it to randomize the test
| inputs and keep repeating the comparison.
| Yeask wrote:
| This companies have trillions and they are not doing that
| research. Why?
| devmor wrote:
| The ultimate conclusion seems to be one that leaves it to
| personal responsibility - the user of the LLM is responsible
| for ensuring the LLM has done its job correctly. While this is
| the ethical conclusion to me, but the "gap" left to personal
| responsibility is so large that it makes me question how useful
| everything else in this document really is.
|
| I don't think it is easy to create a concise set of rules to
| apply in this gap for something as general as LLM use, but I do
| think such a ruleset is noticeably absent here.
| rgoulter wrote:
| What do you find confusing about the document encouraging use
| of LLMs?
|
| The document includes statements like "LLMs are superlative at
| reading comprehension", "LLMs can be excellent editors", "LLMs
| are amazingly good at writing code".
|
| The caveats are really useful: if you've anchored your
| expectations on "these tools are amazing", the caveats bring
| you closer to what they've observed.
|
| Or, if you're anchored on "the tools aren't to be used", the
| caveats give credibility to the document's suggestions of the
| LLMs are useful for.
| mathgeek wrote:
| Junior engineers are the usual comparison folks make to LLMs,
| which is apt as juniors need lots of oversight.
| ares623 wrote:
| I would think some of their engineers love using LLMs, it would
| be unfair to them to completely disallow it IMO (even as
| someone who hates LLMs)
| sudomateo wrote:
| Medication is littered with warning labels but humans still use
| it to combat illness. Social media can harm mental health yet
| people still use it. Pick whatever other example you'd like.
|
| There are things in life that have high risks of harm if
| misused yet people still use them because there are great
| benefits when carefully used. Being aware of the risks is the
| key to using something that can be harmful, safely.
| saagarjha wrote:
| There's a lot of code that doesn't hit prod.
| thundergolfer wrote:
| A measured, comprehensive, and sensible take. Not surprising from
| Bryan. This was a nice line:
|
| > it's just embarrassing -- it's as if the writer is walking
| around with their intellectual fly open.
|
| I think Oxide didn't include this in the RFD because they
| exclusively hire senior engineers, but in an organization that
| contains junior engineers I'd add something specific to help
| junior engineers understand how they should approach LLM use.
|
| Bryan has 30+ years of challenging software (and now hardware)
| engineering experience. He memorably said that he's worked on and
| completed a "hard program" (an OS), which he defines as a program
| you doubt you can actually get working.
|
| The way Bryan approaches an LLM is super different to how a 2025
| junior engineer does so. That junior engineer possibly hasn't
| programmed without the tantalizing, even desperately tempting
| option to be assisted by an LLM.
| pests wrote:
| > That junior engineer possibly hasn't programmed without the
| tantalizing, even desperately tempting option to be assisted by
| an LLM.
|
| Years ago I had to spend many months building nothing but
| Models (as in MVC) for a huge data import / ingest the company
| I worked on was rewriting. It was just messy enough that it
| couldn't be automated. I almost lost my mind from the dull
| monotony and started even having attendance issues. I know
| today that could have been done with an LLM in minutes. Almost
| crazy how much time I put into that project compared to if I
| did it today.
| aatd86 wrote:
| The issue is that it might look good but an LLM often inserts
| weird mistakes. Or ellipses. Or overindex on the training
| data. If someone is not careful it is easy to completely
| wreck the codebase by piling on seemingly innocuous commits.
| So far I have developed a good sense for when I need to push
| the llm to avoid sloppy code. It is all in the details.
|
| But a junior engineer would never find/anticipate those
| issues.
|
| I am a bit concerned. Because the kind of software I am
| making, a llm would never prompt on its own. A junior cannot
| make it, it requires research and programming experience that
| they do not have. But I know that if I were a junior today, I
| would probably try to use llms as much as possible and would
| probably know less programming over time.
|
| So it seems to me that we are likely to have worse software
| over time. Perhaps a boon for senior engineers but how do we
| train junior devs in that environment? Force them to build
| slowly, without llms? Is it aligned with business incentives?
|
| Do we create APIs expecting the code to be generated by LLMs
| or written by hand? Because the impact of verbosity is not
| necessarily the same. LLMs don't get tired as fast as humans.
| AlexCoventry wrote:
| > So it seems to me that we are likely to have worse
| software over time.
|
| IMO, it's already happening. I had to change some personal
| information on a bunch of online services recently, and two
| out of seven of them were down. One of them is still down,
| a week later. This is the website of a major utilities
| company. When I call them, they acknowledge that it's down,
| but say my timing is just bad. That combined with all the
| recent outages has left me with the impression that
| software has been getting (even more) unreliable, recently.
| agentultra wrote:
| They are trained on code people had to make sacrifices for:
| deadlines, shortcuts, etc. And code people were simply too
| ignorant to be writing in the first place. Lots of code
| with hardly any coding standards.
|
| So of course it's going to generate code that has non-
| obvious bugs in it.
|
| Ever play the Undefined Behaviour Game? Humans are bad at
| being compilers and catching mistakes.
|
| I'd hoped... maybe still do, that the future of programming
| isn't a shrug and, "good enough." I hope we'll keep
| developing languages and tools that let us better specify
| programs and optimize them.
| ambicapter wrote:
| If it's such a mind numbing problem it's easy to check it
| though, and the checking you do after the LLM will be much
| smaller than you writing every field (implicitly "checking"
| it when you write it).
|
| Obviously if it's anything even minorly complex you can't
| trust the LLM hasn't found a new way to fool you.
| pests wrote:
| This is exactly it. There wasn't any complex logic. Just
| making sure the right fields were mapped, some renaming,
| and sometimes some more complex joins depending on the
| incoming data source and how it was represented (say
| multiple duplicate rows or a single field with comma
| delimited id's from somewhere else). I would have much
| rather scanned the LLM output line by line (and most
| would be simple, not very indented) then hand writing
| from scratch. I do admit it would take some time to
| review and cross reference, but I have no doubt it would
| have been a fraction of the time and effort.
| zackerydev wrote:
| I remember in the very first class I ever took on Web Design
| the teacher spent an entire semester teaching "first
| principles" of HTML, CSS and JavaScript by writing it in
| Notepad.
|
| It was only then did she introduce us to the glory that was
| Adobe Dreamweaver, which (obviously) increased our productivity
| tenfold.
| girvo wrote:
| I miss Dreamweaver. Combining it with Fireworks was a crazy
| productive combo for me back in the mid 00's!
|
| My first PHP scripts and games were written using nothing
| more than Notepad too funnily enough
| panzi wrote:
| Back in the early 00s I brought gvim.exe on a floppy disk
| to school because I refused to write XSLT, HTML, CSS, etc
| without auto-indent or syntax highlighting.
| frankest wrote:
| DreamWeaver absolutely destroyed the code with all kinds of
| tags and unnecessary stuff. Especially if you used the visual
| editor. It was fun for brainstorming but plain notepad with
| clean understandable code was far far better (and with the
| browser compatibility issues the only option if you were
| going to production).
| christophilus wrote:
| After 25 or so years doing this, I think there are two
| kinds of developers: craftsmen and practical "does it get
| the job done" types. I'm the former. The latter seem to be
| what makes the world go round.
| fragmede wrote:
| It takes both.
| ghurtado wrote:
| If you've been doing it for that long (about as long as I
| have), then surely you remember all the times you had to
| clean up after the "git 'er done" types.
|
| I'm not saying they don't have their place, but without
| us they would still be making the world go round. Only
| backwards.
| thebruce87m wrote:
| > all the times you had to clean up after the "git 'er
| done" types
|
| It's lovely to have the time to do that. This time comes
| once the other type of engineer has shipped the product
| and turned the money flow on. Both types have their
| place.
| bigfatkitten wrote:
| I work in digital forensics and incident response. The
| "git 'er done" software engineers have paid my mortgage
| and are putting my kids through private schooling.
| ambicapter wrote:
| Well, going round in a circle does project to going
| forwards then backwards in a line :)
| KronisLV wrote:
| I think there's more dimensions that also matter a bunch:
| * a bad craftsman will get pedantic about the wrong
| things (e.g. SOLID/DRY as dogma) and will create
| architectures that will make development velocity plummet
| ("clever" code, deep inheritance chains, "magic" code
| with lots of reflection etc.) * a bad practician
| will not care about long term maintainability either, or
| even correctness enough not to introduce a bunch of bad
| bugs or slop, even worse when they're subtle enough to
| ship but mess up your schema or something
|
| So you can have both good and bad outcomes with either,
| just for slightly different reasons (caring about the
| wrong stuff vs not caring).
|
| I think the sweet spot is to strive for code that is easy
| to read and understand, easy to change, and easy to
| eventually replace or throw out. Obviously performant
| _enough_ but yadda yadda premature optimization, depends
| on the domain and so on...
| frankest wrote:
| After becoming a founder and having to deal with my own
| code for a decade, I've learned a balance. Prototype fast
| with AI crap to get the insight than write slow with
| structure for stuff that goes to production. AI does not
| touch production code - ask when needed to fix a tiny
| bit, but keep the beast at arms distance.
| tarsinge wrote:
| I am both, I own a small agency when I have to be
| practical, and have fun crafting code on the hobby side.
|
| I think what craftsmen miss is the different goals.
| Projects fall on a spectrum from long lived app that
| constantly evolve with a huge team working on it to not
| opened again after release. In the latter, like movie or
| music production (or most video games), only the end
| result matters, the how is not part of the final product.
| Working for years with designers and artists really gave
| me perspective on process vs end result and what matter.
|
| That doesn't mean the end result is messy or doesn't have
| craftsmanship. Like if you call a general contractor or
| carpenter for a specific stuff, you care that the end
| result is well made, but if they tell you that they built
| a whole factory for your little custom made project (the
| equivalent of a nice codebase), not only it doesn't
| matter for you but it'll be wildly overpriced and
| delayed. In my agency that means the website is good
| looking and bug free after being built, no matter how
| messy is the temporary construction site.
|
| In contrast if you work on a SaaS or a long lived project
| (e.g. an OS) the factory (the code) is the product.
|
| So to me when people say they are into code craftsmanship
| I think they mean in reality they are more interested in
| factory building than end product crafting.
| jfreds wrote:
| I agree wholeheartedly. As for the _why_ do craftsmen
| care so much about the factory instead of the product, I
| believe the answer is pride. It's a bitter pill to
| swallow, but writing and shipping a hack is _sometimes_
| the high road
| arevno wrote:
| I also do third party software development, and my
| approach is always: bill (highly, $300+/hr) for the
| features and requirements, but do the manual refactoring
| and architecture/performance/detail work on your own
| time. It benefits you, it benefits the client, it
| benefits the relationship, and it handles the
| misunderstanding of your normie clients with regard to
| what constitutes "working".
|
| Say it takes 2 hours to implement a feature, and another
| hour making it logically/architecturally correct. You
| bill $600 and eat $200 for goodwill and your own
| personal/organizational development. You're still making
| $200/hr and you never find yourself in meetings with
| normie clients about why refactoring, cohesiveness, or
| quality was necessary.
| chrisweekly wrote:
| The HTML generated by Dreamweaver's WYSIWYG mode might not
| have been ideal, but it was far superior to the mess
| produced by MS Front Page. With Dreamweave, it was at least
| possible to use it as a starting point.
| BobbyTables2 wrote:
| MS FrontPage also went out of its way to do the same.
| pram wrote:
| It's funny this came up, because it was kinda similar to
| the whole "AI frauds" thing these days.
|
| I don't particularly remember why, but "hand writing"
| fancy HTML and CSS used to be a flex in some circles in
| the 90s. A bunch of junk and stuff like fixed positioning
| in the source was the telltale sign they "cheated" with
| FrontPage or Dreamweaver lol
| supriyo-biswas wrote:
| My only gripe was that they tended to generate gobs of
| "unsemantic" HTML. You resized a table and expect it to
| be based on viewport width? No! It's hardcoded "width: X
| px" to whatever your size the viewport was set to.
| _joel wrote:
| It might have been pretty horrible but I hold Frontpage
| 97 with fond memories, it started my IT career, although
| not for HTML reasons.
|
| The _vti_cnf dir left /etc/passwd downloadable, so I
| grabbed it from my school website. One Jack the Ripper
| later and the password was found.
|
| I told the teacher resposible for the IT it was insecure
| and that ended up getting me some work experience. Ended
| up working the summer (waiting for my GCSE results) for
| ICL which immeasurably helped me when it was time to
| properly start working.
|
| Did think about defacing, often wonder that things could
| have turned out very much differently!
| msephton wrote:
| Judicious and careful use of Dreamweaver (its visual editor
| and properties bar) enabled me to write exactly the code I
| wanted. I used Dreamweaver foot table layouts and Home Site
| (later Top Style) for broader code edits. At that time I
| was famous with the company for being able to make any
| layout. Good times!
| ghurtado wrote:
| > glory that was Adobe Dreamweaver
|
| Dreamweaver was to web development what ...
|
| I just sat here for 5 minutes and I wasn't able to finish
| that sentence. So I think that's a statement in itself.
| riffraff wrote:
| ..VB6 was to windows dev?
|
| People with very little competence could and did get things
| done, but it was a mess underneath.
| pjmlp wrote:
| I love how people speak about Dreamweaver in the past, while
| Adobe keeps getting money for it,
|
| https://developer.adobe.com/dreamweaver/
|
| And yes, as you can imagine for the kind of comments I do
| regarding high level productive tooling and languages, I was
| a big Dreamwever fan back in the 2000's.
| keyle wrote:
| > That junior engineer possibly hasn't programmed without the
| tantalizing, even desperately tempting option to be assisted by
| an LLM.
|
| This gives me somewhat of a knee jerk reaction.
|
| When I started programming professionally in the 90s, the
| internet came of age and I remember being told "in my days, we
| had books and we remembered things" which of course is
| hilarious because today you can't possibly retain ALL the
| knowledge needed to be software engineer due to the sheer size
| of knowledge required today to produce a meaningful product.
| It's too big and it moves too fast.
|
| There was this long argument that you should know things and
| not have to look it up all the time. Altavista was a joke, and
| Google was cheating.
|
| Then syntax highlighting came around and there'd always be a
| guy going "yeah nah, you shouldn't need syntax highlighting to
| program, you screen looks like a Christmas tree".
|
| Then we got stuff like auto-complete, and it was amazing, the
| amount of keystrokes we saved. That too, was seen as heresy by
| the purists (followed later by LSP - which many today call
| heresy).
|
| That reminds me also, back in the day, people would have entire
| Encyclopaedia on DVDs collections. Did they use it? No. But
| they criticised Wikipedia for being inferior. Look at today,
| though.
|
| Same thing with LLMs. Whether you use them as a powerful
| context based auto-complete, as a research tool faster than
| wikipedia and google, as rubber-duck debugger, or as a text
| generator -- who cares: this is today, stop talking like a
| fossil.
|
| It's 2025 and junior developers can't work without LSP and LLM?
| It's fine. They're not in front of a 386 DX33 with 1 book of
| K&R C and a blue EDIT screen. They have massive challenged
| ahead of them, the IT world is complete shambles, and it's
| impossible to decipher how anything is made, even open source.
|
| Today is today. Use all the tools at hand. Don't shame kids for
| using the best tools.
|
| We should be talking about sustainability of such tools rather
| than what it means to use them (cf. enshittification, open
| source models etc.)
| sifar wrote:
| It is not clear though, which tools enable and which tools
| inhibit your development at the beginning of your journey.
| keyle wrote:
| Agreed, although LLMs definitely qualify as enabling
| developers compared to <social media, Steam, consoles, and
| other distractions> of today.
|
| The Internet itself is full of distractions. My younger
| self spent a crazy amount of time on IRC. So it's not
| different than spending time on say, Discord today.
|
| LLMs have pretty much a direct relationship with Google.
| The quality of the response has much to do with the quality
| of the prompt. If anything, it's the overwhelming nature of
| LLMs that might be the problem. Back in the day, if you
| had, say a library access, the problem was knowing what to
| look for. Discoverability with LLMs is exponential.
|
| As for LLM as auto-complete, there is an argument to be
| made that typing a lot reinforces knowledge in the human
| brain like writing. This is getting lost, but with
| productivity gains.
| girvo wrote:
| Watching my juniors constantly fight the nonsense auto
| completion suggestions their LLM editor of choice put in
| front of them, or worse watching them accept it and
| proceed to get entirely lost in the sauce, I'm not
| entirely convinced that the autocompletion part of it is
| the best one.
|
| Tools like Claude code with ask/plan mode seem to be
| better in my experience, though I absolutely do wonder
| about the lack of typing causing a lack of memory
| formation
|
| A rule I set myself a long time ago was to never copy
| paste code from stack overflow or similar websites. I
| always typed it out again. Slower, but I swear it built
| the comprehension I have today.
| zx8080 wrote:
| > but I swear it built the comprehension I have today.
|
| For interns/junior engineers, the choice is:
| comprehension VS career.
|
| And I won't be surprised if most of them will go with
| career now, and comprehension.. well thanks maybe
| tomorrow (or never).
| christophilus wrote:
| I don't think that's the dichotomy. I've been in charge
| of hiring at a few companies, and comprehension is what I
| look for 10 times out of 10.
| sysguest wrote:
| well you could get "interview-optimized" interviewees
| with impressive-looking mini-projects
| xorcist wrote:
| There are plenty of companies today where "not using AI
| enough" is a career problem.
|
| It shouldn't be, but it is.
| sevensor wrote:
| I have worked with a lot of junior engineers, and I'll
| take comprehension any day. Developing their
| comprehension is a huge part of my responsibility to them
| and to the company. It's pretty wasteful to take a human
| being with a functioning brain and ask them to churn out
| half understood code that works accidentally. I'm going
| to have to fix that eventually anyway, so why not get
| ahead of it and have them understand it so they can fix
| it instead of me?
| keyle wrote:
| > Watching my juniors constantly fight the nonsense auto
| completion suggestions their LLM editor of choice put in
| front of them, or worse watching them accept it and
| proceed to get entirely lost in the sauce, I'm not
| entirely convinced that the autocompletion part of it is
| the best one.
|
| That's not an LLM problem, they'd do the same thing 10
| years ago with stack overflow: argue about which answer
| is best, or trust the answer blindly.
| girvo wrote:
| No, it is qualitatively different because it happens in-
| line and much faster. If it's not correct (which it seems
| it usually isn't), they spend more time removing whatever
| garbage it autocompleted.
| menaerus wrote:
| People do it with the autocomplete as well so I guess
| there's not that much of a difference wrt LLMs. It likely
| depends on the language but people who are inexperienced
| in C++ would be over-relying on autocomplete to the point
| that it looks hilarious, if you have a chance to sit next
| to them helping to debug something for example.
| girvo wrote:
| For sure, but these new tools spit out a lot more and a
| lot faster, and it's usually correct "enough" that the
| compiler won't yell. It's been wild to see its
| suggestions be wrong _far_ more often than they are
| right, so I wonder how useful they really are at all.
|
| Normal auto complete plus a code tool like Claude Code or
| similar seem far more useful to me.
| sevensor wrote:
| > never copy paste code from stack overflow
|
| I have the same policy. I do the same thing for example
| code in the official documentation. I also put in a
| comment linking to the source if I end up using it. For
| me, it's like the RFD says, it's about taking
| responsibility for your output. Whether you originated it
| or not, you're the reason it's in the codebase now.
| zdragnar wrote:
| I spent the first two years or so of my coding career
| writing PHP in notepad++ and only after that switched to
| an IDE. I rarely needed to consult the documentation on
| most of the weird quirks of the language because I'd
| memorized them.
|
| Nowadays I'm back to a text editor rather than an IDE,
| though fortunately one with much more creature comforts
| than n++ at least.
|
| I'm glad I went down that path, though I can't say I'd
| really recommend as things felt a bit simpler back then.
| intended wrote:
| LLMs are in a context where they are the promised
| solution for most of the expected economic growth on one
| end, a tool to improve programmer productivity and skill
| while also being only better than doom scrolling?
|
| Thats comparison undermines the integrity of the argument
| you are trying to make.
| aprilthird2021 wrote:
| > When I started programming professionally in the 90s, the
| internet came of age and I remember being told "in my days,
| we had books and we remembered things" which of course is
| hilarious because today you can't possibly retain ALL the
| knowledge needed to be software engineer due to the sheer
| size of knowledge required today to produce a meaningful
| product. It's too big and it moves too fast.
|
| But I mean, you can get by without memorizing stuff sure, but
| memorizing stuff does work out your brain and does help out
| in the long run? Isn't it possible we've reached the cliff of
| "helpful" tools to the point we are atrophying enough to be
| worse at our jobs?
|
| Like, reading is surely better for the brain than watching
| TV. But constant cable TV wasn't enough to ruin our brains.
| What if we've got to the point it finally is enough?
| darkwater wrote:
| I'm sure I'm biased by my age (mid 40s) but I think you are
| onto something there. What if this constant decline in how
| people learn (on average) is not just a grumpy old man
| feeling? What if it's something real, that was smoothened
| out by the sheer increase of the student population between
| 1960 and 2010 and the improvements of tooling?
| Barrin92 wrote:
| >"in my days, we had books and we remembered things" which of
| course is hilarious
|
| it isn't hilarious, it's true. My father (now in his 60s) who
| came from a blue collar background with very little education
| taught himself programming by manually copying and editing
| software out of magazines, like a lot of people his age.
|
| I teach students now who have access to all the information
| in the world but a lot of them are quite literally so
| scatterbrained and heedless anything that isn't catered to
| them they can't process. Not having working focus and memory
| is like having muscle atrophy of the mind, you just turn into
| a vegetable. Professors across disciplines have seen decline
| in student abilities, and for several decades now, not just
| due to LLMs.
| menaerus wrote:
| Information 30 years ago was more difficult to obtain. It
| required manual labor but in todays' context there was not
| much information to be consumed. Today, we have the
| opposite - a huge vast of information that is easy to
| obtain but to process? Not so much. Decline is unavoidable.
| Human intelligence isn't increasing at the pace
| advancements are made.
| pjmlp wrote:
| Ah, but lets do leetcode on the whiteboard as interview, for
| an re-balancing a red-black tree, regardless of how long
| those people have been in the industry and the job position
| they are actually applying for.
| discreteevent wrote:
| > "in my days, we had books and we remembered things" which
| of course is hilarious because today you can't possibly
| retain ALL the knowledge needed to be software engineer
|
| Reading books was never about knowledge. It was about
| knowhow. You didn't need to read all the books. Just some. I
| don't know how many developers I met who would keep asking
| questions that would be obvious to anyone who had read the
| book. They never got the big picture and just wasted
| everyone's time, including their own.
|
| "To know everything, you must first know one thing."
| dachris wrote:
| For the other non-native speakers wondering, "fly" means your
| trouser zipper.
|
| He surely has his fly closed when cutting through the hype with
| reflection and pragmatism (without the extreme positions on
| both sides often seen).
| vaylian wrote:
| I was also confused when I read that sentence. Wikipedia has
| an article on it:
| https://en.wikipedia.org/wiki/Fly_(clothing)
| govping wrote:
| Interesting tension between craft and speed with LLMs. I've
| been building with AI assistance for the past week (terminal
| clients, automation infrastructure) and found the key is: use
| AI for scaffolding and boilerplate, but hand-refine anything
| customer-facing or complex. The 'intellectual fly open' problem
| is real when you just ship AI output directly. But AI + human
| refinement can actually enable better craft by handling the
| tedious parts. Not either/or, but knowing which parts deserve
| human attention vs which can be delegated.
| dicytea wrote:
| It's funny that I've seen people both argue that LLMs are
| exclusively useful only to beginners who know next to nothing
| and also that they are only useful if you are a 50+ YoE veteran
| at the top of their craft who started programming with punch
| cards since they were 5-years-old.
|
| I wonder which of these camps are right.
| Mtinie wrote:
| Both camps, for different reasons.
|
| For novices, LLMs are infinitely patient rubber ducks. They
| unstick the stuck; helping people past the coding and system
| management hurdles that once required deep dives through
| Stack Overflow and esoteric blog posts. When an explanation
| doesn't land, they'll reframe until one does. And because
| they're confidently wrong often enough, learning to spot
| their errors becomes part of the curriculum.
|
| For experienced engineers, they're tireless boilerplate
| generators, dynamic linters, and a fresh set of eyes at 2am
| when no one else is around to ask. They handle the mechanical
| work so you can focus on the interesting problems.
|
| The caveat for both: intentionality matters. They reward
| users who know what they're looking for and punish those who
| outsource judgment entirely.
| govping wrote:
| The craft vs practical tension with LLMs is interesting. We've
| found LLMs excel when there's a clear validation mechanism -
| for security research, the POC either works or it doesn't. The
| LLM can iterate rapidly because success is unambiguous.
|
| Where it struggles: problems requiring taste or judgment
| without clear right answers. The LLM wants to satisfy you,
| which works great for 'make this exploit work' but less great
| for 'is this the right architectural approach?'
|
| The craftsman answer might be: use LLMs for the
| systematic/tedious parts (code generation, pattern matching,
| boilerplate) while keeping human judgment for the parts that
| matter. Let the tool handle what it's good at, you handle what
| requires actual thinking.
| smcameron wrote:
| I found it funny that in a sentence that mentions "those who
| can recognize an LLM's reveals", a few words later, there's an
| em-dash. I've often used em-dashes myself, so I find it a bit
| annoying that use of em-dashes is widely considered to be an AI
| tell.
| bcantrill wrote:
| The em-dash alone is not an LLM-reveal -- it's how the em-
| dash is used to pace a sentence. In my experience, with an
| LLM, em-dashes are used to even pacing; for humans (and
| certainly, for me!), the em-dash is used to deliberately
| change pacing -- to introduce a pause (like that one!),
| followed by a bit of a (metaphorical) punch. The goal is to
| have you read the sentence as I would read it -- and I think
| if you have heard me speak, you can hear me in my writing.
| thundergolfer wrote:
| Too much has been written about em-dashes and LLMs, but I'd
| highly recommend _If it cites em dashes as proof, it came
| from a tool_ from Scott Smitelli if you haven 't read it.
|
| It's a brilliant skewering of the 'em dash means LLM'
| heuristic as a broken trick.
|
| 1. https://www.scottsmitelli.com/articles/em-dash-tool/
| btbuildem wrote:
| > The way Bryan approaches an LLM is super different to how a
| 2025 junior engineer does so
|
| This is a key difference. I've been writing software
| professionally for over two decades. It took me quite a long
| time to overcome certain invisible (to me) hesitations and
| objections to using LLMs in sdev workflows. At some point the
| realization came to me that this is simply the new way of doing
| things, and from this point onward, these tools will be deeply
| embedded in and synonymous with programming work. Recognizing
| this phenomenon for what it is somehow made me feel young again
| -- perhaps that's just the crust breaking around a calcified
| grump, but I do appreciate being able to tap into that all the
| same.
| bryancoxwell wrote:
| Find it interesting that the section about LLM's tells when using
| it for writing is absolutely littered with emdashes
| matt_daemon wrote:
| I believe Bryan is a well known em dash addict
| bryancoxwell wrote:
| And I mean no disrespect to him for it, it's just kind of
| funny
| rl3 wrote:
| > _I believe Bryan is a well known em dash addict_
|
| I was hoping he'd make the leaderboard, but perhaps the
| addiction took proper hold in more recent years:
|
| https://www.gally.net/miscellaneous/hn-em-dash-user-
| leaderbo...
|
| https://news.ycombinator.com/user?id=bcantrill
|
| No doubt his em dashes are legit, of course.
| minimaxir wrote:
| You can stop LLMs from using em-dashes by just telling it to
| "never use em-dashes". This same type of prompt engineering
| works to mitigate almost every sign of AI-generated writing,
| which is one reason why AI writing heuristics/detectors can
| never be fully reliable.
| dcre wrote:
| This does not work on Bryan, however.
| jgalt212 wrote:
| I guess, but if even in you set aside any obvious tells,
| pretty much all expository writing out of an LLM still reads
| like pablum without any real conviction or tons of hedges
| against observed opinions.
|
| "lack of conviction" would be a useful LLM metric.
| minimaxir wrote:
| I ran a test for a potential blog post where I take every
| indicator of AI writing and tell the LLM "don't do any of
| these" and resulted in high school AP English quality
| writing. Which could be considered a lack of conviction
| level of writing.
| bccdee wrote:
| To be fair, LLMs usually use em-dashes correctly, whereas I
| think this document misuses them more often than not. For
| example:
|
| > This can be extraordinarily powerful for summarizing
| documents -- or of answering more specific questions of a large
| document like a datasheet or specification.
|
| That dash shouldn't be there. That's not a parenthetical
| clause, that's an element in a list separated by "or." You can
| just remove the dash and the sentence becomes more correct.
| NobodyNada wrote:
| LLMs also generally don't put spaces around em dashes -- but
| a lot of human writers do.
| kimixa wrote:
| I think you're thinking of british-style "en-dashes" -
| which is often used for something that could have been
| separated by brackets but _do_ have a space either side -
| rather than "em" dashes. They can also be used in a
| similar place as a colon - that is to separate two parts of
| a single sentence.
|
| British users regularly use that sort of construct with "-"
| hyphens, simply because they're pretty much the same and a
| whole lot easier to type on a keyboard.
| the_af wrote:
| I don't know whether that use of the em-dash is grammatically
| correct, but I've seen enough native English writers use it
| like that. One example is Philip K Dick.
| bccdee wrote:
| Perhaps you have--or perhaps you've seen this construction
| instead, where (despite also using "or") the phrase on the
| other side of the dash is properly parenthetical and has
| its own subject.
| anonnon wrote:
| There was a comment recently by HN's most enthusiastic LLM
| cheerleader, Simon Willison, that I stopped reading almost
| immediately (before seeing who posted it), because it exuded
| the slop stench of an LLM:
| https://news.ycombinator.com/item?id=46011877
|
| However, I was surprised to see that when someone (not me)
| accused him of using an LLM to write his comment, he flatly
| denied it: https://news.ycombinator.com/item?id=46011964
|
| Which I guess means (assuming he isn't lying) if you spend too
| much time interacting with LLMs, you eventually resemble one.
| Jweb_Guru wrote:
| > if you spend too much time interacting with LLMs, you
| eventually resemble one
|
| Pretty much. I think people who care about reducing their
| children's exposure to screen time should probably take care
| to do the same for themselves wrt LLMs.
| Philpax wrote:
| I don't know what to tell you: that really does not read like
| it was written by a LLM. You were perhaps set off by the very
| first sentence, which sounds like it was responding to a
| prompt?
| an_ko wrote:
| I would have expected at least some consideration of public
| perception, given the extremely negative opinions many people
| hold about LLMs being trained on stolen data. Whether it's an
| ethical issue or a brand hazard depends on your opinions about
| that, but it's definitely at least one of those currently.
| john01dav wrote:
| He speaks of trust and LLMs breaking that trust. Is this not
| what you mean, but by another name?
|
| > First, to those who can recognize an LLM's reveals (an
| expanding demographic!), it's just embarrassing -- it's as if
| the writer is walking around with their intellectual fly open.
| But there are deeper problems: LLM-generated writing undermines
| the authenticity of not just one's writing but of the thinking
| behind it as well. If the prose is automatically generated,
| might the ideas be too? The reader can't be sure -- and
| increasingly, the hallmarks of LLM generation cause readers to
| turn off (or worse).
|
| > Specifically, we must be careful to not use LLMs in such a
| way as to undermine the trust that we have in one another
|
| > our writing is an important vessel for building trust -- and
| that trust can be quickly eroded if we are not speaking with
| our own voice
| tolerance wrote:
| I made the mistake of first reading this as a document intended
| for all in spite of it being public.
|
| This is a technical document that is useful in illustrating how
| the guy who gave a talk once that I didn't understand but was
| captivated by and is well-respected in his field intends to
| guide his company's use of the technology so that other
| companies and individual programmers may learn from it too.
|
| I don't think the objective was to take any outright ethical
| stance, but to provide guidance about something ostensibly used
| at an employee's discretion.
| john01dav wrote:
| > it is presumed that of the reader and the writer, it is the
| writer that has undertaken the greater intellectual exertion.
| (That is, it is more work to write than to read!)
|
| This applies to natural language, but, interestingly, the
| opposite is true of code (in my experience and that of other
| people that I've discussed it with).
| worble wrote:
| See: Kernighan's Law
|
| > Everyone knows that debugging is twice as hard as writing a
| program in the first place. So if you're as clever as you can
| be when you write it, how will you ever debug it?
|
| https://www.laws-of-software.com/laws/kernighan/
| DrewADesign wrote:
| I think people misunderstand this quote. Cleverness in this
| context is referring to complexity, and generally stems from
| falling in love with some complex mechanism you dream up to
| solve a problem rather than challenging yourself to create
| something simpler and easier to maintain. Bolting together
| bits of LLM-created code is is far more likely to be "clever"
| rather than good.
| SilverSlash wrote:
| What an amazing quote!
| tikhonj wrote:
| That's because embarrassingly bad writing is useless, while
| embarrassingly bad code can still make the computer do
| (roughly) the right thing and lets you tick off a Jira ticket.
| So we end up having way more room for awful code than for awful
| prose.
|
| Reading _good_ code can be a better way to learn about
| something than reading prose. Writing code like that takes some
| real skill and insight, just like writing clear explanations.
| zeroonetwothree wrote:
| Some writing is functional, e.g. a letter notifying someone
| of some information. For that type of writing even bad
| quality can achieve its purpose. Indeed probably the majority
| of words written are for functional reasons.
| jhhh wrote:
| I've had the same thought about 'written' text with an LLM. If
| you didn't spend time writing it don't expect me to read it. I'm
| glad he seems to be taking a hard stance on that saying they
| won't use LLMs to write non-code artifacts. This principle
| extends to writing code as well to some degree. You shouldn't
| expect other people to peer review 'your' code which was simply
| generated because, again, you spent no time making it. You have
| to be the first reviewer. Whether these cultural norms are held
| firmly remains to be seen (I don't work there), but I think they
| represent thoughtful application of emerging technologies.
| john01dav wrote:
| > Wherever LLM-generated code is used, it becomes the
| responsibility of the engineer. As part of this process of taking
| responsibility, self-review becomes essential: LLM-generated code
| should not be reviewed by others if the responsible engineer has
| not themselves reviewed it. Moreover, once in the loop of peer
| review, generation should more or less be removed: if code review
| comments are addressed by wholesale re-generation, iterative
| review becomes impossible.
|
| My general procedure for using an LLM to write code, which is in
| the spirit of what is advocated here, is:
|
| 1) First, feed in the existing relevant code into an LLM. This is
| usually just a few source files in a larger project
|
| 2) Describe what I want to do, either giving an architecture or
| letting the LLM generate one. I tell it to not write code at this
| point.
|
| 3) Let it speak about the plan, and make sure that I like it. I
| will converse to address any deficiencies that I see, and I
| almost always do.
|
| 4) I then tell it to generate the code
|
| 5) I skim & test the code to see if it's generally correct, and
| have it make corrections as needed
|
| 6) Closely read the entire generated artifact at this point, and
| make manual corrections (occasionally automatic corrections like
| "replace all C style casts with the appropriate C++ style casts"
| then a review of the diff)
|
| The hardest part for me is #6, where I feel a strong emotional
| bias towards not doing it, since I am not yet aware of any errors
| compelling such action.
|
| This allows me to operate at a higher level of abstraction
| (architecture) and remove the drudgery of turning an
| architectural idea into written, precise, code. But, when doing
| so, you are abandoning those details to a non-deterministic
| system. This is different from, for example, using a compiler or
| higher level VM language. With these other tools, you can
| understand how they work and rapidly have a good idea of what
| you're going to get, and you have robust assurances.
| Understanding LLMs helps, but thus not to the same degree.
| ryandrake wrote:
| I've found that your step 6 takes the vast majority of the time
| I spend programming with LLMs. Like 10X+ the combined total of
| time steps 1-5 take. And that's if the code the LLM produced
| _actually works_. If it doesn 't work (which happens quite
| often), then even more handholding and corrections are needed.
| It's really a grind. I'm still not sure whether I am net saving
| time using these tools.
|
| I always wonder about the people who say LLMs save them so much
| time: Do you just accept the edits they make without reviewing
| each and every line?
| Jaygles wrote:
| I exclusively use the autocomplete in cursor. I hate
| reviewing huge chunks of llm code at one time. With the
| autocomplete, I'm in full control of the larger design and am
| able to quickly review each piece of llm code. Very often it
| generates what I was going to type myself.
|
| Anything that involves math or complicated conditions I take
| extra time on.
|
| I feel I'm getting code written 2 to 3 times faster this way
| while maintaining high quality and confidence
| zeroonetwothree wrote:
| Maybe it subjectively feels like 2-3x faster but in studies
| that measure it we tend to see smaller improvements like in
| the range of 20-30% faster. It could be that you are an
| outlier, of course.
| Jaygles wrote:
| 2-3x faster on getting the code written. Fully completing
| a coding task maybe only 20-30% faster, if we count
| chasing down requirements, reviews, waiting for CI to
| pass so I can merge etc.
| NKjNkaka wrote:
| This is my preferred way as well. And when you think about
| it, it makes sense. With advanced autocomplete you are:
|
| 1. Keeping the context very small 2. Keeping the scope of
| the output very small
|
| With the added benefit of keeping you in the flow state
| (and in my experience making it more enjoyable).
|
| To anyone that even hates LLMs give autocomplete a shot
| (with a keying to toggle it if it annoys you, sometimes
| it's awful). It's really no different than typing it
| manually wrt quality etc, so the speed up isn't huge, but
| it feels a lot nicer.
| hedgehog wrote:
| You can have the tool start by writing an implementation plan
| describing the overall approach and key details including
| references, snippets of code, task list, etc. That is much
| faster than a raw diff to review and refine to make sure it
| matches your intent. Once that's acceptable the changes are
| quick, and having the machine do a few rounds of refinement
| to make sure the diff vs HEAD matches the plan helps iron out
| some of the easy issues before human eyes show up. The final
| review is then easier because you are only checking for
| smaller issues and consistency with the plan that you already
| signed off on.
|
| It's not magic though, this still takes some time to do.
| mythrwy wrote:
| If it's stuff I have have been doing for years and isn't
| terribly complex I've found its generally quick to skim
| review. I don't need to read every line I can glance at it,
| know it's a loop and why, a function call or whatever. If I
| see something unusual I take that as an opportunity to learn.
|
| I've seen LLMs write some really bad code a few times lately
| it seems almost worse than what they were doing 6 or 8 months
| ago. Could be my imagination but it seems that way.
| ec109685 wrote:
| Don't make manual corrections.
|
| If you keep all edits to be driven by the LLM, you can use that
| knowledge later in the session or ask your model to commit the
| guidelines to long term memory.
| klauserc wrote:
| The best way to get an LLM to follow style is to make sure
| that this style is evident in the codebase. Excessive
| instructions (whether through memories or AGENT.md) do not
| help as much.
|
| Personally, I absolutely hate instructing agents to make
| corrections. It's like pushing a wet noodle. If there is lots
| to correct, fix one or two cases manually and tell the LLM to
| follow that pattern.
|
| https://www.humanlayer.dev/blog/writing-a-good-claude-md
| qudat wrote:
| Insert before 4: make it generate tests that fail, review, then
| have it implement and make sure the tests pass.
|
| Insert before that: have it creates tasks with beads and force
| it to let you review before marking a task complete
| CerryuDu wrote:
| How the heck it does not upset your engineering pride and
| integrity, to limit your own contribution to _verifying and
| touching up machine slop_ , is beyond me.
|
| You obviously cannot emotionally identify with the code you
| produce this way; the ownership you might feel towards such
| code is nowhere near what meticulously hand-written code
| elicits.
| 000ooo000 wrote:
| >Wherever LLM-generated code is used, it becomes the
| responsibility of the engineer. As part of this process of taking
| responsibility, self-review becomes essential: LLM-generated code
| should not be reviewed by others if the responsible engineer has
| not themselves reviewed it
|
| By this own article's standards, now there are 2 authors who
| don't understand what they've produced.
| MobiusHorizons wrote:
| This is exactly what the advice is trying to mitigate. At least
| as I see it, the responsible engineer (meaning author, not some
| quality of the engineer) needs to understand the intent of the
| code they will produce. Then if using an llm, they must take
| full owners of that code by carefully reviewing it or molding
| it until it reflects their intent. If at the end of this the
| "responsible" engineer does not understand the code the advice
| has not been followed.
| rgoulter wrote:
| > LLM-generated writing undermines the authenticity of not just
| one's writing but of the thinking behind it as well.
|
| I think this points out a key point.. but I'm not sure the right
| way to articulate it.
|
| A human-written comment may be worth something, but an LLM-
| generated is cheap/worthless.
|
| The nicest phrase capturing the thought I saw was: "I'd rather
| read the prompt".
|
| It's probably just as good to let an LLM generate it again, as it
| is to publish something written by an LLM.
| averynicepen wrote:
| I'll give it a shot.
|
| Text, images, art, and music are all methods of expressing our
| internal ideas to other human beings. Our thoughts are the
| source, and these methods are how they are expressed. Our true
| goal in any form of communication is to understand the internal
| ideas of others.
|
| An LLM expresses itself in all the same ways, but the source
| doesn't come from an individual - it comes from a giant
| dataset. This could be considered an expression of the
| aggregate thoughts of humanity, which is fine in some contexts
| (like retrieval of ideas and information highly represented in
| the data/world), but not when presented in a context of
| expressing the thoughts of an individual.
|
| LLMs express the statistical summation of everyone's thoughts.
| It presents the mean, when what we're really interested in are
| the data points a couple standard deviations away from the
| mean. That's where all the interesting, unique, and thought
| provoking ideas are. Diversity is a core of the human
| experience.
|
| ---
|
| An interesting paradox is the use of LLMs for translation into
| a non-native language. LLMs are actively being used to better
| express an individual's ideas using words better than they can
| with their limited language proficiency, but for those of us on
| the receiving end, we interpret the expression to mirror the
| source and have immediate suspicions on the legitimacy of the
| individual's thoughts. Which is a little unfortunate for those
| who just want to express themselves better.
| crabmusket wrote:
| I think more people should read Naur's "programming as theory
| building".
|
| A comment is an attempt to more fully document the theory the
| programmer has. Not all theory can be expressed in code. Both
| code and comment are lossy artefacts that are "projections" of
| the theory into text.
|
| LLMs currently, I believe, cannot have a theory of the program.
| But they can definitely perform a useful simulacrum of such. I
| have not yet seen an LLM generated comment that is truly
| valuable. Of course, lots of human generated comments are not
| valuable either. But the ceiling for human comments is much,
| much higher.
| teaearlgraycold wrote:
| One thing I've noticed is that when writing something I
| consider insightful or creative with LLMs for autocompletion
| the machine can't successfully predict any words in the
| sentence except maybe the last one.
|
| They seem to be good at either spitting out something very
| average, or something completely insane. But something
| genuinely indicative of the spark of intelligence isn't common
| at all. I'm happy to know that while my thoughts are likely not
| original, they are at least not statistically likely.
| leobg wrote:
| > I'd rather read the prompt.
|
| That's what I think when I see a news headline. What are you
| writing? Who cares. WHY are you writing it -- that is what I
| want to know.
| weitendorf wrote:
| This is something that I feel rather conflicted about, because
| while I greatly dislike the LLM-slop-style writing that so many
| people are trying to abuse our attention with, I've started
| noticing that there are a large number of people (varying
| across "audiences"/communities/platforms") who don't really
| notice it, or at least that whoever is behind the slop is
| making the "right kind" of slop so that they don't.
|
| For example, I recently was perusing the /r/SaaS subreddit and
| could tell that most of the submissions were obviously LLM-
| generated, but often by telling a story that was meant to spark
| outrage, resonate with the "audience" (eg being doubted and
| later proven right), and ultimately conclude by validating them
| by making the kind of decision they typically would.
|
| I also would never pass this off as anything else, but I've
| been finding it effective to have LLMs write certain kinds of
| documentation or benchmarks in my repos, just so that
| they/I/someone else have access to metrics and code snippets
| that I would otherwise not have time to write myself. I've seen
| non-native English speakers write pretty technically
| useful/interesting docs and tech articles by translating
| through LLMs too, though a lot more bad attempts than good (and
| you might not be able to tell if you can't speak the
| language)...
|
| Honestly the lines are starting to blur ever so slightly for
| me, I'd still not want someone using an LLM to chat with me
| directly, but if someone who could have an LLM build a simple
| WASM/interesting game and then write an
| interesting/informative/useful article about it, or steer it
| into doing so... I might actually enjoy it. And not because the
| prompt was good: instructions telling an LLM to go make a game
| and do a write up don't help me as much or in the same way as
| being able to quickly see how well it went and any useful
| takeaways/tricks/gotchas it uncovered. It would genuinely be
| giving me valuable information and probably wouldn't be
| something I'd speculatively try or run myself.
| mcqueenjordan wrote:
| As usual with Oxide's RFDs, I found myself vigorously head-
| nodding while reading. Somewhat rarely, I found a part that I
| found myself disagreeing with:
|
| > Unlike prose, however (which really should be handed in a
| polished form to an LLM to maximize the LLM's efficacy), LLMs can
| be quite effective writing code de novo.
|
| Don't the same arguments against using LLMs to write one's prose
| also apply to code? Was this structure of the code and ideas
| within the engineers'? Or was it from the LLM? And so on.
|
| Before I'm misunderstood as a LLM minimalist, I want to say that
| I think they're incredibly good at solving for the blank page
| syndrome -- just getting a starting point on the page is useful.
| But I think that the code you actually want to ship is so far
| from what LLMs write, that I think of it more as a crutch for
| blank page syndrome than "they're good at writing code de novo".
|
| I'm open to being wrong and want to hear any discussion on the
| matter. My worry is that this is another one of the "illusion of
| progress" traps, similar to the one that currently fools people
| with the prose side of things.
| dcre wrote:
| In my experience, LLMs have been quite capable of producing
| code I am satisfied with (though of course it depends on the
| context -- I have much lower standards for one-off tools than
| long-lived apps). They are able to follow conventions already
| present in a codebase and produce something passable. Whereas
| with writing prose, I am almost never happy with the feel of
| what an LLM produces (worth noting that Sonnet and Opus 4.5's
| prose may be moving up from disgusting to tolerable). I think
| of it as prose being higher-dimensional -- for a given goal,
| often the way to express it in code is pretty obvious, and many
| developers would do essentially the same thing. Not so for
| prose.
| lukasb wrote:
| One difference is that cliched prose is bad and cliched code is
| generally good.
| joshka wrote:
| Depends on what your prose is for. If it's for documentation,
| then prose which matches the expected tone and form of other
| similar docs would be cliched in this perspective. I think
| this is a really good use of LLMs - making docs consistent
| across a large library / codebase.
| minimaxir wrote:
| I have been testing agentic coding with Claude 4.5 Opus and
| the problem is that it's _too good_ at documentation and
| test cases. It 's thorough in a way that it goes out of
| scope, so I have to edit it down to increase the signal-to-
| noise.
| girvo wrote:
| The "change capture"/straight jacket style tests LLMs
| like to output drive me nuts. But humans write those all
| the time too so I shouldn't be that surprised either!
| mulmboy wrote:
| What do these look like?
| pmg101 wrote:
| 1. Take every single function, even private ones.
| 2. Mock every argument and collaborator. 3. Call
| the function. 4. Assert the mocks were called in
| the expected way.
|
| These tests help you find inadvertent changes, yes, but
| they also create constant noise about changes you intend.
| ornornor wrote:
| Juniors on one of the teams I work with only write this
| kind of tests. It's tiring, and I have to tell them to
| test the behaviour, not the implementation. And yet every
| time they do the same thing. Or rather their AI IDE spits
| these out.
| senbrow wrote:
| These tests also break encapsulation in many cases
| because they're not testing the interface contract,
| they're testing the implementation.
| girvo wrote:
| You beat me to it, and yep these are exactly it.
|
| "Mock the world then test your mocks", I'm simply not
| convinced these have any value at all after my nearly two
| decades of doing this professionally
| diamond559 wrote:
| If the goal is to document the code and it gets
| sidetracked and focuses on only certain parts it failed
| the test. It just further proves llm's are incapable of
| grasping meaning and context.
| dcre wrote:
| Docs also often don't have anyone's name on them, in which
| case they're _already_ attributed to an unknown composite
| author.
| danenania wrote:
| A problem I've found with LLMs for docs is that they are
| like ten times too wordy. They want to document _every_
| path and edge case rather focusing on what really matters.
|
| It can be addressed with prompting, but you have to fight
| this constantly.
| bigiain wrote:
| I think probably my most common prompt is "Make it
| shorter. No more than ($x) (words|sentences|paragraphs)."
| pxc wrote:
| I've never been able to get that to work. LLMs can't
| count; they don't actually know how long their output is.
| pxc wrote:
| > A problem I've found with LLMs for docs is that they
| are like ten times too wordy
|
| This is one of the problems I feel with LLM-generated
| code, as well. It's almost always between 5x and long and
| 20x (!) as long as it needs to be. Though in the case of
| code verbosity, it's usually not because of thoroughness
| so much as extremely bad style.
| averynicepen wrote:
| Writing is an expression of an individual, while code is a tool
| used to solve a problem or achieve a purpose.
|
| The more examples of different types of problems being solved
| in similar ways present in an LLM's dataset, the better it gets
| at solving problems. Generally speaking, if it's a solution
| that works well, it gets used a lot, so "good solutions" become
| well represented in the dataset.
|
| Human expression, however, is diverse by definition. The
| expression of the human experience is the expression of a data
| point on a statistical field with standard deviations the size
| of chasms. An expression of the mean (which is what an LLM
| does) goes against why we care about human expression in the
| first place. "Interesting" is a value closely paired with
| "different".
|
| We value diversity of thought in expression, but we value
| efficiency of problem solving for code.
|
| There is definitely an argument to be made that LLM usage
| fundamentally restrains an individual from solving unsolved
| problems. It also doesn't consider the question of "where do we
| get more data from".
|
| >the code you actually want to ship is so far from what LLMs
| write
|
| I think this is a fairly common consensus, and my understanding
| is the reason for this issue is limited context window.
| twodave wrote:
| I argue that the intent of an engineer is contained
| coherently across the code of a project. I have yet to get an
| LLM to pick up on the deeper idioms present in a codebase
| that help constrain the overall solution towards these more
| particular patterns. I'm not talking about syntax or style,
| either. I'm talking about e.g. semantic connections within an
| object graph, understanding what sort of things belong in the
| data layer based on how it is intended to be read/written,
| etc. Even when I point it at a file and say, "Use the
| patterns you see there, with these small differences and a
| different target type," I find that LLMs struggle. Until they
| can clear that hurdle without requiring me to restructure my
| entire engineering org they will remain as fancy code
| completion suggestions, hobby project accelerators, and not
| much else.
| mac-attack wrote:
| Very well stated.
| themk wrote:
| I recently published an internal memo which covered the same
| point, but I included code. I feel like you still have a
| "voice" in code, and it provides important cues to the
| reviewer. I also consider review to be an important learning
| and collaboration moment, which becomes difficult with LLM
| code.
| AlexCoventry wrote:
| > I think that the code you actually want to ship is so far
| from what LLMs write
|
| It depends on the LLM, I think. A lot of people have a bad
| impression of them as a result of using cheap or outdated LLMs.
| mcqueenjordan wrote:
| I guess to follow up slightly more:
|
| - I think the "if you use another model" rebuttal is becoming
| like the No True Scotsman of the LLM world. We can get concrete
| and discuss a specific model if need be.
|
| - If the use case is "generate this function body for me", I
| agree that that's a pretty good use case. I've specifically
| seen problematic behavior for the other ways I'm seeing it
| OFTEN used, which is "write this feature for me", or trying to
| one shot too much functionality, where the LLM gets to touch
| data structures, abstractions, interface boundaries, etc.
|
| - To analogize it to writing: They shouldn't/cannot write the
| whole book, they shouldn't/cannot write the table of contents,
| they cannot write a chapter, IMO even a paragraph is too much
| -- but if you write the first sentence and the last sentence of
| a paragraph, I think the interpolation can be a pretty
| reasonable starting point. Bringing it back to code for me
| means: function bodies are OK. Everything else gets
| questionable fast IME.
| IgorPartola wrote:
| My suspicion is that this is a form of the paradox where you
| can recognize that the news being reported is wrong when it is
| on a subject in which you are an expert but then you move onto
| the next article on a different subject and your trust resumes.
|
| Basically if you are a software engineer you can very easily
| judge quality of code. But if you aren't a writer then maybe it
| is hard for you to judge the quality of a piece of prose.
| make_it_sure wrote:
| try Opus 4.5, you'll be surprised. It might be true for past
| versions of LLMs, but they advanced a lot.
| cheeseface wrote:
| There are cases where I would start the coding process by copy-
| pasting existing code (e.g. test suites, new screens in the UI)
| and this is where LLMs work especially well and produce code
| that is majority of the time production-ready as-is.
|
| A common prompt I use is approximately "Write tests for file X,
| look at Y on how to setup mocks."
|
| This is probably not "de novo" and in terms of writing is maybe
| closer to something like updating a case study powerpoint with
| the current customer's data.
| fallat wrote:
| The problem with this text is it's a written anecdote. Could all
| be fake.
| bgwalter wrote:
| Cantrill jumps on every bandwagon. When he assisted in cancelling
| a Node developer (not a native English speaker) over pronouns he
| was following the Zeitgeist, now "Broadly speaking, LLM use is
| encouraged at Oxide."
|
| He is a long way from Sun.
| crabmusket wrote:
| For those interested, here's a take from Bryan after that
| incident https://bcantrill.dtrace.org/2013/11/30/the-power-of-
| a-prono...
| KronisLV wrote:
| The change: https://github.com/joyent/libuv/pull/1015/files
|
| > Sorry, not interested in trivial changes like that.
|
| - bnoordhuis
|
| As a not native English speaker, I think the change itself is
| okay (women will also occasionally use computers), but saying
| you're not interested in merging it is kinda _cringe_ , for a
| lack of a better term - do you not realize that people will
| take issue with this and you're turning a trivial change into
| a messy discussion? Stop being a nerd and merge the damn
| changeset, it won't break anything either, read the room.
| Admittedly, I also view the people arguing in the thread to
| be similarly _cringe_ , purely on the basis that if someone
| is uninterested/opposed to stuff like this, you are
| exceedingly unlikely to be able to make them care.
|
| Feels the same as how allowlist/denylist reads more cleanly,
| as well as main for a branch name uses a very common word as
| well - as long as updating your CI config isn't too much
| work. To show a bit of empathy the other way as well, maybe
| people get tired of too many changes like that (e.g. if most
| of the stuff you review is just people poking the docs by
| rewording stuff to be able to say that they contributed to
| project X). Or maybe people love to take principled stances
| and to argue idk
|
| > ...it's not the use of the gendered pronoun that's at issue
| (that's just sloppy), but rather the _insistence_ that
| pronouns should in fact be gendered.
|
| Yeah, odd thing to get so fixated on when the they/them
| version is more accurate in this circumstance. While I don't
| cause drama when I see gendered ones (again, most people here
| have English as a second language), I wouldn't argue with
| someone a bunch if they wanted to correct the docs or
| whatever.
| bgwalter wrote:
| Also for those interested, here is Bryan's take on criticism
| of Sun:
|
| https://landley.net/history/mirror/linux/kissedagirl.html
|
| He wasn't fired or canceled. It is great to see Gen-Xers and
| Boomers having all the fun in the 1980s and 1990s and then
| going all prissy on younger people in the 2010s and trying to
| ruin their careers.
| sunshowers wrote:
| I didn't know about that incident before starting at Oxide, but
| if I'd known about it, it absolutely would have attracted me.
| I've written a large amount of technical content and not _once_
| in over a decade have I needed to use he /him pronouns in it.
| Bryan was 100% correct.
| bgwalter wrote:
| Joyent took funding from Peter Thiel. I have not seen attacks
| from Cantrill against Thiel for his political opinions, so he
| just punches down for street cred and goes against those he
| considers expendable.
|
| What about Oxide? Oxide is funded by Eclipse ventures, which
| now installed a Trump friendly person:
|
| https://www.reuters.com/business/finance/vc-firm-eclipse-
| tap...
| kace91 wrote:
| The guide is generally very well thought, but I see an issue in
| this part:
|
| It sets the rule that things must be actually read when there's a
| social expectation (code interviews for example) but otherwise...
| remarks that use of LLMs to assist comprehension has little
| downside.
|
| I find two problems with this:
|
| - there is incoherence there. If LLMs are flawless in reading and
| summarization, there is no difference with reading the original.
| And if they aren't flawless, then that flaw also extends to non
| social stuff.
|
| - in practice, I haven't found LLMs so good as reading
| assistants. I've send them to check a linked doc and they've just
| read the index and inferred the context, for example. Just
| yesterday I asked for a comparison of three technical books on a
| similar topic, and it wrongly guessed the third one rather than
| follow the three links.
|
| There is a significant risk in placing a translation layer
| between content and reader.
| gpm wrote:
| > Just yesterday I asked for a comparison of three technical
| books on a similar topic, and it wrongly guessed the third one
| rather than follow the three links.
|
| I would consider this a failure in their tool use capabilities,
| not their reading ones.
|
| To use them to read things (without relying on their much less
| reliable tool use) take the thing and put it in the context
| window yourself.
|
| They still aren't perfect of course, but they are reasonably
| good.
|
| Three whole books likely exceeds their context window size of
| course, I'd take this as a sign that they aren't up to a task
| of that magnitude yet.
| kace91 wrote:
| >Three whole books likely exceeds their context window size
| of course
|
| This was not "read all three books", this was "check these
| three links with the (known) book synopsis/reviews there" and
| it made up the third one.
|
| >I would consider this a failure in their tool use
| capabilities, not their reading ones.
|
| Id give it to you if I got an error message, but the text
| being enhanced with wrong-but-plausible data is clearly a
| failure of reliability.
| fastball wrote:
| > It sets the rule that things must be actually read when
| there's a social expectation (code interviews for example) but
| otherwise... remarks that use of LLMs to assist comprehension
| has little downside.
|
| I think you got this backwards, because I don't think the RFD
| said that at all. The point was about a social expectation for
| writing, not for reading.
| kace91 wrote:
| This is what I'm referencing:
|
| >using LLMs to assist comprehension should not substitute for
| actually reading a document where such reading is socially
| expected.
| tonkinai wrote:
| Based on paragraph length, I would assume that "LLMs as writers"
| is the most extensive use case.
| forrestthewoods wrote:
| > When debugging a vexing problem one has little to lose by using
| an LLM -- but perhaps also little to gain.
|
| This probably doesn't give them enough credit. If you can feed an
| LLM a list of crash dumps it can do a remarkable job producing
| both analyses and fixes. And I don't mean just for super obvious
| crashes. I was most impressed with a deadlock where numerous
| engineers and tried and failed to understand exactly how to fix
| it.
| nrhrjrjrjtntbt wrote:
| LLMs are good where there is a lot of detail but the answer to
| be found is simple.
|
| This is sort of the opposite of vibe coding, but LLMs are OK at
| that too.
| forrestthewoods wrote:
| > LLMs are good where there is a lot of detail but the answer
| to be found is simple.
|
| Oooo I like that. Will try and remember that one.
|
| Amusingly, my experience is that the longer an issue takes me
| to debug the simpler and dumber the fix is. It's tragic
| really.
| throwdbaaway wrote:
| After the latest production issue, I have a feeling that
| opus-4.5 and gpt-5.1-codex-max are perhaps better than me at
| debugging. Indeed my role was relegated to combing through the
| logs, finding the abnormal / suspicious ones, and feeding those
| to the models.
| bdangubic wrote:
| > assurance that the model will not use the document to train
| future iterations of itself.
|
| believing this in 2025 is really fascinating. this is like
| believing Meta won't use info they (i)legally collected about you
| to serve you ads
| AlexCoventry wrote:
| I wonder if they would be willing to publish the "LLMs at Oxide"
| advice, linked in the OP [1], but currently publicly
| inaccessible.
|
| [1]
| https://github.com/oxidecomputer/meta/tree/master/engineerin...
| sudomateo wrote:
| Disclaimer: Oxide employee here.
|
| To be honest there's really no secret sauce in there. It's
| primarily how to get started with agents, when to abandon your
| context and start anew, and advice on models, monitoring cost,
| and prompting. This is not to diminish the value of the
| information as it's good information written by great
| colleagues. I just wanted to note that most of the information
| can be obtained from the official AI provider documentation and
| blog posts from AI boosters like Thorsten Ball.
| AlexCoventry wrote:
| Thanks.
| sudomateo wrote:
| You're welcome. My colleague published the text for it:
| https://gist.github.com/david-
| crespo/5c5eaf36a2d20be8a3013ba...
| AlexCoventry wrote:
| Cool, thanks again to both of you. :-)
| StarterPro wrote:
| Nobody has yet to explain how an LLM can be better than a well
| paid human expert.
| nrhrjrjrjtntbt wrote:
| The not needing to pay it well.
| WhyOhWhyQ wrote:
| A well paid human expert can find lots of uses of LLMs. I'm
| still not convinced that humans will ever be totally replaced,
| and what work will look like is human experts using LLMs as
| another tool in the toolbox, just like how an engineer would
| have used a slide rule or mechanical calculator back in the
| day. The kind of work they're good at doesn't cover the full
| range of necessary engineering tasks, but they do open up new
| avenues. For instance, yesterday I was able to get the basic
| gist of three solutions for a pretty complex task in about an
| hour. The result of that was me seeing that two of them were
| unlikely to work for what I'm doing, so that now I can invest
| actual effort in the third solution.
| felipeerias wrote:
| Tools can make individuals and teams more effective. This is
| just as true for LLM-based tools as it was for traditional
| ones.
|
| The question is not whether one (1) LLM can replace one (1)
| expert.
|
| Rather, it is how much farther an expert can get through better
| tooling. In my experience, it can be pretty far indeed.
| 0x0000000 wrote:
| > Ironically, LLMs are especially good at evaluating documents to
| assess the degree that an LLM assisted their creation
|
| Is there any evidence for this?
| fearnot wrote:
| no
| sethops1 wrote:
| If anything my experience has been the opposite of this. LLM
| detection is guesswork for an LLM.
| koolala wrote:
| I disagree with LLM's as Editors. The about of -- in the post is
| crazy.
| keeda wrote:
| Here's the only simple, universal law that should apply:
|
| THOU SHALT OWN THE CODE THAT THOU DOST RENDER.
|
| All other values should flow from that, regardless of whether the
| code itself is written by you or AI or by your dog. If you look
| at the values in the article, they make sense even without LLMs
| in the picture.
|
| The source of workslop is not AI, it's a lack of ownership. This
| is especially true for Open Source projects, which are seeing a
| wave of AI slop PR's precisely because the onus of ownership is
| largely on the maintainers and not the upstart "contributors."
|
| Note also that this does not imply a universal set of values.
| Different organizations may well have different values for what
| ownership of code means -- E.g. in the "move fast, break things"
| era of FaceBook, workslop may have been perfectly fine for Zuck!
| (I'd bet it may even have hastened the era of "Move fast with
| stable infrastructure.") But those values must be consistently
| applied regardless of how the code came to be.
| cobertos wrote:
| > LLMs are especially good at evaluating documents to assess the
| degree that an LLM assisted their creation!)
|
| That's a bold claim. Do they have data to back this up? I'd only
| have confidence to say this after testing this against multiple
| LLM outputs, but does this really work for, e.g. the em dash
| leaderboard of HN or people who tell an LLM to not do these 10
| LLM-y writing cliches? I would need to see their reasoning on why
| they think this to believe.
| elAhmo wrote:
| I would be surprised they have any data about this. There are
| so many ways LLMs can be involved, from writing everything, to
| making text more concise or just "simple proofreading".
| Detecting all this with certainty is not trivial and probably
| not possible with the current tools we have.
| yard2010 wrote:
| I thought about it - a quick way to verify whether something
| was created with LLM is to feed an LLM half of the text and
| then let it complete token by token. Every completion, check
| not just for the next token but the next n-probable tokens. If
| one of them is the one you have in the text, pick it and
| continue. This way, I think, you can identify how much the
| model is "correct" by predicting the text it hasn't yet seen.
|
| I didn't test it and I'm far from an expert, maybe someone can
| challenge it?
| jampekka wrote:
| That seems somewhat similar to perplexity based detection,
| although you can just get the probabilities of each token
| instead of picking n-best, and you don't have to generate.
|
| It kinda works, but is not very reliable and is quite
| sensitive to which model the text was generated with.
|
| This page has nice explanations:
|
| https://www.pangram.com/blog/why-perplexity-and-
| burstiness-f...
| akoboldfrying wrote:
| I expect that, for values of n for which this test
| consistently reports "LLM-generated" on LLM-generated inputs,
| it will also consistently report "LLM-generated" on human-
| generated inputs. But I haven't done the test either so I
| could be wrong.
| bcantrill wrote:
| I am really surprised that people are surprised by this, and
| honestly the reference was so casual in the RFD because it's
| probably the way that I use LLMs the most (so very much coming
| from my own personal experience). I will add a footnote to the
| RFD to explain this, but just for everyone's benefit here: at
| Oxide, we have a very writing-intensive hiring process.[0]
| Unsurprisingly, over the last six months, we have seen an
| explosion of LLM-authored materials (especially for our
| technical positions). We have told applicants to be careful
| about doing this[1], but they do it anyway. We have also seen
| this coupled with outright fraud (though less frequently).
| Speaking personally, I spend a _lot_ of time reviewing
| candidate materials, and my ear has become very sensitive to
| LLM-generated materials. So while I generally only engage an
| LLM to aid in detection when I already have a suspicion, they
| have proven adept. (I also elaborated on this a little in our
| podcast episode with Ben Shindel on using LLMs to explore the
| fraud of Aidan Toner-Rodgers.[2])
|
| I wasn't trying to assert that LLMs can find _all_ LLM-
| generated content (which feels tautologically impossible?),
| just that they are useful for the kind of LLM-generated content
| that we seek to detect.
|
| [0] https://rfd.shared.oxide.computer/rfd/0003
|
| [1] https://oxide.computer/careers
|
| [2] https://oxide-and-friends.transistor.fm/episodes/ai-
| material...
| 12300886574321 wrote:
| I debated not writing this, as I planned on re-applying
| again, as oxide is in many ways a dream company for me, and
| didn't want this to hurt my chances if I could be identified
| and it was seen as negative or critical (I hope not, I'm just
| relaying my experience, as honestly as I can!), but I felt
| like I needed to make this post (my first on HN, a longtime
| lurkerj). I applied in the last 6 months, and against my
| better judgement, encouraged by the perceived company
| culture, the various luminaries on the team, the varied
| technical and non-technical content on the podcasts, and my
| general (unfortunate) propensity for honesty, I was more
| vulnerable than normal in a tech application, and spent many
| hours writing it. (fwiw, it's not super relevant to what I'll
| get to, but you can and should assume I am a longtime Rust
| programmer (since 1.0) with successful open source libraries,
| even ones used by oxide, but also a very private person, no
| socials, no blogging, etc., so much to my chagrin, I assumed
| I would be a shoe-in :)) After almost 3 months, I was
| disappointed (and surprised if I'm being honest, hubris,
| indeed!) to receive a very bland, uninformative rejection
| email for the position, stating they received too many
| applications for the position (still not filled as of today!)
| and would not proceed at this time, and welcome to re-apply,
| etc. Let me state: this is fine, this is not my first rodeo!
| I have a well paying (taking the job would have been a
| significant paycut, but that's how much I wanted to work
| there!), albeit at the moment, unchallenging job at a large
| tech company. What I found particularly objectionable was
| that my writing samples (urls to my personal samples) _were
| never accessed_.
|
| This is or could be signal for a number of things, but what
| was particularly disappointing was the _heavy emphasis_ on
| writing in the application packet and the company culture, as
| e.g., reiterated by the founder I 'm replying to, and yet my
| writing samples were never even read? I have been in tech for
| many years, seen all the bullshit in recruiting, hiring,
| performed interviews many times myself, so it wouldn't be
| altogether surprising that a first line recruiter throws a
| resume into a reject pile for <insert reasons>, but then I
| have so many other questions - why the 3 months delay if
| tossed quickly, and if it truly was read by the/a founder or
| heavily scrutinized, as somewhat indicated by the post, why
| did they not access my writing samples? There are just more
| questions now. All of this was bothersome, and if I'm being
| honest, made me question joining the company, but what really
| made me write this response, is that I am now worried, given
| the content of the post I'm replying to, whether my
| application was flagged as LLM generated? I don't think my
| writing style is particularly LLMish, but in case that's in
| doubt, believe me or not, my application, and this response
| does not have a single word from an LLM. This is all, sui
| generis, me, myself, and I. (This doesn't quite explain why
| my samples weren't accessed, but if I'm being charitable,
| perhaps the content of the application packet seemed of
| dubious provenance?) Irregardless, if it was flagged, I
| suppose the long and short of this little story is: are you
| sending applicants rejection letters noting this suspicion,
| at least as a courtesy? If I was the victim of a false
| positive, I would at least like to know. This isn't some last
| ditch attempt (the rejection was many months ago) to get re-
| eval'd; I have a job, I can reapply in my own time, and even
| if this was an oversight or mistake (although not accessing
| the writing samples at all is somewhat of a red flag _for me_
| ), there is no way they can contact me through this burner
| account, it's just, like, the _principle_ of it, and the
| words needed to be said :) Thank you, and PS, even through it
| all, I (perhaps now guiltily) _still_ love your podcast :D
| venturecruelty wrote:
| I mean this nicely: please don't prostrate yourself for
| these companies. Please have some more respect for
| yourself.
| csb6 wrote:
| Strange to see no mention of potential copyright violations found
| in LLM-generated code (e.g. LLMs reproducing code from Github
| verbatim without respecting the license). I would think that
| would be a pretty important consideration for any software
| development company, especially one that produces so much free
| software.
| dboreham wrote:
| Is there current generation LLMs do this? I suppose I mean "do
| this any more than human developers do".
| theresistor wrote:
| A very recent example:
| https://github.com/ocaml/ocaml/pull/14369
| phyzome wrote:
| ...what a remarkable thread.
| menaerus wrote:
| Right? If this is really true, that some random folk
| without compiler engineering experience, implemented a
| completely new feature in ocaml compiler by prompting the
| LLM to produce the code for him, then I think it really
| is remarkable.
| ccortes wrote:
| Oh wow, is that what you got from this?
|
| It seems more like a non experienced guy asked the LLM to
| implement something and the LLM just output what and
| experienced guy did before, and it even gave him the
| credit
| rcxdude wrote:
| Copyright notices and signatures in generative AI output
| are generally a result of the expectation created by the
| training data that such things exist, and are generally
| unrelated to how much the output corresponds to any
| particular piece of training data, and especially to who
| exactly produced that work.
|
| (It is, of course, exceptionally lazy to leave such
| things in if you are using the LLM to assist you with a
| task, and can cause problems of false attribution.
| Especially in this case where it seems to have just
| picked a name of one of the maintainers of the project)
| menaerus wrote:
| Did you take a look at the code? Given your response I
| figure you did not because if you did you would see that
| the code was _not_ cloned but genuinely compiled by the
| LLM.
| kfajdsl wrote:
| It's one thing for you (yes, you, the user using the
| tool) to generate code you don't understand for a side
| project or one off tool. It's another thing to expect
| your code to be upstreamed into a large project and let
| others take on the maintenance burden, not to mention
| review code you haven't even reviewed yourself!
|
| Note: I, myself, am guilty of forking projects, adding
| some simple feature I need with an LLM quickly because I
| don't want to take the time to understand the codebase,
| and using it personally. I don't attempt to upstream
| changes like this and waste maintainers' time until I
| actually take the time myself to understand the project,
| the issue, and the solution.
| menaerus wrote:
| What are you talking about? It was ridiculously useful
| debugging feature that nobody in their sanity would block
| because "added maintenance". MR was rejected purely
| because of political/social reasons.
| yard2010 wrote:
| >> Here's my question: why did the files that you submitted
| name Mark Shinwell as the author?
|
| > Beats me. AI decided to do so and I didn't question it. I
| did ask AI to look at the OxCaml implementation in the
| beginning.
|
| This shows that the problem with AI is philosophical, not
| practical
| don-bright wrote:
| Also since LLM generated content is not copyrightable what
| happens to code you publish as Copyleft license? The entire
| copyleft system is based on the idea of a human holding
| copyright to copyleft code. Is a big chunk of it, the LLM part,
| basically public domain? How do you ensure theres enough human
| content to make it copyrightable and hence copyleftable....
| IshKebab wrote:
| > since LLM generated content is not copyrightable
|
| That's not how it works. If you ask an LLM to write Harry
| Potter and it writes something that is 99% the same as Harry
| Potter, it isn't magically free of copyright. That would
| obviously be insane.
|
| The legal system is still figuring out exactly what the rules
| are here but it seems likely that it's going to be on the LLM
| user to know if the output is protected by copyright. I
| imagine AI vendors will develop secondary search thingies to
| warn you (if they haven't already), and there will probably
| be some "reasonable belief" defence in the eventual laws.
|
| Either way it definitely isn't as simple as "LLM wrote it so
| we can ignore copyright".
| rcxdude wrote:
| I think the poster is looking at it from the other way:
| purely machine-generated content is not generally
| copryrightable, even if it can violate copyright. So it's
| more a question of can a coplyleft license like GPL
| actually protect something that's original but primarily
| LLM generated? Should it do so?
|
| (From what I understand, the amount of human input that's
| required to make the result copyrightable can be pretty
| small, perhaps even as little as selecting from multiple
| options. But this is likely to be quite a gray area.)
| rafterydj wrote:
| >it seems likely that it's going to be on the LLM user to
| know if the out is protected by copyright.
|
| To me, this is what seems more insane! If you've never read
| Harry Potter, and you ask an LLM to write you a story about
| a wizard boy, and it outputs 80% Harry Potter - how would
| you even know?
|
| > there will be probably be some "reasonable belief"
| defence in eventual laws.
|
| This is probably true, but it's irksome to shift all blame
| away from the LLM producers, using copy-written data to
| peddle copy-written output. This simply turns the business
| into copyright infringement as a service - what incentive
| would they have to actually build those "secondary search
| thingies" and build them well?
|
| > it definitely isn't as simple as "LLM wrote it so we can
| ignore copyright".
|
| Agreed. The copyright system is getting stress tested. It
| will be interesting to see how our legal systems can adapt
| to this.
| IshKebab wrote:
| > how would you even know?
|
| The obvious way is by searching the training data for
| close matches. LLMs need to do that and warn you about
| it. Of course the problem is they all trained on pirated
| books and then deleted them...
|
| But either way it's kind of a "your problem" thing. You
| can't really just say "I invented this great tool and it
| sometimes lets me violate copyright without realising.
| You don't mind do you, copyright holders?"
| fastball wrote:
| Has anything like this worked its way through the courts yet?
| adastra22 wrote:
| Yes, training is considered fair use, and output is non-
| copyrightable / public domain. With many asterix and
| footnotes, of course.
| Madmallard wrote:
| Don't see how output being public domain makes sense when
| they could be outputting copyrighted code.
|
| Shouldn't the right's extend forward and simply require the
| LLM code to be deleted?
| menaerus wrote:
| First, you have to prove it that it produced the
| copyrighted code. The question is what copyrighted code
| is in the first place? Literal copy-paste from source is
| easy but I think 99% of the time this isn't the case.
| adastra22 wrote:
| With many asterix and footnotes. One of which being that
| if it literally output the exact code, of course that
| would be copyright infringement. Something that greatly
| resembled but with minor changes would be a gray area.
|
| Those kinds of cases, although they do happen, are
| exceptional. In a typical output that doesn't not line-
| for-line resemble a single training input, it is
| considered a new, but non-copyrightable work.
| vegardx wrote:
| (I'm not a lawyer)
|
| You should be careful about speaking in absolute terms
| when talking about copyright.
|
| There is nothing that prevents multiple people from
| owning copyright to identical works. This is also why
| copyright infringement is such a mess to litigate.
|
| I'd also be interested in knowing why you think code
| generated by LLMs can't be copyrighted. That's quite a
| statement.
|
| There's also the problem with copyright law and different
| jurisdictions.
| fearnot wrote:
| I fully disagree with 1) the stance, 2) the conclusions.
| Simplita wrote:
| Oxide's approach is interesting because it treats LLMs as a tool
| inside a much stricter engineering boundary. Makes me wonder how
| many teams would avoid chaos if they adopted the same discipline.
| hexo wrote:
| "LLMs are amazingly good at writing code" that one was good. I
| cant stop laughing.
| Madmallard wrote:
| I wrote an entire multiplayer game in XNA that I've tried
| repeatedly to get LLMs to translate to javascript
|
| it's just utterly hopeless how bad they are at doing it
|
| even if I break it down into parts once you get into the stuff
| that actually matters i.e. the physics, event handling, and
| game logic induced by events, it just completely falls apart
| 100% of the time
| azemetre wrote:
| I felt this the other day. I wouldn't even consider my
| example exotic, p2p systems using electron? It just couldn't
| figure out how to work with YJS correctly.
|
| These things aren't hard if you're familiar with the
| documentation and have made them before, but what there is is
| an extreme dearth of information about it compared to web dev
| tutorials.
| leecommamichael wrote:
| I agree with your sentiment, but I do find it amazing that the
| underlying techniques of inference can emit code that is as
| apparently coherent as it is. (This does not imply actual
| coherence.)
| philippta wrote:
| > LLM-generated code should not be reviewed by others if the
| responsible engineer has not themselves reviewed it.
|
| To extend that: If the LLM is the author and the responsible
| engineer is the genuine first reviewer, do you need a second
| engineer at all?
|
| Typically in my experience one review is enough.
| bananapub wrote:
| yes, obviously?
|
| anyone who is doing serious enough engineering that they have
| the rule of "one human writes, one human reviews" wants two
| humans to actually put careful thought in to a thing, and only
| one of them is deeply incentivised to just commit the code.
|
| your suggestion means less review and worse incentives.
| Yeask wrote:
| anyone who is doing serious enough engineering is not using
| LLMS.
| ares623 wrote:
| Yeesss this is what I've been (semi-sarcastically) thinking
| about. Historically it's one author and one reviewer before
| code gets shipped.
|
| Why introduce a second reviewer and reduce the rumoured
| velocity gained by LLMs? After all, "it doesn't matter what
| wrote the code" right.
|
| I say let her rip. Or as the kids say, code goes brrr.
| sevensor wrote:
| I disagree. Code review has a social purpose as well as a
| technical one. It reinforces a shared understanding of the
| code and requires one person to assure another that the code
| is ready for review. It develops consensus about design
| decisions and agreement about what the code is for. With only
| one person, this is impossible. "Code goes brrr" is a neutral
| property. It can just as easily take you to the wrong
| destination as the right one.
| K0nserv wrote:
| More eyes are better, but more importantly code review is also
| about knowledge dissemination. If only the original author and
| the LLM saw the code you have a bus factor of 1. If another
| person reviews the bus factor is closer to 2.
| Madmallard wrote:
| "LLMs can be quite effective writing code de novo."
|
| Maybe for simple braindead tasks you can do yourself anyway.
|
| Try doing it on something actually hard or complex and they get
| it wrong 100/100 if they don't have adequate training data, and
| 90/100 if they do.
| Iridescent_ wrote:
| > Oxide employees bear responsibility for the artifacts we
| create, whatever automation we might employ to create them.
|
| Yes, allow the use of LLMs, encourage your employees to use them
| to move faster by rewarding "performance" regardless of risks,
| but make sure to place responsibility of failure upon them so
| that when it happens, the company culture should not be blamed.
| tizzy wrote:
| The idea that LLMs are amazing at comprehension but we are
| expected to read original documents seems contradictory to me?
| I'm also wary of using them as editors and losing the writers
| voice as that feels heavily prompt dependent and whether or not
| the writer does a final pass without any LLM. Asking someone else
| to re-write is losing your voice if you don't have an opinion on
| how the re-write turns out
| atmosx wrote:
| Nothing new here. Antirez for once has taken a similar stance on
| his YouTube video channel which has material on the topic. But
| it's worthwhile having a document like this publicly available by
| a company that the tech crowd seems to respect.
|
| <offtopic> The "RFD" here stands for "Reason/Request for
| Decision" or something else? (Request for Decision doesn't have a
| nice _ring_ on it tbh). I'm aware of RFCs ofc and the respective
| status changes (draft, review, accepted, rejected) or ADR
| (Architectural Decision Record) but have not come across the RFD
| acronym. Google gave several different answers. </offtopic>
| </offtopic>
| __jonas wrote:
| It stands for 'Request for Discussion':
|
| https://rfd.shared.oxide.computer/rfd/0001
| atmosx wrote:
| Thanks.
| peheje wrote:
| I know I'm walking into a den of wolves here and will probably
| get buried in downvotes, but I have to disagree with the idea
| that using LLMs for writing breaks some social contract.
|
| If you hand me a financial report, I expect you used Excel or a
| calculator. I don't feel cheated that you didn't do long division
| by hand to prove your understanding. Writing is no different. The
| value isn't in how much you sweated while producing it. The value
| is in how clear the final output is.
|
| Human communication is lossy. I think X, I write X' (because I'm
| imperfect), you understand Y. This is where so many
| misunderstandings and workplace conflicts come from. People
| overestimate how clear they are. LLMs help reduce that gap. They
| remove ambiguity, clean up grammar, and strip away the accidental
| noise that gets in the way of the actual point.
|
| Ultimately, outside of fiction and poetry, writing is data
| transmission. I don't need to know that the writer struggled with
| the text. I need to understand the point clearly, quickly, and
| without friction. Using a tool that delivers that is the highest
| form of respect for the reader.
| growse wrote:
| > The value is in how clear the final output is.
|
| Clarity is useless if it's inaccurate.
|
| Excel is deterministic. ChatGPT isn't.
| kstrauser wrote:
| While I understand the point you're making, the idea that
| Excel is deterministic is not commonly shared among Excel
| experts. It's all fun and games until it guesses that your
| 10th separator value, "SEP-10", is a date.
| grufkork wrote:
| I think the main problem is people using the tool badly and not
| producing concise material. If what they produced was really
| lean and correct it'd be great, but you grow a bit tired when
| you have to expend time reviewing and parsing long, winding and
| straight wrong PRs and messages from _people_ who have not put
| in the time.
| mft_ wrote:
| I'm with you, and further, I'd apply this (with some caveats)
| to images created by generative AI too.
|
| I've come across a lot of people recently online expressing
| anger and revulsion at any images or artwork that have been
| created by genAI.
|
| For relatively mundane purposes, like marketing materials, or
| diagrams, or the sort of images that would anyway be sourced
| from a low-cost image library, I don't think there's an
| inherent value to the "art", and don't see any problem with
| such things being created via genAI.
|
| Possible consequences:
|
| 1) Yes, this will likely lead to loss/shifts in employment, but
| wasn't progress ever like this? People have historically
| reacted strongly against many such shifts when advancing
| technology threatens some sector, but somehow we always figure
| it out and move on.
|
| 2) For genuine art, I suspect this will in time lead to a
| _greater_ value being placed in demonstrably human-created
| originals. Related, there's probably of money to be made by
| whoever can create a trusted system somehow capturing proof of
| human work, in a way that can't be cheated or faked.
| Libidinalecon wrote:
| Totally agree. The output is what matters.
|
| At this point, who really cares what the person who sees
| everything as "AI slop" thinks?
|
| I would rather just interact with Gemini anyway. I don't need
| to read/listen to the "AI slop hunter" regurgitate their social
| media feed and NY Times headlines back to me like a bad
| language model.
| Yeask wrote:
| If the output is what matters by definition using a non
| deterministic does not sound like a good idea.
| throw4847285 wrote:
| Something only a bad writer would write.
| rcxdude wrote:
| I think often, though, people use LLMs as a substitute for
| thinking about what they want to express in a clear manner. The
| result is often a large document which locally looks reasonable
| and well written but overall doesn't communicate a coherant
| point because there wasn't one expressed to the LLM to begin
| with, and even a good human writer can only mind-read so much.
| MobiusHorizons wrote:
| The point made in the article was about social contract, not
| about efficacy. Basically if you use an llm in such a way that
| the reader detects the style, you lose the trust of the reader
| that you as the author rigorously understand what has been
| written, and the reader loses the incentive pay attention
| easily.
|
| I would extend the argument further to say it applies to lots
| of human generated content as well. Especially sales and
| marketing information which similarly elicit very low trust.
| cvcderringer wrote:
| I had trouble getting past the Early Modern English tinge of the
| language used in this. It's fun, but it distracts from the
| comprehension in attempt to just sound epic. It's fine if you're
| writing literature, but it comes off sounding uppity in a
| practical doc for devs. Writing is not just about conveying
| something in a mood you wish to set. Study how Richard Feynman
| and Warren Buffett communicated to their audiences; part of their
| success is that they speak to their people in the language all
| can easily understand.
| batney wrote:
| Here it is, rewritten in accessible English:
|
| Using Large Language Models (LLMs) at Oxide
|
| This document explains how we should think about using LLMs
| (like ChatGPT or similar tools) at Oxide.
|
| What are LLMs?
|
| LLMs are very advanced computer programs that can understand
| and generate text. They've become a big deal in the last five
| years and can change how we work. But, like any powerful tool,
| they have good and bad sides. They are very flexible, so it's
| hard to give strict rules about how to use them. Still, because
| they are changing so fast, we need to think carefully about
| when and how we use them at Oxide.
|
| What is Important When Using LLMs
|
| We believe using LLMs should follow our core values:
|
| Responsibility:
|
| We are responsible for the work we produce. Even if we use an
| LLM to help, a human must make the final decisions. The person
| using the LLM is responsible for what comes out.
|
| Rigor (Care and Precision):
|
| LLMs can help us think better or find mistakes, but if we use
| them carelessly, they can cause confusion. We should use them
| to improve our work, not to cut corners.
|
| Empathy:
|
| Remember, real people read and write what we produce. We should
| be kind and respectful in our language, whether we are writing
| ourselves or letting an LLM help.
|
| Teamwork:
|
| We work as a team. Using LLMs should not break trust among team
| members. If we tell others we used an LLM, it might seem like
| we're avoiding responsibility, which can hurt trust.
|
| Urgency (Doing Things Quickly):
|
| LLMs can help us work faster, but we shouldn't rush so much
| that we forget responsibility, care, and teamwork. Speed is
| good, but not at the cost of quality and trust.
|
| How We Use LLMs
|
| LLMs can be used in many ways. Here are some common uses:
|
| 1. As Readers
|
| LLMs are great at quickly understanding documents, summaries,
| or answering questions about texts.
|
| Important: When sharing documents with an LLM, make sure your
| data is private. Also, remember that uploading files might
| allow the LLM to learn from your data unless you turn that off.
|
| Note: Use LLMs to help understand documents, but don't skip
| reading them yourself. LLMs are tools, not replacements for
| reading carefully.
|
| 2. As Editors
|
| LLMs can give helpful feedback on writing, especially after
| you've written a draft. They can suggest improvements in
| structure and wording.
|
| Caution: Sometimes, LLMs may flatter your work too much or
| change your style if used too early. Use them after you've done
| some work yourself.
|
| 3. As Writers
|
| LLMs can write text, but their writing can be basic or obvious.
| Sometimes, they produce text that shows it was made by a
| machine.
|
| Why be careful? If readers see that the writing is from an LLM,
| they might think the author didn't put in enough effort or
| don't truly understand the ideas.
|
| Our rule: Usually don't let LLMs write your final drafts. Use
| them to help, but own your words and ideas.
|
| 4. As Code Reviewers
|
| LLMs can review code and find problems, but they can also miss
| issues or give bad advice. Use them as a helper, not a
| replacement for human review.
|
| 5. As Debuggers
|
| LLMs can sometimes help find solutions to tricky problems. They
| might give helpful hints. But don't rely on them too much--use
| them as a second opinion.
|
| 6. As Programmers
|
| LLMs are very good at writing code, especially simple or
| experimental code. They can be useful for quick tasks like
| writing tests or prototypes.
|
| Important: When an LLM writes code, the person responsible must
| review it carefully. Responsibility for the code stays with the
| human.
|
| Teamwork: If you use an LLM to generate code, make sure you
| understand and review it yourself first.
|
| How to Use LLMs Properly
|
| There are detailed guidelines and tips in the internal document
| called "LLMs at Oxide."
|
| In general:
|
| Using LLMs is encouraged, but always remember your
| responsibilities--to your product, your customers, and your
| team.
| MobiusHorizons wrote:
| What do you mean? The document seemed incredibly digestible to
| me.
|
| Are you speaking about words like "shall"? I didn't notice
| them, but In RFCs those are technical terms which carry precise
| meaning.
| dcre wrote:
| Feynman at the 1965 Nobel banquet: "Each joy, though transient
| thrill, repeated in so many places amounts to a considerable
| sum of human happiness. And, each note of affection released
| thus one upon another has permitted me to realize a depth of
| love for my friends and acquaintances, which I had never felt
| so poignantly before."
|
| https://www.nobelprize.org/prizes/physics/1965/feynman/speec...
| petetnt wrote:
| Funny how the article states that "LLMs can be excellent editors"
| and then the post repeats all the mistakes that no editor would
| make:
|
| 1. Because reading posts like this 2. Is actually frustrating as
| hell 3. When everything gets dragged around and filled with
| useless anecdotes and 3 adjective mumbojumbos and endless
| emdashes -- because somehow it's better than actually just
| writing something up.
|
| Which just means that people in tech or in general have no
| understanding what an editor does.
| xondono wrote:
| Funnily enough, the text is so distinctively Cantrillian that I
| have no doubts this is 100% an "organic intelligence" product.
| CerryuDu wrote:
| > LLMs are superlative at reading comprehension, able to process
| and meaningfully comprehend documents effectively instantly.
|
| I couldn't disagree more. (In fact I'm shocked that Bryan
| Cantrill uses words like "comprehension" and "meaningfully" in
| relation to LLMs.)
|
| Summaries provided by ChatGPT, conclusions drawn by it, contain
| exaggerations and half-truths that are _NOT_ there in the actual
| original sources, if you bother enough to ask ChatGPT for those,
| and to read them yourself. If your question is only slightly
| suggestive, ChatGPT 's tuning is all too happy to tilt the
| summary in your favor; it tells you what you seem to want to
| hear, based on the phrasing of your prompt. ChatGPT presents,
| using confident and authoritative language, total falsehoods and
| deceptive half-truths, after parsing human-written originals, be
| the latter natural language text, or source code. I now only
| trust ChatGPT to recommend sources to me, and I read those --
| especially the relevant-looking parts -- myself. ChatGPT has been
| tuned by its masters to be a lying sack of shit.
|
| I've recently asked ChatGPT a factual question: I asked it about
| the identity of a public figure (an artist) whom I had seen in a
| video on youtube. ChatGPT answered with "Person X", and even
| explained why Person X's contribution was so great to the piece
| of art in question. I knew the answer was wrong, so I retorted
| only with: "Source?". Then ChatGPT apologized, and did the _exact
| same thing_ , just with "Person Y"; again explaining why Person Y
| was so influental in making that piece of art so great. I knew
| the answer was wrong _still_ , so I again said: "Source?". And at
| third attempt, ChatGPT finally said "Person Z", with a verifiable
| reference to a human-written document that identified the artist.
|
| _FUCK_ ChatGPT.
| OptionOfT wrote:
| > Wherever LLM-generated code is used, it becomes the
| responsibility of the engineer. As part of this process of taking
| responsibility, self-review becomes essential: LLM-generated code
| should not be reviewed by others if the responsible engineer has
| not themselves reviewed it.
|
| I think the review by the prompt writer should be at a higher
| level than another person who reviews the code.
|
| If I know how to do something, it is easier for me to avoid
| mistakes while doing it. When I'm reviewing it it requires
| different pathways in my brain. Since there is code out there I'm
| drawn to that path, and I might not not always spot the problem
| points. Or code might be written in a way that I don't recognize,
| but still exhibits the same mistake.
|
| In the past, as a reviewer I used to be able to count on my
| colleagues' professionalism to be a moat.
|
| The size of the moat is inverse to the amount of LLM generated
| code in a PR / project. At a certain moment you can no longer
| guarantee that you stand behind everything.
|
| Combine that with the push to do more faster, with less, meaning
| we're increasing the amount of tech debt we're taking on.
| leecommamichael wrote:
| What is the downside of using them to prototype? to generate
| throwaway code? What do we lose if we default to that behavior?
| j2kun wrote:
| Read the article, which discusses this already, and maybe
| respond to that.
| gpm wrote:
| Time wasted on failed prototypes? Understanding that could have
| been generated by the act of prototyping?
|
| Doesn't mean you shouldn't ever do so, but there are tradeoffs
| that become obvious as soon as you start attempting it.
| davexunit wrote:
| > Large language models (LLMs) are an indisputable breakthrough
| of the last five years
|
| Actually a lot people dispute this and I'm sure the author knows
| that!
___________________________________________________________________
(page generated 2025-12-07 23:01 UTC)