[HN Gopher] How I use LLMs as a staff engineer
___________________________________________________________________
How I use LLMs as a staff engineer
Author : gfysfm
Score : 135 points
Date : 2025-02-04 20:49 UTC (2 hours ago)
(HTM) web link (www.seangoedecke.com)
(TXT) w3m dump (www.seangoedecke.com)
| stuartd wrote:
| > is this idiomatic C?
|
| This is how I use AI at work for maintaining Python projects, a
| language in which I am not at all really versed. Sometimes I
| might add "this is how I would do it in ..., how would I do this
| in Python?"
|
| I find this extremely helpful and productive, especially as I
| have to pull the code onto a server to test it.
| synthc wrote:
| This year I switched to a new job, using programming languages
| that I was less familiar with.
|
| Asking a LLM to translate between languages works really well
| most of the time. It's also a great way to learn which
| libraries are the standard solution for a language. It really
| accelerated my learning process.
|
| Sure, there is the occasional too literal translation or
| hallucination, but I found this useful enough.
| brianstrimp wrote:
| Have you noticed any difference in picking up the language(s)
| yourself? As in, do you think you'd be more fluent in it by
| now without all the help? Or perhaps less? Genuine question.
| mewpmewp2 wrote:
| I do tons of TypeScript in my side projects and in real
| life, and I usually feel heavy frustrations when I stray
| away.
|
| When I stray out of this (e.g. I started doing a lot of
| IoT, ML and Robotics projects, where I can't always use
| TypeScript). I think one key thing that LLMs have helped me
| is that I can ask why something is X without having to
| worry about sounding stupid or annoying.
|
| So I think it has enabled me at least a way to get out of
| the TypeScript zone more worry free without losing
| productivity. And I do think I learn a lot, although I'm
| relating a lot of it on my JS/TS heavy experience.
|
| To me the ability to ask stupid questions without fear of
| judgment or accidentally offending someone - it's just
| amazing.
|
| I used to overthink a lot before LLMs, but they have helped
| me with that aspect, I think a lot.
|
| I sometimes think that no one except LLMs would have the
| patience for me if I didn't filter my thoughts always.
| foobazgt wrote:
| I wonder if the first bullet point, "smart auto complete", is
| much less beneficial if you're already using a statically typed
| language with a good IDE. I already feel like Intellij's auto
| complete reads my mind most of the time.
| Klathmon wrote:
| LLM autocomplete is an entirely different beast.
|
| Traditional auto complete can finish the statement you started
| typing, LLMs often suggest whole lines before I even type
| anything, and even sometimes whole functions.
|
| And static types can assist the LLM too. It's not like it's an
| either or choice
| foobazgt wrote:
| The author says they do the literal opposite:
|
| "Almost all the completions I accept are complete boilerplate
| (filling out function arguments or types, for instance). It's
| rare that I let Copilot produce business logic for me"
|
| My experience is similar, except I get my IDE to complete
| these for me instead of an LLM.
| neeleshs wrote:
| I use LLM to generate complete solutions to small technical
| problems. "Write an input stream implementation that skips
| lines based on a regex".
|
| Hard for an IDE auto complete to do this.
| AOsborn wrote:
| Yeah absolutely.
|
| I find Copilot is great if you add a small comment
| describing the logic or function. Taking 10s to write a
| one line sentence in English can save 5-10 mins writing
| your code from scratch. Subjectively it feels much faster
| to QA and review code already written.
|
| Having good typing and DTOs helps too.
| baq wrote:
| > Copilot
|
| ...needn't say more.
|
| Copilot was utter garbage when I switched to cursor+claude,
| it was like some alien tech upgrade at first.
| unregistereddev wrote:
| Does IntelliJ try to complete the word you are typing, or does
| it suggest an entire line of code? Because newer versions of
| IntelliJ incorporate LLM's to beef up autocomplete. You may
| already be using it.
| foobazgt wrote:
| The new LLM-based completion in Intellij is not useful. :(
| mrguyorama wrote:
| I know I'm not using it because Intellij is constantly
| complaining that my version does not support the AI plugins.
|
| The "dumb" autogenerated stuff is incredible. It's like going
| from bad autocomplete to Intellisense all over again.
|
| The world of python tooling (at least as used by my former
| coworkers) put my expectations in the toilet.
| hansvm wrote:
| It's completely different. If I start writing an itertools
| library with comptime inlining and my favorite selection of
| other features, completing map/reduce/take/skip/... exactly how
| I want them to look, LLM autocomplete can finish the rest of
| the library exactly as would have written it even for languages
| it doesn't otherwise know well, outside of the interesting bits
| (in the context of itertools, that'd be utilities with memory
| tradeoffs, like tee and groupby).
| delduca wrote:
| > Disclaimer: I work for GitHub, and for a year I worked directly
| on Copilot.
|
| Ah, now it makes sense.
| brianstrimp wrote:
| Yeah, the submission heading should indicate that there is a
| high risk for a sales pitch in there.
| arcticfox wrote:
| > How I use LLMs as a staff engineer
|
| With all the talk of o1-pro as a superb staff engineer-level
| architect, it took me awhile to re-parse this headline to
| understand what the author, apparently a staff engineer, meant
| iamwil wrote:
| I'd been using 4o and 3o to read research papers and ask about
| topics that are a little bit out of my depth for a while now. I
| get massive amount of value out of that. What used to take me a
| week of googling and squinting at wikipedia or digging for
| slightly less theoretical blog posts, I get to just upload a
| paper or transcript of a talk and just keep asking it questions
| until I feel like I got all the insights and ah-ha moments.
|
| At the end, I ask it to give me a quiz on everything we talked
| about and any other insights I might have missed. Instead of
| typing out the answers, I just use Apple Dictation to transcribe
| my answers directly.
|
| It's only recently that I thought to take the conversation I just
| had, and have it write a blog post of the insights and ah-ha
| moments I had, and have it write a blog post. It takes a fair bit
| of curation to get it to do that, however. I can't just say,
| "write me a blog post on all we talked about". I have to first
| get it to write an outline with the key insights. And then based
| on the outline, write each section. And then I'll use chatgpt's
| canvas to guide and fine-tune each section.
|
| However, at no point do I have to specifically write the actual
| text. I mostly do curation.
|
| I feel ok about doing this, and don't consider it AI slop,
| because I clearly mark at the top that I didn't write a word of
| it, and it's the result of a curated conversation with 4o. In
| addition, I think if most people do this as a result of their own
| Socratic methods with an AI, it'd build up enough training data
| for next generation of AI to do a better job of writing
| pedagogical explanations, posts, and quizzes to get people
| learning topics that are just out of reach, but there hadn't been
| too many people able to bridge the gap.
|
| The two I had it write are: Effects as Protocols and Contexts as
| Agents: https://interjectedfuture.com/effects-as-protocols-and-
| conte...
|
| How free monads and functors represent syntax for algebraic
| effects: https://interjectedfuture.com/how-the-free-monad-and-
| functor...
| brianstrimp wrote:
| "as a staff engineer"
|
| Such an unnecessary flex.
| simonw wrote:
| It's entirely relevant here. The opinions of a staff engineer
| on this stuff should be interpreted very differently from the
| opinions of a developer with much less experience.
| phist_mcgee wrote:
| Reminds me of TechLead
|
| https://www.youtube.com/watch?v=AbUU-D2Hil0
| twalla wrote:
| Not really, once you get past senior the "shape" of staff+
| engineers varies greatly. At that level the scope is typically
| larger which can limit the usefulness of LLMs - I'd agree that
| the greatest value I've gotten is from being able to quickly
| get up to speed on something I've been asked to have an opinion
| on and sanity checking my work if I'm using an unfamiliar
| language or framework.
|
| It also helps if you realize staff+ is just a way to
| financially reward people who don't want to be managers so you
| end up with these unholy engineer/architect/project
| manager/product manager hybrids that have to influence without
| authority.
| jppope wrote:
| My experience is similar: great for boilerplate, great for
| autocomplete, starts to fall apart on complex tasks, doesn't do
| much as far as business logic (how would it know?)- All in all
| very useful, but not replacing a decent practitioner any time
| soon.
|
| LLMs can absolutely bust out some corporate docs super crazy fast
| too... probably a reasonable thing to re-evaluate the value
| though
| callamdelaney wrote:
| My experience of copilot is that it's completely useless and
| almost completely incapable of anything. 4o is reasonable though.
| ddgflorida wrote:
| You summed it up well and your experience matches mine.
| nvarsj wrote:
| I'm trying to understand the point of the affix "as a staff
| engineer", but I cannot.
| piuantiderp wrote:
| Anytime you read "as an X", spidey senses should tingle and be
| careful. Caveat lector
| pgm8705 wrote:
| I used to feel they just served as a great auto complete or stack
| overflow replacement until I switched from VSCode to Cursor.
| Cursor's agent mode with Sonnet is pretty remarkable in what it
| can generate just from prompts. It is such a better experience
| than any of the AI tools VSCode provides, imo. I think tools like
| this when paired with an experienced developer to guide it and
| oversee the output can result in major productivity boosts. I
| agree with the sentiment that it falls apart with complex tasks
| or understanding unique business logic, but do think it can take
| you far beyond boilerplate.
| t8sr wrote:
| I guess I'm officially listed as a "staff engineer". I have been
| at this for 20 years, and I work with multiple teams in pretty
| different areas, like the kernel, some media/audio logic,
| security, database stuff... I end up alternating a lot between
| using Rust, Java, C++, C, Python and Go.
|
| Coding assistant LLMs have changed how I work in a couple of
| ways:
|
| 1) They make it a lot easier to context switch between e.g.
| writing kernel code one day and a Pandas notebook the next,
| because you're no longer handicapped by slightly forgetting the
| idiosyncrasies of every single language. It's like having smart
| code search and documentation search built into the autocomplete.
|
| 2) They can do simple transformations of existing code really
| well, like generating a match expression from an enum. They can
| extrapolate the rest from 2-3 examples of something repetitive,
| like converting from Rust types into corresponding Arrow types.
|
| I don't find the other use cases the author brings up realistic.
| The AI is terrible at code review and I have never seen it spot a
| logic error I missed. Asking the AI to explain how e.g. Unity
| works might feel nice, but the answers are at least 40% total
| bullshit and I think it's easier to just read the documentation.
|
| I still get a lot of use out of Copilot. The speed boost and
| removal of friction lets me work on more stacks and,
| consequently, lead a much bigger span of related projects.
| Instead of explaining how to do something to a junior engineer, I
| can often just do it myself.
|
| I don't understand how fresh grads can get use out of these
| things, though. Tools like Copilot need a lot of hand-holding.
| You can get them to follow simple instructions over a moderate
| amount of existing code, which works most of the time, or ask
| them to do something you don't exactly know how to do without
| looking it up, and then it's a crapshoot.
|
| The main reason I get a lot of mileage out of Copilot is exactly
| because I have been doing this job for two decades and understand
| what's happening. People who are starting in the industry today,
| IMO, should be very judicious with how they use these tools, lest
| they end up with only a superficial knowledge of computing. Every
| project is a chance to learn, and by going all trial-and-error
| with a chatbot you're robbing yourself of that. (Not to mention
| the resulting code is almost certainly half-broken.)
| chasd00 wrote:
| This is pretty much how i use llm for coding. I already know
| what i want i just don't want to type it out. I ask the llm to
| do the typing for me and then i check it over, copy/paste it
| in, and make any adjustments or extensions.
| toprerules wrote:
| As a fellow "staff engineer" LLMs are terrible at writing or
| teaching how to write idiomatic code, and they are actually
| causing me to spend more time reviewing than I was previously due
| to the influx of junior to senior engineers trying to sneak in
| LLM garbage.
|
| In my opinion, using LLMs to write code comes as a faustian deal
| where you learn terrible practices and rely on code quantity,
| boilerplate, and indeterministic outputs - all hallmarks of poor
| software craftsmanship. Until ML can actually go end to end on
| requirements to product and they fire all of us, you can't cut
| corners on building intuition as a human by forgoing reading and
| writing code yourself.
|
| I do think that there is a place for LLMs in generating ideas or
| exploring an untrusted knowledge base of information, but using
| code generated from an LLM is pure madness unless what you are
| building is truly going to be thrown away and rewritten from
| scratch, as is relying on it as a linting, debugging, or source
| of truth tool.
| doug_durham wrote:
| I've had exactly the opposite experience with generating
| idiomatic code. I find that the models have a lot of
| information on the standard idioms of a particular language. If
| I'm having to write in a language I'm new in, I find it very
| useful to have the LLM do an idiomatic rewrite. I learn a lot
| and it helps me to get up to speed more quickly.
| qqtt wrote:
| I wonder if there is a big disconnect partially due to the
| fact that people are talking about different models. The top
| tier coding models (sonnet, o1, deepseek) are all pretty
| good, but it requires paid subscriptions to make use of them
| or 400GB of local memory to run deepseek.
|
| All the other distilled models and qwen coder and similar are
| a large step below the above models in terms of most
| benchmarks. If someone is running a small 20GB model locally,
| they will not have the same experience as those who run the
| top of the line models.
| baq wrote:
| I've iterated on 1k lines of react slop in 4h the other day,
| changed table components twice, handled errors, loading
| widgets, modals, you name it. It'd take me a couple days easily
| to get maybe 80% of that done.
|
| The result works ok, nobody cares if the code is good or bad.
| If it's bad and there are bugs, doesn't matter, no humans will
| look at it anymore - Claude will remix the slop until it works
| or a new model will rewrite the whole thing from scratch.
|
| Realized during writing this that I should've added the extract
| of requirements in the comment of the index.ts of the package,
| or maybe a README.CURSOR.md.
| mirkodrummer wrote:
| I'd pay to review one of your PRs. Maybe a consistent one
| with ai usage proof.
| baq wrote:
| Would be great comedic relief for sure since I'm mostly
| working in the backend mines, where the LLM-friendly
| boilerplate is harder to come by admittedly.
|
| My defense is that Karpathy does the same thing, admitted
| himself in a tweet
| https://x.com/karpathy/status/1886192184808149383 - I know
| _exactly_ what he means by this.
| mrtesthah wrote:
| My experience having Claude 3.5 Sonnet or Google Gemini 2.0
| Exp-12-06 rewrite a complex function is that it slowly
| introduces slippage of the original intention behind the
| code, and the more rewrites or refactoring, the more likely
| it is to do something other than what was originally
| intended.
|
| At the absolute minimum this should require including the
| function specification in the prompt context and have the
| output sent to a full unit test suite.
| n4r9 wrote:
| > Sometimes the LLMs can't fix a bug so I just work around
| it or ask for random changes until it goes away.
|
| Lordy. Is this where software development is going over the
| next few years?
| aeonik wrote:
| It's actually where we have been the whole time.
| tokioyoyo wrote:
| I will get probably heavily crucified for this, but to people
| who are ideologically opposing AI generated code -- executives,
| directors and managerial staff think the opposite. Being very
| anti-LLM code instead of trying to understand how it can
| improve the speed might be detrimental for your career.
|
| Personally, I'm on the fence. But having conversations with
| others, and some requests from execs to implement different AI
| utils into our processes... making me to be on the safer side
| of job security, rather than dismiss it and be adamant against
| it.
| rectang wrote:
| This has been true for every heavily marketed development aid
| (beneficial or not) for as long as the industry has existed.
| Managing the politics and the expectations of non-technical
| management is part of career development.
| hansvm wrote:
| "Yes, of course, I'm using AI at every single opportunity
| where I think it'll improve my output"
|
| <<never uses AI>>
| feoren wrote:
| > executives, directors and managerial staff think the
| opposite
|
| Executives, directors, and managerial staff have had their
| heads up their own asses since the dawn of civilization.
| Riding the waves of terrible executive decisions is
| unfortunately part of professional life. Executives like the
| idea of LLMs because it means they can lay you off; they're
| not going to care about your opinion on it one way or
| another.
|
| > Being very anti-LLM code instead of trying to understand
| how it can improve the speed might be detrimental for your
| career.
|
| You're making the assumption that LLMs _can_ improve your
| speed. That 's the very assumption being questioned by GP.
| Heaps of low-quality code do not improve development speed.
| simonw wrote:
| I'm willing to stake my reputation on the idea that yes,
| LLMs can improve your speed. You have to learn how to use
| them effectively and responsibly but the productivity
| boosts they can give you once you figure that out are very
| real.
| chefandy wrote:
| Many bosses are willing to stake their subordinates'
| reputations on it, too.
| subw00f wrote:
| I think I'm having the same experience as you. I've heard
| multiple times from execs in my company that "software" will
| have less value and that, in a few years, there won't be as
| many developer jobs.
|
| Don't get me wrong--I've seen productivity gains both in LLMs
| explaining code/ideation and in actual implementation, and I
| use them regularly in my workflow now. I quite like it. But
| these people are itching to eliminate the cost of maintaining
| a dev team, and it shows in the level of wishful thinking
| they display. They write a snake game one day using ChatGPT,
| and the next, they're telling you that you might be too slow
| --despite a string of record-breaking quarters driven by
| successful product iterations.
|
| I really don't want to be a naysayer here, but it's pretty
| demoralizing when these are the same people who decide your
| compensation and overall employment status.
| jondwillis wrote:
| It isn't like you can't write tests or reason about the code,
| iterate on it manually, just because it is generated. You can
| also give examples of idioms or patterns you would like to
| follow. It isn't perfect, and I agree that writing code is the
| best way to build a mental model, but writing code doesn't
| guarantee intuition either. I have written spaghetti that I
| could not hope to explain many times, especially when exploring
| or working in a domain that I am unfamiliar with.
| ajmurmann wrote:
| I described how I liked doing ping-pong pairing TDD with
| Cursor elsewhere. One of the benefits of that approach is
| that I write at least half the implementation and tests and
| review every single line. That means that there is always
| code that follows the patterns I want and it's right there
| for the LLM to see and be its work of.
| scudsworth wrote:
| i love when the llm can be its work of
| axlee wrote:
| What's your stack ? I have the complete opposite experience.
| LLMs are amazing at writing idiomatic code, less so at dealing
| with esoteric use cases.
|
| And very often, if the LLM produces a poopoo, asking it to fix
| it again works just well enough.
| Bjartr wrote:
| > asking it to fix it again works just well enough.
|
| I've yet to encounter any LLM from chatGPT to cursor, that
| doesn't choke and start to repeat itself and say it changed
| code when it didn't, or get stuck changing something back and
| forth repeatedly inside of 10-20 minutes. Like just a handful
| of exchanges and it's worthless. Are people who make this
| workflow effective summarizing and creating a fresh prompt
| every 5 minutes or something?
| simonw wrote:
| One of the most important skills to develop when using LLMs
| is learning how to manage your context. If an LLM starts
| misbehaving or making repeated mistakes, start a fresh
| conversation and paste in just the working pieces that are
| needed to continue.
|
| I estimate a sizable portion of my successful LLM coding
| sessions included at least a few resets of this nature.
| brandall10 wrote:
| It's helpful to view working solutions and quality code as
| separate things to the LLM.
|
| * If you ask it to solve a problem and nothing more, chances
| are the code isn't the best as it will default to the most
| common solutions in the training data.
|
| * If you ask it to refactor some code idiomatically, it will
| apply most common idiomatic concepts found in the training
| data.
|
| * If you ask it to do both at the same time you're more likely
| to get higher quality but incorrect code.
|
| It's better to get a working solution first, then ask it to
| improve that solution, rinse/repeat in smallish chunks of
| 50-100 loc at a time. This is kinda why reasoning models are of
| some benefit, as they allow a certain amount of reflection to
| tie together disparate portions of the training data into more
| cohesive, higher quality responses.
| the_mitsuhiko wrote:
| > but using code generated from an LLM is pure madness unless
| what you are building is truly going to be thrown away and
| rewritten from scratch, as is relying on it as a linting,
| debugging, or source of truth tool.
|
| That does not match my experience at all. You obviously have to
| use your brain to review it, but for a lot of problems LLMs
| produce close to perfect code in record time. It depends a lot
| on your prompting skills though.
| why-el wrote:
| I was hoping the LLM is the staff engineer? can read both ways.
| mvdtnz wrote:
| > What about hallucinations? Honestly, since GPT-3.5, I haven't
| noticed ChatGPT or Claude doing a lot of hallucinating.
|
| See this is what I don't get about the AI Evangelists. Every time
| I use the technology I am astounded at the amount of incorrect
| information and straight up fantasy it invents. When someone
| tells me that they just don't see it, I have to wonder what is
| motivating them to lie. There is simply no way you're using the
| same technology as me with such wildly different results.
| mrguyorama wrote:
| > I have to wonder what is motivating them to lie.
|
| Most of these people who aren't salesmen _aren 't lying_.
|
| They just cannot tell when the LLM is making up code. Which is
| very very sad.
|
| That or they could literally be replaced by a script that
| copy/pastes from stack-overflow. My friend did that a lot and
| it definitely helped features ship but doesn't make
| maintainable code.
| simonw wrote:
| > There is simply no way you're using the same technology as me
| with such wildly different results.
|
| Prompting styles are _incredibly_ different between different
| people. It 's very possible that they are using the same
| technology that you are with wildly different results.
|
| I think learning to use LLMs to their maximum effectiveness
| takes months (maybe even years) of effort. How much time have
| you spent with them so far?
| dlvhdr wrote:
| Another article that doesn't introduce anything new
| softwaredoug wrote:
| I think people mistakenly use LLMs as research tools, thinking in
| terms of search, when they're better as collaborators / co-
| creators of scaffolding you know you need to edit.
| ur-whale wrote:
| This article completely correlates with my so far very positive
| experience of using LLM's to assist me in writing code.
| simonw wrote:
| On last resort bug fixes:
|
| > I don't do this a lot, but sometimes when I'm really stuck on a
| bug, I'll attach the entire file or files to Copilot chat, paste
| the error message, and just ask "can you help?"
|
| The "reasoning" models are MUCH better than this. I've had
| genuinely fantastic results with this kind of thing against o1
| and Gemini Thinking and the new o3-mini - I paste in the whole
| codebase (usually via my https://github.com/simonw/files-to-
| prompt tool) and describe the bug or just paste in the error
| message and the model frequently finds the source, sometimes
| following the path through several modules to get there.
|
| Here's a slightly order example:
| https://gist.github.com/simonw/03776d9f80534aa8e5348580dc6a8... -
| finding a bug in some Django middleware
| powersnail wrote:
| The "attach the entire file" part is very critical.
|
| I've had the experience of seeing some junior dev posting error
| messages into ChatGPT, applying the suggestions of ChatGPT, and
| posting the next error message into ChatGPT again. They ended
| up applying fixes for 3 different kinds of bugs that didn't
| exist in the code base.
|
| ---
|
| Another cause, I think, is that they didn't try to understand
| any of those (not the solutions, and not the problems that
| those solutions are supposed to fix). If they did, they would
| have figured out that the solutions were mismatches to what
| they were witnessing.
|
| There's a big difference between using LLM as a tool, and
| treating it like an oracle.
| n144q wrote:
| Agree with many of the points here, especially the part with one-
| off, non-production code. I had great experience letting ChatGPT
| writing utility code. Once it provided Go code for an ad-hoc task
| which runs exactly as expected on first try, when it could cost
| me at least 30 minutes that's mostly spent on looking up APIs
| that I am not familiar with. Another time it created an HTTP
| server that worked with only minor tweaks. I don't want to think
| about life before LLMs existed.
|
| One thing that is not mentioned -- code review. It is not great
| at it, often pointing out trivial or non issues. But if it finds
| 1 area for improvement out of 10 bullet points, that's still
| worth it -- most human code reviewers don't notice all the issues
| in the code anyway.
| elwillbo wrote:
| I'm in your boat with having to write a significant amount of
| English documents. I always write them myself, and have ChatGPT
| analyze them as well. I just had a thought - I wonder if I could
| paste in technical documentation, and code, to validate my
| documentation? Will have to try that later.
|
| CoPilot is used for simple boilerplate code, and also for the
| autocomplete. It's often a starting point for unit tests (but a
| thorough review is needed - you can't just accept it, I've seen
| it misinterpret code). I started experimenting with RA.Aid
| (https://github.com/ai-christianson/RA.Aid) after seeing a post
| on it here today. The multi-step actions are very promising. I'm
| about to try files-to-prompt (https://github.com/simonw/files-to-
| prompt) mentioned elsewhere in the thread.
|
| For now, LLMs are a level-up in tooling but not a replacement for
| developers (at least yet)
___________________________________________________________________
(page generated 2025-02-04 23:00 UTC)