[HN Gopher] I trusted an LLM, now I'm on day 4 of an afternoon p...
___________________________________________________________________
I trusted an LLM, now I'm on day 4 of an afternoon project
Author : nemofoo
Score : 85 points
Date : 2025-01-27 21:37 UTC (1 hours ago)
(HTM) web link (nemo.foo)
(TXT) w3m dump (nemo.foo)
| BigParm wrote:
| LLM == WGCM = Wild Goose Chase Model
| tanseydavid wrote:
| I ask this question without a hint of tone or sarcasm. You said:
| "*it's a junior dev faking competence. Trust it at your own
| risk.*" My question is simply: "wouldn't you expect to personally
| be able to tell that a human junior dev was faking competence?"
| Why should it be different with the LLM?
| latexr wrote:
| Obviously, it depends on context. When talking to someone live
| you can pick up on subtle hints such as tone of voice, or where
| they look, or how they gesticulate, or a myriad other signals
| which give you a hint to their knowledge gaps. If you're
| communicating via text, the signals change. Furthermore, as you
| interact with people more often you understand them better and
| refine your understanding of them. LLMs always forget and
| "reset" and are in flux. They aren't as consistent. Plus, they
| don't grow with you and pick up on _your_ signals and wants.
|
| It's incredibly worrying that it needs to be explained again
| and again that LLMs are different from people, do not behave
| like people, and should not be compared to people or interacted
| like people, because they are not people.
| appleorchard46 wrote:
| Interestingly your description of social cues you expect to
| pick up on are the exact sort of social cues I struggle with.
| If someone says something, generally speaking I expect it to
| be true unless there is an issue with it that suggests
| otherwise.
|
| I suppose the wide range of negative and positive experiences
| people seem to have working with LLMs is related to the wide
| range of expectations people have for their interactions in
| general.
| layer8 wrote:
| Not instantly. You'd give the human junior dev the benefit of
| the doubt at first. But when it becomes clear that the junior
| dev is faking competence all the time (that might take longer
| than the four days in TFA -- yes I know it's not exactly
| comparable, just saying) and won't stop with that and start
| being honest instead, you'd eventually let them go, because
| that's no way to work with someone.
| zitterbewegung wrote:
| I have used LLMs as a tool and I start to "give up" working with
| it after a few tries. It excels at simple tasks, boilerplate, or
| scripts but larger programs you really have to know what exactly
| you want to do.
|
| I do see the LLMs ingesting more and more documentation and
| content and they are improving at giving me right answers. Almost
| two years ago I don't believe they had every python package
| indexed and now they appear to have at least the documentation or
| source code of it.
| XorNot wrote:
| The trouble is the only reliable use-case LLMs actually seem
| good at is "augmented search engine". Any attempts at coding
| with them just end up feeling like trying to code via a worse
| interface.
|
| So it's handy to get a quick list of "all packages which do X",
| but it's worse then useless to have it speculate as to which
| one to use or why, because of the hallucination problem.
| anigbrowl wrote:
| I've found them tobe quite a time saver, within limits. The blog
| post seemed scattered and disorganized to me, and the author
| admits having no experience with using LLMs to this end, so
| perhaps the problem lies behind their eyes.
| addaon wrote:
| There's not much actual LLM-generated text in this post to go by,
| but it seems like each of the tokens generated by the LLM would
| be reasonable to have high probability. It sounds like the
| developer here thought that the sequence of tokens then carried
| meaning, where instead any possible meaning came from the
| reading. I wonder if this developer would be as irritated by the
| inaccuracies if they had cast sticks onto the ground to manage
| their stock portfolio and found the prophecy's "meaning" to be
| plausible but inaccurate.
| pieix wrote:
| > AI isn't a co-pilot; it's a junior dev faking competence. Trust
| it at your own risk.
|
| This is a good take that tracks with my (heavy) usage of LLMs for
| coding. Leveraging productive-but-often-misguided junior devs is
| a skill every dev should actively cultivate!
| codr7 wrote:
| What you're doing is sacrificing learning for speed.
|
| Which is fine, if it's a conscious choice for yourself.
| piva00 wrote:
| I don't think GP was talking about themselves being a junior
| using LLMs, at least my interpretation was that devs should
| learn how to leverage misguided junior, and LLMs are more-or-
| less on the level of a misguided junior.
|
| Which I completely agree, I use LLMs for the cases where I do
| know what I'm trying to do, I just can't remember some exact
| detail that would require reading documentation. It's much
| quicker to leverage a LLM rather than going on a wild goose
| chase of the piece of information I know exists.
|
| Also it's a pretty good tool to scaffold the boring stuff,
| asking a LLM "generate test code for X asserting A, B, and C"
| and editing it to be a proper test frees up mental space for
| more important stuff.
|
| I wouldn't trust a LLM to generate any kind of business
| logic-heavy code, instead I use it as a quite smart
| template/scaffold generator.
| nrb wrote:
| > Leveraging productive-but-often-misguided junior devs is a
| skill every dev should actively cultivate!
|
| Feels like this is only worthwhile because the junior dev
| learns from the experience; an investment that yields benefits
| all around, in the broad sense. Nobody wants a junior around
| that refuses to learn in perpetuity, serving only as a drag on
| productivity and eventually your sanity.
| stuaxo wrote:
| The junior dev faking competence is useful but needs a lot of
| supervision (unlike a real junior dev we don't know if this one
| will get better).
| tacoooooooo wrote:
| the "AI lies" takeaway is way off for those actually using these
| tools. Calling it a "junior dev faking competence" is catchy, but
| misses the point. We're not expecting a co-pilot, it's a tool, a
| super-powered intern that needs direction. The spaghetti code
| mess wasn't AI "lying", it was a lack of control and proper
| prompting.
|
| Experienced folks aren't surprised by this. LLMs are fast for
| boilerplate, research, and exploring ideas, but they're not
| autonomous coders. The key is you staying in charge: detailed
| prompts, critical code review, iterative refinement. Going back
| to web interfaces and manual pasting because editor integration
| felt "too easy" is a massive overcorrection. It's like ditching
| cars for walking after one fender bender.
|
| Ultimately, this wasn't an AI failure, it was an inexperienced
| user expecting too much, too fast. The "lessons learned" are
| valid, but not AI-specific. For those who use LLMs effectively,
| they're force multipliers, not replacements. Don't blame the tool
| for user error. Learn to drive it properly.
| cruffle_duffle wrote:
| "Experienced folks" in this case means folks who've used LLM's
| enough to somewhat understand how to "feed them" in ways that
| make the tools generate productive output.
|
| Learning to properly prompt an LLM to get a net gain in value
| is a skill in it of itself.
| latexr wrote:
| > We're not expecting a co-pilot
|
| Microsoft's offering is literally called "copilot". That is
| exactly what they're marketing it as.
| potsandpans wrote:
| Counterexample: Ive been able to complete more side projects in
| the last month leveraging llms than i have ever in my life. One
| of which I believe to have potential as a viable product, and
| another which involved complicated rust `no_std` and linker setup
| for compiling rust code onto bare metal RISCV from scratch.
|
| I think the key to being successful here is to realize that
| you're still at the wheel as an engineer. The llm is there to
| rapidly synthesize the universe of information.
|
| You still need to 1) have solid fundamentals in order to have an
| intuition against that synthesis, and 2) be experienced enough to
| translate that synthesis into actionable outcomes.
|
| If youre lacking in either, youre at the same whims of copypasta
| that have always existed.
| mythrwy wrote:
| I've had both experiences strangely enough.
| talldayo wrote:
| > which involved complicated rust `no_std` and linker setup for
| compiling rust code onto bare metal RISCV from scratch.
|
| That's complicated, but I wouldn't say the resulting software
| is complex. You gave an LLM a repetitive, translation-based
| job, and you got good results back. I can also believe that an
| LLM could write up a dopey SAAS in half the time it would take
| a human to do the same.
|
| But having the right parameters only takes you _so_ far. Once
| you click generate, you are trusting that the model has some
| familiarity with your problem and can guide you without needing
| assistance. Most people I 've seen rely entirely on linting and
| runtime errors to debug AI code, not "solid fundamentals" that
| can fact-check a problem they needed ChatGPT to solve first
| place. And the "experience" required to iterate and deploy AI-
| generated code basically boils down to your copy-and-paste
| skills. I like my UNIX knowledge, but it's not a big enough
| gate to keep out ChatGPT Andy and his cohort of enthusiastic
| morons.
|
| We're going to see thousands of AI-assisted success stories
| come out of this. But we already had those "pennies on the
| dollar" success stories from hiring underpaid workers out of
| India and Pakistan. AI will not solve the unsolved problems of
| our industry and in many ways it will exacerbate the
| preexisting issues.
| baxtr wrote:
| Is it reasonable to assume that more senior devs benefit more
| from LLMs?
| dogma1138 wrote:
| It depends it think it's less about how senior they are and
| how good they are at writing requirements, and knowing what
| directives should be explicitly stated and what can be safely
| inferred.
|
| Basically if they are good at utilizing junior developers and
| interns or apprentices they probably will do well with an LLM
| assistant.
| dogma1138 wrote:
| Indeed LLMs are useful as an intern, they are at the "cocky
| grad" stage of their careers. If you don't understand the
| problem and can't steer the solution and worse has only limited
| understanding of the code they produce you are unlikely to be
| productive.
|
| On the other hand if you understand what needs to be done, and
| how to direct the work the productivity boost can be massive.
|
| Claude 3.5 sonnet and O1 are awesome at code generation even
| with relatively complex tasks and they have a long enough
| context and attention windows that the code they produce even
| on relatively large projects can be consistent.
|
| I also found a useful method of using LLMs to "summarize" code
| in an instructive manner which can be used for future prompts.
| For example summarizing a large base class that may be reused
| in multiple other classes can be more effective than having to
| overload a large part of your context window with a bunch do
| code.
| nsavage wrote:
| Funny enough, I posted an article I wrote here yesterday with the
| same sort of thesis. Different technologies (mine was Docker) but
| same idea of LLM leading me astray and causing a lot of
| frustration
| powerset wrote:
| I've had a similar experience, shipping new features at
| incredible speed, then waste a ton of time going down the wrong
| track trying to debug something because the LLM gave me a
| confidently wrong solution.
| williamcotton wrote:
| Well that's kind of on you for not noticing that it was the
| wrong solution, isn't it?
| cruffle_duffle wrote:
| I think the parents post happened to everybody, and if it
| hasn't it will.
|
| The edge between being actually more productive or just
| "pretend productive" using large language models is something
| that we all haven't completely figured out yet.
| trinix912 wrote:
| Sometimes the solution is 99% correct but the other 1% is so
| subtly wrong that it both doesn't work and is a debugging
| hell.
| mythrwy wrote:
| Ya but you kind of get painted in corner sometimes. And
| sunken cost fallacy kicks in.
| nyarlathotep_ wrote:
| often it's something you casually overlook, some minor
| implementation detail that you didn't give much thought to
| that ends up being a huge mess later on, IME
| KronisLV wrote:
| In my experience LLMs will help you with things that have been
| solved thousands of times before and are just a matter of finding
| some easily researched solution.
|
| The very moment when you try to go off the beaten path and do
| something unconventional or stuff that most people won't have
| written a lot about, it gets more tricky. Just consider how many
| people will know how to configure some middleware in a Node.js
| project... vs most things related to hardware or low level work.
| Or even working with complex legacy codebases that have bits of
| code with obscure ways of interacting and more levels of
| abstraction that can be reasonably put in context.
|
| Then again, if an LLM gets confused, then a person might as well.
| So, personally I try to write code that'd be understandable by
| juniors and LLMs alike.
| winocm wrote:
| In my experience, a LLM decided to not know type alignment
| rules in C and confidently trotted out the wrong answer. It
| left a horrible taste in my mouth for the one time I decided to
| look at using a LLM for anything and it keeps leaving me
| wondering if I'd end up more time bashing the LLM into working
| than just working out the answer myself and learning the
| underlying reasoning why.
|
| It was so wrong that I wonder what version of the C standard it
| was even hallucinating.
| NitpickLawyer wrote:
| > vs most things related to hardware or low level work.
|
| counter point:
|
| https://github.com/ggerganov/llama.cpp/pull/11453
|
| > This PR provides a big jump in speed for WASM by leveraging
| SIMD instructions for qX_K_q8_K and qX_0_q8_0 dot product
| functions.
|
| > Surprisingly, 99% of the code in this PR is written by
| DeekSeek-R1. The only thing I do is to develop tests and write
| prompts (with some trials and errors)
| alfalfasprout wrote:
| A single PR doesn't really "prove" anything. Optimization
| passes on well-tested narrowly scoped code are something that
| LLMs are already pretty good at.
| ryandrake wrote:
| I use CoPilot pretty much as a smarter autocomplete that can
| sometimes guess what I'm planning to type next. I find it's not
| so good at answering prompts, but if I type:
| r = (rgba >> 24) & 0xff;
|
| ...and then pause, it's pretty good at guessing:
| g = (rgba >> 16) & 0xff; b = (rgba >> 8) & 0xff;
| a = rgba & 0xff;
|
| ... for the next few lines. I don't really ask it to do more
| heavy lifting than that sort of thing. Certainly nothing like
| "Write this full app for me with these requirements [...]"
| anaisbetts wrote:
| Context matters a lot, copy-pasting snippets to a webpage is
| _way_ less effective than Cursor/Windsurf.
| cmdtab wrote:
| Today, I needed to write a proxy[0] that wraps an object and log
| all method calls recursively.
|
| I asked claude to write the initial version. It came up with a
| complicated class based solution. I spent more than 30 minutes
| getting a good abstract to come out. I was copy pasting
| typescript errors and applying fixes it suggested without
| thinking much.
|
| In the end, I gave up and wrote what I wanted myself in 5
| minutes.
|
| 0] https://github.com/cloudycotton/browser-
| operator/blob/main/s...
| transcriptase wrote:
| I've been able to do more far complex things with ESP32s and RPis
| in an evening without knowing the first thing about python or
| c++.
|
| I can also tell when it's stuck in some kind of context swamp and
| won't be any more help, because it will just keep making the same
| stupid mistakes over and over and generally forgetting past
| instructions.
|
| At that point I take the last working code and paste it into a
| new chat.
| thot_experiment wrote:
| As opposed to not trusting an LLM, and ending up on day 4 of an
| afternoon project? :P
|
| I've been doing that since way before LLMs were a thing.
| ravroid wrote:
| I've found LLMs most useful for spinning up prototypes. But I'm
| able to offload less tasks to the LLM as the project grows in
| complexity and size.
|
| One strategy I've been experimenting with is maintaining a 'spec'
| document, outlining all of the features and any relevant
| technical notes. I include the spec with all of the relevant
| source files in my prompt before asking the LLM to implement a
| new change or feature. This way it doesn't have to do as much
| guessing as to what my code is doing, and I can avoid relying on
| long-running conversations to maintain context. Instead, for each
| big change I include an up-to-date spec and all of the relevant
| source files.
|
| I use an NPM script to automate concatenating the spec + source
| files + prompt, which I then copy/paste to o1. So far this has
| been working somewhat reliably for the early stages of a project
| but has diminishing returns.
| mordymoop wrote:
| I am frankly tired of seeing this kind of post on HN. I feel like
| the population of programmers is bifurcating into those who are
| committed to mastering these tools, learning to work around their
| limitations and working to leverage their strengths... and those
| who are committed to complaining about how they aren't already
| perfect Culture Ship Minds.
|
| We get it. They're not superintelligent at everything yet. They
| couldn't infer what you must've really meant in your heart from
| your initial unskillful prompt. They couldn't foresee every
| possible bug and edge case from the first moment of
| conceptualizing the design, a flaw which I'm sure you don't have.
|
| The thing that pushes me over the line into ranting territory is
| that _computer programmers_ , of all people, should know that
| _computers do what you tell them to._
___________________________________________________________________
(page generated 2025-01-27 23:00 UTC)