hngopher.com

       [HN Gopher] As an experienced LLM user, I don't use generative L...
       ___________________________________________________________________
        
       As an experienced LLM user, I don't use generative LLMs often
        
       Author : minimaxir
       Score  : 229 points
       Date   : 2025-05-05 17:22 UTC (5 hours ago)
        
 (HTM) web link (minimaxir.com)
 (TXT) w3m dump (minimaxir.com)
        
       | rfonseca wrote:
       | This was an interesting quote from the blog post: "There is one
       | silly technique I discovered to allow a LLM to improve my writing
       | without having it do my writing: feed it the text of my mostly-
       | complete blog post, and ask the LLM to pretend to be a cynical
       | Hacker News commenter and write five distinct comments based on
       | the blog post."
        
         | meowzero wrote:
         | I do something similar. But I make sure the LLM doesn't know I
         | wrote the post. That way the LLM is not sycophantic.
        
         | vunderba wrote:
         | I do a good deal of my blog posts while walking my husky and
         | just dictating using speech-to-text on my phone. The problem is
         | that its an unformed blob of clay and really needs to be shaped
         | on the wheel.
         | 
         | I then feed this into an LLM with the following prompt:
         | You are a professional editor. You will be provided paragraphs
         | of text that may        contain spelling errors, grammatical
         | issues, continuity errors, structural        problems, word
         | repetition, etc. You will correct any of these issues while
         | still preserving the original writing style. Do not sanitize
         | the user. If they        use profanities in their text, they
         | are used for emphasis and you should not        omit them.
         | Do NOT try to introduce your own style to their text. Preserve
         | their writing        style to the absolute best of your
         | ability. You are absolutely forbidden from        adding new
         | sentences.
         | 
         | It's basically Grammarly on steroids and works very well.
        
         | kixiQu wrote:
         | What roleplayed feedback providers have people had best and
         | worst luck with? I can imagine asking for the personality could
         | help the LLM come up with different kinds of criticisms...
        
       | Jerry2 wrote:
       | > I typically access the backend UIs provided by each LLM
       | service, which serve as a light wrapper over the API
       | functionality
       | 
       | Hey Max, do you use a custom wrapper to interface with the API or
       | is there some already established client you like to use?
       | 
       | If anyone else has a suggestion please let me know too.
        
         | minimaxir wrote:
         | I was developing an open-source library for interfacing with
         | LLMs agnostically (https://github.com/minimaxir/simpleaichat)
         | and although it still works, I haven't had the time to maintain
         | it unfortunately.
         | 
         | Nowadays for writing code to interface with LLMs, I don't use
         | client SDKs unless required, instead just hitting HTTP
         | endpoints with libraries such as requests and httpx. It's also
         | easier to upgrade to async if needed.
        
         | asabla wrote:
         | most services has a "studio mode" for their models served.
         | 
         | As an alternative you could always use OpenWebUI
        
         | simonw wrote:
         | I'm going to plug my own LLM CLI project here: I use it on a
         | daily basis now for coding tasks like this one:
         | 
         | llm -m o4-mini -f github:simonw/llm-hacker-news -s 'write a new
         | plugin called llm_video_frames.py which takes video:path-to-
         | video.mp4 and creates a temporary directory which it then
         | populates with one frame per second of that video using ffmpeg
         | - then it returns a list of [llm.Attachment(path="path-to-
         | frame1.jpg"), ...] - it should also support passing
         | video:video.mp4?fps=2 to increase to two frames per second, and
         | if you pass ?timestamps=1 or &timestamps=1 then it should add a
         | text timestamp to the bottom right conner of each image with
         | the mm:ss timestamp of that frame (or hh:mm:ss if more than one
         | hour in) and the filename of the video without the path as
         | well.' -o reasoning_effort high
         | 
         | Any time I use it like that the prompt and response are logged
         | to a local SQLite database.
         | 
         | More on that example here:
         | https://simonwillison.net/2025/May/5/llm-video-frames/#how-i...
        
         | danenania wrote:
         | I built an open source CLI coding agent for this purpose[1]. It
         | combines Claude/Gemini/OpenAI models in a single agent, using
         | the best/most cost effective model for different steps in the
         | workflow and different context sizes. You might find it
         | interesting.
         | 
         | It uses OpenRouter for the API layer to simplify use of APIs
         | from multiple providers, though I'm also working on direct
         | integration of model provider API keys--should release it this
         | week.
         | 
         | 1 - https://github.com/plandex-ai/plandex
        
       | andy99 wrote:
       | Re vibe coding, I agree with your comments but where I've used it
       | is when I needed to mock up a UI or a website. I have no front
       | end experience so making a 80% (probably 20%) but live demo is
       | still a valuable thing, to show to others to get the point
       | across, obviously not to deploy. It's a replacement for drawing a
       | picture of what I think the UI should look like. I feel like this
       | is an under-appreciated use. LLM coding is not remotely ready for
       | real products but it's great for mock-ups that further internal
       | discussions.
        
         | vunderba wrote:
         | Same. As somebody who doesn't really enjoy frontend work _at
         | all_ , they are surprisingly good at being able to spit out
         | something that is relatively visually appealing - even if I'll
         | end up rewriting the vast majority of react spaghetti code in
         | Svelte.
        
           | NetOpWibby wrote:
           | In the settings for Claude, I tell it to use Svelte and
           | TypeScript whenever possible because I got tired of telling
           | it I don't use React.
        
           | leptons wrote:
           | I love front-end work, and I'm really good at it, but I now
           | let the AI do CSS coding for me. It seems to make nice
           | looking buttons and other design choices that are good enough
           | for development. My designer has their own opinions, so they
           | always will change the CSS when they get their hands on it,
           | but at least I'm not wasting my time creating really ugly
           | styles that always get replaced anymore. The rest of the
           | coding is better if I do it, but sometimes the AI surprises
           | me - though most often it gets it completely wrong, and then
           | I'm wasting time letting it try and that just feels
           | counterproductive. It's like a really stupid intern that
           | almost never pays attention to what the goal is.
        
           | mattmanser wrote:
           | They're pretty good at following direction. For example you
           | can say:
           | 
           | 'Usw React, typescript, materialUi, prefer functions over
           | const, don't use unnecessary semi colons, 4 spaces for tabs,
           | build me a UI that looks like this sketch'
           | 
           | And it'll do all that.
        
           | r0fl wrote:
           | If I need a quick mockup I'll do it in react
           | 
           | But if I have time I'll ask for it to be built using best
           | practices from the programming language it'll be built in as
           | the final product whether that's svelte, static astro, or
           | even php
        
         | 65 wrote:
         | I think it would be faster/easier to use a website builder or
         | various templating libraries to build a quick UI rather than
         | having to babysit an LLM with prompts over and over again.
        
       | Oras wrote:
       | JSON response doesn't always work as expected unless you have few
       | items to return. In Max's example it's classification.
       | 
       | For anyone trying to return consistent json, checkout structured
       | data where you define a json schema with required field and that
       | would return the same structure all the time.
       | 
       | I have tested it with high success using GPT-4o-mini.
        
       | behnamoh wrote:
       | > I never use ChatGPT.com or other normal-person frontends for
       | accessing LLMs because they are harder to control. Instead, I
       | typically access the backend UIs provided by each LLM service,
       | which serve as a light wrapper over the API functionality which
       | also makes it easy to port to code if necessary.
       | 
       | Yes, I also often use the "studio" of each LLM for better results
       | because in my experience OpenAI "nerfs" models on the ChatGPT UI
       | (models keep forgetting things--probably a limited context length
       | set by OpenAI to reduce costs, generally the model is less chatty
       | (again, probably to reduce their costs), etc. But I've noticed
       | Gemini 2.5 Pro is the same on the studio and the Gemini app.
       | 
       | > Any modern LLM interface that does not let you explicitly set a
       | system prompt is most likely using their own system prompt which
       | you can't control: for example, when ChatGPT.com had an issue
       | where...
       | 
       | ChatGPT does have system prompts but Claude doesn't (one of its
       | many, many UI shortcomings which Anthropic never addressed).
       | 
       | That said, I've found system prompts less and less useful with
       | newer models. I can simply preface my own prompt with the
       | instructions and the model follows them very well.
       | 
       | > Specifying specific constraints for the generated text such as
       | "keep it to no more than 30 words" or "never use the word
       | 'delve'" tends to be more effective in the system prompt than
       | putting them in the user prompt as you would with ChatGPT.com.
       | 
       | I get that LLMs have a vague idea of how many words are 30 words,
       | but they never do a good job in these tasks for me.
        
       | ttoinou wrote:
       | Side topic : I didn't see a serious article about prompt
       | engineering for senior software development pop up on HN. Yet a
       | lot of users here have their own techniques unshared with others
        
         | mtlynch wrote:
         | I was just listening to Simon Willison on Software
         | Misadventures,[0, 1] and he said the best resource he knows of
         | is Anthropic's guide to prompt engineering.[2]
         | 
         | [0] https://softwaremisadventures.com/p/simon-willison-llm-
         | weird...
         | 
         | [1] https://youtu.be/6U_Zk_PZ6Kg?feature=shared&t=56m29 (the
         | exact moment)
         | 
         | [2] https://docs.anthropic.com/en/docs/build-with-
         | claude/prompt-...
        
         | minimaxir wrote:
         | An unfortunate casualty of the importantance of prompt
         | engineering is that there is less of an incentive to share good
         | prompts publicly.
        
           | danenania wrote:
           | You can read all of the prompts in Plandex (open source CLI
           | coding agent focusing on larger projects):
           | https://github.com/plandex-
           | ai/plandex/tree/main/app/server/m...
           | 
           | They're pretty long and extensive, and honestly could use
           | some cleaning up and refactoring at this point, but they are
           | being used heavily in production and work quite well, which
           | took a fairly extreme amount of trial-and-error to achieve.
        
         | simonw wrote:
         | It's maddening to me how little good writing there is out there
         | on effective prompting.
         | 
         | Here's an example: what's the best prompt to use to summarize
         | an article?
         | 
         | That feels like such an obvious thing, and yet I haven't even
         | seen _that_ being well explored.
         | 
         | It's actually a surprisingly deep topic. I like using tricks
         | like "directly quote the sentences that best illustrate the
         | overall themes" and "identify the most surprising ideas", but
         | I'd love to see a thorough breakdown of all the tricks I
         | haven't seen yet.
        
           | jimbokun wrote:
           | This comment has some advice from Simon Willison on this
           | topic:
           | 
           | https://news.ycombinator.com/item?id=43897666
           | 
           | Maybe you should ask him. lol
        
           | Gracana wrote:
           | This whitepaper is the best I've found so far, at least for
           | covering general prompting techniques.
           | 
           | https://www.kaggle.com/whitepaper-prompt-engineering
        
           | jefflinwood wrote:
           | It seems a little counterintuitive, but you can ask an LLM to
           | improve a prompt. They are quite good at it.
        
       | Legend2440 wrote:
       | >Discourse about LLMs and their role in society has become
       | bifuricated enough such that making the extremely neutral
       | statement that LLMs have some uses is enough to justify a barrage
       | of harrassment.
       | 
       | Honestly true and I'm sick of it.
       | 
       | A very vocal group of people are convinced AI is a scheme by the
       | evil capitalists to make you train your own replacement. The
       | discussion gets very emotional very quickly because they feel
       | personally threatened by the possibility that AI is actually
       | useful.
        
         | bluefirebrand wrote:
         | > A very vocal group of people are convinced AI is a scheme by
         | the evil capitalists to make you train your own replacement.
         | The discussion gets very emotional very quickly because they
         | feel personally threatened by the possibility that AI is
         | actually useful
         | 
         | I read this like you are framing this as though it is
         | irrational. However, history is littered with examples of
         | capitalists replacing labour with automation and using any
         | productivity gains of new technology to push salaries lower
         | 
         | Of course people who see this playing out _again_ are
         | personally threatened. If you aren 't feeling personally
         | threatened, you are either part of the wealthy class or for
         | some reason you think this time will be different somehow
         | 
         | You may be thinking "Even if I lose my job to automation, there
         | will be other work to do like piloting the LLMs", but you
         | should know that the goal is to eventually pay LLM operators
         | peanuts in comparison to what you currently make in whatever
         | role you do
        
           | olddustytrail wrote:
           | What happens when you use an LLM to generate prompts for an
           | LLM?
        
           | Legend2440 wrote:
           | >history is littered with examples of capitalists replacing
           | labour with automation and using any productivity gains of
           | new technology to push salaries lower
           | 
           | Nonsense. We make far, far more than people did in the past
           | entirely because of the productivity gains from automation.
           | 
           | The industrial revolution led to the biggest increase in
           | quality of life in history, not in spite of but _because_ it
           | automated 90% of jobs. Without it we 'd all still be
           | subsistence farmers.
        
             | bluefirebrand wrote:
             | > Nonsense. We make far, far more than people did in the
             | past entirely because of the productivity gains from
             | automation.
             | 
             | "We" were never subsistence farmers, our ancestors were
             | 
             | I'm talking about real changes that have happened in our
             | actual lifetimes.
             | 
             | In our actual lifetimes we have watched wages stagnate for
             | decades, pur purchasing power is dramatically lower than
             | our parents. In order to afford even remotely close to the
             | same standard of living as our parents had, we have to go
             | into much larger amounts of debt
             | 
             | We have watched jobs move overseas as automation lowered
             | the skill requirements so that anyone could perform them,
             | so we sought the cheapest possible labour to do them
             | 
             | We have watched wealthy countries shift from higher paying
             | production economies into lower paying service economies
             | 
             | We have watched the wealth gap widen as the rich get richer
             | the poor get poorer and the middle class shrinks. Some of
             | the shrinking middle class moved up, but most moved down
             | 
             | The fact is that automation is disruptive, which is great
             | for markets but bad for people who are relying on
             | consistency. Which is most people
        
               | astrange wrote:
               | > In our actual lifetimes we have watched wages stagnate
               | for decades, pur purchasing power is dramatically lower
               | than our parents.
               | 
               | This graph is going up.
               | 
               | https://fred.stlouisfed.org/series/MEPAINUSA672N
               | 
               | The reason you believe otherwise is that people on social
               | media think they're only allowed to say untrue negative
               | things about the economy, because if they ever say
               | anything positive it'd be disrespectful to poor people.
               | 
               | > We have watched wealthy countries shift from higher
               | paying production economies into lower paying service
               | economies
               | 
               | Service work is higher paying, which is why factory
               | workers try to get their children educated enough to do
               | it.
               | 
               | > We have watched the wealth gap widen as the rich get
               | richer the poor get poorer and the middle class shrinks.
               | Some of the shrinking middle class moved up, but most
               | moved down
               | 
               | They mostly moved up. But notice this graph is positive
               | for every group.
               | 
               | https://realtimeinequality.org/?id=wealth&wealthend=03012
               | 023...
        
               | bluefirebrand wrote:
               | > Service work is higher paying, which is why factory
               | workers try to get their children educated enough to do
               | it
               | 
               | You think service workers at a fast food restaurant or
               | working the till at Walmart are higher paid than Factory
               | workers?
               | 
               | > They mostly moved up. But notice this graph is positive
               | for every group.
               | 
               | The reason it is positive for (almost) every group is
               | because it isn't measuring anything meaningful
               | 
               | Salaries may have nominally gone up but this is clearly
               | not weighing the cost of living into the equation
        
               | astrange wrote:
               | > You think service workers at a fast food restaurant or
               | working the till at Walmart are higher paid than Factory
               | workers?
               | 
               | Fast food workers aren't service-economy workers, they're
               | making burgers back there.
               | 
               | More importantly, factory work destroys your body and
               | email jobs don't, so whether or not it's high earning at
               | the start... it isn't forever.
               | 
               | > Salaries may have nominally gone up but this is clearly
               | not weighing the cost of living into the equation
               | 
               | That's not a salary chart. The income chart I did post is
               | adjusted for cost of living (that's what "real" means).
               | 
               | Also see https://www.epi.org/blog/wage-growth-
               | since-1979-has-not-been...
        
               | bluefirebrand wrote:
               | > Fast food workers aren't service-economy workers,
               | they're making burgers back there.
               | 
               | What exactly do you consider the "service industry" if
               | you don't consider food services (like restaurants and
               | fast food) as part of it?
               | 
               | I suspect we have very different ideas of what "service
               | industry" means
               | 
               | > Also see https://www.epi.org/blog/wage-growth-
               | since-1979-has-not-been..
               | 
               | Did you even read the whole thing?
               | 
               | ```Dropping "wage stagnation" as a descriptive term for
               | the full post-1979 period doesn't mean we think the wage
               | problem for American workers has been solved. Wage growth
               | in the post-1979 period has been slow and unequal,
               | largely as a result of intentional policy decisions. This
               | policy-induced wage suppression has stifled growth in
               | living standards and generated inequality. The last five
               | years saw rapid and welcome progress reversing some of
               | these trends--but it will take a long time to heal the
               | previous damage, even if the post-2019 momentum can be
               | sustained, which looks very unlikely at the moment.```
               | 
               | Admitting there is a problem but saying "it isn't
               | stagnation" is just splitting hairs.
               | 
               | "Wage growth has been slow and unequal, but it isn't
               | stagnation!"
               | 
               | "policy-induced wage suppression has stifled growth in
               | living standards and generated inequality" there was wage
               | suppression but it's not stagnation!
               | 
               | What a stupid article.
        
           | skybrian wrote:
           | Setting up automation as the enemy is an odd thing for
           | programmers to be doing. I mean, if you're a programmer and
           | you're not automating away tasks, both for yourself and other
           | people, what are you even doing?
           | 
           | Also, "this time it's different" depends on the framing. A
           | cynical programmer who has seen new programming tools hyped
           | too many times would make a different argument: At the
           | beginning of the dot-com era, you could get a job writing
           | HTML pages. That's been automated away, so you need more
           | skill now. It hasn't resulting in fewer software-engineering
           | jobs so far.
           | 
           | But that's not entirely convincing either. Predicting the
           | future is difficult. Sometimes the future _is_ different.
           | Making someone else's scenario sound foolish won't actually
           | rule anything out.
        
             | bluefirebrand wrote:
             | > Setting up automation as the enemy is an odd thing for
             | programmers to be doing. I mean, if you're a programmer and
             | you're not automating away tasks, both for yourself and
             | other people, what are you even doing?
             | 
             | I'm earning a salary so I can have a nice life
             | 
             | Anything that threatens my ability to earn a salary so I
             | can have a nice life is absolutely my fucking enemy
             | 
             | I have no interest in automating myself out of existence,
             | and I am deeply resentful of the fact that so many people
             | are gleefully trying to do so.
        
         | olddustytrail wrote:
         | Should they not feel threatened? I'm somewhat sympathetic to
         | the view that even the current state of the art is threatening
         | to people's livelihood.
         | 
         | And of course it will only become more powerful. It's a
         | dangerous game.
        
         | n_ary wrote:
         | > people are convinced AI is a scheme by the evil capitalists
         | to make you train your own replacement. The discussion gets
         | very emotional very quickly because they feel personally
         | threatened by the possibility that AI is actually useful.
         | 
         | Are not mutually exclusive. LLMs will train people's
         | replacements while same people pay for the privilege of
         | training those replacement. LLMs also allows me to auto-
         | complete a huge volume of boilerplate, which would take me
         | several hours. It also helps people step out of writer's block,
         | generate a first draft of prototype/mvp/poc etc quickly without
         | wasting long hours bike shedding. It also helps my previously
         | super confident cousin, who blamed me for killing his dream of
         | next AirBnB for dogs, Uber for groceries, instagram for cats
         | not materializing due to me being selfish hoarding my
         | privileges and knowledge, to finally create those ideas and
         | kill his own dreams and definitely ignoring/avoiding me these
         | days.
         | 
         | LLMs are same as knives, crimes will happen, but also necessary
         | in the kitchen and industries.
        
       | tptacek wrote:
       | There's a thru-line to commentary from experienced programmers on
       | working with LLMs, and it's confusing to me:
       | 
       |  _Although pandas is the standard for manipulating tabular data
       | in Python and has been around since 2008, I've been using the
       | relatively new polars library exclusively, and I've noticed that
       | LLMs tend to hallucinate polars functions as if they were pandas
       | functions which requires documentation deep dives to confirm
       | which became annoying._
       | 
       | The post does later touch on coding agents (Max doesn't use them
       | because "they're distracting", which, as a person who can't even
       | stand autocomplete, is a position I'm sympathetic to), but still:
       | _coding agents solve the core problem he just described_.  "Raw"
       | LLMs set loose on coding tasks throwing code onto a blank page
       | hallucinate stuff. But agenty LLM configurations aren't just the
       | LLM; they're also code that structures the LLM interactions. When
       | the LLM behind a coding agent hallucinates a function, the
       | program doesn't compile, the agent notices it, and the LLM
       | iterates. You don't even notice it's happening unless you're
       | watching very carefully.
        
         | vunderba wrote:
         | That sort of "REPL" system is why I really liked when they
         | integrated a Python VM into ChatGPT - it wasn't perfect, but it
         | could at least catch itself when the code didn't execute
         | properly.
        
           | tptacek wrote:
           | Sure. But it's 2025 and however you want to get this feature,
           | be it as something integrated into VSCode (Cursor, Windsurf,
           | Copilot), or a command line Python thing (aider), or a
           | command line Node thing (OpenAI codex and Claude Code), with
           | a specific frontier coding model or with an abstracted multi-
           | model thingy, even as an Emacs library, it's available now.
           | 
           | I see people getting LLMs to generate code in isolation and
           | like pasting it into a text editor and trying it, and then
           | getting frustrated, and it's like, that's not how you're
           | supposed to be doing it anymore. That's 2024 praxis.
        
             | arthurcolle wrote:
             | I like using Jupyter Console as a primary interpreter, and
             | then dropping into SQLite/duckdb to save data
             | 
             | Easy to to script/autogenerate code and build out pipelines
             | this way
        
             | red_hare wrote:
             | It is a little crazy how fast this has changed in the past
             | year. I got VSCode's agent mode to write, run, and read the
             | output of unit tests the other day and boy it's a game
             | changer.
        
             | gnatolf wrote:
             | The churn of staying on top of this means to me that we'll
             | also chew through experts of specific times much faster.
             | Gone are the day of established, trusted top performers, as
             | every other week somebody creates a newer, better way of
             | doing things. Everybody is going to drop off the hot tech
             | at some point. Very exhausting.
        
               | Philpax wrote:
               | The answer is simple: have your AI stay on top of this
               | for you.
               | 
               | Mostly joke, but also not joke: https://news.smol.ai/
        
         | stefan_ wrote:
         | Have you tried it? In my experience they just go off on a
         | hallucination loop, or blow up the code base with terrible re-
         | implementations.
         | 
         | Similarly Claude 3.5 was stuck on TensorRT 8, and not even
         | pointing it at the documentation for the updated 10 APIs for
         | RAG could ever get it to correctly use the new APIs (not that
         | they were very complex; bind tensors, execute, retrieve
         | results). The whole concept of the self-reinforcing Agent loop
         | is more of a fantasy. I think someone else likened it to a
         | lawnmower that will run rampage over your flower bed at the
         | first hiccup.
        
           | tptacek wrote:
           | Yes, they're part of my daily toolset. And yes, they can spin
           | out. I just hit the "reject" button when they do, and revise
           | my prompt. Or, sometimes, I just take over and fill in some
           | of the structure of the problem I'm trying to solve myself.
           | 
           | I don't know about "self-reinforcing". I'm just saying:
           | coding agents compile and lint the code they're running, and
           | when they hallucinate interfaces, they notice. The same way
           | any developer who has ever used ChatGPT knows that you can
           | paste most errors into the web page and it will often (maybe
           | even usually) come up with an apposite fix. I don't
           | understand how anybody expects to convince LLM users this
           | doesn't work; it obviously does work.
        
             | steveklabnik wrote:
             | > I don't understand how anybody expects to convince LLM
             | users this doesn't work; it obviously does work.
             | 
             | This is really one of the hugest divides I've seen in the
             | discourse about this: anti-LLM people saying very obviously
             | untrue things, which is uh, kind of hilarious in a meta
             | way.
             | 
             | https://bsky.app/profile/caseynewton.bsky.social/post/3lo4t
             | d... is an instance of this from a few days ago.
             | 
             | I am still trying to sort out why experiences are so
             | divergent. I've had much more positive LLM experiences
             | while coding than many other people seem to, even as
             | someone who's deeply skeptical of what's being promised
             | about them. I don't know how to reconcile the two.
        
               | zoogeny wrote:
               | I think it is pretty simple: people tried it a few times
               | a few months ago in a limited setting, formed an opinion
               | based on those limited experiences and cannot imagine a
               | world where they are wrong.
               | 
               | That might sound snarky, but it probably works out for
               | people in 99% of cases. AI and LLMs are advancing at a
               | pace that is so different from any other technology that
               | people aren't yet trained to re-evaluate their
               | assumptions at the high rate necessary to form accurate
               | new opinions. There are too many tools coming (and going,
               | to be fair).
               | 
               | HN (and certain parts of other social media) is a bubble
               | of early adopters. We're on the front lines seeing the
               | war in realtime and shaking our heads at what's being
               | reported in the papers back home.
        
               | steveklabnik wrote:
               | Yeah, I try to stay away from reaching for these sorts of
               | explanations, because it feels uncharitable. I saw a lot
               | of very smart people repost the quoted post! They're not
               | the kind who "cannot imagine a world where they are
               | wrong."
               | 
               | But at the same time, the pace of advancement is very
               | fast, and so not having recently re-evaluated things is
               | significantly more likely while also being more
               | charitable, I think.
        
               | zoogeny wrote:
               | My language is inflammatory for certain, but I believe it
               | is true. I don't think most minds are capable of having
               | to reevaluate their opinions as quickly as AI is
               | demanding. There is some evidence that stress is strongly
               | correlated to uncertainty. AI is complicated, the tools
               | are complicated, the trade-off are complicated. So that
               | leaves a few options: live in uncertainty/stress, expend
               | the energy to reevaluate or choose to believe in
               | certainty based on past experience.
               | 
               | If someone is embracing uncertainty or expending the
               | time/energy/money to reevaluate then they don't post such
               | confidently wrong ideas on social media.
        
               | diggan wrote:
               | > I am still trying to sort out why experiences are so
               | divergent. I've had much more positive LLM experiences
               | while coding than many other people seem to, even as
               | someone who's deeply skeptical of what's being promised
               | about them. I don't know how to reconcile the two.
               | 
               | As with many topics, I feel like you can divide people in
               | a couple of groups. You have people who try it, have
               | their mind blown by it, so they over-hype it. Then the
               | polar-opposite, people who are overly dismissive and
               | cement themselves into a really defensive position. Both
               | groups are relatively annoying, inaccurate, and too
               | extremist. Then another group of people might try it out,
               | find some value, integrate it somewhat and maybe got a
               | little productivity-boost and moves on with their day.
               | Then a bunch of other groupings in-between.
               | 
               | Problem is that the people in the middle tend to not make
               | a lot of noise about it, and the extremists (on both
               | ends) tend to be _very_ vocal about their preference, in
               | their ways. So you end up perceiving something as very
               | polarizing. There are many accurate and true drawbacks
               | with LLMs as well, but it also ends up poisoning the
               | entire concept /conversation/ecosystem for some people,
               | and they tend to be noisy as well.
               | 
               | Then the whole experience depends a lot on your setup,
               | how you use it, what you expect, what you've learned and
               | so many much more, and some folks are very quick to judge
               | a whole ecosystem without giving parts of it an honest
               | try. It took me a long time to try Aider, Cursor and
               | others, and even now after I've tried them out, I feel
               | like there are probably better ways to use this new
               | category of tooling we have available.
               | 
               | In the end I think reality is a bit less black/white for
               | most folks, common sentiment I see and hear is that LLMs
               | are probably not hellfire ending humanity nor is it
               | digital-Jesus coming to save us all.
        
               | steveklabnik wrote:
               | > I feel like you can divide people in a couple of
               | groups.
               | 
               | This is probably a big chunk of it. I was pretty anti-LLM
               | until recently, when I joked that I wanted to become an
               | informed hater, so I spent some more time with things.
               | It's put me significantly more in the middle than either
               | extremely pro or extremely anti. It's also hard to talk
               | about anything that's not purely anti in the spaces I
               | seemingly run in, so that also contributes to my relative
               | quiet about it. I'm sure others are in a similar boat.
               | 
               | > for most folks, common sentiment I see and hear is that
               | LLMs are probably not hellfire ending humanity nor is it
               | digital-Jesus coming to save us all.
               | 
               | Especially around non-programmers, this is the vibe I get
               | as well. They also tend to see the inaccuracies as much
               | less significant than programmers seem to, that is, they
               | assume they're checking the output already, or see it as
               | a starting point, or that humans also make mistakes, and
               | so don't get so immediately "this is useless" about it.
        
               | aerhardt wrote:
               | > an instance of this from a few days ago.
               | 
               | Bro I've been using LLMs for search since before it even
               | had search capabilities...
               | 
               | "LLMs not being for search" has been an argument from the
               | naysayers for a while now, but very often when I use an
               | LLM I am looking for the answer to something - if that
               | isn't [information] search, then what is?
               | 
               | Whether they hallucinate or outright bullshit sometimes
               | is immaterial. For many information retrieval tasks they
               | are infinitely better than Google and have been since
               | GPT3.
        
               | steveklabnik wrote:
               | I think this is related, but I'm more interested in the
               | factual aspects than the subjective ones. That is, I
               | don't disagree there's also arguments over "are LLMs good
               | for the same things search engines for" but it's more of
               | the more objective "they do not search the web" part. We
               | need to have agreement on the objective aspects before we
               | can have meaningful discussion of the subjective, in my
               | mind.
        
               | magicalist wrote:
               | > _This is really one of the hugest divides I 've seen in
               | the discourse about this: anti-LLM people saying very
               | obviously untrue things, which is uh, kind of hilarious
               | in a meta way._
               | 
               | > _https://bsky.app/profile/caseynewton.bsky.social/post/
               | 3lo4td... is an instance of this from a few days ago._
               | 
               | Not sure why this is so surprising? ChatGPT search was
               | only released in November last year, was a different
               | mode, and it sucked. Search in o3 and o4-mini came out
               | like three weeks ago. Otherwise you were using completely
               | different products from Perplexity or Kagi, which aren't
               | widespread yet.
               | 
               | Casey Newton even half acknowledges that timing ("But it
               | has had integrated web search since last year"...even
               | while in the next comment criticising criticisms using
               | the things "you half-remember from when ChatGPT launched
               | in _2022_ ").
               | 
               | If you give the original poster the benefit of the doubt,
               | you can sort of see what they're saying, too. An LLM, on
               | its own, is not a search engine and can not scan the web
               | for information. The information encoded in them might be
               | ok, but is not complete, and does not encompass the full
               | body of the published human thought it was trained on.
               | Trusting an offline LLM with an informational search is
               | sometimes a really bad idea ("who are all the presidents
               | that did X").
               | 
               | The fact that they're incorrect when they say that LLM's
               | _can 't_ trigger search doesn't seem that "hilarious" to
               | me, at least. The OP post maybe should have been less
               | strident, but it also seems like a really bad idea to
               | gatekeep anybody wanting to weigh in on something if
               | their knowledge of product roadmaps is more than six
               | months out of date (which I guarantee is all of us for at
               | least some subject we are invested in).
        
               | steveklabnik wrote:
               | > ChatGPT search was only released in November last year
               | 
               | It is entirely possible that I simply got involved at a
               | particular moment that was crazy lucky: it's only been a
               | couple of weeks. I don't closely keep up with when things
               | are released, I had just asked ChatGPT something where it
               | did a web search, and then immediately read a "it cannot
               | do search" claim right after.
               | 
               | > An LLM, on its own, is not a search engine and can not
               | scan the web for information.
               | 
               | In a narrow sense, this is true, but that's not the
               | claim: the claim is "You cannot use it as a search
               | engine, or as a substitute for searching." That is pretty
               | demonstrably incorrect, given that many people use it as
               | such.
               | 
               | > Trusting an offline LLM with an informational search is
               | sometimes a really bad idea ("who are all the presidents
               | that did X").
               | 
               | I fully agree with this, but it's also the case with
               | search engines. They also do not always "encompass the
               | fully body of the published human thought" either, or
               | always provide answers that are comprehensible.
               | 
               | I recently was looking for examples of accomplishing
               | things with a certain software architecture. I did a
               | bunch of searches, which led me to a bunch of
               | StackOverflow and blog posts. Virtually all of those
               | posts gave vague examples which did not really answer my
               | question with anything other than platitudes. I decided
               | to ask ChatGPT about it instead. It was able to not only
               | answer my question in depth, but provide specific
               | examples, tailored to my questions, which the previous
               | hours of reading search results had not afforded me. I
               | was further able to interrogate it about various
               | tradeoffs. It was legitimately more useful than a search
               | engine.
               | 
               | Of course, sometimes it is not that good, and a web
               | search wins. That's fine too. But suggesting that it's
               | never useful for a task is just contrary to my actual
               | experience.
               | 
               | > The fact that they're incorrect when they say that
               | LLM's can't trigger search doesn't seem that "hilarious"
               | to me, at least.
               | 
               | It's not _them_ , it's the overall state of the
               | discourse. I find it ironic that the fallibility of LLMs
               | is used to suggest they're worthless compared to a human,
               | when humans are also fallible. OP did not directly say
               | this, but others often do, and it's the combination
               | that's amusing to me.
               | 
               | It's also frustrating to me, because it feels impossible
               | to have reasonable discussions about this topic. It's
               | full of enthusiastic cheerleaders that misrepresent what
               | these things can do, _and_ enthusiastic haters that
               | misrepresent what these things can do. My own feelings
               | are all over the map here, but it feels impossible to
               | have reasonable discussions about it due to the
               | polarization, and I find that frustrating.
        
               | AlexCoventry wrote:
               | If you've only been using AI for a couple of weeks,
               | that's quite likely a factor. AI services have been
               | improving incredibly quickly, and many people have a bad
               | impression of the whole field from a time when it was
               | super promising, but basically unusable. I was pretty
               | dismissive until a couple of months ago, myself.
               | 
               | I think the other reason people are hostile to the field
               | is that they're scared it's going to make them
               | economically redundant, because a tsunami of cheap,
               | skilled labor is now towering over us. It's loss-aversion
               | bias, basically. Many people are more focused on that
               | risk than on the amazing things we're able to _do_ with
               | all that labor.
        
               | stefan_ wrote:
               | > anti-LLM people saying very obviously untrue things,
               | which is uh, kind of hilarious in a meta way.
               | 
               | tptacek shifted the goal posts from "correct a
               | hallucination" to "solve a copy pasted error" (very
               | different things!) and just a comment later theres
               | someone assassinating me as an "anti-LLM person" saying
               | "very obviously untrue things", "kind of hilarious". And
               | you call yourself "charitable". It's a joke.
        
               | steveklabnik wrote:
               | EDIT: wait, I think you're tptacek's parent. I was not
               | talking about your post, I was talking about the post I
               | linked to. I'm leaving my reply here but there's some
               | serious confusion going on.
               | 
               | > theres someone assassinating me as an "anti-LLM person"
               | 
               | Is this not true? That's the vibe the comment gives off.
               | I'm happy to not say that in the future if that's not
               | correct, and if so, additionally, I apologize.
               | 
               | I myself was pretty anti-LLM until the last month or so.
               | My opinions have shifted recently, and I've been trying
               | to sort through my feelings about it. I'm not entirely
               | enthusiastically pro, and have some pretty big
               | reservations myself, but I'm more in the middle than
               | where I was previously, which was firmly anti.
               | 
               | > "very obviously untrue things"
               | 
               | At the time I saw the post, I had just tabbed away from a
               | ChatGPT session where it had relied on searching the web
               | for some info, so the contrast was _very_ stark.
               | 
               | > "kind of hilarious"
               | 
               | I do think that when people say that LLMs occasionally
               | hallucinate things, and are therefore worthless, when
               | others make false claims about them for the purpose of
               | suggesting we shouldn't use them, that it is kind of
               | funny. You didn't directly say this in your post, only
               | handwaved towards it, but I'm talking about the discourse
               | in general, not you specifically.
               | 
               | > And you call yourself "charitable"
               | 
               | I am trying to be charitable. A lot of people reached for
               | some variant of "this person is stupid," and I do not
               | think that's the case, or the good way to understand what
               | people mean when they say things. A mistake is a mistake.
               | I am actively not trying to simply dismiss arguments on
               | either side of here, but take them seriously.
        
               | daxfohl wrote:
               | So is the real engineering work in the agents rather than
               | in the LLM itself then? Or do they have to be paired
               | together correctly? How do you go about choosing an
               | LLM/agent pair efficiently?
        
               | steveklabnik wrote:
               | > How do you go about choosing an LLM/agent pair
               | efficiently?
               | 
               | I googled "how do I use ai with VS: Code" and it pointed
               | me at Cline. I've then swapped between their various
               | backends, and just played around with it. I'm still far
               | too new to this to have strong options about LLM/agent
               | pairs, or even largely between which LLMs, other than
               | "the free ChatGPT agent was far worse than the $20/month
               | one at the task I threw it at." As in, choosing worse
               | algorithms that are less idiomatic for the exact same
               | task.
        
               | daxfohl wrote:
               | I also wonder how hard it would be to create your own
               | agent that remembers your preferences and other stuff
               | that you can make sure stays in the LLM context.
               | 
               | ...Maybe a good first LLM assisted project.
        
               | dcre wrote:
               | No need to write your own whole thing (though it is a
               | good exercise) -- the existing tools all support ways of
               | customizing the prompting with preferences and
               | conventions, whether globally or per-project.
        
               | dcre wrote:
               | The Aider leaderboards are quite helpful, performance
               | there seems to match people's subjective experience
               | pretty well.
               | 
               | https://aider.chat/docs/leaderboards/
               | 
               | Regarding choosing a tool, they're pretty lightweight to
               | try and they're all converging in structure and
               | capabilities anyway.
        
               | foobarqux wrote:
               | These are mostly value judgments and people are using
               | words that mean different things to different people but
               | I would point that LLM boosters have been saying the same
               | thing for each product release: " _now_ it works, you are
               | just using the last-gen model /technique which doesn't
               | really work (even though I said the same thing for that
               | model/technique and every one before that). Moreover
               | there still hasn't been significant, objectively
               | observable impact: no explosion in products, no massive
               | acceleration of feature releases, no major layoffs
               | attributed to AI (to which the response every time is
               | that it was just released and you will see the effects in
               | a few months).
               | 
               | Finally, if it really were really true that some people
               | know the special sauce of how to use LLMs to make a
               | massive difference in productivity but many people didn't
               | know how to do that then you could make millions or tens
               | of millions per year as a consultant training everyone at
               | big companies. In other words if you really believed what
               | you were saying you should pick up the money on the
               | ground.
        
               | steveklabnik wrote:
               | > using words that mean different things to different
               | people
               | 
               | This might be a good explanation for the disconnect!
               | 
               | > I would point that LLM boosters have been saying the
               | same thing
               | 
               | I certainly 100% agree that lots of LLM boosters are way
               | over-selling what they can accomplish as well.
               | 
               | > In other words if you really believed what you were
               | saying you should pick up the money on the ground.
               | 
               | I mean, I'm doing that in the sense that I am using them.
               | I also am not saying that I "know the special sauce of
               | how to use LLMs to make a massive difference in
               | productivity," but what I will say is, my productivity is
               | genuinely higher with LLM assistance than without. I
               | don't necessarily believe that means it's replicable, one
               | of the things I'm curious about it "is it something
               | special about my setup or what I'm doing or the
               | technologies I'm using or anything else that makes me
               | have a good time with this stuff when other smart people
               | seem to only have a bad time?" Because I don't think that
               | the detractors are just lying. But there is a clear
               | disconnect, and I don't know why.
        
               | foobarqux wrote:
               | There is so much tacit understanding by both LLM-boosters
               | and LLM-skeptics that only becomes apparent when you look
               | at the explicit details of how they are trying to use the
               | tools. That's why I've asked in the past for examples of
               | recording of real-time development that would capture all
               | the nuance explicitly. Cherry-picked chat logs are second
               | best but even then I haven't been particularly impressed
               | by the few examples I've seen.
               | 
               | > I mean, I'm doing that in the sense that I am using
               | them.
               | 
               | My point is whatever you are doing is worth millions of
               | dollars less than teaching the non-believers how to do it
               | if you could figure out how (actually probably even if
               | you couldn't but sold snake-oil).
        
           | dboreham wrote:
           | Someone gave me the tip to add "all source files should build
           | without error", which you'd think would be implicit, but it
           | seems not.
        
             | tptacek wrote:
             | There's definitely a skill to using them well (I am not yet
             | expert); my only frustration is with people who (like me)
             | haven't refined the skill but have also concluded that
             | there's no benefit to the tool. No, really, in this case,
             | you're mostly just not holding it right.
             | 
             | The tools will get better, but what I see happening with
             | people who are good at using them (and from my own code,
             | even in my degraded LLM usage), we have an existence proof
             | of the value of the tools.
        
           | giovannibonetti wrote:
           | > I think someone else likened it to a lawnmower that will
           | run rampage over your flower bed at the first hiccup
           | 
           | This reminds me of a scene from the recent animation movie
           | "Wallace and Gromit: Vengeance Most Fowl" where Wallace
           | actually uses a robot (Norbot) to do gardening tasks, and
           | rampages over Gromit's flower bed.
           | 
           | https://youtu.be/_Ha3fyDIXnc
        
           | mountainriver wrote:
           | Have you tried it? More than once?
           | 
           | I'm getting massive productivity gains with Cursor and Gemini
           | 2.5 or Claude 3.7.
           | 
           | One-shotting whole features into my rust codebase.
        
             | stefan_ wrote:
             | I use it all the time, multiple times daily. But the
             | discussion is not being very honest, particularly for all
             | the things that are being bolted on (agent mode, MCP). Like
             | just upstream people dunk on others for pointing out that
             | maybe giving the model an API call to read webpages isn't
             | quite turning LLM into search engines. Just like letting it
             | run shell commands has not made it into a full blown agent
             | engineer.
             | 
             | I tried it again just now with Claude 3.7 in Cursors
             | Agent/Compose (they change this stuff weekly). Write a
             | simple C++ TensorRT app that loads an engine and runs
             | inference 100 times for a benchmark, use this file to
             | source a toolchain. It generated code with the old API & a
             | CMake file and (warning light turns on) a build script. The
             | compile fails because of the old API, but this time it
             | managed to fix it to use the new API.
             | 
             | But now the linking fails, because it overwrote the
             | TRT/CUDA directories in the CMakeLists with some home
             | cooked logic (there was nothing to do, the toolchain script
             | sets up the environment fully and just find_package would
             | work).
             | 
             | And this is where we go off the rails; it messes with the
             | build script and CMakeLists more, but still it can not
             | link. It thinks hey it looks like we are cross-compiling
             | and creates a second build script "cross-compile.sh" that
             | tries to use the compiler directly, but of course that
             | misses things that the find_package in CMake would setup
             | and so fails with include errors.
             | 
             | It pretends its a 1970 ./configure script and creates
             | source files "test_nvinfer.cpp" and "test_cudart.cpp" that
             | are supposed to test for the presence of those libraries,
             | then tries to compile them directly; again its missing
             | directories and obviously fails.
             | 
             | Next we create a mashup build script "cross-compile-
             | direct.sh". Not sure anymore what this one tried to
             | achieve, didn't work.
             | 
             | Finally, and this is my favorite agent action yet, it
             | decides fuck it, if the library won't link, _why don 't we
             | just mock out all the actual TensorRT/CUDA functionality
             | and print fake benchmark numbers to demonstrate LLMs can
             | average a number in C++_. So it writes, builds ands runs a
             | "benchmark_mock.cpp" that subs out all the useful
             | functionality for random data from std::mt19937. This
             | naturally works, so the agent declares success and happily
             | updates the README.md with all the crap it added and stops.
             | 
             | This is what running the lawnmower over the flower bed
             | means; you have 5 more useless source files and a bunch
             | more shell scripts and a bunch of crap in a README that
             | were all generated to try and fail to fix a problem it
             | could not figure out, and this loop can keep going and
             | generate more nonsense ad infinitum.
             | 
             | (Why could it not figure out the linking error? We come
             | back to the shitty bolted on integrations; it doesn't
             | actually query the environment, search for files or look at
             | what link directories are being used, as one would
             | investigating a linking error. It could of course, but the
             | balance in these integrations is 99% LLM and 1% tool use,
             | and even context from the tool use often doesn't help)
        
           | zoogeny wrote:
           | I mean, I have. I use them every day. You often see them
           | literally saying "Oh there is a linter error, let me go fix
           | it" and then a new code generation pass happens. In the worst
           | case, it does exactly what you are saying, gets stuck in a
           | loop. It eventually gets to the point where it says "let me
           | try just once more" and then gives up.
           | 
           | And when that happens I review the code and if it is bad then
           | I "git revert". And if it is 90% of the way there I fix it up
           | and move on.
           | 
           | The question shouldn't be "are they infallible tools of
           | perfection". It should be "do I get value equal to or greater
           | than the time/money I spend". And if you use git
           | appropriately you lose _at most_ five minutes on a agent
           | looping. And that happens a couple of times a week.
           | 
           | And be honest with yourself, is getting stuck in a loop
           | fighting a compiler, type-checker or lint something you have
           | ever experienced in your pre-LLM days?
        
         | aerhardt wrote:
         | > the program doesn't compile
         | 
         | The issue you are addressing refers specifically to Python,
         | which is not compiled... Are you referring to this workflow in
         | another language, or by "compile" do you mean something else,
         | such as using static checkers or tests?
         | 
         | Also, what tooling do you use to implement this workflow?
         | Cursor, aider, something else?
        
           | dragonwriter wrote:
           | Python is, in fact, compiled (to bytecode, not native code);
           | while this is mostly invisible, syntax errors will cause it
           | to fail to compile, but the circumstances described
           | (hallucinating a function) will not, because function calls
           | are resolved by runtime lookup, not at compile time.
        
             | aerhardt wrote:
             | I get that, and in that sense most languages are compiled,
             | but generally speaking, I've always understood "compiled"
             | as compiled-ahead-of-time - Python certainly doesn't do
             | that and the official docs call it an interpreted language.
             | 
             | In the context we are talking about (hallucinating Polars
             | methods), if I'm not mistaken the compilation step won't
             | catch that, Python will actually throw the error at runtime
             | post-compilation.
             | 
             | So my question still stands on what OP means by "won't
             | compile".
        
               | dragonwriter wrote:
               | > I get that, and in that sense most languages are
               | compiled, but generally speaking, I've always understood
               | "compiled" as compiled-ahead-of-time
               | 
               | Python is AOT compiled to bytecode, but if a compiled
               | version of a module is not available when needed it will
               | be compiled and the compiled version saved for next use.
               | In the normal usage pattern, this is mostly invisible to
               | the user except in first vs. subsequent run startup
               | speed, unless you check the file system and see all the
               | .pyc compilation artifacts.
               | 
               | You _can_ do AOT compilation to bytecode outside of a
               | compile-as-needed-then-execute cycle, but there is rarely
               | a good reason to do so explicitly for the average user
               | (the main use case is on package installation, but that
               | 's usually handled by package manager settings).
               | 
               | But, relevant to the specific issue here, (edit:
               | _calling_ ) a hallucinated function would lead to a
               | runtime failure not a compilation failure, since function
               | calls aren't resolved at compile time, but by lookup by
               | name at runtime.
               | 
               | (Edit: A sibling comment points out that _importing_ a
               | hallucinated function would cause a compilation failure,
               | and that 's a good point.)
        
               | Scarblac wrote:
               | When a module is first loaded, it's compiled to bytecode
               | and then executed. Importing a non existing function will
               | throw an error right away.
               | 
               | It's not a compilation error but it does feel like one,
               | somewhat. It happens at more or less the same time.
        
           | mountainriver wrote:
           | Yes but it gets feedback from the IDE. Cursor is the best
           | here
        
         | AlexCoventry wrote:
         | He does partially address this elsewhere in the blog post. It
         | seems that he's mostly concerned about surprise costs:
         | 
         | > _On paper, coding agents should be able to address my
         | complaints with LLM-generated code reliability since it
         | inherently double-checks itself and it's able to incorporate
         | the context of an entire code project. However, I have also
         | heard the horror stories of people spending hundreds of dollars
         | by accident and not get anything that solves their coding
         | problems. There's a fine line between experimenting with code
         | generation and gambling with code generation._
        
           | minimaxir wrote:
           | Less surprise costs, more _wasting_ money and not getting
           | proportionate value out of it.
        
         | darepublic wrote:
         | So in my interactions with gpt, o3 and o4 mini, I am the
         | organic middle man that copy and pastes code into the repl and
         | reports on the output back to gpt if anything should be the
         | problem. And for me, past a certain point, even if you
         | continually report back problems it doesn't get any better in
         | its new suggestions. It will just spin its wheels. So for that
         | reason I'm a little skeptical about the value of automating
         | this process. Maybe the llms you are using are better than the
         | ones I tried this with?
         | 
         | Specifically I was researching a lesser known kafka-mqtt
         | connector: https://docs.lenses.io/latest/connectors/kafka-
         | connectors/si..., and o1 was hallucinating the configuration
         | needed to support dynamic topics. The docs said one thing, and
         | I even mentioned it to o1 that the docs contradicted with it.
         | But it would stick to its guns. If I mentioned that the code
         | wouldn't compile it would start suggesting very implausible
         | scenarios -- did you spell this correctly? Responses like that
         | indicate you've reached a dead end. I'm curious how/if the
         | "structured LLM interactions" you mention overcome this.
        
           | nikita2206 wrote:
           | You can have agent search the web for documentation and then
           | provide it to the LLM. That is how Context7 is currently very
           | popular in the AI user crowd.
        
             | entropie wrote:
             | I used o4 to generate nixos config files from the pasted
             | modules source files. At first it did outdated config
             | stuff, but with context files it worked very good.
        
             | dingnuts wrote:
             | Kagi Assistant can do this too but I find it's mostly
             | useful because the traditional search function can find the
             | pages the LLM loaded into its context before it started to
             | output bullshit.
             | 
             | It's nice when the LLM outputs bullshit, which is frequent.
        
           | jimbokun wrote:
           | I wonder if LLMs have been seen claiming "THERE'S A BUG IN
           | THE COMPILER!"
           | 
           | A stage every developer goes through early in their
           | development.
        
           | diggan wrote:
           | > And for me, past a certain point, even if you continually
           | report back problems it doesn't get any better in its new
           | suggestions. It will just spin its wheels. So for that reason
           | I'm a little skeptical about the value of automating this
           | process.
           | 
           | It sucks, but the trick is to _always_ restart the
           | conversations /chat with a new message. I never go beyond one
           | reply, and also copy-paste a bunch. Got tired of copy-
           | pasting, wrote something like a prompting manager
           | (https://github.com/victorb/prompta) to make it easier, and
           | not having to neatly format code blocks and so on.
           | 
           | Basically make one message, if they get the reply wrong,
           | iterate on the prompt itself and start fresh, always. Don't
           | try to correct by adding another message, but update initial
           | prompt to make it clearer/steer more.
           | 
           | But I've noticed that every model degrades really quickly
           | past the initial reply, no matter what length of each
           | individual message. The companies seem to continue to
           | increase the theoretical and practical context limits, but
           | the quality degrades a lot faster even within the context
           | limits, and they don't seem to try to address that (nor have
           | a way of measuring it).
        
           | mr_toad wrote:
           | > If I mentioned that the code wouldn't compile it would
           | start suggesting very implausible scenarios
           | 
           | I have to chuckle at that because it reminds me of a typical
           | response on technical forums _long_ before LLMs were
           | invented.
           | 
           | Maybe the LLM has actually learned from those responses and
           | is imitating them.
        
             | -__---____-ZXyw wrote:
             | It seems no discussion of LLMs on HN these days is complete
             | without a commenter wryly observing how that one specific
             | issue someone is pointing to with an LLM _is also_ ,
             | funnily enough, an issue they've seen with humans. The
             | implication always seems to be that this somehow bolsters
             | the idea that LLMs are therefore in some sense and to some
             | degree human-like.
             | 
             | Humans not being infallible superintelligences does not
             | mean that the thing that LLMs are doing is the same thing
             | we do when we think, create, reason, etc. I would like to
             | imagine that most serious people who use LLMs know this,
             | but sometimes it's hard to be sure.
             | 
             | Is there a name for the "humans stupid --> LLMs smart"
             | fallacy?
        
               | owebmaster wrote:
               | > the idea that LLMs are therefore in some sense and to
               | some degree human-like.
               | 
               | This is 100% true, isn't it? It is based on the corpus of
               | humankind knowledge and interaction, it is only expected
               | that it would "repeat" human patterns. It also makes
               | sense that the way to evolve the results we get from it
               | is to mimic human organization, politics, sociology in
               | the a new layer on top of LLMs to surpass current
               | bottlenecks, just like they were used to evolve human
               | societies.
        
         | surgical_fire wrote:
         | This has been my experience with any LLM I use as a code
         | assistant. Currently I mostly use Claude 3.5, although I
         | sometimes use Deepseek or Gemini.
         | 
         | The more prominent and widely used a
         | language/library/framework, and the more "common" what you are
         | attempting, the more accurate LLMs tends to be. The more you
         | deviate from mainstream paths, the more you will hit such
         | problems.
         | 
         | Which is why I find them them most useful to help me build
         | things when I am very familiar with the subject matter, because
         | at that point I can quickly spot misconceptions, errors, bugs,
         | etc.
         | 
         | It's when it hits the sweet spot of being a productivity tool,
         | really improving the speed with which I write code (and
         | sometimes improving the quality of what I write, for sometimes
         | incorporating good practices I was unaware of).
        
           | steveklabnik wrote:
           | > The more prominent and widely used a
           | language/library/framework, and the more "common" what you
           | are attempting, the more accurate LLMs tends to be. The more
           | you deviate from mainstream paths, the more you will hit such
           | problems.
           | 
           | One very interesting variant of this: I've been experimenting
           | with LLMs in a react-router based project. There's an
           | interesting development history where there's another project
           | called Remix, and later versions of react-router effectively
           | ate it, that is, in December of last year, react-router 7 is
           | effectively also Remix v3 https://remix.run/blog/merging-
           | remix-and-react-router
           | 
           | Sometimes, the LLM will be like "oh, I didn't realize you
           | were using remix" and start importing from it, when I in fact
           | want the same imports, but from react-router.
           | 
           | All of this happened so recently, it doesn't surprise me that
           | it's a bit wonky at this, but it's also kind of amusing.
        
           | zoogeny wrote:
           | In addition to choosing languages, patterns and frameworks
           | that the LLM is likely to be well trained in, I also just ask
           | it how it wants to do things.
           | 
           | For example, I don't like ORMs. There are reasons which
           | aren't super important but I tend to prefer SQL directly or a
           | simple query builder pattern. But I did a chain of messages
           | with LLMs asking which would be better for LLM based
           | development. The LLM made a compelling case as to why an ORM
           | with a schema that generated a typed client would be better
           | if I expected LLM coding agents to write a significant amount
           | of the business logic that accessed the DB.
           | 
           | My dislike of ORMs is something I hold lightly. If I was
           | writing 100% of the code myself then I would have breezed
           | past that decision. But with the agentic code assistants as
           | my partners, I can make decisions that make their job easier
           | from their point of view.
        
         | beepbooptheory wrote:
         | How much money do you spend a day working like this?
        
           | steveklabnik wrote:
           | I haven't spent many days or full days, but when I've toyed
           | with this, it ends up at about $10/hour or maybe a bit less.
        
         | zoogeny wrote:
         | For several moments in the article I had to struggle to
         | continue. He is literally saying "as an experienced LLM user I
         | have no experience with the latest tools". He gives a rationale
         | as to why he hasn't used the latest tools which is basically
         | that he doesn't believe they will help and doesn't want to pay
         | the cost to find out.
         | 
         | I think if you are going to claim you have an opinion based on
         | experience you should probably, at the least, experience the
         | thing you are trying to state your opinion on. It's probably
         | not enough to imagine the experience you would have and then go
         | with that.
        
         | satvikpendem wrote:
         | Cursor also can read and store documentation so it's always up
         | to date [0]. Surprised that many people I talk to about Cursor
         | don't know about this, it's one of its biggest strengths
         | compared to other tools.
         | 
         | [0] https://docs.cursor.com/context/@-symbols/@-docs
        
         | janalsncm wrote:
         | There's an argument that library authors should consider
         | implementing those hallucinated functions, not because it'll be
         | easier for LLMs but because the hallucination is a statement
         | about what an average user might expect to be there.
         | 
         | I really dislike libraries that have their own bespoke ways of
         | doing things for no especially good reason. Don't try to be
         | cute. I don't want to remember your specific API, I want an
         | intuitive API so I spend less time looking up syntax and more
         | time solving the actual problem.
        
           | dayvigo wrote:
           | There's also an argument that developers of new software,
           | including libraries, should consider making an earnest
           | attempt to do The Right Thing instead of re-implementing old,
           | flawed designs and APIs for familiarity's sake. We have
           | enough regression to the mean already.
           | 
           | The more LLMs are entrenched and required, the less we're
           | able to do The Right Thing in the future. Time will be
           | frozen, and we'll be stuck with the current mean forever.
           | LLMs are notoriously bad at understanding anything that isn't
           | mappable in some way to pre-existing constructs.
           | 
           | > for no especially good reason
           | 
           | That's a major qualifier.
        
       | lxe wrote:
       | This article reads like "I'm not like other LLM users" tech
       | writing. There are good points about when LLMs are actually
       | useful vs. overhyped, but the contrarian framing undermines what
       | could have been straightforward practical advice. The whole "I'm
       | more discerning than everyone else" positioning gets tiresome in
       | tech discussions, especially when the actual content is useful.
        
         | minimaxir wrote:
         | I was not explicitly intending to be contrarian, but
         | unfortunately the contrarian framing is inevitable when the
         | practical advice is counterintuitive and against modern norms.
         | I was second-guessing publishing this article at all because "I
         | don't use ChatGPT.com" and "I don't see a use for
         | Agents/MCP/Vibe coding" are both statements that are
         | potentially damaging to my career as an engineer, but there's
         | no point in writing if I can't be honest.
         | 
         | Part of the reason I've been blogging about LLMs for so long is
         | that a lot of it is counterintuitive (which I find
         | interesting!) and there's a lot of misinformation and
         | suboptimal workflows that results from it.
        
           | kayodelycaon wrote:
           | Tone and word choice is actually the problem here. :)
           | 
           | One example: "normal-person frontends" immediately makes the
           | statement a judgement about people. You could have said
           | regular, typical, or normal instead of "normal-person".
           | 
           | Saying your coworkers often come to you to fix problems and
           | your solutions almost always work can come off as saying
           | you're more intelligent than your coworkers.
           | 
           | The only context your readers have are the words you write.
           | This makes communication a damned nuisance because nobody
           | knows who you are and they only know about you from what they
           | read.
        
             | minimaxir wrote:
             | That descriptor was more intended to be a self-deprecation
             | joke at my expense but your interpretation is fair.
        
           | lxe wrote:
           | Your defense of the contrarian framing feels like it's
           | missing the point. What you're describing as
           | "counterintuitive" is actually pretty standard for anyone
           | who's been working deeply with LLMs for a while.
           | 
           | Most experienced LLM users already know about temperature
           | controls and API access - that's not some secret knowledge.
           | Many use both the public vanilla frontends and specialized
           | interfaces (various HF workflows, custom setups, sillytavern,
           | oobabooga (rip), ollama, lmstudio, etc) depending on the
           | task.
           | 
           | Your dismissal of LLMs for writing comes across as someone
           | who scratched the surface and gave up. There's an entire
           | ecosystem of techniques for effectively using LLMs to assist
           | writing without replacing it - from ideation to restructuring
           | to getting unstuck on specific sections.
           | 
           | Throughout the article, you seem to dismiss tools and
           | approaches after only minimal exploration. The depth and
           | nuance that would be evident to anyone who's been integrating
           | these tools into their workflow for the past couple years is
           | missing.
           | 
           | Being honest about your experiences is valuable, but framing
           | basic observations as contrarian insights isn't
           | counterintuitive - it's just incomplete.
        
             | dragonwriter wrote:
             | > oobabooga (rip)
             | 
             | Why (rip) here?
        
               | lxe wrote:
               | Ah, I have intuitively confused it with a1111. oobabooga
               | is alive and well :)
        
           | mattgreenrocks wrote:
           | > "I don't use ChatGPT.com" and "I don't see a use for
           | Agents/MCP/Vibe coding" are both statements that are
           | potentially damaging to my career as an engineer
           | 
           | This is unfortunate, though I don't blame you. Tech shouldn't
           | be about blind faith in any particular orthodoxy.
        
       | qoez wrote:
       | I've tried it out a ton but the only thing I end up using it for
       | these days is teaching me new things (which I largely implement
       | myself; it can rarely one-shot it anyway). Or occasionally to
       | make short throwaway scripts to do like file handling or ffmpeg.
        
       | danbrooks wrote:
       | As a data scientist, this mirrors my experience. Prompt
       | engineering is surprisingly important for getting expected output
       | - and use LLM POCs have quick turnaround times.
        
       | Snuggly73 wrote:
       | Emmm... why has Claude 'improved' the code by setting SQLite to
       | be threadsafe and then adding locks on every db operation? (You
       | can argue that maybe the callbacks are invoked from multiple
       | threads, but they are not thread safe themselves).
        
         | dboreham wrote:
         | Interns don't understand concurrency either.
        
           | daxfohl wrote:
           | But if you teach them the right way to do it today and have
           | them fix it, they won't go and do it the wrong way again
           | tomorrow and the next day amd every day for the rest of the
           | summer.
        
             | chowells wrote:
             | With concurrency, they still might get it wrong the rest of
             | the summer. It's a hard topic. But at least they might
             | learn they need to ask for feedback when they're doing that
             | looks similar to stuff that's previously caused problems.
        
       | iambateman wrote:
       | > "feed it the text of my mostly-complete blog post, and ask the
       | LLM to pretend to be a cynical Hacker News commenter and write
       | five distinct comments based on the blog post."
       | 
       | It feels weird to write something positive here...given the
       | context...but this is a great idea. ;)
        
         | jaggederest wrote:
         | This is the kind of task where, before LLMs, I wouldn't have
         | done it. Maybe if it was something really important I'd
         | circulate it to a couple friends to get rough feedback, but
         | mostly it was just let it fly. I think it's pretty
         | revolutionary to be able to get some useful feedback in
         | seconds, with a similar knock-on effect in the pull request
         | review space.
         | 
         | The other thing I find LLMs most useful for is work that is
         | simply unbearably tedious. Literature reviews are the perfect
         | example of this - Sure, I could go read 30-50 journal articles,
         | some of which are relevant, and form an opinion. But my
         | confidence level in letting the AI do it in 90 seconds is
         | reasonable-ish (~60%+) and 60% confidence in 90 seconds is
         | infinitely better than 0% confidence because I just didn't
         | bother.
         | 
         | A lot of the other highly hyped uses for LLMs I personally
         | don't find that compelling - my favorite uses are mostly like a
         | notebook that actually talks back, like the Young Lady's
         | Illustrated Primer from Diamond Age.
        
           | barbazoo wrote:
           | > But my confidence level in letting the AI do it in 90
           | seconds is reasonable-ish (~60%+) and 60% confidence in 90
           | seconds is infinitely better than 0% confidence because I
           | just didn't bother.
           | 
           | So you got the 30 to 50 articles summarized by the LLM, now
           | how do you know what 60% you can trust and what's
           | hallucinated without reading it? It's hard to be usable at
           | all unless you already do know what is real and what is not.
        
             | jaggederest wrote:
             | So, generally how I use it is to get background for further
             | research. So you're right, you do have to do further
             | reading, but it's at the second tier - "now that I know
             | roughly what I'm looking for, I have targets for further
             | reading", rather than "How does any of this work and what
             | are relevant articles"
        
       | simonw wrote:
       | > However, for more complex code questions particularly around
       | less popular libraries which have fewer code examples scraped
       | from Stack Overflow and GitHub, I am more cautious of the LLM's
       | outputs.
       | 
       | That's changed for me in the past couple of months. I've been
       | using the ChatGPT interface to o3 and o4-mini for a bunch of code
       | questions against more recent libraries and finding that they're
       | surprisingly good at using their search tool to look up new
       | details. Best version of that so far:
       | 
       | "This code needs to be upgraded to the new recommended JavaScript
       | library from Google. Figure out what that is and then look up
       | enough documentation to port this code to it."
       | 
       | This actually worked! https://simonwillison.net/2025/Apr/21/ai-
       | assisted-search/#la...
       | 
       | The other trick I've been using a lot is pasting the
       | documentation or even the entire codebase of a new library
       | directly into a long context model as part of my prompt. This
       | works great for any library under about 50,000 tokens total -
       | more than that and you usually have to manually select the most
       | relevant pieces, though Gemini 2.5 Pro can crunch through
       | hundreds of thousands of tokens pretty well with getting
       | distracted.
       | 
       | Here's an example of that from yesterday:
       | https://simonwillison.net/2025/May/5/llm-video-frames/#how-i...
        
         | zoogeny wrote:
         | I think they might have made a change to Cursor recently as
         | well. A few times I've caught it using an old API of popular
         | libraries that have updates. Shout out to all the library
         | developers that are logging deprecations and known incorrect
         | usages, that has been a huge win with LLMs. In most cases I can
         | paste the deprecation warning back into the agent and it will
         | say "Oh, looks like that API changed in vX.Y.Z, we should be
         | doing <other thing>, let me fix that ..."
         | 
         | So it is capable of integrating new API usage, it just isn't a
         | part of the default "memory" of the LLM. Given how quickly JS
         | libraries tend to change (even on the API side) that isn't
         | ideal. And given that the typical JS server project has dozens
         | of libs, including the most recent documentation for each is
         | not really feasible. So for now, I am just looking out for
         | runtime deprecation errors.
         | 
         | But I give the LLM some slack here, because even if I was
         | programming myself using an library I've used in the past, I'm
         | likely to make the same mistake.
        
           | satvikpendem wrote:
           | You can just use @Docs [0] to import the correct
           | documentation for your libraries.
           | 
           | [0] https://docs.cursor.com/context/@-symbols/@-docs
        
       | Beijinger wrote:
       | "To that end, I never use ChatGPT.com or other normal-person
       | frontends for accessing LLMs because they are harder to control.
       | Instead, I typically access the backend UIs provided by each LLM
       | service, which serve as a light wrapper over the API
       | functionality which also makes it easy to port to code if
       | necessary."
       | 
       | How do you do this? Do you have to be on a paid plan for this?
        
         | minimaxir wrote:
         | If you log into the API backend, there is usually a link to the
         | UI. For OpenAI/ChatGPT, it's
         | https://platform.openai.com/playground
         | 
         | This is independent of ChatGPT+. You do need to have a credit
         | card attached but you only pay for your usage.
        
         | diggan wrote:
         | I think they're talking about the Sandbox/Playground/Editor
         | thingy that almost all companies who expose APIs also offer to
         | quickly try out API features. For OpenAI it's https://platform.
         | openai.com/playground/prompts?models=gpt-4...., Anthropic has
         | https://console.anthropic.com/workbench and so on.
        
       | morgengold wrote:
       | ... but when I do, I let it write regex, SQL commands,
       | simple/complex if else stuff, apply tailwind classes, feed it my
       | console log errors, propose frontend designs ... and other little
       | stuff. Saves brain power for the complex problems.
        
       | gcp123 wrote:
       | While I think the title is misleading/clickbaity (no surprise
       | given the buzzfeed connection), I'll say that the substance of
       | the article might be one of the most honest take on LLMs I've
       | seen from someone who actually works in the field. The author
       | describes exactly how I use LLMs - strategically, for specific
       | tasks where they add value, not as a replacement for actual
       | thinking.
       | 
       | What resonated most was the distinction between knowing when to
       | force the square peg through the round hole vs. when precision
       | matters. I've found LLMs incredibly useful for generating regex
       | (who hasn't?) and solving specific coding problems with unusual
       | constraints, but nearly useless for my data visualization work.
       | 
       | The part about using Claude to generate simulated HN criticism of
       | drafts is brilliant - getting perspective without the usual "this
       | is amazing!" LLM nonsense. That's the kind of creative tool use
       | that actually leverages what these models are good at.
       | 
       | I'm skeptical about the author's optimism regarding open-source
       | models though. While Qwen3 and DeepSeek are impressive, the
       | infrastructure costs for running these at scale remain
       | prohibitive for most use cases. The economics still don't work.
       | 
       | What's refreshing is how the author avoids both the "AGI will
       | replace us all" hysteria and the "LLMs are useless toys"
       | dismissiveness. They're just tools - sometimes useful, sometimes
       | not, always imperfect.
        
         | xandrius wrote:
         | Just about the point about the prohibitive infrastructure at
         | scale, why does it need to be at scale?
         | 
         | Over few years, we went from literally impossible to being able
         | to run a 72B model locally on a laptop. Give it 5-10 years and
         | we might not need to have any infrastructure at all, all served
         | locally with switchable (and different sized) open source
         | models.
        
       | geor9e wrote:
       | >Ridiculous headline implying the existance of non-generative
       | LLMs
       | 
       | >Baited into clicking
       | 
       | >Article about generative LLMs
       | 
       | >It's a buzzfeed employee
        
         | minimaxir wrote:
         | The reason I added "generative" to the headline is because of
         | the mention about the important use of text embedding models
         | that are indeed non-generative LLMs and I did not want to start
         | a semantics war if I did not explicitly specify "generative
         | LLMs." (the headline would flow better without it)
        
       | ziml77 wrote:
       | I like that the author included the chat logs. I know there's a
       | lot of times where people can't share them because they'd expose
       | too much info, but I really think it's important when people make
       | big claims about what they've gotten an LLM to do that they back
       | it up.
        
         | minimaxir wrote:
         | That is a relatively new workflow for me since getting the logs
         | out of the Claude UI is a more copy/paste manual process. I'm
         | likely going to work on something to automate it a bit.
        
           | simonw wrote:
           | I use this:                 llm -m claude-3.7-sonnet "prompt"
           | llm logs -c | pbcopy
           | 
           | Then paste into a Gist. Gets me things like this: https://gis
           | t.github.com/simonw/0a5337d1de7f77b36d488fdd7651b...
        
             | fudged71 wrote:
             | Isn't your Observable notebook more applicable to what he's
             | talking about (scraping the Claude UI)?
             | https://x.com/simonw/status/1821649481001267651
        
               | minimaxir wrote:
               | I'm interested in any approach to better organize LLM
               | outputs offline, but it would probably be more pragmatic
               | to use some sort of scripting tool.
        
       ___________________________________________________________________
       (page generated 2025-05-05 23:00 UTC)