[HN Gopher] As an experienced LLM user, I don't use generative L...
___________________________________________________________________
As an experienced LLM user, I don't use generative LLMs often
Author : minimaxir
Score : 229 points
Date : 2025-05-05 17:22 UTC (5 hours ago)
(HTM) web link (minimaxir.com)
(TXT) w3m dump (minimaxir.com)
| rfonseca wrote:
| This was an interesting quote from the blog post: "There is one
| silly technique I discovered to allow a LLM to improve my writing
| without having it do my writing: feed it the text of my mostly-
| complete blog post, and ask the LLM to pretend to be a cynical
| Hacker News commenter and write five distinct comments based on
| the blog post."
| meowzero wrote:
| I do something similar. But I make sure the LLM doesn't know I
| wrote the post. That way the LLM is not sycophantic.
| vunderba wrote:
| I do a good deal of my blog posts while walking my husky and
| just dictating using speech-to-text on my phone. The problem is
| that its an unformed blob of clay and really needs to be shaped
| on the wheel.
|
| I then feed this into an LLM with the following prompt:
| You are a professional editor. You will be provided paragraphs
| of text that may contain spelling errors, grammatical
| issues, continuity errors, structural problems, word
| repetition, etc. You will correct any of these issues while
| still preserving the original writing style. Do not sanitize
| the user. If they use profanities in their text, they
| are used for emphasis and you should not omit them.
| Do NOT try to introduce your own style to their text. Preserve
| their writing style to the absolute best of your
| ability. You are absolutely forbidden from adding new
| sentences.
|
| It's basically Grammarly on steroids and works very well.
| kixiQu wrote:
| What roleplayed feedback providers have people had best and
| worst luck with? I can imagine asking for the personality could
| help the LLM come up with different kinds of criticisms...
| Jerry2 wrote:
| > I typically access the backend UIs provided by each LLM
| service, which serve as a light wrapper over the API
| functionality
|
| Hey Max, do you use a custom wrapper to interface with the API or
| is there some already established client you like to use?
|
| If anyone else has a suggestion please let me know too.
| minimaxir wrote:
| I was developing an open-source library for interfacing with
| LLMs agnostically (https://github.com/minimaxir/simpleaichat)
| and although it still works, I haven't had the time to maintain
| it unfortunately.
|
| Nowadays for writing code to interface with LLMs, I don't use
| client SDKs unless required, instead just hitting HTTP
| endpoints with libraries such as requests and httpx. It's also
| easier to upgrade to async if needed.
| asabla wrote:
| most services has a "studio mode" for their models served.
|
| As an alternative you could always use OpenWebUI
| simonw wrote:
| I'm going to plug my own LLM CLI project here: I use it on a
| daily basis now for coding tasks like this one:
|
| llm -m o4-mini -f github:simonw/llm-hacker-news -s 'write a new
| plugin called llm_video_frames.py which takes video:path-to-
| video.mp4 and creates a temporary directory which it then
| populates with one frame per second of that video using ffmpeg
| - then it returns a list of [llm.Attachment(path="path-to-
| frame1.jpg"), ...] - it should also support passing
| video:video.mp4?fps=2 to increase to two frames per second, and
| if you pass ?timestamps=1 or ×tamps=1 then it should add a
| text timestamp to the bottom right conner of each image with
| the mm:ss timestamp of that frame (or hh:mm:ss if more than one
| hour in) and the filename of the video without the path as
| well.' -o reasoning_effort high
|
| Any time I use it like that the prompt and response are logged
| to a local SQLite database.
|
| More on that example here:
| https://simonwillison.net/2025/May/5/llm-video-frames/#how-i...
| danenania wrote:
| I built an open source CLI coding agent for this purpose[1]. It
| combines Claude/Gemini/OpenAI models in a single agent, using
| the best/most cost effective model for different steps in the
| workflow and different context sizes. You might find it
| interesting.
|
| It uses OpenRouter for the API layer to simplify use of APIs
| from multiple providers, though I'm also working on direct
| integration of model provider API keys--should release it this
| week.
|
| 1 - https://github.com/plandex-ai/plandex
| andy99 wrote:
| Re vibe coding, I agree with your comments but where I've used it
| is when I needed to mock up a UI or a website. I have no front
| end experience so making a 80% (probably 20%) but live demo is
| still a valuable thing, to show to others to get the point
| across, obviously not to deploy. It's a replacement for drawing a
| picture of what I think the UI should look like. I feel like this
| is an under-appreciated use. LLM coding is not remotely ready for
| real products but it's great for mock-ups that further internal
| discussions.
| vunderba wrote:
| Same. As somebody who doesn't really enjoy frontend work _at
| all_ , they are surprisingly good at being able to spit out
| something that is relatively visually appealing - even if I'll
| end up rewriting the vast majority of react spaghetti code in
| Svelte.
| NetOpWibby wrote:
| In the settings for Claude, I tell it to use Svelte and
| TypeScript whenever possible because I got tired of telling
| it I don't use React.
| leptons wrote:
| I love front-end work, and I'm really good at it, but I now
| let the AI do CSS coding for me. It seems to make nice
| looking buttons and other design choices that are good enough
| for development. My designer has their own opinions, so they
| always will change the CSS when they get their hands on it,
| but at least I'm not wasting my time creating really ugly
| styles that always get replaced anymore. The rest of the
| coding is better if I do it, but sometimes the AI surprises
| me - though most often it gets it completely wrong, and then
| I'm wasting time letting it try and that just feels
| counterproductive. It's like a really stupid intern that
| almost never pays attention to what the goal is.
| mattmanser wrote:
| They're pretty good at following direction. For example you
| can say:
|
| 'Usw React, typescript, materialUi, prefer functions over
| const, don't use unnecessary semi colons, 4 spaces for tabs,
| build me a UI that looks like this sketch'
|
| And it'll do all that.
| r0fl wrote:
| If I need a quick mockup I'll do it in react
|
| But if I have time I'll ask for it to be built using best
| practices from the programming language it'll be built in as
| the final product whether that's svelte, static astro, or
| even php
| 65 wrote:
| I think it would be faster/easier to use a website builder or
| various templating libraries to build a quick UI rather than
| having to babysit an LLM with prompts over and over again.
| Oras wrote:
| JSON response doesn't always work as expected unless you have few
| items to return. In Max's example it's classification.
|
| For anyone trying to return consistent json, checkout structured
| data where you define a json schema with required field and that
| would return the same structure all the time.
|
| I have tested it with high success using GPT-4o-mini.
| behnamoh wrote:
| > I never use ChatGPT.com or other normal-person frontends for
| accessing LLMs because they are harder to control. Instead, I
| typically access the backend UIs provided by each LLM service,
| which serve as a light wrapper over the API functionality which
| also makes it easy to port to code if necessary.
|
| Yes, I also often use the "studio" of each LLM for better results
| because in my experience OpenAI "nerfs" models on the ChatGPT UI
| (models keep forgetting things--probably a limited context length
| set by OpenAI to reduce costs, generally the model is less chatty
| (again, probably to reduce their costs), etc. But I've noticed
| Gemini 2.5 Pro is the same on the studio and the Gemini app.
|
| > Any modern LLM interface that does not let you explicitly set a
| system prompt is most likely using their own system prompt which
| you can't control: for example, when ChatGPT.com had an issue
| where...
|
| ChatGPT does have system prompts but Claude doesn't (one of its
| many, many UI shortcomings which Anthropic never addressed).
|
| That said, I've found system prompts less and less useful with
| newer models. I can simply preface my own prompt with the
| instructions and the model follows them very well.
|
| > Specifying specific constraints for the generated text such as
| "keep it to no more than 30 words" or "never use the word
| 'delve'" tends to be more effective in the system prompt than
| putting them in the user prompt as you would with ChatGPT.com.
|
| I get that LLMs have a vague idea of how many words are 30 words,
| but they never do a good job in these tasks for me.
| ttoinou wrote:
| Side topic : I didn't see a serious article about prompt
| engineering for senior software development pop up on HN. Yet a
| lot of users here have their own techniques unshared with others
| mtlynch wrote:
| I was just listening to Simon Willison on Software
| Misadventures,[0, 1] and he said the best resource he knows of
| is Anthropic's guide to prompt engineering.[2]
|
| [0] https://softwaremisadventures.com/p/simon-willison-llm-
| weird...
|
| [1] https://youtu.be/6U_Zk_PZ6Kg?feature=shared&t=56m29 (the
| exact moment)
|
| [2] https://docs.anthropic.com/en/docs/build-with-
| claude/prompt-...
| minimaxir wrote:
| An unfortunate casualty of the importantance of prompt
| engineering is that there is less of an incentive to share good
| prompts publicly.
| danenania wrote:
| You can read all of the prompts in Plandex (open source CLI
| coding agent focusing on larger projects):
| https://github.com/plandex-
| ai/plandex/tree/main/app/server/m...
|
| They're pretty long and extensive, and honestly could use
| some cleaning up and refactoring at this point, but they are
| being used heavily in production and work quite well, which
| took a fairly extreme amount of trial-and-error to achieve.
| simonw wrote:
| It's maddening to me how little good writing there is out there
| on effective prompting.
|
| Here's an example: what's the best prompt to use to summarize
| an article?
|
| That feels like such an obvious thing, and yet I haven't even
| seen _that_ being well explored.
|
| It's actually a surprisingly deep topic. I like using tricks
| like "directly quote the sentences that best illustrate the
| overall themes" and "identify the most surprising ideas", but
| I'd love to see a thorough breakdown of all the tricks I
| haven't seen yet.
| jimbokun wrote:
| This comment has some advice from Simon Willison on this
| topic:
|
| https://news.ycombinator.com/item?id=43897666
|
| Maybe you should ask him. lol
| Gracana wrote:
| This whitepaper is the best I've found so far, at least for
| covering general prompting techniques.
|
| https://www.kaggle.com/whitepaper-prompt-engineering
| jefflinwood wrote:
| It seems a little counterintuitive, but you can ask an LLM to
| improve a prompt. They are quite good at it.
| Legend2440 wrote:
| >Discourse about LLMs and their role in society has become
| bifuricated enough such that making the extremely neutral
| statement that LLMs have some uses is enough to justify a barrage
| of harrassment.
|
| Honestly true and I'm sick of it.
|
| A very vocal group of people are convinced AI is a scheme by the
| evil capitalists to make you train your own replacement. The
| discussion gets very emotional very quickly because they feel
| personally threatened by the possibility that AI is actually
| useful.
| bluefirebrand wrote:
| > A very vocal group of people are convinced AI is a scheme by
| the evil capitalists to make you train your own replacement.
| The discussion gets very emotional very quickly because they
| feel personally threatened by the possibility that AI is
| actually useful
|
| I read this like you are framing this as though it is
| irrational. However, history is littered with examples of
| capitalists replacing labour with automation and using any
| productivity gains of new technology to push salaries lower
|
| Of course people who see this playing out _again_ are
| personally threatened. If you aren 't feeling personally
| threatened, you are either part of the wealthy class or for
| some reason you think this time will be different somehow
|
| You may be thinking "Even if I lose my job to automation, there
| will be other work to do like piloting the LLMs", but you
| should know that the goal is to eventually pay LLM operators
| peanuts in comparison to what you currently make in whatever
| role you do
| olddustytrail wrote:
| What happens when you use an LLM to generate prompts for an
| LLM?
| Legend2440 wrote:
| >history is littered with examples of capitalists replacing
| labour with automation and using any productivity gains of
| new technology to push salaries lower
|
| Nonsense. We make far, far more than people did in the past
| entirely because of the productivity gains from automation.
|
| The industrial revolution led to the biggest increase in
| quality of life in history, not in spite of but _because_ it
| automated 90% of jobs. Without it we 'd all still be
| subsistence farmers.
| bluefirebrand wrote:
| > Nonsense. We make far, far more than people did in the
| past entirely because of the productivity gains from
| automation.
|
| "We" were never subsistence farmers, our ancestors were
|
| I'm talking about real changes that have happened in our
| actual lifetimes.
|
| In our actual lifetimes we have watched wages stagnate for
| decades, pur purchasing power is dramatically lower than
| our parents. In order to afford even remotely close to the
| same standard of living as our parents had, we have to go
| into much larger amounts of debt
|
| We have watched jobs move overseas as automation lowered
| the skill requirements so that anyone could perform them,
| so we sought the cheapest possible labour to do them
|
| We have watched wealthy countries shift from higher paying
| production economies into lower paying service economies
|
| We have watched the wealth gap widen as the rich get richer
| the poor get poorer and the middle class shrinks. Some of
| the shrinking middle class moved up, but most moved down
|
| The fact is that automation is disruptive, which is great
| for markets but bad for people who are relying on
| consistency. Which is most people
| astrange wrote:
| > In our actual lifetimes we have watched wages stagnate
| for decades, pur purchasing power is dramatically lower
| than our parents.
|
| This graph is going up.
|
| https://fred.stlouisfed.org/series/MEPAINUSA672N
|
| The reason you believe otherwise is that people on social
| media think they're only allowed to say untrue negative
| things about the economy, because if they ever say
| anything positive it'd be disrespectful to poor people.
|
| > We have watched wealthy countries shift from higher
| paying production economies into lower paying service
| economies
|
| Service work is higher paying, which is why factory
| workers try to get their children educated enough to do
| it.
|
| > We have watched the wealth gap widen as the rich get
| richer the poor get poorer and the middle class shrinks.
| Some of the shrinking middle class moved up, but most
| moved down
|
| They mostly moved up. But notice this graph is positive
| for every group.
|
| https://realtimeinequality.org/?id=wealth&wealthend=03012
| 023...
| bluefirebrand wrote:
| > Service work is higher paying, which is why factory
| workers try to get their children educated enough to do
| it
|
| You think service workers at a fast food restaurant or
| working the till at Walmart are higher paid than Factory
| workers?
|
| > They mostly moved up. But notice this graph is positive
| for every group.
|
| The reason it is positive for (almost) every group is
| because it isn't measuring anything meaningful
|
| Salaries may have nominally gone up but this is clearly
| not weighing the cost of living into the equation
| astrange wrote:
| > You think service workers at a fast food restaurant or
| working the till at Walmart are higher paid than Factory
| workers?
|
| Fast food workers aren't service-economy workers, they're
| making burgers back there.
|
| More importantly, factory work destroys your body and
| email jobs don't, so whether or not it's high earning at
| the start... it isn't forever.
|
| > Salaries may have nominally gone up but this is clearly
| not weighing the cost of living into the equation
|
| That's not a salary chart. The income chart I did post is
| adjusted for cost of living (that's what "real" means).
|
| Also see https://www.epi.org/blog/wage-growth-
| since-1979-has-not-been...
| bluefirebrand wrote:
| > Fast food workers aren't service-economy workers,
| they're making burgers back there.
|
| What exactly do you consider the "service industry" if
| you don't consider food services (like restaurants and
| fast food) as part of it?
|
| I suspect we have very different ideas of what "service
| industry" means
|
| > Also see https://www.epi.org/blog/wage-growth-
| since-1979-has-not-been..
|
| Did you even read the whole thing?
|
| ```Dropping "wage stagnation" as a descriptive term for
| the full post-1979 period doesn't mean we think the wage
| problem for American workers has been solved. Wage growth
| in the post-1979 period has been slow and unequal,
| largely as a result of intentional policy decisions. This
| policy-induced wage suppression has stifled growth in
| living standards and generated inequality. The last five
| years saw rapid and welcome progress reversing some of
| these trends--but it will take a long time to heal the
| previous damage, even if the post-2019 momentum can be
| sustained, which looks very unlikely at the moment.```
|
| Admitting there is a problem but saying "it isn't
| stagnation" is just splitting hairs.
|
| "Wage growth has been slow and unequal, but it isn't
| stagnation!"
|
| "policy-induced wage suppression has stifled growth in
| living standards and generated inequality" there was wage
| suppression but it's not stagnation!
|
| What a stupid article.
| skybrian wrote:
| Setting up automation as the enemy is an odd thing for
| programmers to be doing. I mean, if you're a programmer and
| you're not automating away tasks, both for yourself and other
| people, what are you even doing?
|
| Also, "this time it's different" depends on the framing. A
| cynical programmer who has seen new programming tools hyped
| too many times would make a different argument: At the
| beginning of the dot-com era, you could get a job writing
| HTML pages. That's been automated away, so you need more
| skill now. It hasn't resulting in fewer software-engineering
| jobs so far.
|
| But that's not entirely convincing either. Predicting the
| future is difficult. Sometimes the future _is_ different.
| Making someone else's scenario sound foolish won't actually
| rule anything out.
| bluefirebrand wrote:
| > Setting up automation as the enemy is an odd thing for
| programmers to be doing. I mean, if you're a programmer and
| you're not automating away tasks, both for yourself and
| other people, what are you even doing?
|
| I'm earning a salary so I can have a nice life
|
| Anything that threatens my ability to earn a salary so I
| can have a nice life is absolutely my fucking enemy
|
| I have no interest in automating myself out of existence,
| and I am deeply resentful of the fact that so many people
| are gleefully trying to do so.
| olddustytrail wrote:
| Should they not feel threatened? I'm somewhat sympathetic to
| the view that even the current state of the art is threatening
| to people's livelihood.
|
| And of course it will only become more powerful. It's a
| dangerous game.
| n_ary wrote:
| > people are convinced AI is a scheme by the evil capitalists
| to make you train your own replacement. The discussion gets
| very emotional very quickly because they feel personally
| threatened by the possibility that AI is actually useful.
|
| Are not mutually exclusive. LLMs will train people's
| replacements while same people pay for the privilege of
| training those replacement. LLMs also allows me to auto-
| complete a huge volume of boilerplate, which would take me
| several hours. It also helps people step out of writer's block,
| generate a first draft of prototype/mvp/poc etc quickly without
| wasting long hours bike shedding. It also helps my previously
| super confident cousin, who blamed me for killing his dream of
| next AirBnB for dogs, Uber for groceries, instagram for cats
| not materializing due to me being selfish hoarding my
| privileges and knowledge, to finally create those ideas and
| kill his own dreams and definitely ignoring/avoiding me these
| days.
|
| LLMs are same as knives, crimes will happen, but also necessary
| in the kitchen and industries.
| tptacek wrote:
| There's a thru-line to commentary from experienced programmers on
| working with LLMs, and it's confusing to me:
|
| _Although pandas is the standard for manipulating tabular data
| in Python and has been around since 2008, I've been using the
| relatively new polars library exclusively, and I've noticed that
| LLMs tend to hallucinate polars functions as if they were pandas
| functions which requires documentation deep dives to confirm
| which became annoying._
|
| The post does later touch on coding agents (Max doesn't use them
| because "they're distracting", which, as a person who can't even
| stand autocomplete, is a position I'm sympathetic to), but still:
| _coding agents solve the core problem he just described_. "Raw"
| LLMs set loose on coding tasks throwing code onto a blank page
| hallucinate stuff. But agenty LLM configurations aren't just the
| LLM; they're also code that structures the LLM interactions. When
| the LLM behind a coding agent hallucinates a function, the
| program doesn't compile, the agent notices it, and the LLM
| iterates. You don't even notice it's happening unless you're
| watching very carefully.
| vunderba wrote:
| That sort of "REPL" system is why I really liked when they
| integrated a Python VM into ChatGPT - it wasn't perfect, but it
| could at least catch itself when the code didn't execute
| properly.
| tptacek wrote:
| Sure. But it's 2025 and however you want to get this feature,
| be it as something integrated into VSCode (Cursor, Windsurf,
| Copilot), or a command line Python thing (aider), or a
| command line Node thing (OpenAI codex and Claude Code), with
| a specific frontier coding model or with an abstracted multi-
| model thingy, even as an Emacs library, it's available now.
|
| I see people getting LLMs to generate code in isolation and
| like pasting it into a text editor and trying it, and then
| getting frustrated, and it's like, that's not how you're
| supposed to be doing it anymore. That's 2024 praxis.
| arthurcolle wrote:
| I like using Jupyter Console as a primary interpreter, and
| then dropping into SQLite/duckdb to save data
|
| Easy to to script/autogenerate code and build out pipelines
| this way
| red_hare wrote:
| It is a little crazy how fast this has changed in the past
| year. I got VSCode's agent mode to write, run, and read the
| output of unit tests the other day and boy it's a game
| changer.
| gnatolf wrote:
| The churn of staying on top of this means to me that we'll
| also chew through experts of specific times much faster.
| Gone are the day of established, trusted top performers, as
| every other week somebody creates a newer, better way of
| doing things. Everybody is going to drop off the hot tech
| at some point. Very exhausting.
| Philpax wrote:
| The answer is simple: have your AI stay on top of this
| for you.
|
| Mostly joke, but also not joke: https://news.smol.ai/
| stefan_ wrote:
| Have you tried it? In my experience they just go off on a
| hallucination loop, or blow up the code base with terrible re-
| implementations.
|
| Similarly Claude 3.5 was stuck on TensorRT 8, and not even
| pointing it at the documentation for the updated 10 APIs for
| RAG could ever get it to correctly use the new APIs (not that
| they were very complex; bind tensors, execute, retrieve
| results). The whole concept of the self-reinforcing Agent loop
| is more of a fantasy. I think someone else likened it to a
| lawnmower that will run rampage over your flower bed at the
| first hiccup.
| tptacek wrote:
| Yes, they're part of my daily toolset. And yes, they can spin
| out. I just hit the "reject" button when they do, and revise
| my prompt. Or, sometimes, I just take over and fill in some
| of the structure of the problem I'm trying to solve myself.
|
| I don't know about "self-reinforcing". I'm just saying:
| coding agents compile and lint the code they're running, and
| when they hallucinate interfaces, they notice. The same way
| any developer who has ever used ChatGPT knows that you can
| paste most errors into the web page and it will often (maybe
| even usually) come up with an apposite fix. I don't
| understand how anybody expects to convince LLM users this
| doesn't work; it obviously does work.
| steveklabnik wrote:
| > I don't understand how anybody expects to convince LLM
| users this doesn't work; it obviously does work.
|
| This is really one of the hugest divides I've seen in the
| discourse about this: anti-LLM people saying very obviously
| untrue things, which is uh, kind of hilarious in a meta
| way.
|
| https://bsky.app/profile/caseynewton.bsky.social/post/3lo4t
| d... is an instance of this from a few days ago.
|
| I am still trying to sort out why experiences are so
| divergent. I've had much more positive LLM experiences
| while coding than many other people seem to, even as
| someone who's deeply skeptical of what's being promised
| about them. I don't know how to reconcile the two.
| zoogeny wrote:
| I think it is pretty simple: people tried it a few times
| a few months ago in a limited setting, formed an opinion
| based on those limited experiences and cannot imagine a
| world where they are wrong.
|
| That might sound snarky, but it probably works out for
| people in 99% of cases. AI and LLMs are advancing at a
| pace that is so different from any other technology that
| people aren't yet trained to re-evaluate their
| assumptions at the high rate necessary to form accurate
| new opinions. There are too many tools coming (and going,
| to be fair).
|
| HN (and certain parts of other social media) is a bubble
| of early adopters. We're on the front lines seeing the
| war in realtime and shaking our heads at what's being
| reported in the papers back home.
| steveklabnik wrote:
| Yeah, I try to stay away from reaching for these sorts of
| explanations, because it feels uncharitable. I saw a lot
| of very smart people repost the quoted post! They're not
| the kind who "cannot imagine a world where they are
| wrong."
|
| But at the same time, the pace of advancement is very
| fast, and so not having recently re-evaluated things is
| significantly more likely while also being more
| charitable, I think.
| zoogeny wrote:
| My language is inflammatory for certain, but I believe it
| is true. I don't think most minds are capable of having
| to reevaluate their opinions as quickly as AI is
| demanding. There is some evidence that stress is strongly
| correlated to uncertainty. AI is complicated, the tools
| are complicated, the trade-off are complicated. So that
| leaves a few options: live in uncertainty/stress, expend
| the energy to reevaluate or choose to believe in
| certainty based on past experience.
|
| If someone is embracing uncertainty or expending the
| time/energy/money to reevaluate then they don't post such
| confidently wrong ideas on social media.
| diggan wrote:
| > I am still trying to sort out why experiences are so
| divergent. I've had much more positive LLM experiences
| while coding than many other people seem to, even as
| someone who's deeply skeptical of what's being promised
| about them. I don't know how to reconcile the two.
|
| As with many topics, I feel like you can divide people in
| a couple of groups. You have people who try it, have
| their mind blown by it, so they over-hype it. Then the
| polar-opposite, people who are overly dismissive and
| cement themselves into a really defensive position. Both
| groups are relatively annoying, inaccurate, and too
| extremist. Then another group of people might try it out,
| find some value, integrate it somewhat and maybe got a
| little productivity-boost and moves on with their day.
| Then a bunch of other groupings in-between.
|
| Problem is that the people in the middle tend to not make
| a lot of noise about it, and the extremists (on both
| ends) tend to be _very_ vocal about their preference, in
| their ways. So you end up perceiving something as very
| polarizing. There are many accurate and true drawbacks
| with LLMs as well, but it also ends up poisoning the
| entire concept /conversation/ecosystem for some people,
| and they tend to be noisy as well.
|
| Then the whole experience depends a lot on your setup,
| how you use it, what you expect, what you've learned and
| so many much more, and some folks are very quick to judge
| a whole ecosystem without giving parts of it an honest
| try. It took me a long time to try Aider, Cursor and
| others, and even now after I've tried them out, I feel
| like there are probably better ways to use this new
| category of tooling we have available.
|
| In the end I think reality is a bit less black/white for
| most folks, common sentiment I see and hear is that LLMs
| are probably not hellfire ending humanity nor is it
| digital-Jesus coming to save us all.
| steveklabnik wrote:
| > I feel like you can divide people in a couple of
| groups.
|
| This is probably a big chunk of it. I was pretty anti-LLM
| until recently, when I joked that I wanted to become an
| informed hater, so I spent some more time with things.
| It's put me significantly more in the middle than either
| extremely pro or extremely anti. It's also hard to talk
| about anything that's not purely anti in the spaces I
| seemingly run in, so that also contributes to my relative
| quiet about it. I'm sure others are in a similar boat.
|
| > for most folks, common sentiment I see and hear is that
| LLMs are probably not hellfire ending humanity nor is it
| digital-Jesus coming to save us all.
|
| Especially around non-programmers, this is the vibe I get
| as well. They also tend to see the inaccuracies as much
| less significant than programmers seem to, that is, they
| assume they're checking the output already, or see it as
| a starting point, or that humans also make mistakes, and
| so don't get so immediately "this is useless" about it.
| aerhardt wrote:
| > an instance of this from a few days ago.
|
| Bro I've been using LLMs for search since before it even
| had search capabilities...
|
| "LLMs not being for search" has been an argument from the
| naysayers for a while now, but very often when I use an
| LLM I am looking for the answer to something - if that
| isn't [information] search, then what is?
|
| Whether they hallucinate or outright bullshit sometimes
| is immaterial. For many information retrieval tasks they
| are infinitely better than Google and have been since
| GPT3.
| steveklabnik wrote:
| I think this is related, but I'm more interested in the
| factual aspects than the subjective ones. That is, I
| don't disagree there's also arguments over "are LLMs good
| for the same things search engines for" but it's more of
| the more objective "they do not search the web" part. We
| need to have agreement on the objective aspects before we
| can have meaningful discussion of the subjective, in my
| mind.
| magicalist wrote:
| > _This is really one of the hugest divides I 've seen in
| the discourse about this: anti-LLM people saying very
| obviously untrue things, which is uh, kind of hilarious
| in a meta way._
|
| > _https://bsky.app/profile/caseynewton.bsky.social/post/
| 3lo4td... is an instance of this from a few days ago._
|
| Not sure why this is so surprising? ChatGPT search was
| only released in November last year, was a different
| mode, and it sucked. Search in o3 and o4-mini came out
| like three weeks ago. Otherwise you were using completely
| different products from Perplexity or Kagi, which aren't
| widespread yet.
|
| Casey Newton even half acknowledges that timing ("But it
| has had integrated web search since last year"...even
| while in the next comment criticising criticisms using
| the things "you half-remember from when ChatGPT launched
| in _2022_ ").
|
| If you give the original poster the benefit of the doubt,
| you can sort of see what they're saying, too. An LLM, on
| its own, is not a search engine and can not scan the web
| for information. The information encoded in them might be
| ok, but is not complete, and does not encompass the full
| body of the published human thought it was trained on.
| Trusting an offline LLM with an informational search is
| sometimes a really bad idea ("who are all the presidents
| that did X").
|
| The fact that they're incorrect when they say that LLM's
| _can 't_ trigger search doesn't seem that "hilarious" to
| me, at least. The OP post maybe should have been less
| strident, but it also seems like a really bad idea to
| gatekeep anybody wanting to weigh in on something if
| their knowledge of product roadmaps is more than six
| months out of date (which I guarantee is all of us for at
| least some subject we are invested in).
| steveklabnik wrote:
| > ChatGPT search was only released in November last year
|
| It is entirely possible that I simply got involved at a
| particular moment that was crazy lucky: it's only been a
| couple of weeks. I don't closely keep up with when things
| are released, I had just asked ChatGPT something where it
| did a web search, and then immediately read a "it cannot
| do search" claim right after.
|
| > An LLM, on its own, is not a search engine and can not
| scan the web for information.
|
| In a narrow sense, this is true, but that's not the
| claim: the claim is "You cannot use it as a search
| engine, or as a substitute for searching." That is pretty
| demonstrably incorrect, given that many people use it as
| such.
|
| > Trusting an offline LLM with an informational search is
| sometimes a really bad idea ("who are all the presidents
| that did X").
|
| I fully agree with this, but it's also the case with
| search engines. They also do not always "encompass the
| fully body of the published human thought" either, or
| always provide answers that are comprehensible.
|
| I recently was looking for examples of accomplishing
| things with a certain software architecture. I did a
| bunch of searches, which led me to a bunch of
| StackOverflow and blog posts. Virtually all of those
| posts gave vague examples which did not really answer my
| question with anything other than platitudes. I decided
| to ask ChatGPT about it instead. It was able to not only
| answer my question in depth, but provide specific
| examples, tailored to my questions, which the previous
| hours of reading search results had not afforded me. I
| was further able to interrogate it about various
| tradeoffs. It was legitimately more useful than a search
| engine.
|
| Of course, sometimes it is not that good, and a web
| search wins. That's fine too. But suggesting that it's
| never useful for a task is just contrary to my actual
| experience.
|
| > The fact that they're incorrect when they say that
| LLM's can't trigger search doesn't seem that "hilarious"
| to me, at least.
|
| It's not _them_ , it's the overall state of the
| discourse. I find it ironic that the fallibility of LLMs
| is used to suggest they're worthless compared to a human,
| when humans are also fallible. OP did not directly say
| this, but others often do, and it's the combination
| that's amusing to me.
|
| It's also frustrating to me, because it feels impossible
| to have reasonable discussions about this topic. It's
| full of enthusiastic cheerleaders that misrepresent what
| these things can do, _and_ enthusiastic haters that
| misrepresent what these things can do. My own feelings
| are all over the map here, but it feels impossible to
| have reasonable discussions about it due to the
| polarization, and I find that frustrating.
| AlexCoventry wrote:
| If you've only been using AI for a couple of weeks,
| that's quite likely a factor. AI services have been
| improving incredibly quickly, and many people have a bad
| impression of the whole field from a time when it was
| super promising, but basically unusable. I was pretty
| dismissive until a couple of months ago, myself.
|
| I think the other reason people are hostile to the field
| is that they're scared it's going to make them
| economically redundant, because a tsunami of cheap,
| skilled labor is now towering over us. It's loss-aversion
| bias, basically. Many people are more focused on that
| risk than on the amazing things we're able to _do_ with
| all that labor.
| stefan_ wrote:
| > anti-LLM people saying very obviously untrue things,
| which is uh, kind of hilarious in a meta way.
|
| tptacek shifted the goal posts from "correct a
| hallucination" to "solve a copy pasted error" (very
| different things!) and just a comment later theres
| someone assassinating me as an "anti-LLM person" saying
| "very obviously untrue things", "kind of hilarious". And
| you call yourself "charitable". It's a joke.
| steveklabnik wrote:
| EDIT: wait, I think you're tptacek's parent. I was not
| talking about your post, I was talking about the post I
| linked to. I'm leaving my reply here but there's some
| serious confusion going on.
|
| > theres someone assassinating me as an "anti-LLM person"
|
| Is this not true? That's the vibe the comment gives off.
| I'm happy to not say that in the future if that's not
| correct, and if so, additionally, I apologize.
|
| I myself was pretty anti-LLM until the last month or so.
| My opinions have shifted recently, and I've been trying
| to sort through my feelings about it. I'm not entirely
| enthusiastically pro, and have some pretty big
| reservations myself, but I'm more in the middle than
| where I was previously, which was firmly anti.
|
| > "very obviously untrue things"
|
| At the time I saw the post, I had just tabbed away from a
| ChatGPT session where it had relied on searching the web
| for some info, so the contrast was _very_ stark.
|
| > "kind of hilarious"
|
| I do think that when people say that LLMs occasionally
| hallucinate things, and are therefore worthless, when
| others make false claims about them for the purpose of
| suggesting we shouldn't use them, that it is kind of
| funny. You didn't directly say this in your post, only
| handwaved towards it, but I'm talking about the discourse
| in general, not you specifically.
|
| > And you call yourself "charitable"
|
| I am trying to be charitable. A lot of people reached for
| some variant of "this person is stupid," and I do not
| think that's the case, or the good way to understand what
| people mean when they say things. A mistake is a mistake.
| I am actively not trying to simply dismiss arguments on
| either side of here, but take them seriously.
| daxfohl wrote:
| So is the real engineering work in the agents rather than
| in the LLM itself then? Or do they have to be paired
| together correctly? How do you go about choosing an
| LLM/agent pair efficiently?
| steveklabnik wrote:
| > How do you go about choosing an LLM/agent pair
| efficiently?
|
| I googled "how do I use ai with VS: Code" and it pointed
| me at Cline. I've then swapped between their various
| backends, and just played around with it. I'm still far
| too new to this to have strong options about LLM/agent
| pairs, or even largely between which LLMs, other than
| "the free ChatGPT agent was far worse than the $20/month
| one at the task I threw it at." As in, choosing worse
| algorithms that are less idiomatic for the exact same
| task.
| daxfohl wrote:
| I also wonder how hard it would be to create your own
| agent that remembers your preferences and other stuff
| that you can make sure stays in the LLM context.
|
| ...Maybe a good first LLM assisted project.
| dcre wrote:
| No need to write your own whole thing (though it is a
| good exercise) -- the existing tools all support ways of
| customizing the prompting with preferences and
| conventions, whether globally or per-project.
| dcre wrote:
| The Aider leaderboards are quite helpful, performance
| there seems to match people's subjective experience
| pretty well.
|
| https://aider.chat/docs/leaderboards/
|
| Regarding choosing a tool, they're pretty lightweight to
| try and they're all converging in structure and
| capabilities anyway.
| foobarqux wrote:
| These are mostly value judgments and people are using
| words that mean different things to different people but
| I would point that LLM boosters have been saying the same
| thing for each product release: " _now_ it works, you are
| just using the last-gen model /technique which doesn't
| really work (even though I said the same thing for that
| model/technique and every one before that). Moreover
| there still hasn't been significant, objectively
| observable impact: no explosion in products, no massive
| acceleration of feature releases, no major layoffs
| attributed to AI (to which the response every time is
| that it was just released and you will see the effects in
| a few months).
|
| Finally, if it really were really true that some people
| know the special sauce of how to use LLMs to make a
| massive difference in productivity but many people didn't
| know how to do that then you could make millions or tens
| of millions per year as a consultant training everyone at
| big companies. In other words if you really believed what
| you were saying you should pick up the money on the
| ground.
| steveklabnik wrote:
| > using words that mean different things to different
| people
|
| This might be a good explanation for the disconnect!
|
| > I would point that LLM boosters have been saying the
| same thing
|
| I certainly 100% agree that lots of LLM boosters are way
| over-selling what they can accomplish as well.
|
| > In other words if you really believed what you were
| saying you should pick up the money on the ground.
|
| I mean, I'm doing that in the sense that I am using them.
| I also am not saying that I "know the special sauce of
| how to use LLMs to make a massive difference in
| productivity," but what I will say is, my productivity is
| genuinely higher with LLM assistance than without. I
| don't necessarily believe that means it's replicable, one
| of the things I'm curious about it "is it something
| special about my setup or what I'm doing or the
| technologies I'm using or anything else that makes me
| have a good time with this stuff when other smart people
| seem to only have a bad time?" Because I don't think that
| the detractors are just lying. But there is a clear
| disconnect, and I don't know why.
| foobarqux wrote:
| There is so much tacit understanding by both LLM-boosters
| and LLM-skeptics that only becomes apparent when you look
| at the explicit details of how they are trying to use the
| tools. That's why I've asked in the past for examples of
| recording of real-time development that would capture all
| the nuance explicitly. Cherry-picked chat logs are second
| best but even then I haven't been particularly impressed
| by the few examples I've seen.
|
| > I mean, I'm doing that in the sense that I am using
| them.
|
| My point is whatever you are doing is worth millions of
| dollars less than teaching the non-believers how to do it
| if you could figure out how (actually probably even if
| you couldn't but sold snake-oil).
| dboreham wrote:
| Someone gave me the tip to add "all source files should build
| without error", which you'd think would be implicit, but it
| seems not.
| tptacek wrote:
| There's definitely a skill to using them well (I am not yet
| expert); my only frustration is with people who (like me)
| haven't refined the skill but have also concluded that
| there's no benefit to the tool. No, really, in this case,
| you're mostly just not holding it right.
|
| The tools will get better, but what I see happening with
| people who are good at using them (and from my own code,
| even in my degraded LLM usage), we have an existence proof
| of the value of the tools.
| giovannibonetti wrote:
| > I think someone else likened it to a lawnmower that will
| run rampage over your flower bed at the first hiccup
|
| This reminds me of a scene from the recent animation movie
| "Wallace and Gromit: Vengeance Most Fowl" where Wallace
| actually uses a robot (Norbot) to do gardening tasks, and
| rampages over Gromit's flower bed.
|
| https://youtu.be/_Ha3fyDIXnc
| mountainriver wrote:
| Have you tried it? More than once?
|
| I'm getting massive productivity gains with Cursor and Gemini
| 2.5 or Claude 3.7.
|
| One-shotting whole features into my rust codebase.
| stefan_ wrote:
| I use it all the time, multiple times daily. But the
| discussion is not being very honest, particularly for all
| the things that are being bolted on (agent mode, MCP). Like
| just upstream people dunk on others for pointing out that
| maybe giving the model an API call to read webpages isn't
| quite turning LLM into search engines. Just like letting it
| run shell commands has not made it into a full blown agent
| engineer.
|
| I tried it again just now with Claude 3.7 in Cursors
| Agent/Compose (they change this stuff weekly). Write a
| simple C++ TensorRT app that loads an engine and runs
| inference 100 times for a benchmark, use this file to
| source a toolchain. It generated code with the old API & a
| CMake file and (warning light turns on) a build script. The
| compile fails because of the old API, but this time it
| managed to fix it to use the new API.
|
| But now the linking fails, because it overwrote the
| TRT/CUDA directories in the CMakeLists with some home
| cooked logic (there was nothing to do, the toolchain script
| sets up the environment fully and just find_package would
| work).
|
| And this is where we go off the rails; it messes with the
| build script and CMakeLists more, but still it can not
| link. It thinks hey it looks like we are cross-compiling
| and creates a second build script "cross-compile.sh" that
| tries to use the compiler directly, but of course that
| misses things that the find_package in CMake would setup
| and so fails with include errors.
|
| It pretends its a 1970 ./configure script and creates
| source files "test_nvinfer.cpp" and "test_cudart.cpp" that
| are supposed to test for the presence of those libraries,
| then tries to compile them directly; again its missing
| directories and obviously fails.
|
| Next we create a mashup build script "cross-compile-
| direct.sh". Not sure anymore what this one tried to
| achieve, didn't work.
|
| Finally, and this is my favorite agent action yet, it
| decides fuck it, if the library won't link, _why don 't we
| just mock out all the actual TensorRT/CUDA functionality
| and print fake benchmark numbers to demonstrate LLMs can
| average a number in C++_. So it writes, builds ands runs a
| "benchmark_mock.cpp" that subs out all the useful
| functionality for random data from std::mt19937. This
| naturally works, so the agent declares success and happily
| updates the README.md with all the crap it added and stops.
|
| This is what running the lawnmower over the flower bed
| means; you have 5 more useless source files and a bunch
| more shell scripts and a bunch of crap in a README that
| were all generated to try and fail to fix a problem it
| could not figure out, and this loop can keep going and
| generate more nonsense ad infinitum.
|
| (Why could it not figure out the linking error? We come
| back to the shitty bolted on integrations; it doesn't
| actually query the environment, search for files or look at
| what link directories are being used, as one would
| investigating a linking error. It could of course, but the
| balance in these integrations is 99% LLM and 1% tool use,
| and even context from the tool use often doesn't help)
| zoogeny wrote:
| I mean, I have. I use them every day. You often see them
| literally saying "Oh there is a linter error, let me go fix
| it" and then a new code generation pass happens. In the worst
| case, it does exactly what you are saying, gets stuck in a
| loop. It eventually gets to the point where it says "let me
| try just once more" and then gives up.
|
| And when that happens I review the code and if it is bad then
| I "git revert". And if it is 90% of the way there I fix it up
| and move on.
|
| The question shouldn't be "are they infallible tools of
| perfection". It should be "do I get value equal to or greater
| than the time/money I spend". And if you use git
| appropriately you lose _at most_ five minutes on a agent
| looping. And that happens a couple of times a week.
|
| And be honest with yourself, is getting stuck in a loop
| fighting a compiler, type-checker or lint something you have
| ever experienced in your pre-LLM days?
| aerhardt wrote:
| > the program doesn't compile
|
| The issue you are addressing refers specifically to Python,
| which is not compiled... Are you referring to this workflow in
| another language, or by "compile" do you mean something else,
| such as using static checkers or tests?
|
| Also, what tooling do you use to implement this workflow?
| Cursor, aider, something else?
| dragonwriter wrote:
| Python is, in fact, compiled (to bytecode, not native code);
| while this is mostly invisible, syntax errors will cause it
| to fail to compile, but the circumstances described
| (hallucinating a function) will not, because function calls
| are resolved by runtime lookup, not at compile time.
| aerhardt wrote:
| I get that, and in that sense most languages are compiled,
| but generally speaking, I've always understood "compiled"
| as compiled-ahead-of-time - Python certainly doesn't do
| that and the official docs call it an interpreted language.
|
| In the context we are talking about (hallucinating Polars
| methods), if I'm not mistaken the compilation step won't
| catch that, Python will actually throw the error at runtime
| post-compilation.
|
| So my question still stands on what OP means by "won't
| compile".
| dragonwriter wrote:
| > I get that, and in that sense most languages are
| compiled, but generally speaking, I've always understood
| "compiled" as compiled-ahead-of-time
|
| Python is AOT compiled to bytecode, but if a compiled
| version of a module is not available when needed it will
| be compiled and the compiled version saved for next use.
| In the normal usage pattern, this is mostly invisible to
| the user except in first vs. subsequent run startup
| speed, unless you check the file system and see all the
| .pyc compilation artifacts.
|
| You _can_ do AOT compilation to bytecode outside of a
| compile-as-needed-then-execute cycle, but there is rarely
| a good reason to do so explicitly for the average user
| (the main use case is on package installation, but that
| 's usually handled by package manager settings).
|
| But, relevant to the specific issue here, (edit:
| _calling_ ) a hallucinated function would lead to a
| runtime failure not a compilation failure, since function
| calls aren't resolved at compile time, but by lookup by
| name at runtime.
|
| (Edit: A sibling comment points out that _importing_ a
| hallucinated function would cause a compilation failure,
| and that 's a good point.)
| Scarblac wrote:
| When a module is first loaded, it's compiled to bytecode
| and then executed. Importing a non existing function will
| throw an error right away.
|
| It's not a compilation error but it does feel like one,
| somewhat. It happens at more or less the same time.
| mountainriver wrote:
| Yes but it gets feedback from the IDE. Cursor is the best
| here
| AlexCoventry wrote:
| He does partially address this elsewhere in the blog post. It
| seems that he's mostly concerned about surprise costs:
|
| > _On paper, coding agents should be able to address my
| complaints with LLM-generated code reliability since it
| inherently double-checks itself and it's able to incorporate
| the context of an entire code project. However, I have also
| heard the horror stories of people spending hundreds of dollars
| by accident and not get anything that solves their coding
| problems. There's a fine line between experimenting with code
| generation and gambling with code generation._
| minimaxir wrote:
| Less surprise costs, more _wasting_ money and not getting
| proportionate value out of it.
| darepublic wrote:
| So in my interactions with gpt, o3 and o4 mini, I am the
| organic middle man that copy and pastes code into the repl and
| reports on the output back to gpt if anything should be the
| problem. And for me, past a certain point, even if you
| continually report back problems it doesn't get any better in
| its new suggestions. It will just spin its wheels. So for that
| reason I'm a little skeptical about the value of automating
| this process. Maybe the llms you are using are better than the
| ones I tried this with?
|
| Specifically I was researching a lesser known kafka-mqtt
| connector: https://docs.lenses.io/latest/connectors/kafka-
| connectors/si..., and o1 was hallucinating the configuration
| needed to support dynamic topics. The docs said one thing, and
| I even mentioned it to o1 that the docs contradicted with it.
| But it would stick to its guns. If I mentioned that the code
| wouldn't compile it would start suggesting very implausible
| scenarios -- did you spell this correctly? Responses like that
| indicate you've reached a dead end. I'm curious how/if the
| "structured LLM interactions" you mention overcome this.
| nikita2206 wrote:
| You can have agent search the web for documentation and then
| provide it to the LLM. That is how Context7 is currently very
| popular in the AI user crowd.
| entropie wrote:
| I used o4 to generate nixos config files from the pasted
| modules source files. At first it did outdated config
| stuff, but with context files it worked very good.
| dingnuts wrote:
| Kagi Assistant can do this too but I find it's mostly
| useful because the traditional search function can find the
| pages the LLM loaded into its context before it started to
| output bullshit.
|
| It's nice when the LLM outputs bullshit, which is frequent.
| jimbokun wrote:
| I wonder if LLMs have been seen claiming "THERE'S A BUG IN
| THE COMPILER!"
|
| A stage every developer goes through early in their
| development.
| diggan wrote:
| > And for me, past a certain point, even if you continually
| report back problems it doesn't get any better in its new
| suggestions. It will just spin its wheels. So for that reason
| I'm a little skeptical about the value of automating this
| process.
|
| It sucks, but the trick is to _always_ restart the
| conversations /chat with a new message. I never go beyond one
| reply, and also copy-paste a bunch. Got tired of copy-
| pasting, wrote something like a prompting manager
| (https://github.com/victorb/prompta) to make it easier, and
| not having to neatly format code blocks and so on.
|
| Basically make one message, if they get the reply wrong,
| iterate on the prompt itself and start fresh, always. Don't
| try to correct by adding another message, but update initial
| prompt to make it clearer/steer more.
|
| But I've noticed that every model degrades really quickly
| past the initial reply, no matter what length of each
| individual message. The companies seem to continue to
| increase the theoretical and practical context limits, but
| the quality degrades a lot faster even within the context
| limits, and they don't seem to try to address that (nor have
| a way of measuring it).
| mr_toad wrote:
| > If I mentioned that the code wouldn't compile it would
| start suggesting very implausible scenarios
|
| I have to chuckle at that because it reminds me of a typical
| response on technical forums _long_ before LLMs were
| invented.
|
| Maybe the LLM has actually learned from those responses and
| is imitating them.
| -__---____-ZXyw wrote:
| It seems no discussion of LLMs on HN these days is complete
| without a commenter wryly observing how that one specific
| issue someone is pointing to with an LLM _is also_ ,
| funnily enough, an issue they've seen with humans. The
| implication always seems to be that this somehow bolsters
| the idea that LLMs are therefore in some sense and to some
| degree human-like.
|
| Humans not being infallible superintelligences does not
| mean that the thing that LLMs are doing is the same thing
| we do when we think, create, reason, etc. I would like to
| imagine that most serious people who use LLMs know this,
| but sometimes it's hard to be sure.
|
| Is there a name for the "humans stupid --> LLMs smart"
| fallacy?
| owebmaster wrote:
| > the idea that LLMs are therefore in some sense and to
| some degree human-like.
|
| This is 100% true, isn't it? It is based on the corpus of
| humankind knowledge and interaction, it is only expected
| that it would "repeat" human patterns. It also makes
| sense that the way to evolve the results we get from it
| is to mimic human organization, politics, sociology in
| the a new layer on top of LLMs to surpass current
| bottlenecks, just like they were used to evolve human
| societies.
| surgical_fire wrote:
| This has been my experience with any LLM I use as a code
| assistant. Currently I mostly use Claude 3.5, although I
| sometimes use Deepseek or Gemini.
|
| The more prominent and widely used a
| language/library/framework, and the more "common" what you are
| attempting, the more accurate LLMs tends to be. The more you
| deviate from mainstream paths, the more you will hit such
| problems.
|
| Which is why I find them them most useful to help me build
| things when I am very familiar with the subject matter, because
| at that point I can quickly spot misconceptions, errors, bugs,
| etc.
|
| It's when it hits the sweet spot of being a productivity tool,
| really improving the speed with which I write code (and
| sometimes improving the quality of what I write, for sometimes
| incorporating good practices I was unaware of).
| steveklabnik wrote:
| > The more prominent and widely used a
| language/library/framework, and the more "common" what you
| are attempting, the more accurate LLMs tends to be. The more
| you deviate from mainstream paths, the more you will hit such
| problems.
|
| One very interesting variant of this: I've been experimenting
| with LLMs in a react-router based project. There's an
| interesting development history where there's another project
| called Remix, and later versions of react-router effectively
| ate it, that is, in December of last year, react-router 7 is
| effectively also Remix v3 https://remix.run/blog/merging-
| remix-and-react-router
|
| Sometimes, the LLM will be like "oh, I didn't realize you
| were using remix" and start importing from it, when I in fact
| want the same imports, but from react-router.
|
| All of this happened so recently, it doesn't surprise me that
| it's a bit wonky at this, but it's also kind of amusing.
| zoogeny wrote:
| In addition to choosing languages, patterns and frameworks
| that the LLM is likely to be well trained in, I also just ask
| it how it wants to do things.
|
| For example, I don't like ORMs. There are reasons which
| aren't super important but I tend to prefer SQL directly or a
| simple query builder pattern. But I did a chain of messages
| with LLMs asking which would be better for LLM based
| development. The LLM made a compelling case as to why an ORM
| with a schema that generated a typed client would be better
| if I expected LLM coding agents to write a significant amount
| of the business logic that accessed the DB.
|
| My dislike of ORMs is something I hold lightly. If I was
| writing 100% of the code myself then I would have breezed
| past that decision. But with the agentic code assistants as
| my partners, I can make decisions that make their job easier
| from their point of view.
| beepbooptheory wrote:
| How much money do you spend a day working like this?
| steveklabnik wrote:
| I haven't spent many days or full days, but when I've toyed
| with this, it ends up at about $10/hour or maybe a bit less.
| zoogeny wrote:
| For several moments in the article I had to struggle to
| continue. He is literally saying "as an experienced LLM user I
| have no experience with the latest tools". He gives a rationale
| as to why he hasn't used the latest tools which is basically
| that he doesn't believe they will help and doesn't want to pay
| the cost to find out.
|
| I think if you are going to claim you have an opinion based on
| experience you should probably, at the least, experience the
| thing you are trying to state your opinion on. It's probably
| not enough to imagine the experience you would have and then go
| with that.
| satvikpendem wrote:
| Cursor also can read and store documentation so it's always up
| to date [0]. Surprised that many people I talk to about Cursor
| don't know about this, it's one of its biggest strengths
| compared to other tools.
|
| [0] https://docs.cursor.com/context/@-symbols/@-docs
| janalsncm wrote:
| There's an argument that library authors should consider
| implementing those hallucinated functions, not because it'll be
| easier for LLMs but because the hallucination is a statement
| about what an average user might expect to be there.
|
| I really dislike libraries that have their own bespoke ways of
| doing things for no especially good reason. Don't try to be
| cute. I don't want to remember your specific API, I want an
| intuitive API so I spend less time looking up syntax and more
| time solving the actual problem.
| dayvigo wrote:
| There's also an argument that developers of new software,
| including libraries, should consider making an earnest
| attempt to do The Right Thing instead of re-implementing old,
| flawed designs and APIs for familiarity's sake. We have
| enough regression to the mean already.
|
| The more LLMs are entrenched and required, the less we're
| able to do The Right Thing in the future. Time will be
| frozen, and we'll be stuck with the current mean forever.
| LLMs are notoriously bad at understanding anything that isn't
| mappable in some way to pre-existing constructs.
|
| > for no especially good reason
|
| That's a major qualifier.
| lxe wrote:
| This article reads like "I'm not like other LLM users" tech
| writing. There are good points about when LLMs are actually
| useful vs. overhyped, but the contrarian framing undermines what
| could have been straightforward practical advice. The whole "I'm
| more discerning than everyone else" positioning gets tiresome in
| tech discussions, especially when the actual content is useful.
| minimaxir wrote:
| I was not explicitly intending to be contrarian, but
| unfortunately the contrarian framing is inevitable when the
| practical advice is counterintuitive and against modern norms.
| I was second-guessing publishing this article at all because "I
| don't use ChatGPT.com" and "I don't see a use for
| Agents/MCP/Vibe coding" are both statements that are
| potentially damaging to my career as an engineer, but there's
| no point in writing if I can't be honest.
|
| Part of the reason I've been blogging about LLMs for so long is
| that a lot of it is counterintuitive (which I find
| interesting!) and there's a lot of misinformation and
| suboptimal workflows that results from it.
| kayodelycaon wrote:
| Tone and word choice is actually the problem here. :)
|
| One example: "normal-person frontends" immediately makes the
| statement a judgement about people. You could have said
| regular, typical, or normal instead of "normal-person".
|
| Saying your coworkers often come to you to fix problems and
| your solutions almost always work can come off as saying
| you're more intelligent than your coworkers.
|
| The only context your readers have are the words you write.
| This makes communication a damned nuisance because nobody
| knows who you are and they only know about you from what they
| read.
| minimaxir wrote:
| That descriptor was more intended to be a self-deprecation
| joke at my expense but your interpretation is fair.
| lxe wrote:
| Your defense of the contrarian framing feels like it's
| missing the point. What you're describing as
| "counterintuitive" is actually pretty standard for anyone
| who's been working deeply with LLMs for a while.
|
| Most experienced LLM users already know about temperature
| controls and API access - that's not some secret knowledge.
| Many use both the public vanilla frontends and specialized
| interfaces (various HF workflows, custom setups, sillytavern,
| oobabooga (rip), ollama, lmstudio, etc) depending on the
| task.
|
| Your dismissal of LLMs for writing comes across as someone
| who scratched the surface and gave up. There's an entire
| ecosystem of techniques for effectively using LLMs to assist
| writing without replacing it - from ideation to restructuring
| to getting unstuck on specific sections.
|
| Throughout the article, you seem to dismiss tools and
| approaches after only minimal exploration. The depth and
| nuance that would be evident to anyone who's been integrating
| these tools into their workflow for the past couple years is
| missing.
|
| Being honest about your experiences is valuable, but framing
| basic observations as contrarian insights isn't
| counterintuitive - it's just incomplete.
| dragonwriter wrote:
| > oobabooga (rip)
|
| Why (rip) here?
| lxe wrote:
| Ah, I have intuitively confused it with a1111. oobabooga
| is alive and well :)
| mattgreenrocks wrote:
| > "I don't use ChatGPT.com" and "I don't see a use for
| Agents/MCP/Vibe coding" are both statements that are
| potentially damaging to my career as an engineer
|
| This is unfortunate, though I don't blame you. Tech shouldn't
| be about blind faith in any particular orthodoxy.
| qoez wrote:
| I've tried it out a ton but the only thing I end up using it for
| these days is teaching me new things (which I largely implement
| myself; it can rarely one-shot it anyway). Or occasionally to
| make short throwaway scripts to do like file handling or ffmpeg.
| danbrooks wrote:
| As a data scientist, this mirrors my experience. Prompt
| engineering is surprisingly important for getting expected output
| - and use LLM POCs have quick turnaround times.
| Snuggly73 wrote:
| Emmm... why has Claude 'improved' the code by setting SQLite to
| be threadsafe and then adding locks on every db operation? (You
| can argue that maybe the callbacks are invoked from multiple
| threads, but they are not thread safe themselves).
| dboreham wrote:
| Interns don't understand concurrency either.
| daxfohl wrote:
| But if you teach them the right way to do it today and have
| them fix it, they won't go and do it the wrong way again
| tomorrow and the next day amd every day for the rest of the
| summer.
| chowells wrote:
| With concurrency, they still might get it wrong the rest of
| the summer. It's a hard topic. But at least they might
| learn they need to ask for feedback when they're doing that
| looks similar to stuff that's previously caused problems.
| iambateman wrote:
| > "feed it the text of my mostly-complete blog post, and ask the
| LLM to pretend to be a cynical Hacker News commenter and write
| five distinct comments based on the blog post."
|
| It feels weird to write something positive here...given the
| context...but this is a great idea. ;)
| jaggederest wrote:
| This is the kind of task where, before LLMs, I wouldn't have
| done it. Maybe if it was something really important I'd
| circulate it to a couple friends to get rough feedback, but
| mostly it was just let it fly. I think it's pretty
| revolutionary to be able to get some useful feedback in
| seconds, with a similar knock-on effect in the pull request
| review space.
|
| The other thing I find LLMs most useful for is work that is
| simply unbearably tedious. Literature reviews are the perfect
| example of this - Sure, I could go read 30-50 journal articles,
| some of which are relevant, and form an opinion. But my
| confidence level in letting the AI do it in 90 seconds is
| reasonable-ish (~60%+) and 60% confidence in 90 seconds is
| infinitely better than 0% confidence because I just didn't
| bother.
|
| A lot of the other highly hyped uses for LLMs I personally
| don't find that compelling - my favorite uses are mostly like a
| notebook that actually talks back, like the Young Lady's
| Illustrated Primer from Diamond Age.
| barbazoo wrote:
| > But my confidence level in letting the AI do it in 90
| seconds is reasonable-ish (~60%+) and 60% confidence in 90
| seconds is infinitely better than 0% confidence because I
| just didn't bother.
|
| So you got the 30 to 50 articles summarized by the LLM, now
| how do you know what 60% you can trust and what's
| hallucinated without reading it? It's hard to be usable at
| all unless you already do know what is real and what is not.
| jaggederest wrote:
| So, generally how I use it is to get background for further
| research. So you're right, you do have to do further
| reading, but it's at the second tier - "now that I know
| roughly what I'm looking for, I have targets for further
| reading", rather than "How does any of this work and what
| are relevant articles"
| simonw wrote:
| > However, for more complex code questions particularly around
| less popular libraries which have fewer code examples scraped
| from Stack Overflow and GitHub, I am more cautious of the LLM's
| outputs.
|
| That's changed for me in the past couple of months. I've been
| using the ChatGPT interface to o3 and o4-mini for a bunch of code
| questions against more recent libraries and finding that they're
| surprisingly good at using their search tool to look up new
| details. Best version of that so far:
|
| "This code needs to be upgraded to the new recommended JavaScript
| library from Google. Figure out what that is and then look up
| enough documentation to port this code to it."
|
| This actually worked! https://simonwillison.net/2025/Apr/21/ai-
| assisted-search/#la...
|
| The other trick I've been using a lot is pasting the
| documentation or even the entire codebase of a new library
| directly into a long context model as part of my prompt. This
| works great for any library under about 50,000 tokens total -
| more than that and you usually have to manually select the most
| relevant pieces, though Gemini 2.5 Pro can crunch through
| hundreds of thousands of tokens pretty well with getting
| distracted.
|
| Here's an example of that from yesterday:
| https://simonwillison.net/2025/May/5/llm-video-frames/#how-i...
| zoogeny wrote:
| I think they might have made a change to Cursor recently as
| well. A few times I've caught it using an old API of popular
| libraries that have updates. Shout out to all the library
| developers that are logging deprecations and known incorrect
| usages, that has been a huge win with LLMs. In most cases I can
| paste the deprecation warning back into the agent and it will
| say "Oh, looks like that API changed in vX.Y.Z, we should be
| doing <other thing>, let me fix that ..."
|
| So it is capable of integrating new API usage, it just isn't a
| part of the default "memory" of the LLM. Given how quickly JS
| libraries tend to change (even on the API side) that isn't
| ideal. And given that the typical JS server project has dozens
| of libs, including the most recent documentation for each is
| not really feasible. So for now, I am just looking out for
| runtime deprecation errors.
|
| But I give the LLM some slack here, because even if I was
| programming myself using an library I've used in the past, I'm
| likely to make the same mistake.
| satvikpendem wrote:
| You can just use @Docs [0] to import the correct
| documentation for your libraries.
|
| [0] https://docs.cursor.com/context/@-symbols/@-docs
| Beijinger wrote:
| "To that end, I never use ChatGPT.com or other normal-person
| frontends for accessing LLMs because they are harder to control.
| Instead, I typically access the backend UIs provided by each LLM
| service, which serve as a light wrapper over the API
| functionality which also makes it easy to port to code if
| necessary."
|
| How do you do this? Do you have to be on a paid plan for this?
| minimaxir wrote:
| If you log into the API backend, there is usually a link to the
| UI. For OpenAI/ChatGPT, it's
| https://platform.openai.com/playground
|
| This is independent of ChatGPT+. You do need to have a credit
| card attached but you only pay for your usage.
| diggan wrote:
| I think they're talking about the Sandbox/Playground/Editor
| thingy that almost all companies who expose APIs also offer to
| quickly try out API features. For OpenAI it's https://platform.
| openai.com/playground/prompts?models=gpt-4...., Anthropic has
| https://console.anthropic.com/workbench and so on.
| morgengold wrote:
| ... but when I do, I let it write regex, SQL commands,
| simple/complex if else stuff, apply tailwind classes, feed it my
| console log errors, propose frontend designs ... and other little
| stuff. Saves brain power for the complex problems.
| gcp123 wrote:
| While I think the title is misleading/clickbaity (no surprise
| given the buzzfeed connection), I'll say that the substance of
| the article might be one of the most honest take on LLMs I've
| seen from someone who actually works in the field. The author
| describes exactly how I use LLMs - strategically, for specific
| tasks where they add value, not as a replacement for actual
| thinking.
|
| What resonated most was the distinction between knowing when to
| force the square peg through the round hole vs. when precision
| matters. I've found LLMs incredibly useful for generating regex
| (who hasn't?) and solving specific coding problems with unusual
| constraints, but nearly useless for my data visualization work.
|
| The part about using Claude to generate simulated HN criticism of
| drafts is brilliant - getting perspective without the usual "this
| is amazing!" LLM nonsense. That's the kind of creative tool use
| that actually leverages what these models are good at.
|
| I'm skeptical about the author's optimism regarding open-source
| models though. While Qwen3 and DeepSeek are impressive, the
| infrastructure costs for running these at scale remain
| prohibitive for most use cases. The economics still don't work.
|
| What's refreshing is how the author avoids both the "AGI will
| replace us all" hysteria and the "LLMs are useless toys"
| dismissiveness. They're just tools - sometimes useful, sometimes
| not, always imperfect.
| xandrius wrote:
| Just about the point about the prohibitive infrastructure at
| scale, why does it need to be at scale?
|
| Over few years, we went from literally impossible to being able
| to run a 72B model locally on a laptop. Give it 5-10 years and
| we might not need to have any infrastructure at all, all served
| locally with switchable (and different sized) open source
| models.
| geor9e wrote:
| >Ridiculous headline implying the existance of non-generative
| LLMs
|
| >Baited into clicking
|
| >Article about generative LLMs
|
| >It's a buzzfeed employee
| minimaxir wrote:
| The reason I added "generative" to the headline is because of
| the mention about the important use of text embedding models
| that are indeed non-generative LLMs and I did not want to start
| a semantics war if I did not explicitly specify "generative
| LLMs." (the headline would flow better without it)
| ziml77 wrote:
| I like that the author included the chat logs. I know there's a
| lot of times where people can't share them because they'd expose
| too much info, but I really think it's important when people make
| big claims about what they've gotten an LLM to do that they back
| it up.
| minimaxir wrote:
| That is a relatively new workflow for me since getting the logs
| out of the Claude UI is a more copy/paste manual process. I'm
| likely going to work on something to automate it a bit.
| simonw wrote:
| I use this: llm -m claude-3.7-sonnet "prompt"
| llm logs -c | pbcopy
|
| Then paste into a Gist. Gets me things like this: https://gis
| t.github.com/simonw/0a5337d1de7f77b36d488fdd7651b...
| fudged71 wrote:
| Isn't your Observable notebook more applicable to what he's
| talking about (scraping the Claude UI)?
| https://x.com/simonw/status/1821649481001267651
| minimaxir wrote:
| I'm interested in any approach to better organize LLM
| outputs offline, but it would probably be more pragmatic
| to use some sort of scripting tool.
___________________________________________________________________
(page generated 2025-05-05 23:00 UTC)