[HN Gopher] Focus and Context and LLMs
___________________________________________________________________
Focus and Context and LLMs
Author : tarasglek
Score : 57 points
Date : 2025-06-08 09:09 UTC (13 hours ago)
(HTM) web link (taras.glek.net)
(TXT) w3m dump (taras.glek.net)
| quantum_state wrote:
| Context is all you need :-)
| tarasglek wrote:
| Indeed, that was my original working title
| max2he wrote:
| bruh that's googles original working title
| summarity wrote:
| I found the same in my personal work. I have o3 chats (as in
| OAI's Chat interface) that are so large they crash the site, yet
| o3 still responds without hallucination and can debug across 5k+
| LOC. I've used it for DSP code, to debug a subtle error in a
| 800+LOC Nim macro that sat in a 4k+ LOC module (it found the
| bug), work on compute shaders for audio analysis, work on
| optimizing graphics programs and other algorithms. Once I "vibe
| coded" (I hate that term) a fun demo using a color management lib
| I wrote, which encoded the tape state for a brainfuck interpreter
| in the deltaE differences between adjacent cells. Using the same
| prompts replayed in Claude chat and others doesn't even get
| close. It's spooky.
|
| Yet when I use the Codex CLI, or agent mode in any IDE it feels
| like o3 regresses to below GPT-3.5 performance. All recent agent-
| mode models seem completely overfitted to tool calling. The most
| laughable attempt is Mistral's devstral-small - allegedly the #1
| agent model, but going outside of scenarios you'd encounter in
| SWEbench & co it completely falls apart.
|
| I notice this at work as well, the more tools you give any model
| (reasoning or not), the more confused it gets. But the
| alternative is to stuff massive context into the prompts, and
| that has no ROI. There's a fine line to be walked here, but no
| one is even close it yet.
| __mharrison__ wrote:
| Building complex software is certainly possible with no coding
| and minimal promoting.
|
| This YT video (from 2 days ago) demonstrates it
| https://youtu.be/fQL1A4WkuJk?si=7alp3O7uCHY7JB16
|
| The author builds a drawing app in an hour.
| emorning3 wrote:
| The article summed itself up as 'Context is everything".
|
| But the article itself also makes the point that a human
| assistant was also necessary. That's gonna be my take away.
| spmurrayzzz wrote:
| I agree. And the real lede was buried here IMO:
|
| > This is the single most impressive code-gen project I've seen
| so far. I did not think this was possible yet.
|
| To get that sort of acclaim, a human had to build an embedded
| programming language from scratch to get to that point. And
| even with all that effort, the agent itself took $631 and 119
| hours to complete the task. I actually don't think this is a
| knock on the idea at all, this is the direction I think most
| engineers should be thinking about.
|
| That agent-built HTTP/2 server they're referencing is
| apparently the only example of this sort of output they've seen
| to date. But if you're active in this particular space,
| especially on the open source side of the fence, this kind of
| work is everywhere. But since they don't manifest themselves as
| super generic tooling that you can apply to broad task domains
| as a turnkey solution, they don't get much attention.
|
| I've continually held the line that if any given LLM agent
| platform works well for your use case and you haven't built
| said agent platform yourself, the underlying problem likely
| isn't that hard or complex. For the hard problems, you gotta do
| some first-principles engineering to make these tools work for
| you.
| artembugara wrote:
| What are some startups that help precisely with "feeding the LLM
| the right context" ?
| apwell23 wrote:
| cursor ?
| jsemrau wrote:
| Is that really a product? I think it should be solved through
| workflow and policies rather than providing this to a 3rd party
| provider. But I might be wrong.
|
| [1] https://jdsemrau.substack.com/p/memory-and-context
| Workaccount2 wrote:
| I don't know why software engineers think that LLM coding ability
| is purpose made for them to use, and because it sort of sucks at
| it, it therefore useless...
|
| It's like listening to professional translators endlessly lament
| about translation software and all it's short comings and
| pitfalls, while totally missing that the software is primarily
| used for property managers wanting to ask the landscapers to cut
| the grass lower.
|
| LLMs are _excellent_ at writing code for people who have no idea
| what a programming language is, but a good idea of what computers
| can do when someone can speak this code language to them. I don
| 't need an LLM to one-shot Excel.exe so I can track the number of
| members vs non-members who come to my community craft fair.
| Nevermark wrote:
| > LLMs are excellent at
|
| Writing hint: Your last paragraph stands well on its own.
| Especially if this is, in fact, your actual experience.
|
| Nothing in that paragraph requires the negativity or
| inaccuracies of the preceding two paragraphs.
|
| There should be a name for the human tendency (we have all
| done/do it) to weigh down good points with unnecessary and
| often inaccurate contrast/competition.
| jmward01 wrote:
| This is definitely the right problem to focus on. I think the
| answer is a different LLM structure that has unlimited context.
| The transformer with causal masks for training block got us here
| but they are now limiting us in many massive ways.
| briian wrote:
| The funny thing about vibe coding is that God tier vibe coders
| think they're in DGAF mode. But people who are actually in DGAF
| mode and just say "Make instagram for me" think they're in god
| tier.
|
| But agreed, there needs to be a better way for these agents to
| figure out what context to select. It doesn't seem like this will
| be too much of a large issue to solve though?
| tptacek wrote:
| This article is knocking down a very expansive claim that most
| serious (ie: not vibe-coding) developers aren't making. Their
| point is that LLM agents have not yet reached the point where
| they can finish a complicated job end-to-end, and that if you
| want to do a _completely hands-off_ project, where only the LLM
| generates any code, it takes a lot of prompting effort to
| accomplish.
|
| This seems true, right now!
|
| But in building out stuff with LLMs, I don't expect (or want)
| them to do the job end-to-end. I've ~25 merged PRs into a project
| right now (out of ~40 PRs generated). Most merged PRs I pulled
| into Zed and cleaned _something_ up. At around PR #10 I went in
| and significantly restructured the code.
|
| The overall process has been much faster and more pleasant than
| writing from scratch, and, notably, did not involve me honing my
| LLM communications skills. The restructuring work I did was
| exactly the same kind of thing I do on all my projects; until
| you've got something working it's hard to see what the exact
| right shape is. I expect I'll do that 2-3 more times before the
| project is done.
|
| I feel like Kenton Varda was trying to make a point in the way
| they drove their LLM agent; the point of that project was in part
| to record the 2025 experience of doing something complicated end-
| to-end with an agent. That took some doing. But you don't have to
| do that to get a lot of acceleration from LLMs.
| ofjcihen wrote:
| It's almost like unrealistic expectations of LLMs driven by
| those working for companies who have something to gain by
| labeling any skepticism as "crazy" does significant damage to
| our perception of it's usefulness.
|
| Believe it or not I agree.
| threeseed wrote:
| The plural of anecdote is not data.
|
| Let's repeat this process for 100 coding examples and see how
| many it can complete "hands-off" especially where (a) it isn't
| a case of here is a spec and I need you to implement it and (b)
| it isn't for a a use for which there is already publicly
| available code.
|
| Otherwise your claim of "this seems true, right now!" is
| baseless.
| tptacek wrote:
| I can't tell if you're saying I'm being too generous towards
| LLMs or too skeptical.
| jumploops wrote:
| Has the author tried Claude Code?
|
| It's the first useful "agent" (LLM in a loop + tools) that I've
| tried.
|
| IME it is hard to explain why it's better than e.g. Aider or
| Cursor, but once you try it you'll migrate your workflow pretty
| quickly.
| padolsey wrote:
| How much transparency does Claude Code give you into what it's
| doing? I like IDE-integrated agents as they show diffs and
| allow focused prompting for specific areas of concern. And I
| get to control what's in context at any given time in a longer
| thread. I haven't tried Claude's thing in a while, but from
| what I gather it's more of a "prompt and pray" kind of agent..
| ?
| _neil wrote:
| My experience is that you can be very targeted in your
| promoting with Claude code and it mostly gets good results.
| You can also ask it early on to create a branch and create
| logical commits as it works. This way, you can examine
| smaller code changes later in a PR (or git log).
|
| Or if you want to work more manually, you could do the same
| but not allow full access to git commit. That way it will
| request access each time it's ready to commit and you can
| take that time to review diffs.
| apwell23 wrote:
| > IME it is hard to explain why it's better than e.g. Aider or
| Cursor
|
| i have cursor through work but i am tempted to shell out $100
| because of this hype.
|
| is it better than using claude models in cursor?
| troupo wrote:
| It can get surprisingly dumb surprisingly fast.
|
| Today I spent easily half an hour trying to make it solve a
| layout issue it itself introduced when porting a component.
|
| It was a complex port it executed perfectly. And then it
| completely failed to even create a simple wrapper that fixed a
| flexbox issue.
|
| BTW. Claude (Code and Cursor) is over-indexed on "let's
| randomly add and remove h-full/overflow-auto and pretend it
| works ad infinitum"
| apwell23 wrote:
| > And then it completely failed to even create a simple
| wrapper that fixed a flexbox issue.
|
| yea this is the problem with vibe coding. its hard to
| understand and keep tabs on nitty gritty when stuff is being
| generated for you. No matter how much you 'review' it, it
| just doesn't stick in the same way if you were writing code.
| You are really screwed if you have debug something that llm
| throws its hands up on.
| bob1029 wrote:
| I've found that CSS is among one of the more terrible things
| for an LLM to work on.
|
| It's definitely on point with some strategic layout items,
| flexbox, etc., but when it comes to anything like colors,
| margins, padding, typeface, borders, etc., you might as well
| be throwing darts into the void.
| mathgeek wrote:
| > In the meantime continue expecting mediocre results from
| mediocre people feeding LLMs mediocre context.
|
| I can't even with the ego here. The best teachers practice
| humility.
___________________________________________________________________
(page generated 2025-06-08 23:00 UTC)