hngopher.com

       [HN Gopher] Focus and Context and LLMs
       ___________________________________________________________________
        
       Focus and Context and LLMs
        
       Author : tarasglek
       Score  : 57 points
       Date   : 2025-06-08 09:09 UTC (13 hours ago)
        
 (HTM) web link (taras.glek.net)
 (TXT) w3m dump (taras.glek.net)
        
       | quantum_state wrote:
       | Context is all you need :-)
        
         | tarasglek wrote:
         | Indeed, that was my original working title
        
           | max2he wrote:
           | bruh that's googles original working title
        
       | summarity wrote:
       | I found the same in my personal work. I have o3 chats (as in
       | OAI's Chat interface) that are so large they crash the site, yet
       | o3 still responds without hallucination and can debug across 5k+
       | LOC. I've used it for DSP code, to debug a subtle error in a
       | 800+LOC Nim macro that sat in a 4k+ LOC module (it found the
       | bug), work on compute shaders for audio analysis, work on
       | optimizing graphics programs and other algorithms. Once I "vibe
       | coded" (I hate that term) a fun demo using a color management lib
       | I wrote, which encoded the tape state for a brainfuck interpreter
       | in the deltaE differences between adjacent cells. Using the same
       | prompts replayed in Claude chat and others doesn't even get
       | close. It's spooky.
       | 
       | Yet when I use the Codex CLI, or agent mode in any IDE it feels
       | like o3 regresses to below GPT-3.5 performance. All recent agent-
       | mode models seem completely overfitted to tool calling. The most
       | laughable attempt is Mistral's devstral-small - allegedly the #1
       | agent model, but going outside of scenarios you'd encounter in
       | SWEbench & co it completely falls apart.
       | 
       | I notice this at work as well, the more tools you give any model
       | (reasoning or not), the more confused it gets. But the
       | alternative is to stuff massive context into the prompts, and
       | that has no ROI. There's a fine line to be walked here, but no
       | one is even close it yet.
        
       | __mharrison__ wrote:
       | Building complex software is certainly possible with no coding
       | and minimal promoting.
       | 
       | This YT video (from 2 days ago) demonstrates it
       | https://youtu.be/fQL1A4WkuJk?si=7alp3O7uCHY7JB16
       | 
       | The author builds a drawing app in an hour.
        
       | emorning3 wrote:
       | The article summed itself up as 'Context is everything".
       | 
       | But the article itself also makes the point that a human
       | assistant was also necessary. That's gonna be my take away.
        
         | spmurrayzzz wrote:
         | I agree. And the real lede was buried here IMO:
         | 
         | > This is the single most impressive code-gen project I've seen
         | so far. I did not think this was possible yet.
         | 
         | To get that sort of acclaim, a human had to build an embedded
         | programming language from scratch to get to that point. And
         | even with all that effort, the agent itself took $631 and 119
         | hours to complete the task. I actually don't think this is a
         | knock on the idea at all, this is the direction I think most
         | engineers should be thinking about.
         | 
         | That agent-built HTTP/2 server they're referencing is
         | apparently the only example of this sort of output they've seen
         | to date. But if you're active in this particular space,
         | especially on the open source side of the fence, this kind of
         | work is everywhere. But since they don't manifest themselves as
         | super generic tooling that you can apply to broad task domains
         | as a turnkey solution, they don't get much attention.
         | 
         | I've continually held the line that if any given LLM agent
         | platform works well for your use case and you haven't built
         | said agent platform yourself, the underlying problem likely
         | isn't that hard or complex. For the hard problems, you gotta do
         | some first-principles engineering to make these tools work for
         | you.
        
       | artembugara wrote:
       | What are some startups that help precisely with "feeding the LLM
       | the right context" ?
        
         | apwell23 wrote:
         | cursor ?
        
         | jsemrau wrote:
         | Is that really a product? I think it should be solved through
         | workflow and policies rather than providing this to a 3rd party
         | provider. But I might be wrong.
         | 
         | [1] https://jdsemrau.substack.com/p/memory-and-context
        
       | Workaccount2 wrote:
       | I don't know why software engineers think that LLM coding ability
       | is purpose made for them to use, and because it sort of sucks at
       | it, it therefore useless...
       | 
       | It's like listening to professional translators endlessly lament
       | about translation software and all it's short comings and
       | pitfalls, while totally missing that the software is primarily
       | used for property managers wanting to ask the landscapers to cut
       | the grass lower.
       | 
       | LLMs are _excellent_ at writing code for people who have no idea
       | what a programming language is, but a good idea of what computers
       | can do when someone can speak this code language to them. I don
       | 't need an LLM to one-shot Excel.exe so I can track the number of
       | members vs non-members who come to my community craft fair.
        
         | Nevermark wrote:
         | > LLMs are excellent at
         | 
         | Writing hint: Your last paragraph stands well on its own.
         | Especially if this is, in fact, your actual experience.
         | 
         | Nothing in that paragraph requires the negativity or
         | inaccuracies of the preceding two paragraphs.
         | 
         | There should be a name for the human tendency (we have all
         | done/do it) to weigh down good points with unnecessary and
         | often inaccurate contrast/competition.
        
       | jmward01 wrote:
       | This is definitely the right problem to focus on. I think the
       | answer is a different LLM structure that has unlimited context.
       | The transformer with causal masks for training block got us here
       | but they are now limiting us in many massive ways.
        
       | briian wrote:
       | The funny thing about vibe coding is that God tier vibe coders
       | think they're in DGAF mode. But people who are actually in DGAF
       | mode and just say "Make instagram for me" think they're in god
       | tier.
       | 
       | But agreed, there needs to be a better way for these agents to
       | figure out what context to select. It doesn't seem like this will
       | be too much of a large issue to solve though?
        
       | tptacek wrote:
       | This article is knocking down a very expansive claim that most
       | serious (ie: not vibe-coding) developers aren't making. Their
       | point is that LLM agents have not yet reached the point where
       | they can finish a complicated job end-to-end, and that if you
       | want to do a _completely hands-off_ project, where only the LLM
       | generates any code, it takes a lot of prompting effort to
       | accomplish.
       | 
       | This seems true, right now!
       | 
       | But in building out stuff with LLMs, I don't expect (or want)
       | them to do the job end-to-end. I've ~25 merged PRs into a project
       | right now (out of ~40 PRs generated). Most merged PRs I pulled
       | into Zed and cleaned _something_ up. At around PR #10 I went in
       | and significantly restructured the code.
       | 
       | The overall process has been much faster and more pleasant than
       | writing from scratch, and, notably, did not involve me honing my
       | LLM communications skills. The restructuring work I did was
       | exactly the same kind of thing I do on all my projects; until
       | you've got something working it's hard to see what the exact
       | right shape is. I expect I'll do that 2-3 more times before the
       | project is done.
       | 
       | I feel like Kenton Varda was trying to make a point in the way
       | they drove their LLM agent; the point of that project was in part
       | to record the 2025 experience of doing something complicated end-
       | to-end with an agent. That took some doing. But you don't have to
       | do that to get a lot of acceleration from LLMs.
        
         | ofjcihen wrote:
         | It's almost like unrealistic expectations of LLMs driven by
         | those working for companies who have something to gain by
         | labeling any skepticism as "crazy" does significant damage to
         | our perception of it's usefulness.
         | 
         | Believe it or not I agree.
        
         | threeseed wrote:
         | The plural of anecdote is not data.
         | 
         | Let's repeat this process for 100 coding examples and see how
         | many it can complete "hands-off" especially where (a) it isn't
         | a case of here is a spec and I need you to implement it and (b)
         | it isn't for a a use for which there is already publicly
         | available code.
         | 
         | Otherwise your claim of "this seems true, right now!" is
         | baseless.
        
           | tptacek wrote:
           | I can't tell if you're saying I'm being too generous towards
           | LLMs or too skeptical.
        
       | jumploops wrote:
       | Has the author tried Claude Code?
       | 
       | It's the first useful "agent" (LLM in a loop + tools) that I've
       | tried.
       | 
       | IME it is hard to explain why it's better than e.g. Aider or
       | Cursor, but once you try it you'll migrate your workflow pretty
       | quickly.
        
         | padolsey wrote:
         | How much transparency does Claude Code give you into what it's
         | doing? I like IDE-integrated agents as they show diffs and
         | allow focused prompting for specific areas of concern. And I
         | get to control what's in context at any given time in a longer
         | thread. I haven't tried Claude's thing in a while, but from
         | what I gather it's more of a "prompt and pray" kind of agent..
         | ?
        
           | _neil wrote:
           | My experience is that you can be very targeted in your
           | promoting with Claude code and it mostly gets good results.
           | You can also ask it early on to create a branch and create
           | logical commits as it works. This way, you can examine
           | smaller code changes later in a PR (or git log).
           | 
           | Or if you want to work more manually, you could do the same
           | but not allow full access to git commit. That way it will
           | request access each time it's ready to commit and you can
           | take that time to review diffs.
        
         | apwell23 wrote:
         | > IME it is hard to explain why it's better than e.g. Aider or
         | Cursor
         | 
         | i have cursor through work but i am tempted to shell out $100
         | because of this hype.
         | 
         | is it better than using claude models in cursor?
        
         | troupo wrote:
         | It can get surprisingly dumb surprisingly fast.
         | 
         | Today I spent easily half an hour trying to make it solve a
         | layout issue it itself introduced when porting a component.
         | 
         | It was a complex port it executed perfectly. And then it
         | completely failed to even create a simple wrapper that fixed a
         | flexbox issue.
         | 
         | BTW. Claude (Code and Cursor) is over-indexed on "let's
         | randomly add and remove h-full/overflow-auto and pretend it
         | works ad infinitum"
        
           | apwell23 wrote:
           | > And then it completely failed to even create a simple
           | wrapper that fixed a flexbox issue.
           | 
           | yea this is the problem with vibe coding. its hard to
           | understand and keep tabs on nitty gritty when stuff is being
           | generated for you. No matter how much you 'review' it, it
           | just doesn't stick in the same way if you were writing code.
           | You are really screwed if you have debug something that llm
           | throws its hands up on.
        
           | bob1029 wrote:
           | I've found that CSS is among one of the more terrible things
           | for an LLM to work on.
           | 
           | It's definitely on point with some strategic layout items,
           | flexbox, etc., but when it comes to anything like colors,
           | margins, padding, typeface, borders, etc., you might as well
           | be throwing darts into the void.
        
       | mathgeek wrote:
       | > In the meantime continue expecting mediocre results from
       | mediocre people feeding LLMs mediocre context.
       | 
       | I can't even with the ego here. The best teachers practice
       | humility.
        
       ___________________________________________________________________
       (page generated 2025-06-08 23:00 UTC)