[HN Gopher] The New Skill in AI Is Not Prompting, It's Context E...
       ___________________________________________________________________
        
       The New Skill in AI Is Not Prompting, It's Context Engineering
        
       Author : robotswantdata
       Score  : 219 points
       Date   : 2025-06-30 20:53 UTC (2 hours ago)
        
 (HTM) web link (www.philschmid.de)
 (TXT) w3m dump (www.philschmid.de)
        
       | baxtr wrote:
       | > _Conclusion
       | 
       | Building powerful and reliable AI Agents is becoming less about
       | finding a magic prompt or model updates. It is about the
       | engineering of context and providing the right information and
       | tools, in the right format, at the right time. It's a cross-
       | functional challenge that involves understanding your business
       | use case, defining your outputs, and structuring all the
       | necessary information so that an LLM can "accomplish the task."_
       | 
       | That's actually also true for humans: the more context (aka right
       | info at the right time) you provide the better for solving tasks.
        
         | QuercusMax wrote:
         | Yeah... I'm always asking my UX and product folks for mocks,
         | requirements, acceptance criteria, sample inputs and outputs,
         | why we care about this feature, etc.
         | 
         | Until we can scan your brain and figure out what you really
         | want, it's going to be necessary to actually describe what you
         | want built, and not just rely on vibes.
        
         | lupire wrote:
         | Not "more" context. "Better" context.
         | 
         | (X-Y problem, for example.)
        
         | root_axis wrote:
         | I am not a fan of this banal trend of superficially comparing
         | aspects of machine learning to humans. It doesn't provide any
         | insight and is hardly ever accurate.
        
           | ModernMech wrote:
           | I agree, however I _do_ appreciate comparisons to other
           | human-made systems. For example,  "providing the right
           | information and tools, in the right format, at the right
           | time" sounds a lot like a bureaucracy, particularly because
           | "right" is decided for you, it's left undefined, and may
           | change at any time with no warning or recourse.
        
           | furyofantares wrote:
           | [delayed]
        
         | mentalgear wrote:
         | Basically, finding the right buttons to push within the
         | constraints of the environment. Not so much different from what
         | (SW) engineering is, only non-deterministic in the outcomes.
        
       | simonw wrote:
       | I wrote a bit about this the other day:
       | https://simonwillison.net/2025/Jun/27/context-engineering/
       | 
       | Drew Breunig has been doing some _fantastic_ writing on this
       | subject - coincidentally at the same time as the  "context
       | engineering" buzzword appeared but actually unrelated to that
       | meme.
       | 
       | How Long Contexts Fail - https://www.dbreunig.com/2025/06/22/how-
       | contexts-fail-and-ho... - talks about the various ways in which
       | longer contexts can start causing problems (also known as
       | "context rot")
       | 
       | How to Fix Your Context -
       | https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.... -
       | gives names to a bunch of techniques for working around these
       | problems including Tool Loadout, Context Quarantine, Context
       | Pruning, Context Summarization, and Context Offloading.
        
         | the_mitsuhiko wrote:
         | Drew Breunig's posts are a must read on this. This is not only
         | important for writing your own agents, it is also critical when
         | using agentic coding right now. These limitations/behaviors
         | will be with us for a while.
        
           | outofpaper wrote:
           | They might be good reads on the topic but Drew makes some
           | significant etymological mistakes. For example loadout
           | doesn't come from gaming but military terminology. It's
           | essentially the same as kit or gear.
        
             | ZYbCRq22HbJ2y7 wrote:
             | > They might be good reads on the topic but Drew makes some
             | significant etymological mistakes. For example loadout
             | doesn't come from gaming but military terminology. It's
             | essentially the same as kit or gear.
             | 
             | Doesn't seem that significant?
             | 
             | Not to say those blog posts say anything much anyway that
             | any "prompt engineer" (someone who uses LLMs frequently)
             | doesn't already know, but maybe it is useful to some at
             | such an early stage of _these things_.
        
             | simonw wrote:
             | Drew isn't using that term in a military context, he's
             | using it in a gaming context. He defines what he means very
             | clearly:
             | 
             | > _The term "loadout" is a gaming term that refers to the
             | specific combination of abilities, weapons, and equipment
             | you select before a level, match, or round._
             | 
             | In the military you don't select your abilities before
             | entering a level.
        
             | DiggyJohnson wrote:
             | This seems like a rather unimportant type of mistake,
             | especially because the definition is still accurate, it's
             | just the etymology isn't complete.
        
         | storus wrote:
         | Those issues are considered artifacts of the current crop of
         | LLMs in academic circles; there is already research allowing
         | LLMs to use millions of different tools at the same time, and
         | stable long contexts, likely reducing the amount of agents to
         | one for most use cases outside interfacing different providers.
         | 
         | Anyone basing their future agentic systems on current LLMs
         | would likely face LangChain fate - built for GPT-3, made
         | obsolete by GPT-3.5.
        
           | simonw wrote:
           | Can you link to the research on millions of different terms
           | and stable long contexts? I haven't come across that yet.
        
             | storus wrote:
             | You can look at AnyTool, 2024 (16,000 tools) and start
             | looking at newer research from there.
             | 
             | https://arxiv.org/abs/2402.04253
             | 
             | For long contexts start with activation beacons and RoPE
             | scaling.
        
               | simonw wrote:
               | I would classify AnyTool as a context engineering trick.
               | It's using GPT-4 function calls (what we would call tool
               | calls today) to find the best tools for the current job
               | based on a 3-level hierarchy search.
               | 
               | Drew calls that one "Tool Loadout"
               | https://www.dbreunig.com/2025/06/26/how-to-fix-your-
               | context....
        
           | Foreignborn wrote:
           | yes, but those aren't released and even then you'll always
           | need glue code.
           | 
           | you just need to knowingly resource what glue code is needed,
           | and build it in a way it can scale with whatever new limits
           | that upgraded models give you.
           | 
           | i can't imagine a world where people aren't building products
           | that try to overcome the limitations of SOTA models
        
             | storus wrote:
             | My point is that newer models will have those baked in, so
             | instead of supporting ~30 tools before falling apart they
             | will reliably support 10,000 tools defined in their
             | context. That alone would dramatically change the need for
             | more than one agent in most cases as the architectural
             | split into multiple agents is often driven by the inability
             | to reliably run many tools within a single agent. Now you
             | can hack around it today by turning tools on/off depending
             | on the agent's state but at some point in the future you
             | might afford not to bother and just dump all your tools to
             | a long stable context, maybe cache it for performance, and
             | that will be it.
        
               | ZYbCRq22HbJ2y7 wrote:
               | There will likely be custom, large, and expensive models
               | at an enterprise level in the near future (some large
               | entities and governments already have them (niprgpt)).
               | 
               | With that in mind, what would be the business sense in
               | siloing a single "Agent" instead of using something like
               | a service discovery service that all benefit from?
        
               | storus wrote:
               | My guess is the main issue is latency and accuracy; a
               | single agent without all the routing/evaluation sub-
               | agents around it that introduce cumulative errors, lead
               | to infinite loops and slow it down would likely be much
               | faster, accurate and could be cached at the token level
               | on a GPU, reducing token preprocessing time further. Now
               | different companies would run different "monorepo" agents
               | and those would need something like MCP to talk to each
               | other at the business boundary, but internally all this
               | won't be necessary.
               | 
               | Also the current LLMs have still too many issues because
               | they are autoregressive and heavily biased towards the
               | first few generated tokens. They also still don't have
               | full bidirectional awareness of certain relationships due
               | to how they are masked during the training. Discrete
               | diffusion looks interesting but I am not sure how does
               | that one deal with tools as I've never seen a model from
               | that class using any tools.
        
           | ZYbCRq22HbJ2y7 wrote:
           | How would "a million different tool calls at the same time"
           | work? For instance, MCP is HTTP based, even at low latency in
           | incredibly parallel environments that would take forever.
        
           | dinvlad wrote:
           | > already research allowing LLMs to use millions of different
           | tools
           | 
           | Hmm first time hearing about this, could you share any
           | examples please?
        
             | simonw wrote:
             | See this comment
             | https://news.ycombinator.com/item?id=44428548
        
         | old_man_cato wrote:
         | First, you pay a human artist to draw a pelican on a bicycle.
         | 
         | Then, you provide that as "context".
         | 
         | Next, you prompt the model.
         | 
         | Voila!
        
           | d0gsg0w00f wrote:
           | This hits too close to home.
        
         | JoeOfTexas wrote:
         | So who will develop the first Logic Core that automates the
         | context engineer.
        
           | igravious wrote:
           | The first rule of automation: that which can be automated
           | _will_ be automated.
           | 
           | Observation: this isn't anything that can't be automated /
        
         | risyachka wrote:
         | "A month-long skill" after which it won't be a thing anymore,
         | like so many other.
        
           | simonw wrote:
           | Most of the LLM prompting skills I figured out ~three years
           | ago are still useful to me today. Even the ones that I've
           | dropped are useful because I know that things that used to be
           | helpful aren't helpful any more, which helps me build an
           | intuition for how the models have improved over time.
        
       | crystal_revenge wrote:
       | Definitely mirrors my experience. One heuristic I've often used
       | when providing context to model is "is this enough information
       | for a human to solve this task?". Building some text2SQL products
       | in the past it was very interesting to see how often when the
       | model failed, a real data analyst would reply something like "oh
       | yea, that's an older table we don't use any more, the correct
       | table is...". This means the model was likely making a mistake
       | that a real human analyst would have without the proper context.
       | 
       | One thing that is _missing_ from this list is: _evaluations!_
       | 
       | I'm shocked how often I still see large AI projects being run
       | without any regard to evals. Evals are _more_ important for AI
       | projects than test suites are for traditional engineering ones.
       | You don 't even need a big eval set, just one that covers your
       | problem surface reasonably well. However without it you're
       | basically just "guessing" rather than iterating on your problem,
       | and you're not even guessing in a way where each guess is an
       | improvement on the last.
       | 
       | edit: To clarify, I ask _myself_ this question. It 's frequently
       | the case that we expect LLMs to solve problems without the
       | necessary information for a _human_ to solve them.
        
         | kevin_thibedeau wrote:
         | Asking yes no questions will get you a lie 50% of the time.
        
         | adriand wrote:
         | I have pretty good success with asking the model this question
         | before it starts working as well. I'll tell it to ask questions
         | about anything it's unsure of and to ask for examples of code
         | patterns that are in use in the application already that it can
         | use as a template.
        
         | hobs wrote:
         | The thing is, all the people cosplaying as data scientists
         | don't want evaluations, and that's why you saw so little in
         | fake C level projects, because telling people the emperor has
         | no clothes doesn't pay.
         | 
         | For those actually using the products to make money well, hey -
         | all of those have evaluations.
        
       | bGl2YW5j wrote:
       | Saw this the other day and it made me think that too much effort
       | and credence is being given to this idea of crafting the perfect
       | environment for LLMs to thrive in. Which to me, is contrary to
       | how powerful AI systems should function. We shouldn't need to
       | hold its hand so much.
       | 
       | Obviously we've got to tame the version of LLMs we've got now,
       | and this kind of thinking is a step in the right direction. What
       | I take issue with is the way this thinking is couched as a
       | revolutionary silver bullet.
        
         | gametorch wrote:
         | It's still way easier for me to say
         | 
         | "here's where to find the information to solve the task"
         | 
         | than for me to manually type out the code, 99% of the time
        
         | ramesh31 wrote:
         | We shouldn't but it's analogous to how CPU usage used to work.
         | In the 8 bit days you could do some magical stuff that was
         | completely impossible before microcomputers existed. But you
         | had to have all kinds of tricks and heuristics to work around
         | the limited abilities. We're in the same place with LLMs now.
         | Some day we will have the equivalent of what gigabytes or RAM
         | are to a modern CPU now, but we're still stuck in the 80s for
         | now (which _was_ revolutionary at the time).
        
           | smeej wrote:
           | It also reminds me of when you could structure an internet
           | search query and find exactly what you wanted. You just had
           | to ask it in the machine's language.
           | 
           | I hope the generalized future of this doesn't look like the
           | generalized future of that, though. Now it's darn near
           | impossible to find very specific things on the internet
           | because the search engines will ignore any "operators" you
           | try to use if they generate "too few" results (by which they
           | seem to mean "few enough that no one will pay for us to show
           | you an ad for this search"). I'm moderately afraid the
           | ability to get useful results out of AIs will be abstracted
           | away to some lowest common denominator of spammy garbage
           | people want to "consume" instead of _use_ for something.
        
             | skydhash wrote:
             | An empty set of results is a good signal just like a "I
             | don't know" or "You're wrong because <reason>" are good
             | replies to a question/query. It's how a program crashing,
             | while painful, is better than it corrupting data.
        
         | 4ndrewl wrote:
         | Reminds me of first gen chatbots where the user had to put in
         | the effort of trying to craft a phrase in a way that would
         | garner the expected result. It's a form of user-hostility.
        
         | aleksiy123 wrote:
         | It may not be a silver bullet, in that it needs lots of low
         | level human guidance to do some complex task.
         | 
         | But looking at the trend of these tools, the help they are
         | requiring is become more and more higher level, and they are
         | becoming more and more capable of doing longer more complex
         | tasks as well as being able to find the information they need
         | from other systems/tools (search, internet, docs, code etc...).
         | 
         | I think its that trend that really is the exciting part, not
         | just its current capabilities.
        
       | pwarner wrote:
       | It's an integration adventure. This is why much AI is failing in
       | the enterprise. MS Copilot is moderately interesting for data in
       | MS Office, but forget about it accessing 90% of your data that's
       | in other systems.
        
       | JohnMakin wrote:
       | > Building powerful and reliable AI Agents is becoming less about
       | finding a magic prompt or model updates.
       | 
       | Ok, I can buy this
       | 
       | > It is about the engineering of context and providing the right
       | information and tools, in the right format, at the right time.
       | 
       | when the "right" format and "right" time are essentially, and
       | maybe even necessarily, undefined, then aren't you still reaching
       | for a "magic" solution?
       | 
       | If the definition of "right" information is "information which
       | results in a sufficiently accurate answer from a language model"
       | then I fail to see how you are doing anything fundamentally
       | differently than prompt engineering. Since these are non-
       | deterministic machines, I fail to see any reliable heuristic that
       | is fundamentally indistinguishable than "trying and seeing" with
       | prompts.
        
         | edwardbernays wrote:
         | The state of the art theoretical frameworks typically separates
         | these into two distinct exploratory and discovery phases. The
         | first phase, which is exploratory, is best conceptualized as
         | utilizing an atmospheric dispersion device. An easily
         | identifiable marker material, usually a variety of feces, is
         | metaphorically introduced at high velocity. The discovery phase
         | is then conceptualized as analyzing the dispersal patterns of
         | the exploratory phase. These two phases are best summarized,
         | respectively, as "Fuck Around" followed by "Find Out."
        
         | mentalgear wrote:
         | It's magical thinking all the way down. Whether they call it
         | now "prompt" or "context" engineering because it's the same
         | tinkering to find something that "sticks" in non-deterministic
         | space.
        
         | dinvlad wrote:
         | > when the "right" format and "right" time are essentially, and
         | maybe even necessarily, undefined, then aren't you still
         | reaching for a "magic" solution?
         | 
         | Exactly the problem with all "knowing how to use AI correctly"
         | advice out there rn. Shamans with drums, at the end of the day
         | :-)
        
         | andy99 wrote:
         | It's called over-fitting, that's basically what prompt
         | engineering is.
        
       | ModernMech wrote:
       | "Wow, AI will replace programming languages by allowing us to
       | code in natural language!"
       | 
       | "Actually, you need to engineer the prompt to be very precise
       | about what you want to AI to do."
       | 
       | "Actually, you also need to add in a bunch of "context" so it can
       | disambiguate your intent."
       | 
       | "Actually English isn't a good way to express intent and
       | requirements, so we have introduced protocols to structure your
       | prompt, and various keywords to bring attention to specific
       | phrases."
       | 
       | "Actually, these meta languages could use some more features and
       | syntax so that we can better express intent and requirements
       | without ambiguity."
       | 
       | "Actually... wait we just reinvented the idea of a programming
       | language."
        
         | throwawayoldie wrote:
         | Only without all that pesky determinism and reproducibility.
         | 
         | (Whoever's about to say "well ackshually temperature of zero",
         | don't.)
        
         | mindok wrote:
         | "Actually - curly braces help save space in the context while
         | making meaning clearer"
        
         | georgeburdell wrote:
         | We should have known up through Step 4 for a while. See: the
         | legal system
        
       | eddythompson80 wrote:
       | Which is funny because everyone is already looking at AI as: I
       | have 30 TB of shit that is basically "my company". Can I dump
       | that into your AI and have another, magical, all-konwning, co-
       | worker?
        
         | coliveira wrote:
         | Which I think it is double funny because, given the zeal with
         | which companies are jumping into this bandwagon, AI will
         | bankrupt most businesses in record time! Just imagine the
         | typical company firing most workers and paying a fortune to run
         | on top of a schizophrenic AI system that gets things wrong half
         | of the time...
        
           | eddythompson80 wrote:
           | Yes, you can see the insanely accelerated pace of
           | bankruptcies or "strategic realignments" among AI startups.
           | 
           | I think it's just game theory in play and we can do nothing
           | but watch it play out. The "up side" is insane, potentially
           | unlimited. The price is high, but so is the potential reward.
           | By the rules of the game, you have to play. There is no other
           | move you can make. No one knows the odds, but we know the
           | potential reward. You could be the next T company easy. You
           | could realistically go from startup -> 1 Trillion in less
           | than a year if you are right.
           | 
           | We need to give this time to play itself out. The "odds" will
           | eventually be better estimated and it'll affect investment.
           | In the mean time, just give your VC Google's, Microsoft's, or
           | AWS's direct deposit info. It's easier that way.
        
       | whimsicalism wrote:
       | i think context engineering as described is somewhat a subset of
       | 'environment engineering.' the gold-standard is when an outcome
       | reached with tools can be verified as correct and hillclimbed
       | with RL. most of the engineering effort is from building the
       | environment and verifier while the nuts and bolts of grpo/ppo
       | training and open-weight tool-using models are commodities.
        
       | intellectronica wrote:
       | See also: https://ai.intellectronica.net/context-engineering for
       | an overview.
        
       | jshorty wrote:
       | I have felt somewhat frustrated with what I perceive as a broad
       | tendency to malign "prompt engineering" as an antiquated approach
       | for whatever new the industry technique is with regards to
       | building a request body for a model API. Whether that's RAG years
       | ago, nuance in a model request's schema beyond simple text (tool
       | calls, structured outputs, etc), or concepts of agentic knowledge
       | and memory more recently.
       | 
       | While models were less powerful a couple of years ago, there was
       | nothing stopping you at that time from taking a highly dynamic
       | approach to what you asked of them as a "prompt engineer"; you
       | were just more vulnerable to indeterminism in the contract with
       | the models at each step.
       | 
       | Context windows have grown larger; you can fit more in now, push
       | out the need for fine-tuning, and get more ambitious with what
       | you dump in to help guide the LLM. But I'm not immediately sure
       | what skill requirements fundamentally change here. You just have
       | more resources at your disposal, and can care less about counting
       | tokens.
        
         | simonw wrote:
         | I liked what Andrej Karpathy had to say about this:
         | 
         | https://twitter.com/karpathy/status/1937902205765607626
         | 
         | > _[..] in every industrial-strength LLM app, context
         | engineering is the delicate art and science of filling the
         | context window with just the right information for the next
         | step. Science because doing this right involves task
         | descriptions and explanations, few shot examples, RAG, related
         | (possibly multimodal) data, tools, state and history,
         | compacting... Too little or of the wrong form and the LLM doesn
         | 't have the right context for optimal performance. Too much or
         | too irrelevant and the LLM costs might go up and performance
         | might come down. Doing this well is highly non-trivial. And art
         | because of the guiding intuition around LLM psychology of
         | people spirits._
        
       | saejox wrote:
       | Claude 3.5 was released 1 year ago. Current LLMs are not much
       | better at coding than it. Sure they are more shiny and well
       | polished, but not much better at all. I think it is time to curb
       | our enthusiasm.
       | 
       | I almost always rewrite AI written functions in my code a few
       | weeks later. Doesn't matter they have more context or better
       | context, they still fail to write code easily understandable by
       | humans.
        
         | simonw wrote:
         | Claude 3.5 was _remarkably_ good at writing code. If Claude 3.7
         | and Claude 4 are just incremental improvements on that then
         | even better!
         | 
         | I actually think they're a lot more than incremental. 3.7
         | introduced "thinking" mode and 4 doubled down on that and
         | thinking/reasoning/whatever-you-want-to-call-it is particularly
         | good at code challenges.
         | 
         | As always, if you're not getting great results out of coding
         | LLMs it's likely you haven't spent several months iterating on
         | your prompting techniques to figure out what works best for
         | your style of development.
        
       | davidclark wrote:
       | Good example of why I have been totally ignoring people who beat
       | the drum of needing to develop the skills of interacting with
       | models. "Learn to prompt" is already dead? Of course, the true
       | believers will just call this an evolution of prompting or some
       | such goalpost moving.
       | 
       | Personally, my goalpost still hasn't moved: I'll invest in using
       | AI when we are past this grand debate about its usefulness. The
       | utility of a calculator is self-evident. The utility of an LLM
       | requires 30k words of explanation and nuanced caveats. I just
       | can't even be bothered to read the sales pitch anymore.
        
         | simonw wrote:
         | We should be _so far_ past the  "grand debate about its
         | usefulness" at this point.
         | 
         | If you think that's still a debate, you might be listening to
         | the small pool of very loud people who insist nothing has
         | improved since the release of GPT-4.
        
       | _pdp_ wrote:
       | It is wrong. The new/old skill is reverse engineering.
       | 
       | If the majority of the code is generated by AI, you'll still need
       | people with technical expertise to make sense of it.
        
         | CamperBob2 wrote:
         | Not really. Got some code you don't understand? Feed it to a
         | model and ask it to add comments.
         | 
         | Ultimately humans will never need to look at most AI-generated
         | code, any more than we have to look at the machine language
         | emitted by a C compiler. We're a long way from that state of
         | affairs -- as anyone who struggled with code-generation bugs in
         | the first few generations of compilers will agree -- but we'll
         | get there.
        
           | rvz wrote:
           | > Not really. Got some code you don't understand? Feed it to
           | a model and ask it to add comments.
           | 
           | Absolutely not.
           | 
           | An experienced individual in their field can tell if the AI
           | made a mistake in the comments / code rather than the typical
           | untrained eye.
           | 
           | So no, actually read the code and understand what it does.
           | 
           | > Ultimately humans will never need to look at most AI-
           | generated code, any more than we have to look at the machine
           | language emitted by a C compiler.
           | 
           | So for safety critical systems, one should not look or check
           | if code has been AI generated?
        
           | inspectorwadget wrote:
           | >any more than we have to look at the machine language
           | emitted by a C compiler.
           | 
           | Some developers do actually look at the output of C
           | compilers, and some of them even spend a lot of time
           | criticizing that output by a specific compiler (even writing
           | long blog posts about it). The C language has an ISO
           | specification, and if a compiler does not conform to that
           | specification, it is considered a bug in that compiler.
           | 
           | You can even go to godbolt.org / compilerexplorer.org and see
           | the output generated for different targets by different
           | compilers for different languages. It is a popular tool, also
           | for language development.
           | 
           | I do not know what prompt engineering will look like in the
           | future, but without AGI, I remain skeptical about
           | verification of different kinds of code not being required in
           | at least a sizable proportion of cases. That does not exclude
           | usefulness of course: for instance, if you have a case where
           | verification is not needed; or verification in a specific
           | case can be done efficiently and robustly by a relevant
           | expert; or some smart method for verification in some cases,
           | like a case where a few primitive tests are sufficient.
           | 
           | But I have no experience with LLMs or prompt engineering.
           | 
           | I do, however, sympathize with not wanting to deal with
           | paying programmers. Most are likely nice, but for instance a
           | few may be costly, or less than honest, or less than
           | competent, etc. But while I think it is fine to explore LLMs
           | and invest a lot into seeing what might come of them, I would
           | not personally bet everything on them, neither in the short
           | term nor the long term.
           | 
           | May I ask what your professional background and experience
           | is?
        
       | adhamsalama wrote:
       | There is no engineering involved in using AI. It's insulting to
       | call begging an LLM "engineering".
        
         | rednafi wrote:
         | This. Convincing a bullshit generator to give you the right
         | data isn't engineering, it quackery. But I guess "context
         | quackery" wouldn't sell as much.
         | 
         | LLMs are quite useful and I leverage them all the time. But I
         | can't stand these AI yappers saying the same shit over and over
         | again in every media format and trying to sell AI usage as some
         | kind of profound wizardry when it's not.
        
       | 8organicbits wrote:
       | One thought experiment I was musing on recently was the minimal
       | context required to define a task (to an LLM, human, or
       | otherwise). In software, there's a whole discipline of human
       | centered design that aims to uncover the nuance of a task. I've
       | worked with some great designers, and they are incredibly
       | valuable to software development. They develop journey maps, user
       | stories, collect requirements, and produce a wealth of design
       | docs. I don't think you can successfully build large projects
       | without that context.
       | 
       | I've seen lots of AI demos that prompt "build me a TODO app",
       | pretend that is sufficient context, and then claim that the
       | output matches their needs. Without proper context, you can't
       | tell if the output is correct.
        
       | grafmax wrote:
       | There is no need to develop this 'skill'. This can all be
       | automated as a preprocessing step before the main request runs.
       | Then you can have agents with infinite context, etc.
        
         | simonw wrote:
         | You need this skill if you're the engineer that's designing and
         | implementing that preprocessing step.
        
           | yunwal wrote:
           | Non-rhetorical question: is this different enough from data
           | engineering that it needs it's own name?
        
           | ofjcihen wrote:
           | Not at all, just ask the LLM to design and implement it.
           | 
           | AI turtles all the way down.
        
           | dolebirchwood wrote:
           | The skill amounts to determining "what information is
           | required for System A to achieve Outcome X." We already have
           | a term for this: Critical thinking.
        
           | grafmax wrote:
           | In the short term horizon I think you are right. But over a
           | longer horizon, we should expect model providers to
           | internalize these mechanisms, similar to how chain of thought
           | has been effectively "internalized" - which in turn has
           | reduced the effectiveness that prompt engineering used to
           | provide as models have gotten better.
        
       | lawlessone wrote:
       | I look forward to 5 million LinkedIn posts repeating this
        
       | labrador wrote:
       | I'm curious how this applies to systems like ChatGPT, which now
       | have two kinds of memory: user-configurable memory (a list of
       | facts or preferences) and an opaque chat history memory. If
       | context is the core unit of interaction, it seems important to
       | give users more control or at least visibility into both.
       | 
       | I know context engineering is critical for agents, but I wonder
       | if it's also useful for shaping personality and improving overall
       | relatability? I'm curious if anyone else has thought about that.
        
         | simonw wrote:
         | I really dislike the new ChatGPT memory feature (the one that
         | pulls details out of a summarized version of all of your
         | previous chats, as opposed to older memory feature that records
         | short notes to itself) for exactly this reason: it makes it
         | even harder for me to control the context when I'm using
         | ChatGPT.
         | 
         | If I'm debugging something with ChatGPT and I hit an error
         | loop, my fix is to start a new conversation.
         | 
         | Now I can't be sure ChatGPT won't include notes from that
         | previous conversation's context that I was trying to get rid
         | of!
         | 
         | Thankfully you can turn the new memory thing off, but it's on
         | by default.
         | 
         | I wrote more about that here:
         | https://simonwillison.net/2025/May/21/chatgpt-new-memory/
        
           | labrador wrote:
           | On the other hand, for my use case (I'm retired and enjoy
           | chatting with it), having it remember items from past chats
           | makes it feel much more personable. I actually prefer Claude,
           | but it doesn't have memory, so I unsubscribed and subscribed
           | to ChatGPT. That it remembers obscure but relevant details
           | about our past chats feels almost magical.
           | 
           | It's good that you can turn it off. I can see how it might
           | cause problems when trying to do technical work.
           | 
           | Edit: Note, the introduction of memory was a contributing
           | factor to "the sychophant" that OpenAI had to rollback. When
           | it could praise you while seeming to know you was encouraging
           | addictive use.
           | 
           | Edit2: Here's the previous Hacker News discussion on Simon's
           | "I really don't like ChatGPT's new memory dossier"
           | 
           | https://news.ycombinator.com/item?id=44052246
        
       | ozim wrote:
       | Finding a magic prompt was never "prompt engineering" it was
       | always "context engineering" - lots of "AI wannabe gurus" sold it
       | as such but they never knew any better.
       | 
       | RAG wasn't invented this year.
       | 
       | Proper tooling that wraps esoteric knowledge like using
       | embeddings, vector dba or graph dba becomes more mainstream. Big
       | players improve their tooling so more stuff is available.
        
       | semiinfinitely wrote:
       | context engineering is just a phrase that karpathy uttered for
       | the first time 6 days ago and now everyone is treating it like
       | its a new field of science and engineering
        
       | hnthrow90348765 wrote:
       | Cool, but wait another year or two and context engineering will
       | be obsolete as well. It still feels like tinkering with the
       | machine, which is what AI is (supposed to be) moving us away
       | from.
        
         | hobs wrote:
         | Probably impossible unless computers themselves change in
         | another year or two.
        
       | alganet wrote:
       | If I need to do all this work (gather data, organize it, prepare
       | it, etc), there are other AI solutions I might decide to use
       | instead of an LLM.
        
         | joe5150 wrote:
         | You might as well use your natural intelligence instead of the
         | artificial stuff at that point.
        
           | coliveira wrote:
           | Yes, when all is said and done people will realize that
           | artificial intelligence is too expensive to replace natural
           | intelligence. AI companies want to avoid this realization for
           | as long as possible.
        
             | alganet wrote:
             | This is not what I'm talking about, see the other reply.
        
           | alganet wrote:
           | I'm assuming the post is about automated "context
           | engineering". It's not a human doing it.
           | 
           | In this arrangement, the LLM is a component. What I meant is
           | that it seems to me that other non-LLM AI technologies would
           | be a better fit for this kind of thing. Lighter, easier to
           | change and adapt, potentially even cheaper. Not for all
           | scenarios, but for a lot of them.
        
         | simonw wrote:
         | What kind of alternative AI solutions might you use here?
        
           | alganet wrote:
           | Classifiers to classify things, traditional neural nets to
           | identify things. Typical run of the mill.
           | 
           | In OpenAI hype language, this is a problem for "Software
           | 2.0", not "Software 3.0" in 99% of the cases.
           | 
           | The thing about matching an informal tone would be the hard
           | part. I have to concede that LLMs are probably better at
           | that. But I have the feeling that this is not exactly the
           | feature most companies are looking for, and they would be
           | willing to not have it for a cheaper alternative. Most of
           | them just don't know that's possible.
        
       | la64710 wrote:
       | Of course the best prompts automatically included providing the
       | best (not necessarily most) context to extract the right output.
        
       | m3kw9 wrote:
       | Well, it's still a prompt
        
       | bradhe wrote:
       | Back in my day we just called this "knowing what to google" but
       | alright, guys.
        
       | rednafi wrote:
       | I really don't get this rush to invent neologisms to describe
       | every single behavioral artifact of LLMs. Maybe it's just a
       | yearning to be known as the father of Deez Unseen Mind-blowing
       | Behaviors (DUMB).
       | 
       | LLM farts -- Stochastic Wind Release.
       | 
       | The latest one is yet another attempt to make prompting sound
       | like some kind of profound skill, when it's really not that
       | different from just knowing how to use search effectively.
       | 
       | Also, "context" is such an overloaded term at this point that you
       | might as well just call it "doing stuff" -- and you'd objectively
       | be more descriptive.
        
       | jongjong wrote:
       | Recently I started work on a new project and I 'vibe coded' a
       | test case for a complex OAuth token expiry bug entirely with AI
       | (with Cursor), complete with mocks and stubs... And it was on
       | someone else's project. I had no prior familiarity with the code.
       | 
       | That's when I understood that vibe coding is real and context is
       | the biggest hurdle.
       | 
       | That said, most of the context could not be pulled from the
       | codebase directly but came from me after asking the AI to
       | check/confirm certain things that I suspected could be the
       | problem.
       | 
       | I think vibe coding can be very powerful in the hands of a senior
       | developer because if you're the kind of person who can clearly
       | explain their intuitions with words, it's exactly the missing
       | piece that the AI needs to solve the problem... And you still
       | need to do code review aspect which is also something which
       | senior devs are generally good at. Sometimes it makes
       | mistakes/incorrect assumptions.
       | 
       | I'm feeling positive about LLMs. I was always complaining about
       | other people's ugly code before... I HATE over-modularized,
       | poorly abstracted code where I have to jump across 5+ different
       | files to figure out what a function is doing; with AI, I can just
       | ask it to read all the relevant code across all the files and
       | tell me WTF the spaghetti is doing... Then it generates new code
       | which 'follows' existing 'conventions' (same level of mess). The
       | AI basically automates the most horrible aspect of the work;
       | making sense of the complexity and churning out more complexity
       | that works. I love it.
       | 
       | That said, in the long run, to build sustainable projects, I
       | think it will require following good coding conventions and
       | minimal 'low code' coding... Because the codebase could explode
       | in complexity if not used carefully. Code quality can only drop
       | as the project grows. Poor abstractions tend to stick around and
       | have negative flow-on effects which impact just about everything.
        
       | colgandev wrote:
       | I've been finding a ton of success lately with speech to text as
       | the user prompt, and then using https://continue.dev in VSCode,
       | or Aider, to supply context from files from my projects and
       | having those tools run the inference.
       | 
       | I'm trying to figure out how to build a "Context Management
       | System" (as compared to a Content Management System) for all of
       | my prompts. I completely agree with the premise of this article,
       | if you aren't managing your context, you are losing all of the
       | context you create every time you create a new conversation. I
       | want to collect all of the reusable blocks from every
       | conversation I have, as well as from my research and reading
       | around the internet. Something like a mashup of Obsidian with
       | some custom Python scripts.
       | 
       | The ideal inner loop I'm envisioning is to create a "Project"
       | document that uses Jinja templating to allow transclusion of a
       | bunch of other context objects like code files, documentation,
       | articles, and then also my own other prompt fragments, and then
       | to compose them in a master document that I can "compile" into a
       | "superprompt" that has the precise context that I want for every
       | prompt.
       | 
       | Since with the chat interfaces they are always already just
       | sending the entire previous conversation message history anyway,
       | I don't even really want to use a chat style interface as much as
       | just "one shotting" the next step in development.
       | 
       | It's almost a turn based game: I'll fiddle with the code and the
       | prompts, and then run "end turn" and now it is the llm's turn. On
       | the llm's turn, it compiles the prompt and runs inference and
       | outputs the changes. With Aider it can actually apply those
       | changes itself. I'll then review the code using diffs and make
       | changes and then that's a full turn of the game of AI-assisted
       | code.
       | 
       | I love that I can just brain dump into speech to text, and llms
       | don't really care that much about grammar and syntax. I can
       | curate fragments of documentation and specifications for
       | features, and then just kind of rant and rave about what I want
       | for a while, and then paste that into the chat and with my
       | current LLM of choice being Claude, it seems to work really quite
       | well.
       | 
       | My Django work feels like it's been supercharged with just this
       | workflow, and my context management engine isn't even really that
       | polished.
       | 
       | If you aren't getting high quality output from llms, definitely
       | consider how you are supplying context.
        
       | patrickhogan1 wrote:
       | OpenAI's o3 searches the web behind a curtain: you get a few
       | source links and a fuzzy reasoning trace, but never the full
       | chunk of text it actually pulled in. Without that raw context,
       | it's impossible to audit what really shaped the answer.
        
         | simonw wrote:
         | Yeah, I find that really frustrating.
         | 
         | I understand why they do it though: if they presented the
         | actual content that came back from search they would
         | _absolutely_ get in trouble for copyright-infringement.
         | 
         | I suspect that's why so much of the Claude 4 system prompt for
         | their search tool is the message "Always respect copyright by
         | NEVER reproducing large 20+ word chunks of content from search
         | results" repeated half a dozen times:
         | https://simonwillison.net/2025/May/25/claude-4-system-prompt...
        
       | rvz wrote:
       | This is just another "rebranding" of the failed "prompt
       | engineering" trend to promote another borderline pseudo-
       | scientific trend to attact more VC money to fund a new pyramid
       | scheme.
       | 
       | Assuming that this will be using the totally flawed MCP protocol,
       | I can only see more cases of data exfiltration attacks on these
       | AI systems just like before [0] [1].
       | 
       | Prompt injection + Data exfiltration is the new social
       | engineering in AI Agents.
       | 
       | [0] https://embracethered.com/blog/posts/2025/security-
       | advisory-...
       | 
       | [1] https://www.bleepingcomputer.com/news/security/zero-click-
       | ai...
        
       | slavapestov wrote:
       | I feel like if the first link in your post is a tweet from a tech
       | CEO the rest is unlikely to be insightful.
        
       | CharlieDigital wrote:
       | I was at a startup that started using OpenAI APIs pretty early
       | (almost 2 years ago now?).
       | 
       | "Back in the day", we had to be very sparing with context to get
       | great results so we really focused on how to build great context.
       | Indexing and retrieval were pretty much our core focus.
       | 
       | Now, even with the larger windows, I find this still to be true.
       | 
       | The moat for most companies is actually their data, data
       | indexing, and data retrieval[0]. Companies that 1) have the data
       | and 2) know how to use that data are going to win.
       | 
       | My analogy is this:                   > The LLM is just an oven;
       | a fantastical oven.  But for it to produce a good product still
       | depends on picking good ingredients, in the right ratio, and
       | preparing them with care.  You hit the bake button, then you
       | still need to finish it off with presentation and decoration.
       | 
       | [0] https://chrlschn.dev/blog/2024/11/on-bakers-ovens-and-ai-
       | sta...
        
       | retinaros wrote:
       | it is still sending a string of chars and hoping the model
       | outputs something relevant. let's not do like finance and
       | permanently obfuscate really simple stuff to make us bigger than
       | we are.
       | 
       | prompt engineering/context engineering : stringbuilder
       | 
       | Retrieval augmented generation: search+ adding strings to main
       | string
       | 
       | test time compute: running multiple generation and choosing the
       | best
       | 
       | agents: for loop and some ifs
        
       | jumploops wrote:
       | To anyone who has worked with LLMs extensively, this is obvious.
       | 
       | Single prompts can only get you so far (surprisingly far
       | actually, but then they fall over quickly).
       | 
       | This is actually the reason I built my own chat client (~2 years
       | ago), because I wanted to "fork" and "prune" the context easily;
       | using the hosted interfaces was too opaque.
       | 
       | In the age of (working) tool-use, this starts to resemble agents
       | calling sub-agents, partially to better abstract, but mostly to
       | avoid context pollution.
        
       | mgdev wrote:
       | If we zoom out far enough, and start to put more and more under
       | the execution umbrella of AI, what we're actually describing here
       | is... product development.
       | 
       | You are constructing the set of context, policies, directed
       | attention toward some intentional end, same as it ever was. The
       | difference is you need fewer meat bags to do it, even as your
       | projects get larger and larger.
       | 
       | To me this is wholly encouraging.
       | 
       | Some projects will remain outside what models are capable of, and
       | your role as a human will be to stitch many smaller projects
       | together into the whole. As models grow more capable, that
       | stitching will still happen - just as larger levels.
       | 
       | But as long as humans have imagination, there will always be a
       | role for the human in the process: as the orchestrator of will,
       | and ultimate fitness function for his own creations.
        
       | jcon321 wrote:
       | I thought this entire premise was obvious? Does it really take an
       | article and a venn diagram to say you should only provide the
       | relevant content to your LLM when asking a question?
        
         | simonw wrote:
         | "Relevant content to your LLM when asking a question" is last
         | year's RAG.
         | 
         | If you look at how sophisticated current LLM systems work there
         | is _so much more_ to this.
         | 
         | Just one example: Microsoft open sourced VS Code Copilot Chat
         | today (MIT license). Their prompts are dynamically assembled
         | with tool instructions for various tools based on whether or
         | not they are enabled: https://github.com/microsoft/vscode-
         | copilot-chat/blob/v0.29....
         | 
         | And the autocomplete stuff has a _wealth_ of contextual
         | information included: https://github.com/microsoft/vscode-
         | copilot-chat/blob/v0.29....                 You have access to
         | the following information to help you make       informed
         | suggestions:            - recently_viewed_code_snippets: These
         | are code snippets that       the developer has recently looked
         | at, which might provide       context or examples relevant to
         | the current task. They are       listed from oldest to newest,
         | with line numbers in the form       #| to help you understand
         | the edit diff history. It's       possible these are entirely
         | irrelevant to the developer's       change.       -
         | current_file_content: The content of the file the developer
         | is currently working on, providing the broader context of the
         | code. Line numbers in the form #| are included to help you
         | understand the edit diff history.       - edit_diff_history: A
         | record of changes made to the code,       helping you
         | understand the evolution of the code and the       developer's
         | intentions. These changes are listed from oldest       to
         | latest. It's possible a lot of old edit diff history is
         | entirely irrelevant to the developer's change.       -
         | area_around_code_to_edit: The context showing the code
         | surrounding the section to be edited.       - cursor position
         | marked as ${CURSOR_TAG}: Indicates where       the developer's
         | cursor is currently located, which can be       crucial for
         | understanding what part of the code they are       focusing on.
        
       | liampulles wrote:
       | The only engineering going on here is Job Engineering(tm)
        
         | ryhanshannon wrote:
         | It is really funny to see the hyper fixation on relabeling of
         | soft skills / product development to "<blank> Engineering" in
         | the AI space.
        
       | amelius wrote:
       | Yes, and it is a soft skill.
        
       | zacharyvoase wrote:
       | I love how we have such a poor model of how LLMs work (or more
       | aptly don't work) that we are developing an entire alchemical
       | practice around them. Definitely seems healthy for the industry
       | and the species.
        
         | simonw wrote:
         | The stuff that's showing up under the "context engineering"
         | banner feels a whole lot _less_ alchemical to me than the older
         | prompt engineering tricks.
         | 
         | Alchemical is "you are the world's top expert on marketing, and
         | if you get it right I'll tip you $100, and if you get it wrong
         | a kitten will die".
         | 
         | The techniques in https://www.dbreunig.com/2025/06/26/how-to-
         | fix-your-context.... seem a whole lot more rational to me than
         | that.
        
       | geeewhy wrote:
       | ive beeen experimenting with this for a while, (im sure in a way,
       | most of us did). Would be good to numerate some examples. When it
       | comes to coding, here's a few:
       | 
       | - compile scripts that can grep / compile list of your relevant
       | files as files of interest
       | 
       | - make temp symlinks in relevant repos to each other for
       | documentation generation, pass each documentation collected from
       | respective repos to to enable cross-repo ops to be performed
       | atomically
       | 
       | - build scripts to copy schemas, db ddls, dtos, example records,
       | api specs, contracts (still works better than MCP in most cases)
       | 
       | I found these steps not only help better output but also reduces
       | cost greatly avoiding some "reasoning" hops. I'm sure practice
       | can extend beyond coding.
        
       | dinvlad wrote:
       | I feel like ppl just keep inventing concepts for the same old
       | things, which come down to dancing with the drums around the fire
       | and screaming shamanic incantations :-)
        
       | emporas wrote:
       | Prompting sits on the back seat, while context is the driving
       | factor. 100% agree with this.
       | 
       | For programming I don't use any prompts. I give a problem solved
       | already, as a context or example, and I ask it to implement
       | something similar. One sentence or two, and that's it.
       | 
       | Other kind of tasks, like writing, I use prompts, but even then,
       | context and examples are still the driving factor.
       | 
       | In my opinion, we are in an interesting point in history, in
       | which now individuals will need their own personal database. Like
       | companies the last 50 years, which had their own database records
       | of customers, products, prices and so on, now an individual will
       | operate using personal contextual information, saved over a long
       | period of time in wikis or Sqlite rows.
        
         | d0gsg0w00f wrote:
         | Yes, the other day I was telling a colleague that we all need
         | our own personal context to feed into every model we interact
         | with. You could carry it around on a thumb drive or something.
        
       ___________________________________________________________________
       (page generated 2025-06-30 23:00 UTC)