[HN Gopher] The New Skill in AI Is Not Prompting, It's Context E...
___________________________________________________________________
The New Skill in AI Is Not Prompting, It's Context Engineering
Author : robotswantdata
Score : 219 points
Date : 2025-06-30 20:53 UTC (2 hours ago)
(HTM) web link (www.philschmid.de)
(TXT) w3m dump (www.philschmid.de)
| baxtr wrote:
| > _Conclusion
|
| Building powerful and reliable AI Agents is becoming less about
| finding a magic prompt or model updates. It is about the
| engineering of context and providing the right information and
| tools, in the right format, at the right time. It's a cross-
| functional challenge that involves understanding your business
| use case, defining your outputs, and structuring all the
| necessary information so that an LLM can "accomplish the task."_
|
| That's actually also true for humans: the more context (aka right
| info at the right time) you provide the better for solving tasks.
| QuercusMax wrote:
| Yeah... I'm always asking my UX and product folks for mocks,
| requirements, acceptance criteria, sample inputs and outputs,
| why we care about this feature, etc.
|
| Until we can scan your brain and figure out what you really
| want, it's going to be necessary to actually describe what you
| want built, and not just rely on vibes.
| lupire wrote:
| Not "more" context. "Better" context.
|
| (X-Y problem, for example.)
| root_axis wrote:
| I am not a fan of this banal trend of superficially comparing
| aspects of machine learning to humans. It doesn't provide any
| insight and is hardly ever accurate.
| ModernMech wrote:
| I agree, however I _do_ appreciate comparisons to other
| human-made systems. For example, "providing the right
| information and tools, in the right format, at the right
| time" sounds a lot like a bureaucracy, particularly because
| "right" is decided for you, it's left undefined, and may
| change at any time with no warning or recourse.
| furyofantares wrote:
| [delayed]
| mentalgear wrote:
| Basically, finding the right buttons to push within the
| constraints of the environment. Not so much different from what
| (SW) engineering is, only non-deterministic in the outcomes.
| simonw wrote:
| I wrote a bit about this the other day:
| https://simonwillison.net/2025/Jun/27/context-engineering/
|
| Drew Breunig has been doing some _fantastic_ writing on this
| subject - coincidentally at the same time as the "context
| engineering" buzzword appeared but actually unrelated to that
| meme.
|
| How Long Contexts Fail - https://www.dbreunig.com/2025/06/22/how-
| contexts-fail-and-ho... - talks about the various ways in which
| longer contexts can start causing problems (also known as
| "context rot")
|
| How to Fix Your Context -
| https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.... -
| gives names to a bunch of techniques for working around these
| problems including Tool Loadout, Context Quarantine, Context
| Pruning, Context Summarization, and Context Offloading.
| the_mitsuhiko wrote:
| Drew Breunig's posts are a must read on this. This is not only
| important for writing your own agents, it is also critical when
| using agentic coding right now. These limitations/behaviors
| will be with us for a while.
| outofpaper wrote:
| They might be good reads on the topic but Drew makes some
| significant etymological mistakes. For example loadout
| doesn't come from gaming but military terminology. It's
| essentially the same as kit or gear.
| ZYbCRq22HbJ2y7 wrote:
| > They might be good reads on the topic but Drew makes some
| significant etymological mistakes. For example loadout
| doesn't come from gaming but military terminology. It's
| essentially the same as kit or gear.
|
| Doesn't seem that significant?
|
| Not to say those blog posts say anything much anyway that
| any "prompt engineer" (someone who uses LLMs frequently)
| doesn't already know, but maybe it is useful to some at
| such an early stage of _these things_.
| simonw wrote:
| Drew isn't using that term in a military context, he's
| using it in a gaming context. He defines what he means very
| clearly:
|
| > _The term "loadout" is a gaming term that refers to the
| specific combination of abilities, weapons, and equipment
| you select before a level, match, or round._
|
| In the military you don't select your abilities before
| entering a level.
| DiggyJohnson wrote:
| This seems like a rather unimportant type of mistake,
| especially because the definition is still accurate, it's
| just the etymology isn't complete.
| storus wrote:
| Those issues are considered artifacts of the current crop of
| LLMs in academic circles; there is already research allowing
| LLMs to use millions of different tools at the same time, and
| stable long contexts, likely reducing the amount of agents to
| one for most use cases outside interfacing different providers.
|
| Anyone basing their future agentic systems on current LLMs
| would likely face LangChain fate - built for GPT-3, made
| obsolete by GPT-3.5.
| simonw wrote:
| Can you link to the research on millions of different terms
| and stable long contexts? I haven't come across that yet.
| storus wrote:
| You can look at AnyTool, 2024 (16,000 tools) and start
| looking at newer research from there.
|
| https://arxiv.org/abs/2402.04253
|
| For long contexts start with activation beacons and RoPE
| scaling.
| simonw wrote:
| I would classify AnyTool as a context engineering trick.
| It's using GPT-4 function calls (what we would call tool
| calls today) to find the best tools for the current job
| based on a 3-level hierarchy search.
|
| Drew calls that one "Tool Loadout"
| https://www.dbreunig.com/2025/06/26/how-to-fix-your-
| context....
| Foreignborn wrote:
| yes, but those aren't released and even then you'll always
| need glue code.
|
| you just need to knowingly resource what glue code is needed,
| and build it in a way it can scale with whatever new limits
| that upgraded models give you.
|
| i can't imagine a world where people aren't building products
| that try to overcome the limitations of SOTA models
| storus wrote:
| My point is that newer models will have those baked in, so
| instead of supporting ~30 tools before falling apart they
| will reliably support 10,000 tools defined in their
| context. That alone would dramatically change the need for
| more than one agent in most cases as the architectural
| split into multiple agents is often driven by the inability
| to reliably run many tools within a single agent. Now you
| can hack around it today by turning tools on/off depending
| on the agent's state but at some point in the future you
| might afford not to bother and just dump all your tools to
| a long stable context, maybe cache it for performance, and
| that will be it.
| ZYbCRq22HbJ2y7 wrote:
| There will likely be custom, large, and expensive models
| at an enterprise level in the near future (some large
| entities and governments already have them (niprgpt)).
|
| With that in mind, what would be the business sense in
| siloing a single "Agent" instead of using something like
| a service discovery service that all benefit from?
| storus wrote:
| My guess is the main issue is latency and accuracy; a
| single agent without all the routing/evaluation sub-
| agents around it that introduce cumulative errors, lead
| to infinite loops and slow it down would likely be much
| faster, accurate and could be cached at the token level
| on a GPU, reducing token preprocessing time further. Now
| different companies would run different "monorepo" agents
| and those would need something like MCP to talk to each
| other at the business boundary, but internally all this
| won't be necessary.
|
| Also the current LLMs have still too many issues because
| they are autoregressive and heavily biased towards the
| first few generated tokens. They also still don't have
| full bidirectional awareness of certain relationships due
| to how they are masked during the training. Discrete
| diffusion looks interesting but I am not sure how does
| that one deal with tools as I've never seen a model from
| that class using any tools.
| ZYbCRq22HbJ2y7 wrote:
| How would "a million different tool calls at the same time"
| work? For instance, MCP is HTTP based, even at low latency in
| incredibly parallel environments that would take forever.
| dinvlad wrote:
| > already research allowing LLMs to use millions of different
| tools
|
| Hmm first time hearing about this, could you share any
| examples please?
| simonw wrote:
| See this comment
| https://news.ycombinator.com/item?id=44428548
| old_man_cato wrote:
| First, you pay a human artist to draw a pelican on a bicycle.
|
| Then, you provide that as "context".
|
| Next, you prompt the model.
|
| Voila!
| d0gsg0w00f wrote:
| This hits too close to home.
| JoeOfTexas wrote:
| So who will develop the first Logic Core that automates the
| context engineer.
| igravious wrote:
| The first rule of automation: that which can be automated
| _will_ be automated.
|
| Observation: this isn't anything that can't be automated /
| risyachka wrote:
| "A month-long skill" after which it won't be a thing anymore,
| like so many other.
| simonw wrote:
| Most of the LLM prompting skills I figured out ~three years
| ago are still useful to me today. Even the ones that I've
| dropped are useful because I know that things that used to be
| helpful aren't helpful any more, which helps me build an
| intuition for how the models have improved over time.
| crystal_revenge wrote:
| Definitely mirrors my experience. One heuristic I've often used
| when providing context to model is "is this enough information
| for a human to solve this task?". Building some text2SQL products
| in the past it was very interesting to see how often when the
| model failed, a real data analyst would reply something like "oh
| yea, that's an older table we don't use any more, the correct
| table is...". This means the model was likely making a mistake
| that a real human analyst would have without the proper context.
|
| One thing that is _missing_ from this list is: _evaluations!_
|
| I'm shocked how often I still see large AI projects being run
| without any regard to evals. Evals are _more_ important for AI
| projects than test suites are for traditional engineering ones.
| You don 't even need a big eval set, just one that covers your
| problem surface reasonably well. However without it you're
| basically just "guessing" rather than iterating on your problem,
| and you're not even guessing in a way where each guess is an
| improvement on the last.
|
| edit: To clarify, I ask _myself_ this question. It 's frequently
| the case that we expect LLMs to solve problems without the
| necessary information for a _human_ to solve them.
| kevin_thibedeau wrote:
| Asking yes no questions will get you a lie 50% of the time.
| adriand wrote:
| I have pretty good success with asking the model this question
| before it starts working as well. I'll tell it to ask questions
| about anything it's unsure of and to ask for examples of code
| patterns that are in use in the application already that it can
| use as a template.
| hobs wrote:
| The thing is, all the people cosplaying as data scientists
| don't want evaluations, and that's why you saw so little in
| fake C level projects, because telling people the emperor has
| no clothes doesn't pay.
|
| For those actually using the products to make money well, hey -
| all of those have evaluations.
| bGl2YW5j wrote:
| Saw this the other day and it made me think that too much effort
| and credence is being given to this idea of crafting the perfect
| environment for LLMs to thrive in. Which to me, is contrary to
| how powerful AI systems should function. We shouldn't need to
| hold its hand so much.
|
| Obviously we've got to tame the version of LLMs we've got now,
| and this kind of thinking is a step in the right direction. What
| I take issue with is the way this thinking is couched as a
| revolutionary silver bullet.
| gametorch wrote:
| It's still way easier for me to say
|
| "here's where to find the information to solve the task"
|
| than for me to manually type out the code, 99% of the time
| ramesh31 wrote:
| We shouldn't but it's analogous to how CPU usage used to work.
| In the 8 bit days you could do some magical stuff that was
| completely impossible before microcomputers existed. But you
| had to have all kinds of tricks and heuristics to work around
| the limited abilities. We're in the same place with LLMs now.
| Some day we will have the equivalent of what gigabytes or RAM
| are to a modern CPU now, but we're still stuck in the 80s for
| now (which _was_ revolutionary at the time).
| smeej wrote:
| It also reminds me of when you could structure an internet
| search query and find exactly what you wanted. You just had
| to ask it in the machine's language.
|
| I hope the generalized future of this doesn't look like the
| generalized future of that, though. Now it's darn near
| impossible to find very specific things on the internet
| because the search engines will ignore any "operators" you
| try to use if they generate "too few" results (by which they
| seem to mean "few enough that no one will pay for us to show
| you an ad for this search"). I'm moderately afraid the
| ability to get useful results out of AIs will be abstracted
| away to some lowest common denominator of spammy garbage
| people want to "consume" instead of _use_ for something.
| skydhash wrote:
| An empty set of results is a good signal just like a "I
| don't know" or "You're wrong because <reason>" are good
| replies to a question/query. It's how a program crashing,
| while painful, is better than it corrupting data.
| 4ndrewl wrote:
| Reminds me of first gen chatbots where the user had to put in
| the effort of trying to craft a phrase in a way that would
| garner the expected result. It's a form of user-hostility.
| aleksiy123 wrote:
| It may not be a silver bullet, in that it needs lots of low
| level human guidance to do some complex task.
|
| But looking at the trend of these tools, the help they are
| requiring is become more and more higher level, and they are
| becoming more and more capable of doing longer more complex
| tasks as well as being able to find the information they need
| from other systems/tools (search, internet, docs, code etc...).
|
| I think its that trend that really is the exciting part, not
| just its current capabilities.
| pwarner wrote:
| It's an integration adventure. This is why much AI is failing in
| the enterprise. MS Copilot is moderately interesting for data in
| MS Office, but forget about it accessing 90% of your data that's
| in other systems.
| JohnMakin wrote:
| > Building powerful and reliable AI Agents is becoming less about
| finding a magic prompt or model updates.
|
| Ok, I can buy this
|
| > It is about the engineering of context and providing the right
| information and tools, in the right format, at the right time.
|
| when the "right" format and "right" time are essentially, and
| maybe even necessarily, undefined, then aren't you still reaching
| for a "magic" solution?
|
| If the definition of "right" information is "information which
| results in a sufficiently accurate answer from a language model"
| then I fail to see how you are doing anything fundamentally
| differently than prompt engineering. Since these are non-
| deterministic machines, I fail to see any reliable heuristic that
| is fundamentally indistinguishable than "trying and seeing" with
| prompts.
| edwardbernays wrote:
| The state of the art theoretical frameworks typically separates
| these into two distinct exploratory and discovery phases. The
| first phase, which is exploratory, is best conceptualized as
| utilizing an atmospheric dispersion device. An easily
| identifiable marker material, usually a variety of feces, is
| metaphorically introduced at high velocity. The discovery phase
| is then conceptualized as analyzing the dispersal patterns of
| the exploratory phase. These two phases are best summarized,
| respectively, as "Fuck Around" followed by "Find Out."
| mentalgear wrote:
| It's magical thinking all the way down. Whether they call it
| now "prompt" or "context" engineering because it's the same
| tinkering to find something that "sticks" in non-deterministic
| space.
| dinvlad wrote:
| > when the "right" format and "right" time are essentially, and
| maybe even necessarily, undefined, then aren't you still
| reaching for a "magic" solution?
|
| Exactly the problem with all "knowing how to use AI correctly"
| advice out there rn. Shamans with drums, at the end of the day
| :-)
| andy99 wrote:
| It's called over-fitting, that's basically what prompt
| engineering is.
| ModernMech wrote:
| "Wow, AI will replace programming languages by allowing us to
| code in natural language!"
|
| "Actually, you need to engineer the prompt to be very precise
| about what you want to AI to do."
|
| "Actually, you also need to add in a bunch of "context" so it can
| disambiguate your intent."
|
| "Actually English isn't a good way to express intent and
| requirements, so we have introduced protocols to structure your
| prompt, and various keywords to bring attention to specific
| phrases."
|
| "Actually, these meta languages could use some more features and
| syntax so that we can better express intent and requirements
| without ambiguity."
|
| "Actually... wait we just reinvented the idea of a programming
| language."
| throwawayoldie wrote:
| Only without all that pesky determinism and reproducibility.
|
| (Whoever's about to say "well ackshually temperature of zero",
| don't.)
| mindok wrote:
| "Actually - curly braces help save space in the context while
| making meaning clearer"
| georgeburdell wrote:
| We should have known up through Step 4 for a while. See: the
| legal system
| eddythompson80 wrote:
| Which is funny because everyone is already looking at AI as: I
| have 30 TB of shit that is basically "my company". Can I dump
| that into your AI and have another, magical, all-konwning, co-
| worker?
| coliveira wrote:
| Which I think it is double funny because, given the zeal with
| which companies are jumping into this bandwagon, AI will
| bankrupt most businesses in record time! Just imagine the
| typical company firing most workers and paying a fortune to run
| on top of a schizophrenic AI system that gets things wrong half
| of the time...
| eddythompson80 wrote:
| Yes, you can see the insanely accelerated pace of
| bankruptcies or "strategic realignments" among AI startups.
|
| I think it's just game theory in play and we can do nothing
| but watch it play out. The "up side" is insane, potentially
| unlimited. The price is high, but so is the potential reward.
| By the rules of the game, you have to play. There is no other
| move you can make. No one knows the odds, but we know the
| potential reward. You could be the next T company easy. You
| could realistically go from startup -> 1 Trillion in less
| than a year if you are right.
|
| We need to give this time to play itself out. The "odds" will
| eventually be better estimated and it'll affect investment.
| In the mean time, just give your VC Google's, Microsoft's, or
| AWS's direct deposit info. It's easier that way.
| whimsicalism wrote:
| i think context engineering as described is somewhat a subset of
| 'environment engineering.' the gold-standard is when an outcome
| reached with tools can be verified as correct and hillclimbed
| with RL. most of the engineering effort is from building the
| environment and verifier while the nuts and bolts of grpo/ppo
| training and open-weight tool-using models are commodities.
| intellectronica wrote:
| See also: https://ai.intellectronica.net/context-engineering for
| an overview.
| jshorty wrote:
| I have felt somewhat frustrated with what I perceive as a broad
| tendency to malign "prompt engineering" as an antiquated approach
| for whatever new the industry technique is with regards to
| building a request body for a model API. Whether that's RAG years
| ago, nuance in a model request's schema beyond simple text (tool
| calls, structured outputs, etc), or concepts of agentic knowledge
| and memory more recently.
|
| While models were less powerful a couple of years ago, there was
| nothing stopping you at that time from taking a highly dynamic
| approach to what you asked of them as a "prompt engineer"; you
| were just more vulnerable to indeterminism in the contract with
| the models at each step.
|
| Context windows have grown larger; you can fit more in now, push
| out the need for fine-tuning, and get more ambitious with what
| you dump in to help guide the LLM. But I'm not immediately sure
| what skill requirements fundamentally change here. You just have
| more resources at your disposal, and can care less about counting
| tokens.
| simonw wrote:
| I liked what Andrej Karpathy had to say about this:
|
| https://twitter.com/karpathy/status/1937902205765607626
|
| > _[..] in every industrial-strength LLM app, context
| engineering is the delicate art and science of filling the
| context window with just the right information for the next
| step. Science because doing this right involves task
| descriptions and explanations, few shot examples, RAG, related
| (possibly multimodal) data, tools, state and history,
| compacting... Too little or of the wrong form and the LLM doesn
| 't have the right context for optimal performance. Too much or
| too irrelevant and the LLM costs might go up and performance
| might come down. Doing this well is highly non-trivial. And art
| because of the guiding intuition around LLM psychology of
| people spirits._
| saejox wrote:
| Claude 3.5 was released 1 year ago. Current LLMs are not much
| better at coding than it. Sure they are more shiny and well
| polished, but not much better at all. I think it is time to curb
| our enthusiasm.
|
| I almost always rewrite AI written functions in my code a few
| weeks later. Doesn't matter they have more context or better
| context, they still fail to write code easily understandable by
| humans.
| simonw wrote:
| Claude 3.5 was _remarkably_ good at writing code. If Claude 3.7
| and Claude 4 are just incremental improvements on that then
| even better!
|
| I actually think they're a lot more than incremental. 3.7
| introduced "thinking" mode and 4 doubled down on that and
| thinking/reasoning/whatever-you-want-to-call-it is particularly
| good at code challenges.
|
| As always, if you're not getting great results out of coding
| LLMs it's likely you haven't spent several months iterating on
| your prompting techniques to figure out what works best for
| your style of development.
| davidclark wrote:
| Good example of why I have been totally ignoring people who beat
| the drum of needing to develop the skills of interacting with
| models. "Learn to prompt" is already dead? Of course, the true
| believers will just call this an evolution of prompting or some
| such goalpost moving.
|
| Personally, my goalpost still hasn't moved: I'll invest in using
| AI when we are past this grand debate about its usefulness. The
| utility of a calculator is self-evident. The utility of an LLM
| requires 30k words of explanation and nuanced caveats. I just
| can't even be bothered to read the sales pitch anymore.
| simonw wrote:
| We should be _so far_ past the "grand debate about its
| usefulness" at this point.
|
| If you think that's still a debate, you might be listening to
| the small pool of very loud people who insist nothing has
| improved since the release of GPT-4.
| _pdp_ wrote:
| It is wrong. The new/old skill is reverse engineering.
|
| If the majority of the code is generated by AI, you'll still need
| people with technical expertise to make sense of it.
| CamperBob2 wrote:
| Not really. Got some code you don't understand? Feed it to a
| model and ask it to add comments.
|
| Ultimately humans will never need to look at most AI-generated
| code, any more than we have to look at the machine language
| emitted by a C compiler. We're a long way from that state of
| affairs -- as anyone who struggled with code-generation bugs in
| the first few generations of compilers will agree -- but we'll
| get there.
| rvz wrote:
| > Not really. Got some code you don't understand? Feed it to
| a model and ask it to add comments.
|
| Absolutely not.
|
| An experienced individual in their field can tell if the AI
| made a mistake in the comments / code rather than the typical
| untrained eye.
|
| So no, actually read the code and understand what it does.
|
| > Ultimately humans will never need to look at most AI-
| generated code, any more than we have to look at the machine
| language emitted by a C compiler.
|
| So for safety critical systems, one should not look or check
| if code has been AI generated?
| inspectorwadget wrote:
| >any more than we have to look at the machine language
| emitted by a C compiler.
|
| Some developers do actually look at the output of C
| compilers, and some of them even spend a lot of time
| criticizing that output by a specific compiler (even writing
| long blog posts about it). The C language has an ISO
| specification, and if a compiler does not conform to that
| specification, it is considered a bug in that compiler.
|
| You can even go to godbolt.org / compilerexplorer.org and see
| the output generated for different targets by different
| compilers for different languages. It is a popular tool, also
| for language development.
|
| I do not know what prompt engineering will look like in the
| future, but without AGI, I remain skeptical about
| verification of different kinds of code not being required in
| at least a sizable proportion of cases. That does not exclude
| usefulness of course: for instance, if you have a case where
| verification is not needed; or verification in a specific
| case can be done efficiently and robustly by a relevant
| expert; or some smart method for verification in some cases,
| like a case where a few primitive tests are sufficient.
|
| But I have no experience with LLMs or prompt engineering.
|
| I do, however, sympathize with not wanting to deal with
| paying programmers. Most are likely nice, but for instance a
| few may be costly, or less than honest, or less than
| competent, etc. But while I think it is fine to explore LLMs
| and invest a lot into seeing what might come of them, I would
| not personally bet everything on them, neither in the short
| term nor the long term.
|
| May I ask what your professional background and experience
| is?
| adhamsalama wrote:
| There is no engineering involved in using AI. It's insulting to
| call begging an LLM "engineering".
| rednafi wrote:
| This. Convincing a bullshit generator to give you the right
| data isn't engineering, it quackery. But I guess "context
| quackery" wouldn't sell as much.
|
| LLMs are quite useful and I leverage them all the time. But I
| can't stand these AI yappers saying the same shit over and over
| again in every media format and trying to sell AI usage as some
| kind of profound wizardry when it's not.
| 8organicbits wrote:
| One thought experiment I was musing on recently was the minimal
| context required to define a task (to an LLM, human, or
| otherwise). In software, there's a whole discipline of human
| centered design that aims to uncover the nuance of a task. I've
| worked with some great designers, and they are incredibly
| valuable to software development. They develop journey maps, user
| stories, collect requirements, and produce a wealth of design
| docs. I don't think you can successfully build large projects
| without that context.
|
| I've seen lots of AI demos that prompt "build me a TODO app",
| pretend that is sufficient context, and then claim that the
| output matches their needs. Without proper context, you can't
| tell if the output is correct.
| grafmax wrote:
| There is no need to develop this 'skill'. This can all be
| automated as a preprocessing step before the main request runs.
| Then you can have agents with infinite context, etc.
| simonw wrote:
| You need this skill if you're the engineer that's designing and
| implementing that preprocessing step.
| yunwal wrote:
| Non-rhetorical question: is this different enough from data
| engineering that it needs it's own name?
| ofjcihen wrote:
| Not at all, just ask the LLM to design and implement it.
|
| AI turtles all the way down.
| dolebirchwood wrote:
| The skill amounts to determining "what information is
| required for System A to achieve Outcome X." We already have
| a term for this: Critical thinking.
| grafmax wrote:
| In the short term horizon I think you are right. But over a
| longer horizon, we should expect model providers to
| internalize these mechanisms, similar to how chain of thought
| has been effectively "internalized" - which in turn has
| reduced the effectiveness that prompt engineering used to
| provide as models have gotten better.
| lawlessone wrote:
| I look forward to 5 million LinkedIn posts repeating this
| labrador wrote:
| I'm curious how this applies to systems like ChatGPT, which now
| have two kinds of memory: user-configurable memory (a list of
| facts or preferences) and an opaque chat history memory. If
| context is the core unit of interaction, it seems important to
| give users more control or at least visibility into both.
|
| I know context engineering is critical for agents, but I wonder
| if it's also useful for shaping personality and improving overall
| relatability? I'm curious if anyone else has thought about that.
| simonw wrote:
| I really dislike the new ChatGPT memory feature (the one that
| pulls details out of a summarized version of all of your
| previous chats, as opposed to older memory feature that records
| short notes to itself) for exactly this reason: it makes it
| even harder for me to control the context when I'm using
| ChatGPT.
|
| If I'm debugging something with ChatGPT and I hit an error
| loop, my fix is to start a new conversation.
|
| Now I can't be sure ChatGPT won't include notes from that
| previous conversation's context that I was trying to get rid
| of!
|
| Thankfully you can turn the new memory thing off, but it's on
| by default.
|
| I wrote more about that here:
| https://simonwillison.net/2025/May/21/chatgpt-new-memory/
| labrador wrote:
| On the other hand, for my use case (I'm retired and enjoy
| chatting with it), having it remember items from past chats
| makes it feel much more personable. I actually prefer Claude,
| but it doesn't have memory, so I unsubscribed and subscribed
| to ChatGPT. That it remembers obscure but relevant details
| about our past chats feels almost magical.
|
| It's good that you can turn it off. I can see how it might
| cause problems when trying to do technical work.
|
| Edit: Note, the introduction of memory was a contributing
| factor to "the sychophant" that OpenAI had to rollback. When
| it could praise you while seeming to know you was encouraging
| addictive use.
|
| Edit2: Here's the previous Hacker News discussion on Simon's
| "I really don't like ChatGPT's new memory dossier"
|
| https://news.ycombinator.com/item?id=44052246
| ozim wrote:
| Finding a magic prompt was never "prompt engineering" it was
| always "context engineering" - lots of "AI wannabe gurus" sold it
| as such but they never knew any better.
|
| RAG wasn't invented this year.
|
| Proper tooling that wraps esoteric knowledge like using
| embeddings, vector dba or graph dba becomes more mainstream. Big
| players improve their tooling so more stuff is available.
| semiinfinitely wrote:
| context engineering is just a phrase that karpathy uttered for
| the first time 6 days ago and now everyone is treating it like
| its a new field of science and engineering
| hnthrow90348765 wrote:
| Cool, but wait another year or two and context engineering will
| be obsolete as well. It still feels like tinkering with the
| machine, which is what AI is (supposed to be) moving us away
| from.
| hobs wrote:
| Probably impossible unless computers themselves change in
| another year or two.
| alganet wrote:
| If I need to do all this work (gather data, organize it, prepare
| it, etc), there are other AI solutions I might decide to use
| instead of an LLM.
| joe5150 wrote:
| You might as well use your natural intelligence instead of the
| artificial stuff at that point.
| coliveira wrote:
| Yes, when all is said and done people will realize that
| artificial intelligence is too expensive to replace natural
| intelligence. AI companies want to avoid this realization for
| as long as possible.
| alganet wrote:
| This is not what I'm talking about, see the other reply.
| alganet wrote:
| I'm assuming the post is about automated "context
| engineering". It's not a human doing it.
|
| In this arrangement, the LLM is a component. What I meant is
| that it seems to me that other non-LLM AI technologies would
| be a better fit for this kind of thing. Lighter, easier to
| change and adapt, potentially even cheaper. Not for all
| scenarios, but for a lot of them.
| simonw wrote:
| What kind of alternative AI solutions might you use here?
| alganet wrote:
| Classifiers to classify things, traditional neural nets to
| identify things. Typical run of the mill.
|
| In OpenAI hype language, this is a problem for "Software
| 2.0", not "Software 3.0" in 99% of the cases.
|
| The thing about matching an informal tone would be the hard
| part. I have to concede that LLMs are probably better at
| that. But I have the feeling that this is not exactly the
| feature most companies are looking for, and they would be
| willing to not have it for a cheaper alternative. Most of
| them just don't know that's possible.
| la64710 wrote:
| Of course the best prompts automatically included providing the
| best (not necessarily most) context to extract the right output.
| m3kw9 wrote:
| Well, it's still a prompt
| bradhe wrote:
| Back in my day we just called this "knowing what to google" but
| alright, guys.
| rednafi wrote:
| I really don't get this rush to invent neologisms to describe
| every single behavioral artifact of LLMs. Maybe it's just a
| yearning to be known as the father of Deez Unseen Mind-blowing
| Behaviors (DUMB).
|
| LLM farts -- Stochastic Wind Release.
|
| The latest one is yet another attempt to make prompting sound
| like some kind of profound skill, when it's really not that
| different from just knowing how to use search effectively.
|
| Also, "context" is such an overloaded term at this point that you
| might as well just call it "doing stuff" -- and you'd objectively
| be more descriptive.
| jongjong wrote:
| Recently I started work on a new project and I 'vibe coded' a
| test case for a complex OAuth token expiry bug entirely with AI
| (with Cursor), complete with mocks and stubs... And it was on
| someone else's project. I had no prior familiarity with the code.
|
| That's when I understood that vibe coding is real and context is
| the biggest hurdle.
|
| That said, most of the context could not be pulled from the
| codebase directly but came from me after asking the AI to
| check/confirm certain things that I suspected could be the
| problem.
|
| I think vibe coding can be very powerful in the hands of a senior
| developer because if you're the kind of person who can clearly
| explain their intuitions with words, it's exactly the missing
| piece that the AI needs to solve the problem... And you still
| need to do code review aspect which is also something which
| senior devs are generally good at. Sometimes it makes
| mistakes/incorrect assumptions.
|
| I'm feeling positive about LLMs. I was always complaining about
| other people's ugly code before... I HATE over-modularized,
| poorly abstracted code where I have to jump across 5+ different
| files to figure out what a function is doing; with AI, I can just
| ask it to read all the relevant code across all the files and
| tell me WTF the spaghetti is doing... Then it generates new code
| which 'follows' existing 'conventions' (same level of mess). The
| AI basically automates the most horrible aspect of the work;
| making sense of the complexity and churning out more complexity
| that works. I love it.
|
| That said, in the long run, to build sustainable projects, I
| think it will require following good coding conventions and
| minimal 'low code' coding... Because the codebase could explode
| in complexity if not used carefully. Code quality can only drop
| as the project grows. Poor abstractions tend to stick around and
| have negative flow-on effects which impact just about everything.
| colgandev wrote:
| I've been finding a ton of success lately with speech to text as
| the user prompt, and then using https://continue.dev in VSCode,
| or Aider, to supply context from files from my projects and
| having those tools run the inference.
|
| I'm trying to figure out how to build a "Context Management
| System" (as compared to a Content Management System) for all of
| my prompts. I completely agree with the premise of this article,
| if you aren't managing your context, you are losing all of the
| context you create every time you create a new conversation. I
| want to collect all of the reusable blocks from every
| conversation I have, as well as from my research and reading
| around the internet. Something like a mashup of Obsidian with
| some custom Python scripts.
|
| The ideal inner loop I'm envisioning is to create a "Project"
| document that uses Jinja templating to allow transclusion of a
| bunch of other context objects like code files, documentation,
| articles, and then also my own other prompt fragments, and then
| to compose them in a master document that I can "compile" into a
| "superprompt" that has the precise context that I want for every
| prompt.
|
| Since with the chat interfaces they are always already just
| sending the entire previous conversation message history anyway,
| I don't even really want to use a chat style interface as much as
| just "one shotting" the next step in development.
|
| It's almost a turn based game: I'll fiddle with the code and the
| prompts, and then run "end turn" and now it is the llm's turn. On
| the llm's turn, it compiles the prompt and runs inference and
| outputs the changes. With Aider it can actually apply those
| changes itself. I'll then review the code using diffs and make
| changes and then that's a full turn of the game of AI-assisted
| code.
|
| I love that I can just brain dump into speech to text, and llms
| don't really care that much about grammar and syntax. I can
| curate fragments of documentation and specifications for
| features, and then just kind of rant and rave about what I want
| for a while, and then paste that into the chat and with my
| current LLM of choice being Claude, it seems to work really quite
| well.
|
| My Django work feels like it's been supercharged with just this
| workflow, and my context management engine isn't even really that
| polished.
|
| If you aren't getting high quality output from llms, definitely
| consider how you are supplying context.
| patrickhogan1 wrote:
| OpenAI's o3 searches the web behind a curtain: you get a few
| source links and a fuzzy reasoning trace, but never the full
| chunk of text it actually pulled in. Without that raw context,
| it's impossible to audit what really shaped the answer.
| simonw wrote:
| Yeah, I find that really frustrating.
|
| I understand why they do it though: if they presented the
| actual content that came back from search they would
| _absolutely_ get in trouble for copyright-infringement.
|
| I suspect that's why so much of the Claude 4 system prompt for
| their search tool is the message "Always respect copyright by
| NEVER reproducing large 20+ word chunks of content from search
| results" repeated half a dozen times:
| https://simonwillison.net/2025/May/25/claude-4-system-prompt...
| rvz wrote:
| This is just another "rebranding" of the failed "prompt
| engineering" trend to promote another borderline pseudo-
| scientific trend to attact more VC money to fund a new pyramid
| scheme.
|
| Assuming that this will be using the totally flawed MCP protocol,
| I can only see more cases of data exfiltration attacks on these
| AI systems just like before [0] [1].
|
| Prompt injection + Data exfiltration is the new social
| engineering in AI Agents.
|
| [0] https://embracethered.com/blog/posts/2025/security-
| advisory-...
|
| [1] https://www.bleepingcomputer.com/news/security/zero-click-
| ai...
| slavapestov wrote:
| I feel like if the first link in your post is a tweet from a tech
| CEO the rest is unlikely to be insightful.
| CharlieDigital wrote:
| I was at a startup that started using OpenAI APIs pretty early
| (almost 2 years ago now?).
|
| "Back in the day", we had to be very sparing with context to get
| great results so we really focused on how to build great context.
| Indexing and retrieval were pretty much our core focus.
|
| Now, even with the larger windows, I find this still to be true.
|
| The moat for most companies is actually their data, data
| indexing, and data retrieval[0]. Companies that 1) have the data
| and 2) know how to use that data are going to win.
|
| My analogy is this: > The LLM is just an oven;
| a fantastical oven. But for it to produce a good product still
| depends on picking good ingredients, in the right ratio, and
| preparing them with care. You hit the bake button, then you
| still need to finish it off with presentation and decoration.
|
| [0] https://chrlschn.dev/blog/2024/11/on-bakers-ovens-and-ai-
| sta...
| retinaros wrote:
| it is still sending a string of chars and hoping the model
| outputs something relevant. let's not do like finance and
| permanently obfuscate really simple stuff to make us bigger than
| we are.
|
| prompt engineering/context engineering : stringbuilder
|
| Retrieval augmented generation: search+ adding strings to main
| string
|
| test time compute: running multiple generation and choosing the
| best
|
| agents: for loop and some ifs
| jumploops wrote:
| To anyone who has worked with LLMs extensively, this is obvious.
|
| Single prompts can only get you so far (surprisingly far
| actually, but then they fall over quickly).
|
| This is actually the reason I built my own chat client (~2 years
| ago), because I wanted to "fork" and "prune" the context easily;
| using the hosted interfaces was too opaque.
|
| In the age of (working) tool-use, this starts to resemble agents
| calling sub-agents, partially to better abstract, but mostly to
| avoid context pollution.
| mgdev wrote:
| If we zoom out far enough, and start to put more and more under
| the execution umbrella of AI, what we're actually describing here
| is... product development.
|
| You are constructing the set of context, policies, directed
| attention toward some intentional end, same as it ever was. The
| difference is you need fewer meat bags to do it, even as your
| projects get larger and larger.
|
| To me this is wholly encouraging.
|
| Some projects will remain outside what models are capable of, and
| your role as a human will be to stitch many smaller projects
| together into the whole. As models grow more capable, that
| stitching will still happen - just as larger levels.
|
| But as long as humans have imagination, there will always be a
| role for the human in the process: as the orchestrator of will,
| and ultimate fitness function for his own creations.
| jcon321 wrote:
| I thought this entire premise was obvious? Does it really take an
| article and a venn diagram to say you should only provide the
| relevant content to your LLM when asking a question?
| simonw wrote:
| "Relevant content to your LLM when asking a question" is last
| year's RAG.
|
| If you look at how sophisticated current LLM systems work there
| is _so much more_ to this.
|
| Just one example: Microsoft open sourced VS Code Copilot Chat
| today (MIT license). Their prompts are dynamically assembled
| with tool instructions for various tools based on whether or
| not they are enabled: https://github.com/microsoft/vscode-
| copilot-chat/blob/v0.29....
|
| And the autocomplete stuff has a _wealth_ of contextual
| information included: https://github.com/microsoft/vscode-
| copilot-chat/blob/v0.29.... You have access to
| the following information to help you make informed
| suggestions: - recently_viewed_code_snippets: These
| are code snippets that the developer has recently looked
| at, which might provide context or examples relevant to
| the current task. They are listed from oldest to newest,
| with line numbers in the form #| to help you understand
| the edit diff history. It's possible these are entirely
| irrelevant to the developer's change. -
| current_file_content: The content of the file the developer
| is currently working on, providing the broader context of the
| code. Line numbers in the form #| are included to help you
| understand the edit diff history. - edit_diff_history: A
| record of changes made to the code, helping you
| understand the evolution of the code and the developer's
| intentions. These changes are listed from oldest to
| latest. It's possible a lot of old edit diff history is
| entirely irrelevant to the developer's change. -
| area_around_code_to_edit: The context showing the code
| surrounding the section to be edited. - cursor position
| marked as ${CURSOR_TAG}: Indicates where the developer's
| cursor is currently located, which can be crucial for
| understanding what part of the code they are focusing on.
| liampulles wrote:
| The only engineering going on here is Job Engineering(tm)
| ryhanshannon wrote:
| It is really funny to see the hyper fixation on relabeling of
| soft skills / product development to "<blank> Engineering" in
| the AI space.
| amelius wrote:
| Yes, and it is a soft skill.
| zacharyvoase wrote:
| I love how we have such a poor model of how LLMs work (or more
| aptly don't work) that we are developing an entire alchemical
| practice around them. Definitely seems healthy for the industry
| and the species.
| simonw wrote:
| The stuff that's showing up under the "context engineering"
| banner feels a whole lot _less_ alchemical to me than the older
| prompt engineering tricks.
|
| Alchemical is "you are the world's top expert on marketing, and
| if you get it right I'll tip you $100, and if you get it wrong
| a kitten will die".
|
| The techniques in https://www.dbreunig.com/2025/06/26/how-to-
| fix-your-context.... seem a whole lot more rational to me than
| that.
| geeewhy wrote:
| ive beeen experimenting with this for a while, (im sure in a way,
| most of us did). Would be good to numerate some examples. When it
| comes to coding, here's a few:
|
| - compile scripts that can grep / compile list of your relevant
| files as files of interest
|
| - make temp symlinks in relevant repos to each other for
| documentation generation, pass each documentation collected from
| respective repos to to enable cross-repo ops to be performed
| atomically
|
| - build scripts to copy schemas, db ddls, dtos, example records,
| api specs, contracts (still works better than MCP in most cases)
|
| I found these steps not only help better output but also reduces
| cost greatly avoiding some "reasoning" hops. I'm sure practice
| can extend beyond coding.
| dinvlad wrote:
| I feel like ppl just keep inventing concepts for the same old
| things, which come down to dancing with the drums around the fire
| and screaming shamanic incantations :-)
| emporas wrote:
| Prompting sits on the back seat, while context is the driving
| factor. 100% agree with this.
|
| For programming I don't use any prompts. I give a problem solved
| already, as a context or example, and I ask it to implement
| something similar. One sentence or two, and that's it.
|
| Other kind of tasks, like writing, I use prompts, but even then,
| context and examples are still the driving factor.
|
| In my opinion, we are in an interesting point in history, in
| which now individuals will need their own personal database. Like
| companies the last 50 years, which had their own database records
| of customers, products, prices and so on, now an individual will
| operate using personal contextual information, saved over a long
| period of time in wikis or Sqlite rows.
| d0gsg0w00f wrote:
| Yes, the other day I was telling a colleague that we all need
| our own personal context to feed into every model we interact
| with. You could carry it around on a thumb drive or something.
___________________________________________________________________
(page generated 2025-06-30 23:00 UTC)