[HN Gopher] Andrej Karpathy: Software in the era of AI [video]
       ___________________________________________________________________
        
       Andrej Karpathy: Software in the era of AI [video]
        
       Author : sandslash
       Score  : 1056 points
       Date   : 2025-06-19 00:33 UTC (22 hours ago)
        
 (HTM) web link (www.youtube.com)
 (TXT) w3m dump (www.youtube.com)
        
       | gchamonlive wrote:
       | I think it's interesting to juxtapose traditional coding, neural
       | network weights and prompts because in many areas -- like the
       | example of the self driving module having code being replaced by
       | neural networks tuned to the target dataset representing the
       | domain -- this will be quite useful.
       | 
       | However I think it's important to make it clear that given the
       | hardware constraints of many environments the applicability of
       | what's being called software 2.0 and 3.0 will be severely
       | limited.
       | 
       | So instead of being replacements, these paradigms are more like
       | extra tools in the tool belt. Code and prompts will live side by
       | side, being used when convenient, but none a panacea.
        
         | karpathy wrote:
         | I kind of say it in words (agreeing with you) but I agree the
         | versioning is a bit confusing analogy because it usually
         | additionally implies some kind of improvement. When I'm just
         | trying to distinguish them as very different software
         | categories.
        
           | miki123211 wrote:
           | What do you think about structured outputs / JSON mode /
           | constrained decoding / whatever you wish to call it?
           | 
           | To me, it's a criminally underused tool. While "raw" LLMs are
           | cool, they're annoying to use as anything but chatbots, as
           | their output is unpredictable and basically impossible to
           | parse programmatically.
           | 
           | Structured outputs solve that problem neatly. In a way,
           | they're "neural networks without the training". They can be
           | used to solve similar problems as traditional neural
           | networks, things like image classification or extracting
           | information from messy text, but all they require is a Zod or
           | Pydantic type definition and a prompt. No renting GPUs,
           | labeling data and tuning hyperparameters necessary.
           | 
           | They often also improve LLM performance significantly.
           | Imagine you're trying to extract calories per 100g of
           | product, but some product give you calories per serving and a
           | serving size, calories per pound etc. The naive way to do
           | this is a prompt like "give me calories per 100g", but that
           | forces the LLM to do arithmetic, and LLMs are bad at
           | arithmetic. With structured outputs, you just give it the
           | fifteen different formats that you expect to see as
           | alternatives, and use some simple Python to turn them all
           | into calories per 100g on the backend side.
        
             | abdullin wrote:
             | Even more than that. With Structured Outputs we essentially
             | control layout of the response, so we can force LLM to go
             | through different parts of the completion in a predefined
             | order.
             | 
             | One way teams exploit that - force LLM to go through a
             | predefined task-specific checklist before answering. This
             | custom hard-coded chain of thought boosts the accuracy and
             | makes reasoning more auditable.
        
             | solaire_oa wrote:
             | I also think that structured outputs are criminally
             | underused, but it isn't perfect... and per your example, it
             | might not even be good, because I've done something
             | similar.
             | 
             | I was trying to make a decent cocktail recipe database, and
             | scraped the text of cocktails from about 1400 webpages.
             | Note that this was just the text of the cocktail recipe,
             | and cocktail recipes are comparatively small. I sent the
             | text to an LLM for JSON structuring, and the LLM routinely
             | miscategorized liquor types. It also failed to normalize
             | measurements with explicit instructions and the temperature
             | set to zero. I gave up.
        
               | handfuloflight wrote:
               | Which LLM?
        
               | hellovai wrote:
               | have you tried schema-aligned parsing yet?
               | 
               | the idea is that instead of using JSON.parse, we create a
               | custom Type.parse for each type you define.
               | 
               | so if you want a:                  class Job { company:
               | string[] }
               | 
               | And the LLM happens to output:                  {
               | "company": "Amazon" }
               | 
               | We can upcast "Amazon" -> ["Amazon"] since you indicated
               | that in your schema.
               | 
               | https://www.boundaryml.com/blog/schema-aligned-parsing
               | 
               | and since its only post processing, the technique will
               | work on every model :)
               | 
               | for example, on BFCL benchmarks, we got SAP + GPT3.5 to
               | beat out GPT4o ( https://www.boundaryml.com/blog/sota-
               | function-calling )
        
               | solaire_oa wrote:
               | Interesting! I was using function calling and JSON modes
               | with zod. I may revisit the project with SAP!
        
             | coderatlarge wrote:
             | note the per 100g prompt might lead the llm to reach for
             | the part of its training distribution that is actually
             | written in terms of the 100g standard and just lead to
             | different recall rather than a suboptimal calculation based
             | on non-standardized per 100g training examples.
        
           | poorcedural wrote:
           | Andrej, maybe Software 3.0 is not written in spoken language
           | like code or prompts. Software 3.0 is recorded in behavior, a
           | behavior that today's software lacks. That behavior is
           | written and consumed by machine and annotated by human
           | interaction. Skipping to 3.0 is premature, but Software 2.0
           | is a ramp.
        
             | mclau157 wrote:
             | Would this also be more of a push towards robotics and
             | getting physical AI in our every day lives
        
               | poorcedural wrote:
               | Very insightful! How you would describe boiling an egg is
               | different than how a machine would describe it to another
               | machine.
        
           | BobbyJo wrote:
           | The versioning makes sense to me. Software has a cycle where
           | a new tool is created to solve a problem, and the problem
           | winds up being meaty enough, and the tool effective enough,
           | that the exploration of the problem space the tool unlocks is
           | essentially a new category/skill/whatever.
           | 
           | computers -> assembly -> HLL -> web -> cloud -> AI
           | 
           | Nothing on that list has disappeared, but the work has
           | changed enough to warrant a few major versions imo.
        
             | TeMPOraL wrote:
             | For me it's even simpler:
             | 
             | V1.0: describing solutions to specific problems directly,
             | precisely, for machines to execute.
             | 
             | V2.0: giving machine examples of good and bad answers to
             | specific problems we don't know how to describe precisely,
             | for machine to generalize from and solve such indirectly
             | specified problem.
             | 
             | V3.0: telling machine what to do in plain language, for it
             | to figure out and solve.
             | 
             | V2 was coded in V1 style, as a solution to problem of
             | "build a tool that can solve problems defined as examples".
             | V3 was created by feeding everything and the kitchen sink
             | into V2 at the same time, so it learns to solve the problem
             | of being general-purpose tool.
        
               | BobbyJo wrote:
               | That's less a versioning of software and more a
               | versioning of AI's role in software. None -> Partial ->
               | Total. Its a valid scale with regard to AI's role
               | specifically, but I think Karpathy was intending to make
               | a point about software as a whole, and even the details
               | of how that middle "Partial" era evolves.
        
           | swyx wrote:
           | no no, it actually is a good analogy in 2 ways:
           | 
           | 1) it is a breaking change from the prior version
           | 
           | 2) it is an improvement in that, in its ideal/ultimate form,
           | it is a full superset of capabilities of the previous version
        
           | gchamonlive wrote:
           | > versioning is a bit confusing analogy because it usually
           | additionally implies some kind of improvement
           | 
           | Exactly what I felt. Semver like naming analogies bring their
           | own set of implicit meanings, like major versions having to
           | necessarily supersede or replace the previous version, that
           | is, it doesn't account for coexistence further than planning
           | migration paths. This expectation however doesn't correspond
           | with the rest of the talk, so I thought I might point it out.
           | Thanks for taking the time to reply!
        
         | radicalbyte wrote:
         | Weights are code being replaced by data; something I've been
         | making heavy use of since the early 00s. After coding for 10
         | years you start to see the benefits of it and understand where
         | you should use it.
         | 
         | LLMs give us another tool only this time it's far more
         | accessible and powerful.
        
         | dcsan wrote:
         | LLMs have already replaced some code directly for me eg NLP
         | stuff. Previously I might write a bunch of code to do
         | clustering now I just ask the LLM to group things. Obviously
         | this is a very basic feature native to LLMs but there will be
         | more first class LLM callable functions over time.
        
       | nico wrote:
       | Thank you YC for posting this before the talk became
       | deprecated[1]
       | 
       | 1: https://x.com/karpathy/status/1935077692258558443
        
         | sandslash wrote:
         | We couldn't let that happen!
        
       | jppope wrote:
       | Well that showed up significantly faster than they said it would.
        
         | seneca wrote:
         | Classic under promise and over deliver.
         | 
         | I'm glad they got it out quickly.
        
           | dang wrote:
           | Me too. It was my favorite talk of the ones I saw.
        
         | dang wrote:
         | The team adapted quickly, which is a good sign. I believe
         | getting the videos out sooner (as in why-not-immediately) is
         | going to be a priority in the future.
        
       | anythingworks wrote:
       | loved the analogies! Karpathy is consistently one of the clearest
       | thinkers out there.
       | 
       | interesting that Waymo could do uninterrupted trips back in 2013,
       | wonder what took them so long to expand? regulation? tailend of
       | driving optimization issues?
       | 
       | noticed one of the slides had a cross over 'AGI 2027'...
       | ai-2027.com :)
        
         | AlotOfReading wrote:
         | You don't "solve" autonomous driving as such. There's a long,
         | slow grind of gradually improving things until failures become
         | rare enough.
        
           | petesergeant wrote:
           | I wonder at what point all the self-driving code becomes
           | replaceable with a multimodal generalist model with the
           | prompt "drive safely"
        
             | AlotOfReading wrote:
             | One of the issues with deploying models like that is the
             | lack of clear, widely accepted ways to validate
             | comprehensive safety and absence of unreasonable risk. If
             | that can be solved, or regulators start accepting answers
             | like "our software doesn't speed in over 95% of
             | situations", then they'll become more common.
        
             | anon7000 wrote:
             | Very advanced machine learning models are used in current
             | self driving cars. It all depends what the model is trying
             | to accomplish. I have a hard time seeing a generalist
             | prompt-based generative model ever beating a model
             | specifically designed to drive cars. The models are just
             | designed for different, specific purposes
        
               | tshaddox wrote:
               | I could see it being the case that driving is a fairly
               | general problem, and this models intentionally designed
               | to be general end up doing better than models designed
               | with the misconception that you need a very particular
               | set of driving-specific capabilities.
        
               | anythingworks wrote:
               | exactly! I think that was tesla's vision with self-
               | driving to begin with... so they tried to frame it as
               | problem general enough, that trying to solve it would
               | also solve questions of more general intelligence ('agi')
               | i.e. cars should use vision just like humans would
               | 
               | but in hindsight looks like this slowed them down quite a
               | bit despite being early to the space...
        
               | shakna wrote:
               | Driving is not a general problem, though. Its a
               | contextual landscape of fast-based reactions and
               | predictions. Both are required, and done regularly by the
               | human element. The exact nature of every reaction, and
               | every prediction, change vastly within the context
               | window.
               | 
               | You need image processing just as much as you need
               | scenario management, and they're orthoganol to each
               | other, as one example.
               | 
               | If you want a general transport system... We do have
               | that. It's called rail. (And can and has been automated.)
        
               | melvinmelih wrote:
               | > Driving is not a general problem, though.
               | 
               | But what's driving a car? A generalist human brain that
               | has been trained for ~30 hours to drive a car.
        
               | shakna wrote:
               | Human brain's aren't generalist!
               | 
               | We have multiple parts of the brain that interact in
               | vastly different ways! Your cerebellum won't be running
               | the role of the pons.
               | 
               | Most parts of the brain cannot take over for others.
               | Self-healing is the exception, not the rule. Yes, we have
               | a degree of neuroplasticity, but there are many limits.
               | 
               | (Sidenote: Driver's license here is 240 hours.)
        
               | Zanfa wrote:
               | > Human brain's aren't generalist!
               | 
               | What? Human intelligence is literally how AGI is defined.
               | Brain's physical configuration is irrelevant.
        
               | shakna wrote:
               | A human brain is not a general model. We have multiple
               | overlapping systems. The physical configuration is
               | extremely relevant to that.
               | 
               | AGI is defined in terms of "General Intelligence", a
               | theory that general modelling is irrelevant to.
        
               | azan_ wrote:
               | > We have multiple parts of the brain that interact in
               | vastly different ways!
               | 
               | Yes, and thanks to that human brains are generalist
        
               | shakna wrote:
               | Only if that was a singular system, however, it is not.
               | [0]
               | 
               | For example... The nerve cells in your gut may speak to
               | the brain, and interact with it in complex ways we are
               | only just beginning to understand, but they are separate
               | systems that both have control over the nervous system,
               | and other systems. [1]
               | 
               | General Intelligence, the psychological theory, and
               | General Modelling, whilst sharing words, share little
               | else.
               | 
               | [0] https://doi.org/10.1016/j.neuroimage.2022.119673
               | 
               | [1] https://doi.org/10.1126/science.aau9973
        
               | yusina wrote:
               | 240 hours sounds excessive. Where is "here"?
        
               | TeMPOraL wrote:
               | It partially is. You have the specialized part of
               | maneuvering a fast moving vehicle in physical world,
               | trying to keep it under control at all times and never
               | colliding with anything. Then you have the general part,
               | which is navigating the _human environment_. That 's
               | lanes and traffic signs and road works and schoolbuses,
               | that's kids on the road and badly parked trailers.
               | 
               | Current breed of autonomous driving systems have problems
               | with exceptional situations - but based on all I've read
               | about so far, those are _exactly_ of the kind that would
               | benefit from a general system able to _understand_ the
               | situation it 's in.
        
               | mannicken wrote:
               | Speed and Moore's law. You don't need to just make a
               | decision without hallucinations, you need to do it fast
               | enough for it to propagate to the power electronics and
               | hit the gas/brake/turn the wheel/whatever. Over and over
               | and over again on thousands of different tests.
               | 
               | A big problem I am noticing is that the IT culture over
               | the last 70 years has existed in a state of "hardware gun
               | get faster soon". And over the last ten years we had a
               | "hardware cant get faster bc physics sorry" problem.
               | 
               | The way we've been making software in the 90s and 00s
               | just isn't gonna be happening anymore. We are used to
               | throwing more abstraction layers (C->C++->Java->vibe
               | coding etc) at the problem and waiting for the guys in
               | the fab to hurry up and get their hardware faster so our
               | new abstraction layers can work.
               | 
               | Well, you can fire the guys in the fab all you want but
               | no matter how much they try to yell at the nature it
               | doesn't seem to care. They told us the embedded
               | c++-monkeys to spread the message. Sorry, the moore's law
               | is over, boys and girls. I think we all need to take a
               | second to take that in and realize the significance of
               | that.
               | 
               | [1] The "guys in the fab" are a fictional character and
               | any similarity to the real world is a coincidence.
               | 
               | [2] No c++-monkeys were harmed in the process of making
               | this comment.
        
             | yokto wrote:
             | This is (in part) what "world models" are about. While some
             | companies like Tesla bring together a fleet of small
             | specialised models, others like CommaAI and Wayve train
             | generalist models.
        
         | ActorNightly wrote:
         | > Karpathy is consistently one of the clearest thinkers out
         | there.
         | 
         | Eh, he ran Teslas self driving division and put them into a
         | direction that is never going to fully work.
         | 
         | What they should have done is a) trained a neural net to
         | represent sequence of frames into a physical environment, and
         | b)leveraged Mu Zero, so that self driving system basically
         | builds out parallel simulations into the future, and does a
         | search on the best course of action to take.
         | 
         | Because thats pretty much what makes humans great drivers. We
         | don't need to know what a cone is - we internally compute that
         | something that is an object on the road that we are driving
         | towards is going to result in a negative outcome when we
         | collide with it.
        
           | visarga wrote:
           | > We don't need to know what a cone is
           | 
           | The counter argument is that you can't zoom in and fix a
           | specific bug in this mode of operation. Everything is mashed
           | together in the same neural net process. They needed to
           | ensure safety, so testing was crucial. It is harder to test
           | an end-to-end system than its individual parts.
        
           | AlotOfReading wrote:
           | Aren't continuous, stochastic, partial knowledge environments
           | where you need long horizon planning with strict deadlines
           | and limited compute exactly the sort of environments muzero
           | variants struggle with? Because that's driving.
           | 
           | It's also worth mentioning that humans intentionally (and
           | safely) drive into "solid" objects all the time. Bags, steam,
           | shadows, small animals, etc. We also break rules (e.g. drive
           | on the wrong side of the road), and anticipate things we
           | can't even see based on a theory of mind of other agents.
           | Human driving is extremely sophisticated, not reducible to
           | rules that are easily expressed in "simple" language.
        
           | tayo42 wrote:
           | Is that the approach that waymo uses?
        
           | suddenlybananas wrote:
           | That's absolutely not what makes humans great drivers?
        
           | impossiblefork wrote:
           | I don't think that would have worked either.
           | 
           | But if they'd gone for radars and lidars and a bunch of
           | sensors and then enough processing hardware to actually fuse
           | that, then I think they could have built something that had a
           | chance of working.
        
       | AIorNot wrote:
       | Love his analogies and clear eyed picture
        
         | pyman wrote:
         | "We're not building Iron Man robots. We're building Iron Man
         | suits"
        
           | reducesuffering wrote:
           | [flagged]
        
             | throwawayoldie wrote:
             | I'm old enough to remember when Twitter was new, and for a
             | moment it felt like the old utopian promise of the Internet
             | finally fulfilled: ordinary people would be able to talk,
             | one-on-one and unmediated, with other ordinary people
             | across the world, and in the process we'd find out that
             | we're all more similar than different and mainly want the
             | same things out of life, leading to a new era of peace and
             | empathy.
             | 
             | It was a nice feeling while it lasted.
        
               | _kb wrote:
               | Believe it or not, humans did in fact have forms of
               | written language and communication prior to twitter.
        
               | dang wrote:
               | Can you please make your substantive points without
               | snark? We're trying for something a bit different here.
               | 
               | https://news.ycombinator.com/newsguidelines.html
        
               | throwawayoldie wrote:
               | You missed the point, but that's fine, it happens.
        
               | tock wrote:
               | I believe the opposite happened. People found out that
               | there are huge groups of people with wildly differing
               | views on morality from them and that just encouraged more
               | hate. I genuinely think old school facebook where people
               | only interacted with their own private friend circles is
               | better.
        
               | prisenco wrote:
               | Broadcast networks like Twitter only make sense for
               | influencers, celebrities and people building a brand.
               | They're a net negative for literally anyone else.
               | 
               | | _old school facebook where people only interacted with
               | their own private friend circles is better._
               | 
               | 100% agree but crazy that option doesn't exist anymore.
        
           | pryelluw wrote:
           | Funny thing is that in more than one of the iron man movies
           | the suits end up being bad robots. Even the ai iron man made
           | shows up to ruin the day in the avengers movie. So it's a
           | little in the nose that they'd try to pitch it this way.
        
             | wiseowise wrote:
             | That's looking too much into this. It's just an obvious
             | plot twist to justify making another movie, nothing else.
        
       | AdieuToLogic wrote:
       | It's an interesting presentation, no doubt. The analogies
       | eventually fail as analogies usually do.
       | 
       | A recurring theme presented, however, is that LLM's are somehow
       | not controlled by the corporations which expose them as a
       | service. The presenter made certain to identify three interested
       | actors (governments, corporations, "regular people") and how LLM
       | offerings are not controlled by governments. This is a bit
       | disingenuous.
       | 
       | Also, the OS analogy doesn't make sense to me. Perhaps this is
       | because I do not subscribe to LLM's having reasoning capabilities
       | nor able to reliably provide services an OS-like system can be
       | shown to provide.
       | 
       | A minor critique regarding the analogy equating LLM's to
       | mainframes:                 Mainframes in the 1960's never "ran
       | in the cloud" as it did       not exist.  They still do not "run
       | in the cloud" unless one       includes simulators.
       | Terminals in the 1960's - 1980's did not use networks.  They
       | used dedicated serial cables or dial-up modems to connect
       | either directly or through stat-mux concentrators.
       | "Compute" was not "batched over users."  Mainframes either
       | had jobs submitted and ran via operators (indirect execution)
       | or supported multi-user time slicing (such as found in Unix).
        
         | furyofantares wrote:
         | > The presenter made certain to identify three interested
         | actors (governments, corporations, "regular people") and how
         | LLM offerings are not controlled by governments. This is a bit
         | disingenuous.
         | 
         | I don't think that's what he said, he was identifying the first
         | customers and uses.
        
           | AdieuToLogic wrote:
           | >> A recurring theme presented, however, is that LLM's are
           | somehow not controlled by the corporations which expose them
           | as a service. The presenter made certain to identify three
           | interested actors (governments, corporations, "regular
           | people") and how LLM offerings are not controlled by
           | governments. This is a bit disingenuous.
           | 
           | > I don't think that's what he said, he was identifying the
           | first customers and uses.
           | 
           | The portion of the presentation I am referencing starts at or
           | near 12:50[0]. Here is what was said:                 I wrote
           | about this one particular property that strikes me       as
           | very different this time around.  It's that LLM's like
           | flip they flip the direction of technology diffusion that
           | is usually present in technology.            So for example
           | with electricity, cryptography, computing,       flight,
           | internet, GPS, lots of new transformative that have       not
           | been around.            Typically it is the government and
           | corporations that are       the first users because it's new
           | expensive etc. and it only       later diffuses to consumer.
           | But I feel like LLM's are kind       of like flipped around.
           | So maybe with early computers it was all about ballistics
           | and military use, but with LLM's it's all about how do you
           | boil an egg or something like that.  This is certainly like
           | a lot of my use.  And so it's really fascinating to me that
           | we have a new magical computer it's like helping me boil an
           | egg.            It's not helping the government do something
           | really crazy       like some military ballistics or some
           | special technology.
           | 
           | Note the identification of historic government interest in
           | computing along with a flippant "regular person" scenario in
           | the context of "technology diffusion."
           | 
           | You are right in that the presenter identified "first
           | customers", but this is mentioned in passing when viewed in
           | context. Perhaps I should not have characterized this as "a
           | recurring theme." Instead, a better categorization might be:
           | The presenter minimized the control corporations have by
           | keeping focus on governmental topics and trivial customer
           | use-cases.
           | 
           | 0 - https://youtu.be/LCEmiRjPEtQ?t=770
        
             | furyofantares wrote:
             | Yeah that's explicitly about first customers and first
             | uses, not about who controls it.
             | 
             | I don't see how it minimizes the control corporations have
             | to note this. Especially since he's quite clear about how
             | everything is currently centralized / time share model, and
             | obviously hopeful we can enter an era that's more analogous
             | to the PC era, even explicitly telling the audience maybe
             | some of them will work on making that happen.
        
         | distalx wrote:
         | Hang in there! Your comment makes some really good points about
         | the limits of analogies and the real control corporations have
         | over LLMs.
         | 
         | Plus, your historical corrections were spot on. Sometimes, good
         | criticisms just get lost in the noise online. Don't let it get
         | to you!
        
       | wjohn wrote:
       | The comparison of our current methods of interacting with LLMs
       | (back and forth text) to old-school terminals is pretty
       | interesting. I think there's still a lot work to be done to
       | optimize how we interact with these models, especially for non-
       | dev consumers.
        
         | informal007 wrote:
         | Audio maybe the better option.
        
           | recursive wrote:
           | Based on my experience with voicemail, I'd say that audio is
           | not always best, and is sometimes in the running for worst.
        
       | nodesocket wrote:
       | llms.txt makes a lot of sense, especially for LLMs to interact
       | with http APIs autonomously.
       | 
       | Seems like you could set a LLM loose and like the Google Bot have
       | it start converting all html pages into llms.txt. Man, the future
       | is crazy.
        
         | nothrabannosir wrote:
         | Couldn't believe my eyes. The www is truly bankrupt. If anyone
         | has a browser plugin which automatically redirects to llms.txt
         | sign me up.
         | 
         | Website too confusing for humans? Add more design, modals,
         | newsletter pop ups, cookie banners, ads, ...
         | 
         | Website too confusing for LLMs? Add an accessible, clean, ad-
         | free, concise, high entropy, plain text summary of your
         | website. Make sure to hide it from the humans!
         | 
         | PS: it should be /.well-known/llms.txt but that feels futile at
         | this point..
         | 
         | PPS: I enjoyed the talk, thanks.
        
           | andrethegiant wrote:
           | > If anyone has a browser plugin which automatically
           | redirects to llms.txt sign me up.
           | 
           | Not a browser plugin, but you can prefix URLs with `pure.md/`
           | to get the pure markdown of that page. It's not quite a 1:1
           | to llms.txt as it doesn't explain the entire domain, but
           | works well for one-off pages. [disclaimer: I'm the
           | maintainer]
        
           | jph00 wrote:
           | The next version of the llms.txt proposal will allow an
           | llms.txt file to be added at any level of a path, which isn't
           | compatible with /.well-known.
           | 
           | (I'm the creator of the llms.txt proposal.)
        
             | nothrabannosir wrote:
             | [flagged]
        
               | dang wrote:
               | " _Please don 't post shallow dismissals, especially of
               | other people's work. A good critical comment teaches us
               | something._"
               | 
               | https://news.ycombinator.com/newsguidelines.html
        
               | nothrabannosir wrote:
               | Fair
        
               | nothrabannosir wrote:
               | PS apologies to jph00. I still believe what I believe but
               | I should have phrased it differently or not at all. Good
               | luck on your endeavors either way.
        
             | achempion wrote:
             | Even with this future approach, it still can live under the
             | `/.well-known`, think of `/.well-known/llm/<mirrored path>`
             | or `/.well-known/llm.json` with key/value mappings.
        
             | andrethegiant wrote:
             | Doesn't this conflict with the original proposal of
             | appending .md to any resource, e.g. /foo/bar.html.md? Or
             | why not tell servers to respond to the Accept header when
             | it's set to text/markdown?
        
           | alightsoul wrote:
           | The web started dying with mobile social media apps, in which
           | hyperlinks are a poor UX choice. Then again with SEO banning
           | outlinks. Now this. The web of interconnected pages that was
           | the World Wide Web is dead. Not on social media? No one sees
           | you. Run a website? more bots than humans. Unless you sell
           | something on the side with the website it's not profitable.
           | Hyperlinking to other websites is dead.
           | 
           | Gen Alpha doesn't know what a web page is and if they do,
           | it's for stuff like neocities aka as a curiosity or art form
           | only. Not as a source of information anymore. I don't blame
           | them. Apps (social media apps) have less friction than web
           | sites but have a higher barrier for people to create. We are
           | going back to pre World Wide Web days in a way, kind of like
           | Bulletin Board Systems on dial up without hyperlinking, and
           | centralized (social media) Some countries mostly ones with
           | few technical people llike the ones in Central America have
           | moved away from the web almost entirely and into social media
           | like Instagram.
           | 
           | Due to the death of the web, google search and friends now
           | rely mostly on matching queries with titles now so just like
           | before the internet you have to know people to learn new
           | stuff or wait for an algorithm to show it to you or someone
           | to comment it online or forcefully enroll in a university.
           | Maybe that's why search results have declined and poeple
           | search using ChatGPT or maybe perplexity. Scholarly search
           | engines are a bit better but frankly irrelevant for most
           | poeple.
           | 
           | Now I understand why Google established their own DNS server
           | at 8.8.8.8. If you have a directory of all domains on DNS,
           | you can still index sites without hyperlinks between them,
           | even if the web dies. They saw it coming.
        
         | practal wrote:
         | If you have different representations of the same thing
         | (llms.txt / HTML), how do you know it is actually equivalent to
         | each other? I am wondering if there are scenarios where webpage
         | publishers would be interested in gaming this.
        
           | andrethegiant wrote:
           | <link rel="alternate" /> is a standards-friendly way to
           | semantically represent the same content in a different format
        
           | jph00 wrote:
           | That's not what llms.txt is. You can just use a regular
           | markdown URL or similar for that.
           | 
           | llms.txt is a description for an LLM of how to find the
           | information on your site needed for an LLM to use your
           | product or service effectively.
        
       | dang wrote:
       | This was my favorite talk at AISUS because it was so full of
       | _concrete_ insights I hadn 't heard before and (even better)
       | practical points about what to build _now_ , in the immediate
       | future. (To mention just one example: the "autonomy slider".)
       | 
       | If it were up to me, which it very much is not, I would try to
       | optimize the next AISUS for more of this. I felt like I was
       | getting smarter as the talk went on.
        
       | sneak wrote:
       | Can we please stop standardizing on putting things in the root?
       | 
       | /.well-known/ exists for this purpose.
       | 
       | example.com/.well-known/llms.txt
       | 
       | https://en.m.wikipedia.org/wiki/Well-known_URI
        
         | andrethegiant wrote:
         | https://github.com/AnswerDotAI/llms-txt/issues/2
        
         | jph00 wrote:
         | You can't just put things there any time you want - the RFC
         | requires that they go through a registration process.
         | 
         | Having said that, this won't work for llms.txt, since in the
         | next version of the proposal they'll be allowed at any level of
         | the path, not only the root.
        
           | politelemon wrote:
           | > You can't just put things there any time you want - the RFC
           | requires that they go through a registration process.
           | 
           | Actually, I can for two reasons. First is of course the RFC
           | mentions that items can be registered after the fact, if it's
           | found that a particular well-known suffix is being widely
           | used. But the second is a bit more chaotic - website owners
           | are under no obligation to consult a registry, much like port
           | registrations; in many cases they won't even know it exists
           | and may think of it as a place that should reflect their
           | mental model.
           | 
           | It can make things awkward and difficult though, that is
           | true, but that comes with the free text nature of the well-
           | known space. That's made evident in the Github issue linked,
           | a large group of very smart people didn't know that there was
           | a registry for it.
           | 
           | https://github.com/AnswerDotAI/llms-
           | txt/issues/2#issuecommen...
        
             | jph00 wrote:
             | There was no "large group of very smart people" behind
             | llms.txt. It was just me. And I'm very familiar with the
             | registry, and it doesn't work for this particular case IMO
             | (although other folks are welcome to register it if they
             | feel otherwise, of course).
        
           | dncornholio wrote:
           | > You can't just put things there any time you want - the RFC
           | requires that they go through a registration process.
           | 
           | Excuse me???
        
             | jph00 wrote:
             | From the RFC:
             | 
             | """ A well-known URI is a URI [RFC3986] whose path
             | component begins with the characters "/.well-known/", and
             | whose scheme is "HTTP", "HTTPS", or another scheme that has
             | explicitly been specified to use well- known URIs.
             | 
             | Applications that wish to mint new well-known URIs MUST
             | register them, following the procedures in Section 5.1. """
        
           | sneak wrote:
           | I put stuff in /.well-known/ all the time whenever I want.
           | They're my servers.
        
       | mikewarot wrote:
       | A few days ago, I was introduced to the idea that when you're
       | vibe coding, you're consulting a "genie", much like in the
       | fables, you almost never get what you asked for, but if your
       | wishes are small, you might just get what you want.
       | 
       | The primagen reviewed this article[1] a few days ago, and (I
       | think) that's where I heard about it. (Can't re-watch it now,
       | it's members only) 8(
       | 
       | [1] https://medium.com/@drewwww/the-gambler-and-the-
       | genie-08491d...
        
         | fudged71 wrote:
         | "You are an expert 10x software developer. Make me a billion
         | dollar app." Yeah this checks out
        
         | anythingworks wrote:
         | that's a really good analogy! It feels like wicked joke that
         | llms behave in such a way that they're both intelligent and
         | stupid at the same time
        
       | fnord77 wrote:
       | Him claiming govts don't use AI or are behind the curve is not
       | accurate.
       | 
       | Modern military drones are very much AI agents
        
       | practal wrote:
       | Great talk, thanks for putting it online so quickly. I liked the
       | idea of making the generation / verification loop go brrr, and
       | one way to do this is to make verification not just a human task,
       | but a machine task, where possible.
       | 
       | Yes, I am talking about formal verification, of course!
       | 
       | That also goes nicely together with "keeping the AI on a tight
       | leash". It seems to clash though with "English is the new
       | programming language". So the question is, can you hide the
       | formal stuff under the hood, just like you can hide a calculator
       | tool for arithmetic? Use informal English on the surface, while
       | some of it is interpreted as a formal expression, put to work,
       | and then reflected back in English? I think that is possible, if
       | you have a formal language and logic that is flexible enough, and
       | close enough to informal English.
       | 
       | Yes, I am talking about abstraction logic [1], of course :-)
       | 
       | So the goal would be to have English (German, ...) as the ONLY
       | programming language, invisibly backed underneath by abstraction
       | logic.
       | 
       | [1] http://abstractionlogic.com
        
         | AdieuToLogic wrote:
         | > So the question is, can you hide the formal stuff under the
         | hood, just like you can hide a calculator tool for arithmetic?
         | Use informal English on the surface, while some of it is
         | interpreted as a formal expression, put to work, and then
         | reflected back in English?
         | 
         | The problem with trying to make "English -> formal language ->
         | (anything else)" work is that informality is, by definition,
         | not a formal specification and therefore subject to ambiguity.
         | The inverse is not nearly as difficult to support.
         | 
         | Much like how a property in an API initially defined as being
         | optional cannot be made mandatory without potentially breaking
         | clients, whereas making a mandatory property optional can be
         | backward compatible. IOW, the cardinality of "0 .. 1" is a
         | strict superset of "1".
        
           | practal wrote:
           | > The problem with trying to make "English -> formal language
           | -> (anything else)" work is that informality is, by
           | definition, not a formal specification and therefore subject
           | to ambiguity. The inverse is not nearly as difficult to
           | support.
           | 
           | Both directions are difficult and important. How do you
           | determine when going from formal to informal that you got the
           | right informal statement? If you can judge that, then you can
           | also judge if a formal statement properly represents an
           | informal one, or if there is a problem somewhere. If you
           | detect a discrepancy, tell the user that their English is
           | ambiguous and that they should be more specific.
        
             | amelius wrote:
             | LLMs are pretty good at writing small pieces of code, so I
             | suppose they can very well be used to compose some formal
             | logic statements.
        
         | singularity2001 wrote:
         | lean 4/5 will be a rising star!
        
           | practal wrote:
           | You would definitely think so, Lean is in a great position
           | here!
           | 
           | I am betting though that type theory is not the right logic
           | for this, and that Lean can be leapfrogged.
        
             | gylterud wrote:
             | I think type theory is exactly right for this! Being so
             | similar to programming languages, it can piggy back on the
             | huge amount of training the LLMs have on source code.
             | 
             | I am not sure lean in part is the right language, there
             | might be challengers rising (or old incumbents like Agda or
             | Roq can find a boost). But type theory definitely has the
             | most robust formal systems at the moment.
        
               | practal wrote:
               | > Being so similar to programming languages
               | 
               | I think it is more important to be close to English than
               | to programming languages, because that is the critical
               | part:
               | 
               |  _" As close to a programming language as necessary, as
               | close to English as possible"_
               | 
               | is the goal, in my opinion, without sacrificing
               | constraints such as simplicity.
        
               | gylterud wrote:
               | Why? Why would the language used to express proof of
               | correctness have anything to do with English?
               | 
               | English was not developed to facilitate exact and formal
               | reasoning. In natural language ambiguity is a feature, in
               | formal languages it is unwanted. Just look at maths. The
               | reasons for all the symbols is not only brevity but also
               | precision. (I dont think the symbolism of mathematics is
               | something to strive for though, we can use sensible names
               | in our languages, but the structure will need to be
               | formal and specialised to the domain.)
               | 
               | I think there could be meaningful work done to render the
               | statements of the results automatically into (a
               | restricted subset of) English for ease of human
               | verification that the results proven are actually the
               | results one wanted. I know there has been work in this
               | direction. This might be viable. But I think the actual
               | language of expressing results and proofs would have to
               | be specialised for precision. And there I think type
               | theory has the upper hand.
        
               | practal wrote:
               | My answer is already in my previous comment: if you have
               | two formal languages to choose from, you want the one
               | closer to natural language, because it will be easier to
               | see if informal and formal statements match. Once you are
               | in formal land, you can do transformations to other
               | formal systems as you like, as these can be machine-
               | verified. Does that make sense?
        
               | skydhash wrote:
               | Not really. You want the one more aligned to the domain.
               | Think music notation. Languages have more evolved to
               | match abstractions that help with software engineering
               | principles than to help with layman understanding. (take
               | SQL and the relational model, they have more relation
               | with each other than the former with natural languages)
        
               | polivier wrote:
               | > if you have two formal languages to choose from, you
               | want the one closer to natural language
               | 
               | Given the choice I'd rather use Python than COBOL even
               | though COBOL is closer to English than Python.
        
             | voidhorse wrote:
             | Why? By the completeness theorem, shouldn't first order
             | logic already be sufficient?
             | 
             | The calculus of constructions and other approaches are
             | already available and proven. I'm not sure why we'd need a
             | special logic for LLMs unless said logic somehow accounts
             | for their inherently stochastic tendencies.
        
               | practal wrote:
               | If first-order logic is already sufficient, why are most
               | mature systems using a type theory? Because type theory
               | is more ergonomic and practical than first-order logic. I
               | just don't think that type theory is ergonomic and
               | practical _enough_. That is not a special judgement with
               | respect to LLMs, I want a better logic for myself as
               | well. This has nothing to do with  "stochastic
               | tendencies". If it is easier to use for humans, it will
               | be easier for LLMs as well.
        
               | tylerhou wrote:
               | Completeness for FOL specifically says that semantic
               | implications (in the language of FOL) have syntactic
               | proofs. There are many concepts that are inexpressible in
               | FOL (for example, the class of all graphs which contain a
               | cycle).
        
         | kordlessagain wrote:
         | This thread perfectly captures what Karpathy was getting at.
         | We're witnessing a fundamental shift where the interface to
         | computing is changing from formal syntax to natural language.
         | But you can see people struggling to let go of the formal
         | foundations they've built their careers on.
        
           | skydhash wrote:
           | Not really. There's a problem to be solved, and the solution
           | is always best exprimed in formal notation, because we can
           | then let computers do it and not worry about it.
           | 
           | We already have natural languages for human systems and the
           | only way it works is because of shared metaphors and
           | punishment and rewards. Everyone is incentivized to do a good
           | job.
        
           | neuronic wrote:
           | It's called gatekeeping and the gatekeepers will be the ones
           | left in the dust. This has been proven time and time again.
           | Better learn to go with the flow - judging LLMs on linear
           | improvements or even worse on today's performance is a fool's
           | errand.
           | 
           | Even if improvements level off and start plateauing, things
           | will still get better and for careful guided, educated use
           | LLMs have already become a great accelerator in many ways.
           | StackOverflow is basically dead now which in itself is a
           | fundamental shift from just 3-4 years ago.
        
           | norir wrote:
           | Have you thought through the downsides of letting go of these
           | formal foundations that have nothing to do with job
           | preservation? This comes across as a rather cynical
           | interpretation of the motivations of those who have concerns.
        
           | mkleczek wrote:
           | This is why I call all this AI stuff BS.
           | 
           | Using a formal language is a feature, not a bug. It is a
           | cornerstone of all human engineering and scientific activity
           | and is the _reason_ why these disciplines are successful.
           | 
           | What you are describing (ie. ditching formal and using
           | natural language) is moving humanity back towards magical
           | thinking, shamanism and witchcraft.
        
             | diggan wrote:
             | > is the _reason_ why these disciplines
             | 
             | Would you say that ML isn't a successful discipline? ML is
             | basically balancing between "formal language"
             | (papers/algorithms) and "non-deterministic outcomes"
             | (weights/inference) yet it seems useful in a wide range of
             | applications, even if you don't think about LLMs at all.
             | 
             | > towards magical thinking, shamanism and witchcraft.
             | 
             | I kind of feel like if you want to make a point about how
             | something is bullshit, you probably don't want to call it
             | "magical thinking, shamanism and witchcraft" because no
             | matter how good your point is, if you end up basically re-
             | inventing the witch hunt, how is what you say not bullshit,
             | just in the other way?
        
               | mkleczek wrote:
               | > Would you say that ML isn't a successful discipline? ML
               | is basically balancing between "formal language"
               | (papers/algorithms) and "non-deterministic outcomes"
               | (weights/inference) yet it seems useful in a wide range
               | of applications
               | 
               | Usefulness of LLMs has yet to be proven. So far there is
               | more marketing in it than actual, real world results.
               | Especially comparing to civil and mechanical engineering,
               | maths, electrical engineering and plethora of disciplines
               | and methods that bring real world results.
        
               | diggan wrote:
               | > Usefulness of LLMs has yet to be proven.
               | 
               | What about ML (Machine Learning) as a whole? I kind of
               | wrote ML instead of LLMs just to avoid this specific
               | tangent. Are you feelings about that field the same?
        
               | mkleczek wrote:
               | > What about ML (Machine Learning) as a whole? I kind of
               | wrote ML instead of LLMs just to avoid this specific
               | tangent. Are you feelings about that field the same?
               | 
               | No - I only expressed my thoughts about using natural
               | language for computing.
        
               | lelanthran wrote:
               | > Would you say that ML isn't a successful discipline?
               | 
               | Not yet it isn't; all I am seeing are tools to replace
               | programmers and artists :-/
               | 
               | Where are the tools to take in 400 recipes and spit out
               | all of them in a formal structure (poster upthread
               | literally gave up on trying to get an LLM to do this).
               | Tools that can replace the 90% of office staff who _aren
               | 't_ programmers?
               | 
               | Maybe it's a successful low-code industry right now, it's
               | not really a successful AI industry.
        
               | diggan wrote:
               | > Not yet it isn't; all I am seeing are tools to replace
               | programmers and artists :-/
               | 
               | You're missing a huge part of the ecosystem, ML is so
               | much more than just "generative AI", which seems to be
               | the extent of your experience so far.
               | 
               | Weather predictions, computer vision, speech recognition,
               | medicine research and more are already improved by
               | various machine learning techniques, and already was
               | before the current LLM/generative AI. Wikipedia has a
               | list of ~50 topics where ML is already being used, in
               | production, today ( https://en.wikipedia.org/wiki/Machine
               | _learning#Applications ) if you're feeling curious about
               | exploring the ecosystem more.
        
               | lelanthran wrote:
               | > You're missing a huge part of the ecosystem, ML is so
               | much more than just "generative AI", which seems to be
               | the extent of your experience so far.
               | 
               | I'm not missing anything; I'm saying the current boom is
               | being fueled by claims of "replacing workers", but the
               | only class of AI being funded to do that are LLMs, and
               | the only class of worker that _might_ get replaced are
               | programmers and artists.
               | 
               | Karpathy's video, and this thread, are not about the un-
               | hyped ML stuff that has been employed in various
               | disciplines since 2010 and has not been proposed as a
               | replacement for workers.
        
               | skydhash wrote:
               | ML is basically greedy determinism. If we can't get the
               | correct answer, we try to get one that is most likely
               | wrong, but give us enough information that we can make a
               | decision. So the answer is not useful, but its nature is.
               | 
               | If we take object detection in computer vision, the
               | detection by itself is not accurate, but it helps with
               | resources management. instead of expensive continuous
               | monitoring, we now have something cheaper which moves the
               | expensive part to be discrete.
               | 
               | But something deterministic would be always more
               | preferable because you only needs to do verification
               | once.
        
             | jason_oster wrote:
             | > What you are describing (ie. ditching formal and using
             | natural language) is moving humanity back towards magical
             | thinking ...
             | 
             | "Any sufficiently advanced technology is indistinguishable
             | from magic."
        
               | discreteevent wrote:
               | indistinguishable from magic != magic
        
             | bwfan123 wrote:
             | > Using a formal language is a feature, not a bug. It is a
             | cornerstone of all human engineering and scientific
             | activity and is the _reason_ why these disciplines are
             | successful
             | 
             | A similar argument was also made by Dijkstra in this brief
             | essay here [1] - which is timely to this debate of why
             | "english is the new programming language" is not well-
             | founded.
             | 
             | I quote a brief snippet here:
             | 
             | "The virtue of formal texts is that their manipulations, in
             | order to be legitimate, need to satisfy only a few simple
             | rules; they are, when you come to think of it, an amazingly
             | effective tool for ruling out all sorts of nonsense that,
             | when we use our native tongues, are almost impossible to
             | avoid."
             | 
             | [1] https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/E
             | WD667...
        
           | uncircle wrote:
           | > This thread perfectly captures what Karpathy was getting
           | at. We're witnessing a fundamental shift where the interface
           | to computing is changing from formal syntax to natural
           | language.
           | 
           | Yes, telling a subordinate with natural language what you
           | need is called being a product manager. Problem is, the
           | subordinate has encyclopedic knowledge but it's also
           | _extremely_ dumb in many aspects.
           | 
           | I guess this is good for people that got into CS and hate the
           | craft so prefer doing management, but in many cases you still
           | need in your team someone with a IQ higher than room
           | temperature to deliver a product. The only "fundamental"
           | shift here is killing the entry-level coder at the big corp
           | tasked at doing menial and boilerplate tasks, when instead
           | you can hire a mechanical replacement from an AI company for
           | a few hundred dollars a month.
        
             | sponnath wrote:
             | I think the only places where the entry-level coder is
             | being killed are corps that never cared about the junior to
             | senior pipeline. Some of them love off-shoring too so I'm
             | not sure much has changed.
        
           | otabdeveloper4 wrote:
           | > We're witnessing a fundamental shift where the interface to
           | computing is changing from formal syntax to natural language.
           | 
           | People have said this _every year_ since the 1950 's.
           | 
           | No, it is not happening. LLMs won't help.
           | 
           | Writing code is easy, it's understanding the problem domain
           | is hard. LLMs won't help you understand the problem domain in
           | a formal manner. (In fact they might make it even more
           | difficult.)
        
             | simplify wrote:
             | Let's be real, people have said similar things about AI
             | too. It was all fluff, until it wasn't.
        
           | megaman821 wrote:
           | Yep, that why I never write anything out using mathmatical
           | expressions. Natural language only baby!
        
         | lelanthran wrote:
         | > Use informal English on the surface, while some of it is
         | interpreted as a formal expression, put to work, and then
         | reflected back in English? I think that is possible, if you
         | have a formal language and logic that is flexible enough, and
         | close enough to informal English.
         | 
         | That sounds like a paradox.
         | 
         | Formal verification can prove that constraints are held.
         | English cannot. mapping between them necessarily requires
         | disambiguation. How would you construct such a disambiguation
         | algorithm which must, by its nature, be deterministic?
        
         | redbell wrote:
         | > "English is the new programming language."
         | 
         | For those who missed it, here's the viral tweet by Karpathy
         | himself: https://x.com/karpathy/status/1617979122625712128
        
           | throwaway314155 wrote:
           | Referenced in the video of course. Not that everyone should
           | watch a 40 minute long video before commenting but his
           | reaction to the "meme" that vibe coding became when his tweet
           | was intended as more of a shower thought is worth checking
           | out.
        
       | hgl wrote:
       | It's fascinating to think about what true GUI for LLM could be
       | like.
       | 
       | It immediately makes me think a LLM that can generate a
       | customized GUI for the topic at hand where you can interact with
       | in a non-linear way.
        
         | nbbaier wrote:
         | I love this concept and would love to know where to look for
         | people working on this type of thing!
        
         | dpkirchner wrote:
         | Like a HyperCard application?
        
           | necrodome wrote:
           | We (https://vibes.diy/) are betting on this
        
             | diggan wrote:
             | Border-line off-topic, but since you're flagrantly self-
             | promoting, might as well add some more rule breakage to it.
             | 
             | You know websites/apps who let you enter text/details and
             | then not displaying sign in/up screen until you submit it,
             | so you feel like "Oh but I already filled it out, might as
             | well sign up"?
             | 
             | They really suck, big time! It's disingenuous, misleading
             | and wastes people's time. I had no interest in using your
             | thing for real, but thought I'd try it out, potentially
             | leave some feedback, but this bait-and-switch just made the
             | whole thing feel sour and I'll probably try to actively
             | avoid this and anything else I feel is related to it.
        
               | necrodome wrote:
               | Thanks for the benefit of the doubt. I typed that in a
               | hurry, and it didn't come out the way I intended.
               | 
               | We had the idea that there's a class of apps [1] that
               | could really benefit from our tooling - mainly Fireproof,
               | our local-first database, along with embedded LLM calling
               | and image generation support. The app itself is open
               | source, and the hosted version is free.
               | 
               | Initially, there was no login or signup - you could just
               | generate an app right away. We knew that came with risks,
               | but we wanted to explore what a truly frictionless
               | experience could look like. Unfortunately, it didn't take
               | long for our LLM keys to start getting scraped, so the
               | next best step was to implement rate limiting in the
               | hosted version.
               | 
               | [1] https://tools.simonwillison.net/
        
               | diggan wrote:
               | My complaint isn't about that you need to protect it with
               | a login/signup, but where in the process you put that
               | login/signup.
               | 
               | Put it before letting people enter text, rather than once
               | they've entered text and pressed the button, and people
               | won't feel mislead anymore.
        
         | karpathy wrote:
         | Fun demo of an early idea was posted by Oriol just yesterday :)
         | 
         | https://x.com/OriolVinyalsML/status/1935005985070084197
        
           | aprilthird2021 wrote:
           | This is crazy cool, even if not necessarily the best use case
           | for this idea
        
           | hackernewds wrote:
           | it's impressive but it seems like a crappier UX? that none of
           | the patterns can really be memorized
        
           | suddenlybananas wrote:
           | Having different documents come up every time you go into the
           | documents directory seems hellishly terrible.
        
             | falcor84 wrote:
             | It's a brand of terribleness I've somewhat gotten used to,
             | opening Google Drive every time, when it takes me to the
             | "Suggested" tab. I can't recall a single time when it had
             | the document I care about anywhere close to the top.
             | 
             | There's still nothing that beats the UX of Norton
             | Commander.
        
           | sensanaty wrote:
           | [flagged]
        
             | danielbln wrote:
             | Maybe we can collect all of this salt and operate a Thorium
             | reactor with it, this in turn can then power AI.
        
               | sensanaty wrote:
               | We'll need to boil a few more lakes before we get to that
               | stage I'm afraid, who needs water when you can have your
               | AI hallucinate some for you after all?
        
               | TeMPOraL wrote:
               | Who needs water when all these hot takes come from
               | sources so dense, they're about to collapse into black
               | holes.
        
               | sensanaty wrote:
               | Is me not wanting the UI of my OS to shift with every
               | mouse click a hot take? If me wanting to have the
               | consistent "When I click here, X happens" behavior
               | instead of the "I click here and I'm Feeling Lucky
               | happens" behavior is equal to me being dense, so be it I
               | guess.
        
               | TeMPOraL wrote:
               | No. But you interpreting and evaluating the demo in
               | question as suggesting the things you described -
               | frankly, yes. It takes a deep gravity well to miss a
               | point this clear from this close.
               | 
               | It's a tech demo. It shows you it's _possible_ to do
               | these things live, in real time (and to back Karpathy 's
               | point about tech spread patterns, it's accessible to you
               | and me right now). It's not saying it's a good idea - but
               | there are obvious seeds of good ideas there. For one, it
               | shows you a vision of an OS or software you can trivially
               | extend yourself on the fly. "I wish it did X", bam, it
               | does. And no one says it has to be non-deterministic each
               | time you press some button. It can just fill what's
               | missing and make additions permanent, fully deterministic
               | after creation.
        
             | dang wrote:
             | " _Please don 't fulminate._"
             | 
             | " _Don 't be curmudgeonly. Thoughtful criticism is fine,
             | but please don't be rigidly or generically negative._"
             | 
             | " _Please don 't post shallow dismissals, especially of
             | other people's work. A good critical comment teaches us
             | something._"
             | 
             | " _Please respond to the strongest plausible interpretation
             | of what someone says, not a weaker one that 's easier to
             | criticize._"
             | 
             | https://news.ycombinator.com/newsguidelines.html
        
           | superfrank wrote:
           | On one hand, I'm incredibly impressed by the technology
           | behind that demo. On the other hand, I can't think of many
           | things that would piss me off more than a non-deterministic
           | operating system.
           | 
           | I like my tools to be predictable. Google search trying to
           | predict that I want the image or shopping tag based on my
           | query already drives me crazy. If my entire operating system
           | did that, I'm pretty sure I'd throw my computer out a window.
        
             | iLoveOncall wrote:
             | > incredibly impressed by the technology behind that demo
             | 
             | An LLM generating some HTML?
        
               | superfrank wrote:
               | At a speed that feels completely seamless to navigate
               | through. Yeah, I'm pretty impressed by that.
        
           | spamfilter247 wrote:
           | My takeaway from the demo is less that "it's different each
           | time", but more a "it can be different for different users
           | and their styles of operating" - a poweruser can now see a
           | different Settings UI than a basic user, and it can be
           | generated realtime based on the persona context of the user.
           | 
           | Example use case (chosen specifically for tech): An IDE UI
           | that starts basic, and exposes functionality over time as the
           | human developer's skills grow.
        
           | superconduct123 wrote:
           | That looks both cool and infuriating
        
           | throwaway314155 wrote:
           | I would bet good money that many of the functions they chose
           | not to drill down into (such as settings -> volume) do
           | nothing at all or cause an error.
           | 
           | It's a fronted generator. It's fast. That's cool. But is
           | being pitched as a functioning OS generator and I can't help
           | but think it isn't given the failure rates for those sorts of
           | tasks. Further, the success rates for HTML generation
           | probably _are_ good enough for a Holmes-esque (perhaps too
           | harsh) rugpull (again, too harsh) demo.
           | 
           | A cool glimpse into what the future might look like in any
           | case.
        
         | cjcenizal wrote:
         | My friend Eric Pelz started a company called Malleable to do
         | this very thing: https://www.linkedin.com/posts/epelz_every-
         | piece-of-software...
        
         | jonny_eh wrote:
         | An ever-shifting UI sounds unlearnable, and therefore unusable.
        
           | dang wrote:
           | It wouldn't be unlearnable if it fits the way the user is
           | already thinking.
        
             | guappa wrote:
             | AI is not mind reading.
        
               | NitpickLawyer wrote:
               | A sufficiently advanced prediction engine is
               | indistinguishable from mind reading :D
        
               | dang wrote:
               | Behavioral patterns are not unpredictable. Who knows how
               | far an LLM could get by pattern-matching what a user is
               | doing and generating a UI to make it easier. Since the
               | user could immediately say whether they liked it or not,
               | this could turn into a rapid and creative feedback loop.
        
           | OtherShrezzing wrote:
           | A mixed ever-shifting UI can be excellent though. So you've
           | got some tools which consistently interact with UI
           | components, but the UI itself is altered frequently.
           | 
           | Take for example world-building video games like Cities
           | Skylines / Sim City or procedural sandboxes like Minecraft.
           | There are 20-30 consistent buttons (tools) in the game's UX,
           | while the rest of the game is an unbounded ever-shifting UI.
        
             | skydhash wrote:
             | The rest of the game is very deterministic where its state
             | is controlled by the buttons. The slight variation is
             | caused by the simulation engine and follows consistent
             | patterns (you can't have building on fire if there's no
             | building yet).
        
           | sotix wrote:
           | Like Spotify ugh
        
           | 9rx wrote:
           | Tools like v0 are a primitive example of what the above is
           | talking about. The UI maintains familiar conventions, but is
           | laid out dynamically based on surrounding context. I'm sure
           | there are still weird edge cases, but for the most part
           | people have no trouble figuring out how to use the output of
           | such tools already.
        
         | semi-extrinsic wrote:
         | Humans are shit at interacting with systems in a non-linear
         | way. Just look at Jupyter notebooks and the absolute mess that
         | arises when you execute code blocks in arbitrary order.
        
         | stoisesky wrote:
         | This talk https://www.youtube.com/watch?v=MbWgRuM-7X8 explores
         | the idea of generative / malleable personal user interfaces
         | where LLMs can serve as the gateway to program how we want our
         | UI to be rendered.
        
         | stuartmemo wrote:
         | It's probably Jira. https://medium.com/question-park/all-
         | aboard-the-ai-train-b03...
        
       | bedit wrote:
       | I love the "people spirits" analogy. For casual tasks like
       | vibecoding or boiling an egg, LLM errors aren't a big deal. But
       | for critical work, we need rigorous checks--just like we do with
       | human reasoning. That's the core of empirical science: we expect
       | fallibility, so we verify. A great example is how early migration
       | theories based on pottery were revised with better data like
       | ancient DNA (see David Reich). Letting LLMs judge each other
       | without solid external checks misses the point--leaderboard-style
       | human rankings are often just as flawed.
        
       | nilirl wrote:
       | Where do these analogies break down?
       | 
       | 1. Similar cost structure to electricity, but non-essential
       | utility (currently)?
       | 
       | 2. Like an operating system, but with non-determinism?
       | 
       | 3. Like programming, but ...?
       | 
       | Where does the programming analogy break down?
        
         | rudedogg wrote:
         | > programming
         | 
         | The programming analogy is convenient but off. The joke has
         | always been "the computer only does exactly what you tell it to
         | do!" regarding logic bugs. Prompts and LLMs most certainly do
         | not work like that.
         | 
         | I loved the parallels with modern LLMs and time sharing he
         | presented though.
        
           | diggan wrote:
           | > Prompts and LLMs most certainly do not work like that.
           | 
           | It quite literally works like that. The computer is now OS +
           | user-land + LLM runner + ML architecture + weights + system
           | prompt + user prompt.
           | 
           | Taken together, and since you're adding in probabilities (by
           | using ML/LLMs), you're quite literally getting "the computer
           | only does exactly what you tell it to do!", it's just that we
           | have added "but make slight variations to what tokens you
           | select next" (temperature>0.0) sometimes, but it's still the
           | same thing.
           | 
           | Just like when you tell the computer to create encrypted
           | content by using some seed. You're getting exactly what you
           | asked for.
        
         | politelemon wrote:
         | only in English, and also non-deterministic.
        
           | malux85 wrote:
           | Yeah, wherever possible I try to have the llm answer me in
           | Python rather than English (especially when explaining new
           | concepts)
           | 
           | English is soooooo ambiguous
        
             | falcor84 wrote:
             | For what it's worth, I've been using it to help me learn
             | math, and I added to my rules an instruction that it should
             | always give me an example in Python (preferably sympy)
             | whenever possible.
        
         | PeterStuer wrote:
         | Define non-essenti
         | 
         | The way I see dependency in office ("knowledge") work:
         | 
         | - pre-(computing) history. We are at the office, we work
         | 
         | - dawn of the pc: my computer is down, work halts
         | 
         | - dawn of the lan: the network is down, work halts
         | 
         | - dawn of the Internet: the Internet connection is down, work
         | halts (<- we are basically all here)
         | 
         | - dawn of the LLM: ChatGPT is down, work halts (<- for many, we
         | are here already)
        
           | nilirl wrote:
           | I see your point. It's nearing essential.
        
       | sothatsit wrote:
       | I find Karpathy's focus on tightening the feedback loop between
       | LLMs and humans interesting, because I've found I am the happiest
       | when I extend the loop instead.
       | 
       | When I have tried to "pair program" with an LLM, I have found it
       | incredibly tedious, and not that useful. The insights it gives me
       | are not that great if I'm optimising for response speed, and it
       | just frustrates me rather than letting me go faster. Worse, often
       | my brain just turns off while waiting for the LLM to respond.
       | 
       | OTOH, when I work in a more async fashion, it feels freeing to
       | just pass a problem to the AI. Then, I can stop thinking about it
       | and work on something else. Later, I can come back to find the AI
       | results, and I can proceed to adjust the prompt and re-generate,
       | to slightly modify what the LLM produced, or sometimes to just
       | accept its changes verbatim. I really like this process.
        
         | geeunits wrote:
         | I would venture that 'tightening the feedback loop' isn't
         | necessarily 'increasing the number of back and forth prompts'-
         | and what you're saying you want is ultimately his argument.
         | i.e. if integral enough it can almost guess what you're going
         | to say next...
        
           | sothatsit wrote:
           | I specifically do not want AI as an auto-correct, doing auto-
           | predictions while I am typing. I find this interrupts my
           | thinking process, and I've never been bottlenecked by typing
           | speed anyway.
           | 
           | I want AI as a "co-worker" providing an alternative
           | perspective or implementing my specific instructions, and
           | potentially filling in gaps I didn't think about in my
           | prompt.
        
         | jwblackwell wrote:
         | Yeah I am currently enjoying giving the LLM relatively small
         | chunks of code to write and then asking it to write
         | accompanying tests. While I focus on testing the product
         | myself. I then don't even bother to read the code it's written
         | most of the time
        
       | dmitrijbelikov wrote:
       | I think that Andrej presents "Software 3.0" as a revolution, but
       | in essence it is a natural evolution of abstractions.
       | 
       | Abstractions don't eliminate the need to understand the
       | underlying layers - they just hide them until something goes
       | wrong.
       | 
       | Software 3.0 is a step forward in convenience. But it is not a
       | replacement for developers with a foundation, but a tool for
       | acceleration, amplification and scaling.
       | 
       | If you know what is under the hood -- you are irreplaceable. If
       | you do not know -- you become dependent on a tool that you do not
       | always understand.
        
         | poorcedural wrote:
         | Foundational programmers form the base of where the seed can
         | grow.
         | 
         | In a way programmers found where our roots grow, they can not
         | find your limits.
         | 
         | Software 3.0 is a step into a different light, where software
         | finds its own limits.
         | 
         | If we know where they are rooted, we will merge their best
         | attempts. Only because we appreciate their resultant behavior.
        
       | ast0708 wrote:
       | Should we not treat LLMs more as a UX feature to interact with a
       | domain specific model (highly contextual), rather than expecting
       | LLMs to provide the intelligence needed for software to act as
       | partner to Humans.
        
         | guappa wrote:
         | He's selling something.
        
           | rvz wrote:
           | Someone is thinking.
        
       | alightsoul wrote:
       | why does vibe coding still involve any code at all? why can't an
       | AI directly control the registers of a computer processor and
       | graphics card, controlling a computer directly? why can't it draw
       | on the screen directly, connected directly to the rows and
       | columns of an LCD screen? what if an AI agent was implemented in
       | hardware, with a processor for AI, a normal computer processor
       | for logic, and a processor that correlates UI elements to touches
       | on the screen? and a network card, some RAM for temporary stuff
       | like UI elements and some persistent storage for vectors that
       | represent UI elements and past converstations
        
         | flumpcakes wrote:
         | I'm not sure this makes sense as a question. Registers are
         | 'controlled' by running code for a given state. An AI can write
         | code that changes registers, as all code does in operation. An
         | AI can't directly 'control registers' in any other way, just as
         | you or I can't.
        
           | singularity2001 wrote:
           | what he means is why are the tokens not directly machine code
           | tokens
        
             | flumpcakes wrote:
             | What is meant by a 'machine code token'? Ultimately a
             | processor needs assembly code as input to do anything.
             | Registers are set by assembly. Data is read by assembly.
             | Hardware is managed through assembly (for example by
             | setting bits in memory). Either I have a complete
             | misunderstanding on what this thread is talking about, or
             | others are commenting with some fundamental assumptions
             | that aren't correct.
        
           | alightsoul wrote:
           | I would like to make an AI agent that directly interfaces
           | with a processor by setting bits in a processor register,
           | thus eliminating the need for even assembly code or any kind
           | of code. The only software you would ever need would be the
           | AI.
        
             | shakna wrote:
             | That's called a JIT compiler. And ignoring how bad an idea
             | blending those two... It wouldn't be that difficult a task.
             | 
             | The hardest parts of a jit is the safety aspect. And AI
             | already violates most of that.
        
               | alightsoul wrote:
               | The safety part will probably be either solved or a non-
               | issue or ignored. Similarly to how GPT3 was often seen as
               | dangerous before ChatGPT was released. Some people who
               | have only ever vibe coded are finding jobs today,
               | ignoring safety entirely and lacking a notion of it or
               | what it means. They just copy paste output from ChatGPT
               | or an agentic IDE. To me it's JIT already with extra
               | steps. Or they have pivoted their software engineers to
               | vibe coding most of the time and don't even touch code
               | anymore doing JIT with extra steps again.
        
               | shakna wrote:
               | As "jit" to you means running code, and not "building and
               | executing machine code", maybe you could vibe code this.
               | And enjoy the segfaults.
        
               | guappa wrote:
               | In a way he's making sense. If the "code" is the prompt,
               | the output of the llm is an intermediate artifact, like
               | the intermediate steps of gcc.
               | 
               | So why should we still need gcc?
               | 
               | The answer is of course, that we need it because llm's
               | output is shit 90% of the time and debugging assembly or
               | binary directly is even harder, so putting asides the
               | difficulties of training the model, the output would be
               | unusable.
        
               | shakna wrote:
               | Probably too much snark from me. But the gulf between
               | interpreter and compiler can be decades of work, often
               | discovering new mathematical principles along the way.
               | 
               | The idea that you're fine to risk everything, in the way
               | agentic things allow [0], and _want_ that messing around
               | with raw memory is... A return to DOS ' crashes, but with
               | HAL along for the ride.
               | 
               | [0] https://msrc.microsoft.com/update-
               | guide/vulnerability/CVE-20...
        
               | guappa wrote:
               | Ah don't worry, llms are a return to crashes as it is :)
               | 
               | The other day it managed to produce code that made python
               | segfault.
        
               | flumpcakes wrote:
               | It's not a JIT. A JIT produces assembly. You can't "set
               | registers" or do anything useful without assembly code
               | running on the processor.
        
             | flumpcakes wrote:
             | This makes no sense at all. You can't set registers without
             | assembly code. If you could set registers without assembly
             | code then it would be pointless as the registers wouldn't
             | be 'running' against anything.
        
         | birn559 wrote:
         | Because any precise description of what the computer is
         | supposed to do is already code as we know it. AI can fill in
         | the gaps between natural language and programming by guessing
         | and because you don't always care about the "how" only about
         | the "what". The more you care about the "how" you have to
         | become more precise in your language to reduce the guess work
         | of the AI to the point that your input to the AI is already
         | code.
         | 
         | The question is: how much do we really care about the "how",
         | even when we think we care about it? Modern programming
         | language don't do guessing work, but they already abstract away
         | quite a lot of the "how".
         | 
         | I believe that's the original argument in favor of coding in
         | assembler and that it will stay relevant.
         | 
         | Following this argument, what AI is really missing is
         | determinism to a far extend. I can't just save my input I have
         | given to an AI and can be sure that it will produce the exact
         | same output in a year from now on.
        
           | alightsoul wrote:
           | With vibe coding, I am under the impression that the only
           | thing that matters for vibe coders is whether the output is
           | good enough in the moment to fullfill a desire. For companies
           | going AI first that's how it seems to be done. I see people
           | in other places and those people have lost interest in the
           | "how"
        
         | therein wrote:
         | All you need is a framebuffer and AI.
        
         | abhaynayar wrote:
         | Nice try, AI.
        
       | belter wrote:
       | Painful to watch. The new tech generation deserves better than
       | hyped presentations from tech evangelists.
       | 
       | This reminds me of the Three Amigos and Grady Booch evangelizing
       | the future of software while ignoring the terrible output from
       | Rational Software and the Unified Process.
       | 
       | At least we got acknowledgment that self-driving remains
       | unsolved: https://youtu.be/LCEmiRjPEtQ?t=1622
       | 
       | And Waymo still requires extensive human intervention. Given
       | Tesla's robotaxi timeline, this should crash their stock
       | valuation...but likely won't.
       | 
       | You can't discuss "vibe coding" without addressing security
       | implications of the produced artifacts, or the fact that you're
       | building on potentially stolen code, books, and copyrighted
       | training data.
       | 
       | And what exactly is Software 3.0? It was mentioned early then
       | lost in discussions about making content "easier for agents."
        
         | digianarchist wrote:
         | In his defense he clearly articulated that meaningful change
         | has not yet been achieved and could be a decade away. Even
         | pointing to specific examples of LLMs failing to count letters
         | and do basic arithmetic.
         | 
         | What I find absent is where do we go from LLMs? More hardware,
         | more training. "This isn't the scientific breakthrough you're
         | looking for".
        
       | nottorp wrote:
       | In the era of AI and illiteracy...
        
       | abdullin wrote:
       | Tight feedback loops are the key in working productively with
       | software. I see that in codebases up to 700k lines of code
       | (legacy 30yo 4GL ERP systems).
       | 
       | The best part is that AI-driven systems are fine with running
       | even more tight loops than what a sane human would tolerate.
       | 
       | Eg. running full linting, testing and E2E/simulation suite after
       | any minor change. Or generating 4 versions of PR for the same
       | task so that the human could just pick the best one.
        
         | OvbiousError wrote:
         | I don't think the human is the problem here, but the time it
         | takes to run the full testing suite.
        
           | Byamarro wrote:
           | I work in web dev, so people sometimes hook code formatting
           | as a git commit hook or sometimes even upon file save. The
           | tests are problematic tho. If you work at huge project it's a
           | no go idea at all. If you work at medium then the tests are
           | long enough to block you, but short enough for you not to be
           | able to focus on anything else in the meantime.
        
           | diggan wrote:
           | It is kind of a human problem too, although that the full
           | testing suite takes X hours to run is also not fun, but it
           | makes the human problem larger.
           | 
           | Say you're Human A, working on a feature. Running the full
           | testing suite takes 2 hours from start to finish. Every
           | change you do to existing code needs to be confirmed to not
           | break existing stuff with the full testing suite, so some
           | changes it takes 2 hours before you have 100% understanding
           | that it doesn't break other things. How quickly do you lose
           | interest, and at what point do you give up to either improve
           | the testing suite, or just skip that feature/implement it
           | some other way?
           | 
           | Now say you're Robot A working on the same task. The robot
           | doesn't care if each change takes 2 hours to appear on their
           | screen, the context is exactly the same, and they're still "a
           | helpful assistant" 48 hours later when they still try to get
           | the feature put together without breaking anything.
           | 
           | If you're feeling brave, you start Robot B and C at the same
           | time.
        
             | abdullin wrote:
             | This is the workflow that ChatGPT Codex demonstrates
             | nicely. Launch any number of <<robotic>> tasks in parallel,
             | then go on your own. Come back later to review the results
             | and pick good ones.
        
               | diggan wrote:
               | Well, they're demonstrating it _somewhat_ , it's more of
               | a prototype today. First tell is the low limit, I think
               | the longest task for me been 15 minutes before it gives
               | up. Second tell is still using a chat UI which is simple
               | to implement, easy to implement and familiar, but also
               | kind of lazy. There should be a better UX, especially
               | with the new variations they just added. From the top of
               | my head, some graph-like UX might have been better.
        
               | abdullin wrote:
               | I guess, it depends on the case and the approach.
               | 
               | It works really nice with the following approach
               | (distilled from experiences reported by multiple
               | companies)
               | 
               | (1) Augment codebase with explanatory texts that describe
               | individual modules, interfaces and interactions
               | (something that is needed for the humans anyway)
               | 
               | (2) Provide Agent.MD that describes the
               | approach/style/process that the AI agent must take. It
               | should also describe how to run all tests.
               | 
               | (3) Break down the task into smaller features. For each
               | feature - ask first to write a detailed implementation
               | plan (because it is easier to review the plan than 1000
               | lines of changes. spread across a dozen files)
               | 
               | (4) Review the plan and ask to improve it, if needed.
               | When ready - ask to draft an actual pull request
               | 
               | (5) The system will automatically use all available
               | tests/linting/rules before writing the final PR. Verify
               | and provide feedback, if some polish is needed.
               | 
               | (6) Launch multiple instances of "write me an
               | implementation plan" and "Implement this plan" task, to
               | pick the one that looks the best.
               | 
               | This is very similar to git-driven development of large
               | codebases by distributed teams.
               | 
               | Edit: added newlines
        
               | diggan wrote:
               | > distilled from experiences reported by multiple
               | companies
               | 
               | Distilled from my experience, I'd still say that the UX
               | is lacking, as sequential chat just isn't the right
               | format. I agree with Karpathy that we haven't found the
               | right way of interacting with these OSes yet.
               | 
               | Even with what you say, variations were implemented in a
               | rush. Once you've iterated with one variation you can not
               | at the same time iterate on another variant, for example.
        
             | TeMPOraL wrote:
             | Worked in such a codebase for about 5 years.
             | 
             | No one really cares about improving test times. Everyone
             | either suffers in private or gets convinced it's all normal
             | and look at you weird when you suggest something needs to
             | be done.
        
               | diggan wrote:
               | There a few of us around, but it's not a lot, agree. It
               | really is an uphill battle trying to get development
               | teams to design and implement test suites the same way
               | they do with other "more important" code.
        
           | londons_explore wrote:
           | The full test suite is probably tens of thousands of tests.
           | 
           | But AI will do a pretty decent job of telling you which tests
           | are most likely to fail on a given PR. Just run those ones,
           | then commit. Cuts your test time from hours down to seconds.
           | 
           | Then run the full test suite only periodically and
           | automatically bisect to find out the cause of any
           | regressions.
           | 
           | Dramatically cuts the compute costs of tests too, which in
           | big codebase can easily become whole-engineers worth of
           | costs.
        
             | tele_ski wrote:
             | It's an interesting idea, but reactive, and could cause big
             | delays due to bisecting and testing on those regressions.
             | There's the 'old' saying that the sooner the bug is found
             | the cheaper it is to fix, seems weird to intentionally push
             | finding side effect bugs later in the process because
             | faster CI runs. Maybe AI will get there but it seems too
             | aggressive right now to me. But yeah, put the automation
             | slider where you're comfortable.
        
           | tlb wrote:
           | Yes, and (some near-future) AI is also more patient and
           | better at multitasking than a reasonable human. It can make a
           | change, submit for full fuzzing, and if there's a problem it
           | can continue with the saved context it had when making the
           | change. It can work on 100s of such changes in parallel,
           | while a human trying to do this would mix up the reasons for
           | the change with all the other changes they'd done by the time
           | the fuzzing result came back.
           | 
           | LLMs are worse at many things than human programmers, so you
           | have to try to compensate by leveraging the things they're
           | better at. Don't give up with "they're bad at such and such"
           | until you've tried using their strengths.
        
             | HappMacDonald wrote:
             | You can't run N bots in parallel with testing between each
             | attempt unless you're also running N tests in parallel.
             | 
             | If you could run N tests in parallel, then you could
             | probably also run the components of one test in parallel
             | and keep it from taking 2 hours in the first place.
             | 
             | To me this all sounds like snake oil to convince people to
             | do something they were already doing, but by also spinning
             | up N times as many compute instances and run a burn endless
             | tokens along the way. And by the time it's demonstrated
             | that it _doesn 't_ really offer anything more than doing it
             | yourself, well you've already given them all of your money
             | so their job is done.
        
               | abdullin wrote:
               | Running tests is already an engineering problem.
               | 
               | In one of the systems (supply chain SaaS) we invested so
               | much effort in having good tests in a simulated
               | environment, that we could run full-stack tests at kHz.
               | Roughly ~5k tests per second or so on a laptop.
        
           | abdullin wrote:
           | Humans tend to lack inhumane patience.
        
           | 9rx wrote:
           | Unless you are doing something crazy like letting the fuzzer
           | run on every change (cache that shit), the full test suite
           | taking a long time suggests that either your isolation points
           | are _way_ too large or you are letting the LLM cross isolated
           | boundaries and  "full testing suite" here actually means
           | "multiple full testing suites". The latter is an easy fix:
           | Don't let it. Force it stay within a single isolation zone
           | just like you'd expect of a human. The former is a lot harder
           | to fix, but I suppose ending up there is a strong indicator
           | that you can't trust the human picking the best LLM result in
           | the first place and that maybe this whole thing isn't a good
           | idea for the people in your organization.
        
         | yahoozoo wrote:
         | The problem is that every time you run your full automation
         | with linting and tests, you're filling up the context window
         | more and more. I don't know how people using Claude do it with
         | its <300k context window. I get the "your message will exceed
         | the length of this chat" message so many times.
        
           | diggan wrote:
           | I don't know exactly how Claude works, but the way I work
           | around this with my own stuff is prompting it to not display
           | full outputs ever, and instead temporary redirect the output
           | somewhere then grep from the log-file what it's looking for.
           | So a test run outputting 10K lines of test output and one
           | failure is easily found without polluting the context with
           | 10K lines.
        
           | the_mitsuhiko wrote:
           | I started to use sub agents for that. That does not pollute
           | the context as much
        
           | abdullin wrote:
           | Claude's approach is currently a bit dated.
           | 
           | Cursor.sh agents or especially OpenAI Codex illustrate that a
           | tool doesn't need to keep on stuffing context window with
           | irrelevant information in order to make progress on a task.
           | 
           | And if really needed, engineers report that Gemini Pro 2.5
           | keeps on working fine within 200k-500k token context. Above
           | that - it is better to reset the context.
        
         | latexr wrote:
         | > Or generating 4 versions of PR for the same task so that the
         | human could just pick the best one.
         | 
         | That sounds awful. A truly terrible and demotivating way to
         | work and produce anything of real quality. Why are we doing
         | this to ourselves and embracing it?
         | 
         | A few years ago, it would have been seen as a joke to say "the
         | future of software development will be to have a million monkey
         | interns banging on one million keyboards and submit a million
         | PRs, then choose one". Today, it's lauded as a brilliant
         | business and cost-saving idea.
         | 
         | We're beyond doomed. The first major catastrophe caused by
         | sloppy AI code can't come soon enough. The sooner it happens,
         | the better chance we have to self-correct.
        
           | bonoboTP wrote:
           | If it's monkeylike quality and you need a million tries, it's
           | shit. It you need four tries and one of those is top-tier
           | professional programmer quality, then it's good.
        
             | agos wrote:
             | if the thing producing the four PRs can't distinguish the
             | top tier one, I have strong doubts that it can even produce
             | it
        
               | solaire_oa wrote:
               | Making 4 PRs for a well-known solution sounds insane,
               | yes, but to be the devil's advocate, you could plausibly
               | be working with an ambiguous task: "Create 4 PRs with 4
               | different dependency libraries, so that I can compare
               | their implementations." Technically it wouldn't need to
               | pick the best one.
               | 
               | I have apprehension about the future of software
               | engineering, but comparison does technically seem like a
               | valid use case.
        
             | layer8 wrote:
             | The problem is, for any change, you have to understand the
             | existing code base to assess the quality of the change in
             | the four tries. This means, you aren't relieved from being
             | familiar with the code and reviewing everything. For many
             | developers this review-only work style isn't an exciting
             | prospect.
             | 
             | And it will remain that way until you can delegate
             | development tasks to AI with a 99+% success rate so that
             | you don't have to review their output and understand the
             | code base anymore. At which point developers will become
             | truly obsolete.
        
             | solaire_oa wrote:
             | Top-tier professional programmer quality is exceedingly,
             | impractically optimistic, for a few reasons.
             | 
             | 1. There's a low probability of that in the first place.
             | 
             | 2. You need to be a top-tier professional programmer to
             | recognize that type of quality (i.e. a junior engineer
             | could select one of the 3 shit PRs)
             | 
             | 3. When it doesn't produce TTPPQ, you wasted tons of time
             | prompting and reviewing shit code and still need to
             | deliver, net negative.
             | 
             | I'm not doubting the utility of LLMs but the scattershot
             | approach just feels like gambling to me.
        
               | zelphirkalt wrote:
               | Also as a consequence of (1) the LLMs are trained on
               | mediocre code mostly, so they often output mediocre or
               | bad solutions.
        
           | diggan wrote:
           | > A truly terrible and demotivating way to work and produce
           | anything of real quality
           | 
           | You clearly have strong feelings about it, which is fine, but
           | it would be much more interesting to know exactly why it
           | would terrible and demotivating, and why it cannot produce
           | anything of quality? And what is "real quality" and does that
           | mean "fake quality" exists?
           | 
           | > million monkey interns banging on one million keyboards and
           | submit a million PRs
           | 
           | I'm not sure if you misunderstand LLMs, or the famous
           | "monkeys writing Shakespeare" part, but that example is more
           | about randomness and infinity than about probabilistic
           | machines somewhat working towards a goal with some non-
           | determinism.
           | 
           | > We're beyond doomed
           | 
           | The good news is that we've been doomed for a long time, yet
           | we persist. If you take a look at how the internet is
           | basically held up by duct-tape at this point, I think you'd
           | feel slightly more comfortable with how crap absolutely
           | everything is. Like 1% of software is actually Good Software
           | while the rest barely works on a good day.
        
             | 3dsnano wrote:
             | > And what is "real quality" and does that mean "fake
             | quality" exists?
             | 
             | I think there is no real quality or fake quality, just
             | quality. I am referencing the quality that Persig and C.
             | Alexander have written about.
             | 
             | It's... qualitative, so it's hard to measure but easy to
             | feel. Humans are really good at perceiving it then making
             | objective decisions. LLMs don't know what it is (they've
             | heard about it and think they know).
        
               | abdullin wrote:
               | It is actually funny that current AI+Coding tools benefit
               | a lot from domain context and other information along the
               | lines of Domain-Driven Design (which was inspired by the
               | pattern language of C. Alexander).
               | 
               | A few teams have started incorporating `CONTEXT.MD` into
               | module descriptions to leverage this.
        
               | diggan wrote:
               | > LLMs don't know what it is
               | 
               | Of course they don't, they're probability/prediction
               | machines, they don't "know" anything, not even that Paris
               | is the capital of France. What they do "know" is that
               | once someone writes "The capital of France is", the most
               | likely tokens to come after that, is "Paris". But they
               | don't understand the concept, nor anything else, just
               | that probably 54123 comes after 6723 (or whatever the
               | tokens are).
               | 
               | Once you understand this, I think it's easy to reason
               | about _why_ they don 't understand code quality, why they
               | _couldn 't_ ever understand it, and how you can make them
               | output quality code regardless.
        
             | bgwalter wrote:
             | If "AI" worked (which fortunately isn't the case), humans
             | would be degraded to passive consumers in the last domain
             | in which they were active creators: thinking.
             | 
             | Moreover, you would have to pay centralized corporations
             | that stole all of humanity's intellectual output for
             | engaging in your profession. That is terrifying.
             | 
             | The current reality is also terrifying: Mediocre developers
             | are enabled to have a 10x volume (not quality). Mediocre
             | execs like that and force everyone to use the "AI"
             | snakeoil. The profession becomes even more bureaucratic,
             | tool oriented and soulless.
             | 
             | People without a soul may not mind.
        
               | diggan wrote:
               | > If "AI" worked (which fortunately isn't the case),
               | humans would be degraded to passive consumers in the last
               | domain in which they were active creators: thinking.
               | 
               | "AI" (depending on what you understand that to be) is
               | already "working" for many, including myself. I've
               | basically stopped using Google because of it.
               | 
               | > humans would be degraded to passive consumers in the
               | last domain in which they were active creators: thinking
               | 
               | Why? I still think (I think at least), why would I stop
               | thinking just because I have yet another tool in my
               | toolbox?
               | 
               | > you would have to pay centralized corporations that
               | stole all of humanity's intellectual output for engaging
               | in your profession
               | 
               | Assuming we'll forever be stuck in the "mainframe" phase,
               | then yeah. I agree that local models aren't really close
               | to SOTA yet, but the ones you can run locally can already
               | be useful in a couple of focused use cases, and judging
               | by the speed of improvements, we won't always be stuck in
               | this mainframe-phase.
               | 
               | > Mediocre developers are enabled to have a 10x volume
               | (not quality).
               | 
               | In my experience, which admittedly been mostly in
               | startups and smaller companies, this has always been the
               | case. Most developers seem to like to produce MORE code
               | over BETTER code, I'm not sure why that is, but I don't
               | think LLMs will change people's mind about this, in
               | either direction. Shitty developers will be shit, with or
               | without LLMs.
        
               | zelphirkalt wrote:
               | The AI as it is currently, will not come up with that new
               | app idea or that clever innovative way of implementing an
               | application. It will endlessly rehash the training data
               | it has ingested. Sure, you can tell an AI to spit out a
               | CRUD, and maybe it will even eventually work in some sane
               | way, but that's not innovative and not necessarily a good
               | software. It is blindly copying existing approaches to
               | implement something. That something is then maybe even
               | working, but lacks any special sauce to make it special.
               | 
               | Example: I am currently building a web app. My goal is to
               | keep it entirely static, traditional template rendering,
               | just using the web as a GUI framework. If I had just told
               | the AI to build this, it would have thrown tons of JS at
               | the problem, because that is what the mainstream does
               | these days, and what it mostly saw as training data. Then
               | my back button would most likely no longer work, I would
               | not be able to use bookmarks properly, it would not
               | automatically have an API as powerful as the web UI,
               | usable from any script, and the whole thing would have
               | gone to shit.
               | 
               | If the AI tools were as good as I am at what I am doing,
               | and I relied upon that, then I would not have spent time
               | trying to think of the principles of my app, as I did
               | when coming up with it myself. As it is now, the AI would
               | not even have managed to prevent duplicate results from
               | showing up in the UI, because I had a GPT4 session about
               | how to prevent that, and none of the suggested AI answers
               | worked and in the end I did what I thought I might have
               | to do when I first discovered the issue.
        
               | diggan wrote:
               | > The AI as it is currently, will not come up with that
               | new app idea or that clever innovative way of
               | implementing an application
               | 
               | Who has claimed that they can do that sort of stuff? I
               | don't think my comment hints at that, nor does the talk
               | in the submission.
               | 
               | You're absolutely right with most of your comment, and
               | seem to just be rehashing what Karpathy talks about but
               | with different words. Of course it won't create good
               | software unless you specify exactly what "good software"
               | is for you, and tell it that. Of course it won't know you
               | want "traditional static template rendering" unless you
               | tell it to. Of course it won't create a API you can use
               | from anywhere unless you say so. Of course it'll follow
               | what's in the training data. Of course things won't
               | automatically implement whatever you imagine your project
               | should have, unless you tell it about those features.
               | 
               | I'm not sure if you're just expanding on the talk but
               | chose my previous comment to attach it to, or if you're
               | replying to something I said in my comment.
        
           | koakuma-chan wrote:
           | > That sounds awful. A truly terrible and demotivating way to
           | work and produce anything of real quality
           | 
           | This is the right way to work with generative AI, and it
           | already is an extremely common and established practice when
           | working with image generation.
        
             | notTooFarGone wrote:
             | I can recognize images in one look.
             | 
             | How about that 400 Line change that touches 7 files?
        
               | koakuma-chan wrote:
               | In my prompt I ask the LLM to write a short summary of
               | how it solved the problem, run multiple instances of LLM
               | concurrently, compare their summaries, and use the output
               | of whichever LLM seems to have interpreted instructions
               | the best, or arrived at the best solution.
        
               | elt895 wrote:
               | And you trust that the summary matches what was actually
               | done? Your experience with the level of LLMs
               | understanding of code changes must significantly differ
               | from mine.
        
               | koakuma-chan wrote:
               | It matched every time so far.
        
               | abdullin wrote:
               | Exactly!
               | 
               | This is why there has to be "write me a detailed
               | implementation plan" step in between. Which files is it
               | going to change, how, what are the gotchas, which tests
               | will be affected or added etc.
               | 
               | It is easier to review one document and point out missing
               | bits, than chase the loose ends.
               | 
               | Once the plan is done and good, it is usually a smooth
               | path to the PR.
        
               | bayindirh wrote:
               | So you can create a more buggy code remixed from scraped
               | bits from the internet which you don't understand, but
               | somehow works rather than creating a higher quality,
               | tighter code which takes the same amount of time to type?
               | All the while offloading all the work to something else
               | so your skills can atrophy at the same time?
               | 
               | Sounds like progress to me.
        
               | abdullin wrote:
               | Here is another way to look at the problem.
               | 
               | There is a team of 5 people that are passionate about
               | their indigenous language and want to preserve it from
               | disappearing. They are using AI+Coding tools to:
               | 
               | (1) Process and prepare a ton of various datasets for
               | training custom text-to-speech, speech-to-text models and
               | wake word models (because foundational models don't know
               | this language), along with the pipelines and tooling for
               | the contributors.
               | 
               | (2) design and develop an embedded device (running
               | ESP32-S3) to act as a smart speaker running on the edge
               | 
               | (3) design and develop backend in golang to orchestrate
               | hundreds of these speakers
               | 
               | (4) a whole bunch of Python agents (essentially glorified
               | RAGs over folklore, stories)
               | 
               | (5) a set of websites for teachers to create course
               | content and exercises, making them available to these
               | edge devices
               | 
               | All that, just so that kids in a few hundred
               | kindergartens and schools would be able to practice their
               | own native language, listen to fairy tales, songs or ask
               | questions.
               | 
               | This project was acknowledged by the UN (AI for Good
               | programme). They are now extending their help to more
               | disappearing languages.
               | 
               | None of that was possible before. This sounds like a good
               | progress to me.
               | 
               | Edit: added newlines.
        
               | mistersquid wrote:
               | > I can recognize images in one look.
               | 
               | > How about that 400 Line change that touches 7 files?
               | 
               | Karpathy discusses this discrepancy. In his estimation
               | LLMs currently do not have a UI comparable to 1970s CLI.
               | Today, LLMs output text and text does not leverage the
               | human brain's ability to ingest visually coded
               | information, literally, at a glance.
               | 
               | Karpathy surmises UIs for LLMs are coming and I suspect
               | he's correct.
        
               | variadix wrote:
               | The thing required isn't a GUI for LLMs, it's a visual
               | model of code that captures all the behavior and is a
               | useful representation to a human. People have floated
               | this idea before LLMs, but as far as I know there isn't
               | any real progress, probably because it isn't feasible.
               | There's so much intricacy and detail in software (and
               | getting it even slightly wrong can be catastrophic), any
               | representation that can capture said detail isn't going
               | to be interpretable at a glance.
        
               | mistersquid wrote:
               | > The thing required isn't a GUI for LLMs, it's a visual
               | model of code that captures all the behavior and is a
               | useful representation to a human.
               | 
               | The visual representation that would be useful to humans
               | is what Karpathy means by "GUI for LLMs".
        
               | skydhash wrote:
               | There's no visual model for code as code isn't 2d.
               | There's 2 mechanism in the turing machine model: a state
               | machine and a linear representation of code and data. The
               | 2d representation of state machine has no significance
               | and the linear aspect of code and data is hiding more
               | dimensions. We invented more abstractions, but nothing
               | that map to a visual representation.
        
             | deadbabe wrote:
             | It is not. The _right_ way to work with generative AI is to
             | get the right answer in the first shot. But it 's the AI
             | that is not living up to this promise.
             | 
             | Reviewing 4 different versions of AI code is grossly
             | unproductive. A human co-worker can submit one version of
             | code and usually have it accepted with a single review, no
             | other "versions" to verify. 4 versions means you're reading
             | 75% more code than is necessary. Multiply this across every
             | change ever made to a code base, and you're wasting a
             | shitload of time.
        
               | koakuma-chan wrote:
               | > Reviewing 4 different versions of AI code is grossly
               | unproductive.
               | 
               | You can have another AI do that for you. I review
               | manually for now though (summaries, not the code, as I
               | said in another message).
        
               | RHSeeger wrote:
               | That's not really comparing apples to apples though.
               | 
               | > A human co-worker can submit one version of code and
               | usually have it accepted with a single review, no other
               | "versions" to verify.
               | 
               | But that human co-worker spent a lot of time generating
               | what is being reviewed. You're trading "time saved
               | coding" for "more time reviewing". You can't complain
               | about the added time reviewing and then ignore all the
               | time saved coding. THat's not to say it's necessarily a
               | win, but it _is_ a tradeoff.
               | 
               | Plus that co-worker may very well have spent some time
               | discussing various approaches to the problem (with you),
               | with is somewhat parallel to the idea of reviewing 4
               | different PRs.
        
             | xphos wrote:
             | "If the only tool you have is a hammer, you tend to see
             | every problem as a nail."
             | 
             | I think the worlds leaning dangerously into LLMs expecting
             | them to solve every problem under the sun. Sure AI can
             | solve problems but I think that domain 1 they Karpathy
             | shows if it is the body of new knowledge in the world
             | doesn't grow with LLMs and agents maybe generation and
             | selection is the best method for working with domain 2/3
             | but there is something fundamentally lost in the rapid
             | embrace of these AI tools.
             | 
             | A true challenge question for people is would you give up
             | 10 points of IQ for access to the next gen AI model? I
             | don't ask this in the sense that AI makes people stupid but
             | rather that it frames the value of intelligence is that you
             | have it. Rather than, in how you can look up or generate an
             | answer that may or may not be correct quickly. How we use
             | our tools deeply shapes what we will do in the future. A
             | cautionary tale is US manufacturing of precision tools
             | where we give up on teaching people how to use Lathes,
             | because they could simply run CNC machines instead. Now
             | that industry has an extreme lack of programmers for CNC
             | machines, making it impossible to keep up with other
             | precision instrument producing countries. This of course is
             | a normative statement and has more complex variables but I
             | fear in this dead set charge for AI we will lose sight of
             | what makes programming languages and programming in general
             | valuable
        
           | osigurdson wrote:
           | I'm not sure that AI code has to be sloppy. I've had some
           | success with hand coding some examples and then asking codex
           | to rigorously adhere to prior conventions. This can end up
           | with very self consistent code.
           | 
           | Agree though on the "pick the best PR" workflow. This is pure
           | model training work and you should be compensated for it.
        
             | elif wrote:
             | Yep this is what Andrej talks about around 20 minutes into
             | this talk.
             | 
             | You have to be extremely verbose in describing all of your
             | requirements. There is seemingly no such thing as too much
             | detail. The second you start being vague, even if it WOULD
             | be clear to a person with common sense, the LLM views that
             | vagueness as a potential aspect of it's own creative
             | liberty.
        
               | jebarker wrote:
               | > the LLM views that vagueness as a potential aspect of
               | it's own creative liberty.
               | 
               | I think that anthropomorphism actually clouds what's
               | going on here. There's no creative choice inside an LLM.
               | More description in the prompt just means more
               | constraints on the latent space. You still have no
               | certainty whether the LLM models the particular part of
               | the world you're constraining it to in the way you hope
               | it does though.
        
               | 9rx wrote:
               | _> You have to be extremely verbose in describing all of
               | your requirements. There is seemingly no such thing as
               | too much detail._
               | 
               | If only there was a language one could use that enables
               | describing all of your requirements in a unambiguous
               | manner, ensuring that you have provided all the necessary
               | detail.
               | 
               | Oh wait.
        
               | joshuahedlund wrote:
               | > You have to be extremely verbose in describing all of
               | your requirements. There is seemingly no such thing as
               | too much detail
               | 
               | I understand YMMV, but I have yet to find a use case
               | where this takes me less time than writing the code
               | myself.
        
               | SirMaster wrote:
               | I'm really waiting for AI to get on par with the common
               | sense of most humans in their respective fields.
        
               | diggan wrote:
               | I think you'll be waiting for a very long time. Right now
               | we have programmable LLMs, so if you're not getting the
               | results, you need to reprogram it to give the results you
               | want.
        
               | pja wrote:
               | > You have to be extremely verbose in describing all of
               | your requirements. There is seemingly no such thing as
               | too much detail.
               | 
               | Sounds like ... programming.
               | 
               | Program specification is programming, ultimately. For any
               | given problem if you're lucky the specification is
               | concise & uniquely defines the required program. If
               | you're unlucky the spec ends up longer than the code
               | you'd write to implement it, because the language you're
               | writing it in is less suited to the problem domain than
               | the actual code.
        
           | ponector wrote:
           | >That sounds awful.
           | 
           | Not for the cloud provider. AWS bill to the moon!
        
           | chamomeal wrote:
           | I say this all the time!
           | 
           | Does anybody really want to be an assembly line QA reviewer
           | for an automated code factory? Sounds like shit.
           | 
           | Also I can't really imagine that in the first place. At my
           | current job, each task is like 95% understanding all the
           | little bits, and then 5% writing the code. If you're
           | reviewing PRs from a bot all day, you'll still need to
           | understand all the bits before you accept it. So how much
           | time is that really gonna save?
        
             | diggan wrote:
             | > Does anybody really want to be an assembly line QA
             | reviewer for an automated code factory? Sounds like shit.
             | 
             | On the other hand, does anyone really wanna be a code-
             | monkey implementing CRUD applications over and over by
             | following product specifications by "product managers" that
             | barely seem to understand the product they're "managing"?
             | 
             | See, we can make bad faith arguments both ways, but what's
             | the point?
        
               | nevertoolate wrote:
               | Issue is if product people will do the "coding" and you
               | have to fix it is miserable
        
               | diggan wrote:
               | Even worse would be if we asked the accountants to do the
               | coding, then you'll learn what miserable means.
               | 
               | What was the point again?
        
               | nevertoolate wrote:
               | Yes
        
               | consumer451 wrote:
               | I hesitate to divide a group as diverse as software devs
               | into two categories, but here I go:
               | 
               | I have a feeling that devs who love LLM coding tools are
               | more product-driven than those who hate them.
               | 
               | Put another way, maybe devs with their own product ideas
               | love LLM coding tools, devs without them do not.
               | 
               | I am genuinely not trying to throw shade here in any way.
        
         | bandoti wrote:
         | Here's a few problems I foresee:
         | 
         | 1. People get lazy when presented with four choices they had no
         | hand in creating, and they don't look over the four and just
         | click one, ignoring the others. Why? Because they have ten more
         | of these on the go at once, diminishing their overall focus.
         | 
         | 2. Automated tests, end-to-end sim., linting, etc--tools
         | already exist and work at scale. They should be robust and
         | THOROUGHLY reviewed by both AI and humans ideally.
         | 
         | 3. AI is good for code reviews and "another set of eyes" but
         | man it makes serious mistakes sometimes.
         | 
         | An anecdote for (1), when ChatGPT tries to A/B test me with two
         | answers, it's incredibly burdensome for me to read twice
         | virtually the same thing with minimal differences.
         | 
         | Code reviewing four things that do almost the same thing is
         | more of a burden than writing the same thing once myself.
        
           | abdullin wrote:
           | A simple rule applies: "No matter what tool created the code,
           | you are still responsible for what you merge into main".
           | 
           | As such, task of verification, still falls on hands of
           | engineers.
           | 
           | Given that and proper processes, modern tooling works nicely
           | with codebases ranging from 10k LOC (mixed embedded device
           | code with golang backends and python DS/ML) to 700k LOC
           | (legacy enterprise applications from the mainframe era)
        
             | bandoti wrote:
             | Agreed. I think engineers though following simple Test-
             | Driven Development procedures can write the code, unit
             | tests, integration tests, debug, etc for a small enough
             | unit by default forces tight feedback loops. AI may assist
             | in the particulars, not run the show.
             | 
             | I'm willing to bet, short of droid-speak or some AI output
             | we can't even understand, that when considering "the system
             | as a whole", that even with short-term gains in speed, the
             | longevity of any product will be better with real people
             | following current best-practices, and perhaps a modest
             | sprinkle of AI.
             | 
             | Why? Because AI is trained on the results of human
             | endeavors and can only work within that framework.
        
               | abdullin wrote:
               | Agreed. AI is just a tool. Letting in run the show is
               | essentially what the vibe-coding is. It is a fun activity
               | for prototyping, but tends to accumulate problems and
               | tech debt at an astonishing pace.
               | 
               | Code, manually crafted by professionals, will almost
               | always beat AI-driven code in quality. Yet, one has still
               | to find such professionals and wait for them to get the
               | job done.
               | 
               | I think, the right balance is somewhere in between - let
               | tools handle the mundane parts (e.g. mechanically
               | rewriting that legacy Progress ABL/4GL code to Kotlin),
               | while human engineers will have fun with high-level tasks
               | and shaping the direction of the project.
        
             | ponector wrote:
             | > As such, task of verification, still falls on hands of
             | engineers.
             | 
             | Even before LLM it was a common thing to merge changes
             | which completely brake test environment. Some people really
             | skip verification phase of their work.
        
             | xpe wrote:
             | > A simple rule applies: "No matter what tool created the
             | code, you are still responsible for what you merge into
             | main".
             | 
             | Beware of claims of simple rules.
             | 
             | Take one subset of the problem: code reviews in an
             | organizational environment. How well does they simple rule
             | above work?
             | 
             | The idea of "Person P will take responsibility" is far from
             | clear and often not a good solution. (1) P is fallible. (2)
             | Some consequences are too great to allow one person to
             | trigger them, which is why we have systems and checks. (3)
             | P cannot necessarily right the wrong. (4) No-fault analyses
             | are often better when it comes to long-term solutions which
             | require a fear free culture to reduce cover-ups.
             | 
             | But this is bigger than one organization. The effects of
             | software quickly escape organizational boundaries. So when
             | we think about giving more power to AI tooling, we have to
             | be really smart. This means understanding human nature,
             | decision theory, political economy [1], societal norms, and
             | law. And building smart systems (technical and
             | organizational)
             | 
             | Recommending good strategies for making AI generated code
             | safe is hard problem. I'd bet it is a much harder than even
             | "elite" software developers people have contemplated, much
             | less implemented. Training in software helps but is
             | insufficient. I personally have some optimism for formal
             | methods, defense in depth, and carefully implemented human-
             | in-the-loop systems.
             | 
             | [1] Political economy uses many of the tools of economics
             | to study the incentives of human decision making
        
           | eddd-ddde wrote:
           | With lazy people the same applies for everything, code they
           | do write, or code they review from peers. The issue is not
           | the tooling, but the hands.
        
             | freehorse wrote:
             | The more tedious the work is, the less motivation and
             | passion you get for doing it, and the more "lazy" you
             | become.
             | 
             | Laziness does not just come from within, there are
             | situations that promote behaving lazy, and others that
             | don't. Some people are just lazy most of the time, but most
             | people are "lazy" in some scenarios and not in others.
        
               | bandoti wrote:
               | Seurat created beautiful works of art composed of
               | thousands of tiny dots, painted by hand; one might find
               | it meditational with the right mindset.
               | 
               | Some might also find laziness itself dreadfully boring--
               | like all the Microsoft employees code-reviewing AI-
               | Generated pull requests!
               | 
               | https://blog.stackademic.com/my-new-hobby-watching-
               | copilot-s...
        
             | chamomeal wrote:
             | I am not a lazy worker but I guarantee you I will not
             | thoroughly read through and review four PRs for the same
             | thing
        
         | elif wrote:
         | In my experience with Jules and (worse) Codex, juggling
         | multiple pull requests at once is not advised.
         | 
         | Even if you tell the git-aware Jules to handle a merge conflict
         | within the context window the patch was generated, it is like
         | sorry bro I have no idea what's wrong can you send me a diff
         | with the conflict?
         | 
         | I find i have to be in the iteration loop at every stage or
         | else the agent will forget what it's doing or why rapidly. for
         | instance don't trust Jules to run your full test suite after
         | every change without handholding and asking for specific run
         | results every time.
         | 
         | It feels like to an LLM, gaslighting you with code that
         | nominally addresses the core of what you just asked while
         | completely breaking unrelated code or disregarding previously
         | discussed parameters is an unmitigated success.
        
         | layer8 wrote:
         | > Tight feedback loops are the key in working productively with
         | software. [...] even more tight loops than what a sane human
         | would tolerate.
         | 
         | Why would a sane human be averse to things happening
         | instantaneously?
        
       | benob wrote:
       | You can generate 1.0 programs with 3.0 programs. But can you
       | generate 2.0 programs the same way?
        
         | olmo23 wrote:
         | 2.0 programs (model weights) are created by running 1.0
         | programs (training runs).
         | 
         | I don't think it's currently possible to ask a model to
         | generate the weights for a model.
        
           | movedx01 wrote:
           | But you can generate synthetic data using a 3.0 program to
           | train a smaller, faster, cheaper-to-run 2.0 program.
        
       | amai wrote:
       | The quite good blog post mentioned by Karpathy for working with
       | LLMs when building software:
       | 
       | - https://blog.nilenso.com/blog/2025/05/29/ai-assisted-coding/
       | 
       | See also:
       | 
       | - https://news.ycombinator.com/item?id=44242051
        
         | mkw5053 wrote:
         | I like the idea of having a single source of truth RULES.md,
         | however I'm wondering why you used symlinks as opposed to the
         | ability to link/reference other files in cursor rules,
         | CLAUDE.md, etc. I understand that functionality doesn't exist
         | for all coding agents, but I think it gives you more
         | flexibility when composing rules files (for example you can
         | have the standard cursor rules headers and then point to
         | @RULES.md lower in the file)
        
       | blobbers wrote:
       | Software 3.0 is the code generated by the machine, not the
       | prompts that generated it. The prompts don't even yield the same
       | output; there is randomness.
       | 
       | The new software world is the massive amount of code that will be
       | burped out by these agents, and it should quickly dwarf the human
       | output.
        
         | pelagicAustral wrote:
         | I think that if you give the same task to three different
         | developers you'll get three different implementations. It's not
         | a random result if you do get the functionality that was
         | expected, and at that, I do think the prompt plays an important
         | role in offering a view of how the result was achieved.
        
           | klabb3 wrote:
           | > I think that if you give the same task to three different
           | developers you'll get three different implementations.
           | 
           | Yes, but if you want them to be compatible you need to define
           | a protocol and conformance test suite. This is way more work
           | than writing a single implementation.
           | 
           | The code is the real spec. Every piece of unintentional non-
           | determinism can be a hazard. That's why you want the code to
           | be the unit of maintenance, not a prompt.
        
             | imiric wrote:
             | I know! Let's encode the spec into a format that doesn't
             | have the ambiguities of natural language.
        
               | klabb3 wrote:
               | Right. Great idea. Maybe call it "formal execution spec
               | for LLM reference" or something. It could even be
               | versioned in some kind of distributed merkle tree.
        
         | tamersalama wrote:
         | How I understood it is that natural language will form
         | relatively large portions of stacks (endpoint descriptions,
         | instructions, prompts, documentations, etc...). In addition to
         | code generated by agents (which would fall under 1.0)
        
         | poorcedural wrote:
         | It is not the code, which just like prompts is a written
         | language. Software 3.0 will be branches of behaviors, by the
         | software and by the users all documented in a feedback loop.
         | The best behaviors will be merged by users and the best will
         | become the new HEAD. Underneath it all will be machine code for
         | the hardware, but it will be the results that dictate progress.
        
         | fritzo wrote:
         | Code is read much more often than it is written. Code generated
         | by the machine today will be prompt read by the machine going
         | forward. It's a closed loop.
         | 
         | Software is a world in motion. Software 1.0 was animated by
         | developers pushing it around. Software 3.0 is additionally
         | animated by AI agents.
        
       | politelemon wrote:
       | The beginning was painful to watch as is the cheering in this
       | comment section.
       | 
       | The 1.0, 2.0, and 3.0 simply aren't making sense. They imply a
       | kind of a succession and replacement and demonstrate a lack of
       | how programming works. It sounds as marketing oriented as "Web
       | 3.0" that has been born inside an echo chamber. And yet halfway
       | through, the need for determinism/validation is now being
       | reinvented.
       | 
       | The analogies make use of cherry picked properties, which could
       | apply to anything.
        
         | monsieurbanana wrote:
         | > "Because they all have slight pros and cons, and you may want
         | to program some functionality in 1.0 or 2.0, or 3.0, or you're
         | going to train in LLM, or you're going to just run from LLM"
         | 
         | He doesn't say they will fully replace each other (or had fully
         | replaced each other, since his definition of 2.0 is quite old
         | by now)
        
           | whiplash451 wrote:
           | I think Andrej is trying to elevate the conversation in an
           | interesting way.
           | 
           | That in and on itself makes it worth it.
           | 
           | No one has a crystal clear view of what is happening, but at
           | least he is bringing a novel and interesting perspective to
           | the field.
        
         | amelius wrote:
         | The version numbers mean abrupt changes.
         | 
         | Analogy: how we "moved" from using Google to ChatGPT is an
         | abrupt change, and we still use Google.
        
         | mentalgear wrote:
         | The whole AI scene is starting to feel a lot like the
         | cryptocurrency bubble before it burst. Don't get me wrong,
         | there's real value in the field, but the hype, the influencers,
         | and the flashy "salon tricks" are starting to drown out
         | meaningful ML research (like Apple's critical research that
         | actually improves AI robustness). It's frustrating to see solid
         | work being sidelined or even mocked in favor of vibe-coding.
         | 
         | Meanwhile, I asked this morning Claude 4 to write a simple EXIF
         | normalizer. After two rounds of prompting it to double-check
         | its code, I still had to point out that it makes no sense to
         | load the entire image for re-orientating if the EXIF
         | orientation is fine in the first place.
         | 
         | Vibe vs reality, and anyone actually working in the space daily
         | can attest how brittle these systems are.
        
           | rxtexit wrote:
           | I think part of the problem is that people have the wrong
           | mental models currently.
           | 
           | I am a non-software engineer and I fully expect someday to be
           | a professional "vibe coder". It will be within a domain
           | though and not a generalist like a real software engineer.
           | 
           | I think "vibe coding" in this context will have a type of
           | relationship to software engineering the way excel has a
           | relationship to the professional mathematician.
           | 
           | The knocks on "vibe coding" by software engineers are like a
           | mathematician shitting on Excel for not being able to do
           | symbolic manipulation.
           | 
           | It is not wrong but missing the forest for the trees.
        
       | fergie wrote:
       | There were some cool ideas- I particularly liked "psychology of
       | AI"
       | 
       | Overall though I really feel like he is selling the idea that we
       | are going to have to pay large corporations to be able to write
       | code. Which is... terrifying.
       | 
       | Also, as a lazy developer who is always trying to make AI do my
       | job for me, it still kind of sucks, and its not clear that it
       | will make my life easier any time soon.
        
         | guappa wrote:
         | I think it used to be like that before the GNU people made gcc,
         | completely destroying the market of compilers.
         | 
         | > Also, as a lazy developer who is always trying to make AI do
         | my job for me, it still kind of sucks, and its not clear that
         | it will make my life easier any time soon.
         | 
         | Every time I have to write a simple self contained couple of
         | functions I try... and it gets it completely wrong.
         | 
         | It's easier to just write it myself rather than to iterate 50
         | times and hope it will work, considering iterations are also
         | very slow.
        
           | ykonstant wrote:
           | At least proprietary compilers were software you owned and
           | could be airgapped from any network. You didn't create
           | software by tediously negotiating with compilers running on
           | remote machines controlled by a tech corp that can undercut
           | you on whatever you are trying to build (but of course they
           | will not, it says so in the Agreement, and other tales of the
           | fantastic).
        
         | teekert wrote:
         | He says that now we are in the mainframe phase. We will hit the
         | personal computing phase hopefully soon. He says llama (and
         | DeepSeek?) are like Linux in a way, OpenAI and Claude are like
         | Windows and MacOS.
         | 
         | So, No, he's actually saying it may be everywhere for cheap
         | soon.
         | 
         | I find the talk to be refreshingly intellectually honest and
         | unbiased. Like the opposite of a cringey LinkedIn post on AI.
        
           | mirkodrummer wrote:
           | Being Linux is not a good thing imo, it took decades for tech
           | like proton to run Windows games reliably, if not better as
           | now, than Windows does. Software is still mostly develop for
           | Windows and macOS. Not to mention the Linux Desktop that
           | never took off, I mean one could mention Android but there is
           | a large corporation behind it. Sure Linux is successfull in
           | many ways, it's embedded everywhere but nowhere near being
           | the OS of the everyday people, "traditional linux desktop"
           | never took off
        
         | geraneum wrote:
         | On a tangent, I find the analogies interesting as well.
         | However, while Karpathy is an expert in Computer Science, NLP
         | and machine vision, his understanding of how human psychology
         | and brain work is as good as you an I (non-experts). So I take
         | some of those comparisons as a lay person's feelings about the
         | subject. Still, they are fun to listen to.
        
       | pera wrote:
       | Is it possible to vibe code NFT smart contracts with Software
       | 3.0?
        
       | romain_batlle wrote:
       | Can't believe they wanted to postpone this video by a few weeks
        
         | dang wrote:
         | No one wanted to! I think we might have bitten off more than we
         | could chew in terms of video production. There is a _lot_ of
         | content to publish.
         | 
         | Once it was clear how high the demand was for this talk, the
         | team adapted quickly.
         | 
         | That's how it goes sometimes! Future iterations will be
         | different.
        
       | William_BB wrote:
       | [flagged]
        
       | iLoveOncall wrote:
       | He sounds like Terrence Howard with his nonsense.
        
       | mentalgear wrote:
       | Meanwhile, I asked this morning Claude 4 to write a simple EXIF
       | normalizer. After two rounds of prompting it to double-check its
       | code, I still had to point out that it makes no sense to load the
       | entire image for re-orientating if the EXIF orientation is fine
       | in the first place.
       | 
       | Vibe vs reality, and anyone actually working in the space daily
       | can attest how brittle these systems are.
       | 
       | Maybe this changes in SWE with more automated tests in verifiable
       | simulators, but the real world is far to complex to simulate in
       | its vastness.
        
         | diggan wrote:
         | > Meanwhile
         | 
         | What do you mean "meanwhile", that's exactly (among other
         | things) the kind of stuff he's talking about? The various
         | frictions and how you need to approach it
         | 
         | > anyone actually working in the space
         | 
         | Is this trying to say that Karpathy doesn't "actually work"
         | with LLMs or in the ML space?
         | 
         | I feel like your whole comment is just reacting to the title of
         | the YouTube video, rather than actually thinking and reflecting
         | on the content itself.
        
           | demaga wrote:
           | I'm pretty sure "actually work" part refers to SWE space
           | rather than LLM/ML space
        
         | coreyh14444 wrote:
         | https://theeducationist.info/everything-amazing-nobody-happy...
        
           | belter wrote:
           | AI Snake Oil: https://press.princeton.edu/books/hardcover/978
           | 0691249131/ai...
        
         | ramon156 wrote:
         | The real question is how long it'll take until they're not
         | brittle
        
           | kubb wrote:
           | Or will they ever be reliable. Your question is already
           | making an assumption.
        
             | diggan wrote:
             | They're reliable already if you change the way you approach
             | them. These probabilistic token generators probably never
             | will be "reliable" if you expect them to 100% always output
             | exactly what you had in mind, without iterating in user-
             | space (the prompts).
        
               | kubb wrote:
               | I also think they might never become reliable.
        
               | diggan wrote:
               | But _what does that mean_? If you tell the LLM  "Say just
               | 'hi' without any extra words or explanations", do you not
               | get "hi" back from it?
        
               | TeMPOraL wrote:
               | That's literally _the_ wrong way to use LLMs though.
               | 
               | LLMs think in tokens, the less they emit the dumber they
               | are, so asking them to be concise, or to give the answer
               | before explanation, is _extremely_ counterproductive.
        
               | diggan wrote:
               | I was trying to make a point regarding "reliability", not
               | a point about how to prompt or how to use them for work.
        
               | TeMPOraL wrote:
               | This _is_ relevant. Your example may be simple enough,
               | but for anything more complex, letting the model have its
               | space to think /compute is critical to reliability - if
               | you starve it for compute, you'll get more
               | errors/hallucinations.
        
               | diggan wrote:
               | Yeah I mean I agree with you, but I'm still not sure how
               | it's relevant. I'd also urge people to have unit tests
               | they treat as production code, and proper system prompts,
               | and X and Y, but it's really beyond the original point of
               | "LLMs aren't reliable" which is the context in this sub-
               | tree.
        
               | kubb wrote:
               | Sometimes I get "Hi!", sometimes "Hey!".
        
               | diggan wrote:
               | Which model? Just tried a bunch of ChatGPT, OpenAI's API,
               | Claude, Anthropic's API and DeepSeek's API with both chat
               | and reasonee, every single one replied with a single
               | "hi".
        
               | throwdbaaway wrote:
               | o3-mini-2025-01-31 with high reasoning effort replied
               | with "Hi" after 448 reasoning tokens.
               | 
               | gpt-4.5-preview-2025-02-27 replied with "Hi!"
        
               | diggan wrote:
               | > o3-mini-2025-01-31 with high reasoning effort replied
               | with "Hi" after 448 reasoning tokens.
               | 
               | I got "hi", as expected. What is the full system prompt +
               | user message you're using?
               | 
               | https://i.imgur.com/Y923KXB.png
               | 
               | > gpt-4.5-preview-2025-02-27
               | 
               | Same "hi": https://i.imgur.com/VxiIrIy.png
        
               | flir wrote:
               | There is a bar below which they are reliable.
               | 
               | "Write a Python script that adds three numbers together".
               | 
               | Is that bar going up? I think it probably is, although
               | not as fast/far as some believe. I also think that
               | "unreliable" can still be "useful".
        
             | vFunct wrote:
             | Its perfectly reliable for the things you know it to be,
             | such as operations within its context window size.
             | 
             | Don't ask LLMs to "Write me Microsoft Excel".
             | 
             | Instead, ask it to "Write a directory tree view for the
             | Open File dialog box in Excel".
             | 
             | Break your projects down into the smallest chunks you can
             | for the LLMs. The more specific you are, the more reliable
             | it's going to be.
             | 
             | The rest of this year is going to be companies figuring out
             | how to break down large tasks into smaller tasks for LLM
             | consumption.
        
             | dist-epoch wrote:
             | I remember when people were saying here on HN that AIs will
             | never be able to generate picture of hands with just 5
             | fingers because they just "don't have common sense"
        
           | guappa wrote:
           | [?]
        
           | yahoozoo wrote:
           | "Treat it like a junior developer" ... 5 years later ...
           | "Treat it like a junior developer"
        
             | agile-gift0262 wrote:
             | while True:                 print("This model that just
             | came out changes everything. It's flawless. It doesn't have
             | any of the issues the model from 6 months ago had. We are 1
             | year away from AGI and becoming jobless")
             | sleep(timedelta(days=180).total_seconds)
        
             | TeMPOraL wrote:
             | Usable LLMs are 3 years old at this point. ChatGPT, not
             | Github Copilot, is the marker.
        
               | LtWorf wrote:
               | Usable for fun yes.
        
         | ApeWithCompiler wrote:
         | A manager in our company introduced Gemini as a chat bot
         | coupled to our documentation.
         | 
         | > It failed to write out our company name.The rest was flawed
         | with hallucinations also, hardly worth to mention.
         | 
         | I wish this is a rage bait towards others, but what should me
         | feelings be? After all this is the tool thats sold to me, I am
         | expected to work with.
        
           | gorbachev wrote:
           | We had exactly the opposite experience. CoPilot was able to
           | answer questions accurately and reformatted the existing
           | documentation to fit the context of users' questions, which
           | made the information much easier to understand.
           | 
           | Code examples, which we offer as sort of reference
           | implementations, were also adopted to fit the specific
           | questions without much issues. Granted these aren't whole
           | applications, but 10 - 25 line examples of doing API setup /
           | calls.
           | 
           | We didn't, of course, just send users' questions directly to
           | CoPilot. Instead there's a bit of prompt magic behind the
           | scenes that tweaks the context so that CoPilot can produce
           | better quality results.
        
         | sensanaty wrote:
         | There's also those instances where Microsoft unleashed Copilot
         | on the .NET repo, and it resulted in the most hilariously
         | terrible PRs that required the maintainers to basically tell
         | Copilot every single step it should take to fix the issue. They
         | were basically writing the PRs themselves at that point, except
         | doing it through an intermediary that was much dumber, slower
         | and less practical than them.
         | 
         | And don't get me started on my own experiences with these
         | things, and no, I'm not a luddite, I've tried my damndest and
         | have followed all the cutting-edge advice you see posted on HN
         | and elsewhere.
         | 
         | Time and time again, the reality of these tools falls flat on
         | their face while people like Andrej hype things up as if we're
         | 5 minutes away from having Claude become Skynet or whatever, or
         | as he puts it, before we enter the world of "Software 3.0"
         | (coincidentally totally unrelated to Web 3.0 and the grift we
         | had to endure there, I'm sure).
         | 
         | To intercept the common arguments,
         | 
         | - no I'm not saying LLMs are useless or have no usecases
         | 
         | - yes there's a possibility if you extrapolate by current
         | trends (https://xkcd.com/605/) that they indeed will be Skynet
         | 
         | - yes I've tried the latest and greatest model released 7
         | minutes ago to the best of my ability
         | 
         | - yes I've tried giving it prompts so detailed a literal infant
         | could follow along and accomplish the task
         | 
         | - yes I've fiddled with providing it more/less context
         | 
         | - yes I've tried keeping it to a single chat rather than
         | multiple chats, as well as vice versa
         | 
         | - yes I've tried Claude Code, Gemini Pro 2.5 With Deep
         | Research, Roocode, Cursor, Junie, etc.
         | 
         | - yes I've tried having 50 different "agents" running and only
         | choosing the best output form the lot.
         | 
         | I'm sure there's a new gotcha being written up as we speak,
         | probably something along the lines of "Well for me it doubled
         | my productivity!" and that's great, I'm genuinely happy for you
         | if that's the case, but for me and my team who have been trying
         | diligently to use these tools for anything that wasn't a
         | microscopic toy project, it has fallen apart time and time
         | again.
         | 
         | The idea of an application UI or god forbid an entire fucking
         | Operating System being run via these bullshit generators is
         | just laughable to me, it's like I'm living on a different
         | planet.
        
           | diggan wrote:
           | You're not the first, nor the last person, to have a
           | seemingly vastly different experience than me and others.
           | 
           | So I'm curious, what am I doing differently from what you
           | did/do when you try them out?
           | 
           | This is maybe a bit out there, but would you be up for
           | sending me like a screen recording of exactly what you're
           | doing? Or maybe even a video call sharing your screen? I'm
           | not working in the space, have no products or services to
           | sell, only curious is why this gap seemingly exists between
           | you and me, and my only motive would be to understand if I'm
           | the one who is missing something, or there are more effective
           | ways to help people understand how they can use LLMs and what
           | they can use them for.
           | 
           | My email is on my profile if you're up for it. Invitation
           | open for others in the same boat as parent too.
        
             | bsenftner wrote:
             | I'm a greybeard, 45+ years coding, including active in AI
             | during the mid 80's and used it when it applied throughout
             | my entire career. That career being media and animation
             | production backends, where the work is both at the
             | technical and creative edge.
             | 
             | I currently have an AI integrated office suite, which has
             | attorneys, professional writers, and political activists
             | using the system. It is office software, word processing,
             | spreadsheets, project management and about two dozen types
             | of AI agents that act as virtual co-workers.
             | 
             | No, my users are not programmers, but I do have interns;
             | college students with anything from 3 to 10 years
             | experience writing software.
             | 
             | I see the same AI use problem issues with my users, and my
             | interns. My office system bends over backwards to address
             | this, but people are people: they do not realize that AI
             | does not know what they are talking about. They will
             | frequently ask questions with no preamble, no introduction
             | to the subject. They will change topics, not bothering to
             | start a new session or tell the AI the topic is now
             | different. There is a huge number of things they do, often
             | with escalating frustration evident in their prompts, that
             | all violate the same basic issue: the LLM was not given a
             | context to understand the subject at hand, and the user is
             | acting like many people and when explaining they go
             | further, past the point of confusion, now adding new
             | confusion.
             | 
             | I see this over and over. It frustrates the users to anger,
             | yet at the same time if they acted, communicated to a
             | human, in the same manner they'd have a verbal fight almost
             | instantly.
             | 
             | The problem is one of communications. ...and for a huge
             | number of you I just lost you. You've not been taught to
             | understand the power of communications, so you do not
             | respect the subject. How to communication is practically
             | everything when it comes to human collaboration. It is how
             | one orders their mind, how one collaborates with others,
             | AND how one gets AI to respond in the manner they desire.
             | 
             | But our current software development industry, and by
             | extension all of STEM has been short changed by never been
             | taught how to effectively communicate, no not at all.
             | Presentations and how to sell are not effective
             | communications, that's persuasion, about 5% of what it
             | takes to _convey understanding in others_ which then
             | _unblocks resistance to changes_.
        
               | diggan wrote:
               | But parent explicitly mentioned:
               | 
               | > - yes I've tried giving it prompts so detailed a
               | literal infant could follow along and accomplish the task
               | 
               | Which you are saying that might have missed in the end
               | regardless?
        
               | bsenftner wrote:
               | I'd like to see the prompt. I suspect that "literal
               | infant" is expected to be a software developer without
               | preamble. The initial sentence to an LLM carries far more
               | relevance, it sets the context stage to understand what
               | follows. If there is no introduction to the subject at
               | hand, the response will be just like anyone fed a wall of
               | words: confusion as to what all this is about.
        
               | diggan wrote:
               | You and me both :) But I always try to read the comments
               | here with the most charitable interpretation I can come
               | up with.
        
               | sensanaty wrote:
               | So AI is simultaneously going to take over everyone's job
               | and do literally everything, including being used as
               | application UI somehow... But you have to talk to it like
               | a moody teenager at their first job lest you get nothing
               | but garbage? I have to put just as much (and usually,
               | more) effort talking to this non-deterministic black box
               | as I would to an intern who joined a week ago to get
               | anything usable out of it?
               | 
               | Yeah, I'd rather just type things out myself, and
               | continue communicating with my fellow humans rather than
               | expending my limited time on this earth appeasing a
               | bullshit generator that's apparently going to make us all
               | jobless Soon(tm)
        
               | bsenftner wrote:
               | Consider that these AIs are trained on human
               | communications, they mirror that communication. They are
               | literally damaged document repair models, they use what
               | they are given to generate a response - statistically.
               | The fact that a question generates text that appears like
               | an answer is an exploited coincidence.
               | 
               | It's a perspective shift few seem to have considered: if
               | one wants an expert software developer from their AI,
               | they need to create an expert software developer's
               | context by using expert developer terminology that is
               | present in the training data.
               | 
               | One can take this to an extreme, and it works: read the
               | source code of an open source project and get and idea of
               | both the developer and their coding style. Write prompts
               | that mimic both the developer and their project, and
               | you'll find that the AI's context now can discuss that
               | project with surprising detail. This is because that
               | project is in the training data, the project is also
               | popular, meaning it has additional sites of tutorials and
               | people discussing use of that project, so a foundational
               | model ends up knowing quite a bit, if one knows how to
               | construct the context with that information.
               | 
               | This is, of course, tricky with hallucination, but that
               | can be minimized. Which is also why we will all become
               | aware of AI context management if we continue writing
               | software that incorporates AIs. I expect context
               | management is what was meant by prompt engineering.
               | Communicating within engineering disciplines has always
               | been difficult.
        
               | TeMPOraL wrote:
               | > _But you have to talk to it like a moody teenager at
               | their first job lest you get nothing but garbage?_
               | 
               | No, you have to talk to it like to an adult human being.
               | 
               | If one's doing so and still gets garbage results from
               | SOTA LLMs, that to me is a strong indication one also
               | cannot communicate with other human beings effectively.
               | It's literally _the same skill_. Such individual is
               | probably the kind of clueless person we all learn to
               | isolate and navigate around, because contrary to their
               | beliefs, they 're not the center of the world, and we
               | cannot actually read their mind.
        
           | ffsm8 wrote:
           | Unironically, your comment mirrors my opinion as of last
           | month.
           | 
           | Since then I've given it another try last week and was quite
           | literally mind blown how much it improved in the context of
           | Vibe coding (Claude code). It actually improved so much that
           | I thought "I would like to try that on my production
           | codebase", (mostly because I _want_ if to fail, because that
           | 's my job ffs) but alas - that's not allowed at my dayjob.
           | 
           | From the limited experience I could gather over the last week
           | as a software dev with over 10 yrs of experience (along with
           | another 5-10 doing it as a hobby before employment) I can say
           | that I expect our industry to get absolutely destroyed within
           | the next 5 yrs.
           | 
           | The skill ceiling for devs is going to get mostly squashed
           | for 90% of devs, this will inevitably destroy our collective
           | bargaining positions. Including for the last 10%, because the
           | competition around these positions will be even more fierce.
           | 
           | It's already starting, even if it's _currently_ very
           | misguided and mostly down to short-sightedness.
           | 
           | But considering the trajectory and looking at how naive
           | current llms coding tools are... Once the industry adjusts
           | and better tooling is pioneered... it's gonna get brutal.
           | 
           | And most certainly not limited to software engineering.
           | Pretty much all desk jobs will get hemorrhaged as soon as a
           | llm-player basically replaces SAP with entirely new tooling.
           | 
           | Frankly, I expect this to go bad, very very quickly. But I'm
           | still hoping for a good ending.
        
           | kypro wrote:
           | I think part of the problem is that code quality is somewhat
           | subjective and developers are of different skill levels.
           | 
           | If you're fine with things that kinda working okay and you're
           | not the best developer yourself then you probably think
           | coding agents work really really well because the slop they
           | produce isn't that much worse than yourself. In fact I know a
           | mid-level dev who believes agent AIs write better code than
           | himself.
           | 
           | If you're very critical of code quality then it's much
           | tougher... This is even more true in complex codebases where
           | simply following some existing pattern to add a new feature
           | isn't going to cut it.
           | 
           | The degree to which it helps any individual developer will
           | vary, and perhaps it's not that useful for yourself. For me
           | over the last few months the tech has got to the point where
           | I use it and trust it to write a fair percentage of my code.
           | Unit tests are an example where I find it does a really good
           | job.
        
             | diggan wrote:
             | > If you're very critical of code quality then it's much
             | tougher
             | 
             | I'm not sure, I'm hearing developers I know are sloppy and
             | produce shit code both having no luck with LLMs, and some
             | of them having lots of luck with them.
             | 
             | On the other side, those who really think about the
             | design/architecture and are very strict (which is the group
             | I'd probably put myself into, but who wouldn't?) are split
             | in a similar way.
             | 
             | I don't have any concrete proof, but I'm guessing
             | "expectations + workflow" differences would explain the
             | vast difference in perception of usefulness.
        
             | sensanaty wrote:
             | Listen, I won't pretend to be The God Emperor Of Writing
             | Code or anything of the sort, I'm realistically quite
             | mediocre/dead average in the grand scheme of things.
             | 
             | But literally yesterday, with Claude Code running 4 opus
             | (aka: The latest and greatest, to intercept the "dId YoU
             | tRy X" comment) which has full access to my entire Vue
             | codebase at work, that has dedicated rules files I pass to
             | it, that can see the fucking `.vue` file extension on every
             | file in the codebase, after prompting it to "generate this
             | vue component that does X, Y and Z" spat out React code at
             | me.
             | 
             | You don't have to be Bjarne Stroustrup to get annoyed at
             | this kinda stuff, and it happens _constantly_ for a billion
             | tiny things on the daily. The biggest pushers of AI have
             | finally started admitting that it 's not literally perfect,
             | but am I really supposed to pretend that this workflow of
             | having AIs generate dozens of PRs where a single one is
             | somewhat acceptable is somehow efficient or good?
             | 
             | It's great for random one-offs, sure, but is that really
             | deserving of this much _insane_ , blind hype?
        
           | crmi wrote:
           | I've got a working theory that models perform differently
           | when used in different timezones... As in during US working
           | hours they dont work as well due to high load. When used at
           | 'offpeak' hours not only are they (obviously) snappier but
           | the outputs appear to be a higher standard. Thought this for
           | a while but now noticing with Claude4 [thinking] recently.
           | Textbook case of anecdata of course though.
        
             | diggan wrote:
             | Interesting thought, if nothing less. Unless I
             | misunderstand, it would be easy to run a study to see if
             | this is true; use the API to send the same but slightly
             | different prompt (as to avoid the caches) which has a
             | definite answer, then run that once per hour for a week and
             | see if the accuracy oscillates or not.
        
               | crmi wrote:
               | Yes good idea - although it appears we would also have to
               | account for the possibility of providers nerfing their
               | models. I've read others also think models are being
               | quantized after a while to cut costs.
        
             | jim180 wrote:
             | Same! I did notice, a couples of months ago, that same
             | prompt in the morning failed and then, later that day, when
             | starting from scratch with identical prompts, the results
             | were much better.
        
           | crmi wrote:
           | To add to this, I ran into a lot of issues too. And similar
           | when using cursor... Until I started creating a mega list of
           | rules for it to follow that attaches to the prompts. Then
           | outputs improved (but fell off after the context window got
           | too large). At that stage I then used a prompt to summarize,
           | to continue with a new context.
        
         | Seanambers wrote:
         | Seems to me that this is just another level of throwing compute
         | at the problem.
         | 
         | Same way programs was way more efficient before and now they
         | are "bloated" with packages, abstractions, slow implementations
         | of algos and scaffolding.
         | 
         | The concept of what is good software development might be
         | changing as well.
         | 
         | LLMs might not write the best code, but they sure can write a
         | lot of it.
        
         | hombre_fatal wrote:
         | On the other hand, posts like this are like watching someone
         | writing ask jeeves search queries into google 20 years ago and
         | then gesturing how google sucks while everyone else in the room
         | has figured out how to be productive with it and cringes at his
         | "boomer" queries.
         | 
         | If you're still struggling to make LLMs useful for you by now,
         | you should probably ask someone. Don't let other noobs on HN
         | +1'ing you hold you back.
        
           | mirrorlake wrote:
           | Perhaps consider making some tutorials, then, and share your
           | wealth of knowledge rather than calling people stupid.
        
       | imiric wrote:
       | The slide at 13m claims that LLMs flip the script on technology
       | diffusion and give power to the people. Nothing could be further
       | from the truth.
       | 
       | Large corporations, which have become governments in all but
       | name, are the only ones with the capability to create ML models
       | of any real value. They're the only ones with access to vast
       | amounts of information and resources to train the models. They
       | introduce biases into the models, whether deliberately or not,
       | that reinforces their own agenda. This means that the models will
       | either avoid or promote certain topics. It doesn't take a genius
       | to imagine what will happen when the advertising industry
       | inevitably extends its reach into AI companies, if it hasn't
       | already.
       | 
       | Even open weights models which technically users can self-host
       | are opaque blobs of data that only large companies can create,
       | and have the same biases. Even most truly open source models are
       | useless since no individual has access to the same large datasets
       | that corporations use for training.
       | 
       | So, no, LLMs are the same as any other technology, and actually
       | make governments and corporations even more powerful than
       | anything that came before. The users benefit tangentially, if at
       | all, but will mostly be exploited as usual. Though it's
       | unsurprising that someone deeply embedded in the AI industry
       | would claim otherwise.
        
         | moffkalast wrote:
         | Well there are cases like OLMo where the process, dataset, and
         | model are all open source. As expected though, it doesn't
         | really compare well to the worst closed model since the dataset
         | can't contain vast amounts of stolen copyrighted data that
         | noticeably improves the model. Llama is not good because Meta
         | knows what they're doing, it's good because it was pretrained
         | on the entirety of Anna's Archive and every pirated ebook they
         | could get their hands on. Same goes for Elevenlabs and pirated
         | audiobooks.
         | 
         | Lack of compute on the Ai2's side also means the context OLMo
         | is trained for is miniscule, the other thing that you need to
         | throw brazillions of dollars at to make model that's maybe
         | useful in the end if you're very lucky. Training needs high GPU
         | interconnect bandwidth, it can't be done in distributed horde
         | in any meaningful way even if people wanted to.
         | 
         | The only ones who have the power now are the Chinese, since
         | they can easily ignore copyright for datasets, patents for
         | compute, and have infinite state funding.
        
       | khalic wrote:
       | His dismissal of smaller and local models suggests he
       | underestimates their improvement potential. Give phi4 a run and
       | see what I mean.
        
         | TeMPOraL wrote:
         | He ain't dismissing them. Comparing local/"open" model to Linux
         | (and closed services to Windows and MacOS) is high praise. It's
         | also accurate.
        
           | khalic wrote:
           | This is a bad comparison
        
         | sriram_malhar wrote:
         | Of all the things you could suggest, a lack of understanding is
         | not one that can be pinned on Karpathy. He does know his
         | technical stuff.
        
           | khalic wrote:
           | We all have blind spots
        
             | diggan wrote:
             | Sure, but maybe suggesting that the person who literally
             | spent countless hours educating others on how to build
             | small models locally from scratch, is lacking knowledge
             | about local small models is going a bit beyond "people have
             | blind spots".
        
               | khalic wrote:
               | Their potential, not how they work, it was very badly
               | formulated, just corrected it
        
         | diggan wrote:
         | > suggests a lack of understanding of these smaller models
         | capabilities
         | 
         | If anything, you're showing a lack of understanding of what he
         | was talking about. The context is this specific time, where
         | we're early in a ecosystem and things are expensive and likely
         | centralized (ala mainframes) but if his analogy/prediction is
         | correct, we'll have a "Linux" moment in the future where that
         | equation changes (again) and local models are competitive.
         | 
         | And while I'm a huge fan of local models run them for maybe
         | 60-70% of what I do with LLMs, they're nowhere near proprietary
         | ones today, sadly. I want them to, really badly, but it's
         | important to be realistic here and realize the differences of
         | what a normal consumer can run, and what the current mainframes
         | can run.
        
           | khalic wrote:
           | He understands the technical part, of course, I was referring
           | to his prediction that large models will be always be
           | necessary.
           | 
           | There is a point where an LLM is good enough for most tasks,
           | I don't need a megamind AI in order to greet clients, and
           | both large and small/medium model size are getting there,
           | with the large models hitting a computing/energy demand
           | barrier. The small models won't hit that barrier anytime
           | soon.
        
             | vikramkr wrote:
             | Did he predict they'd always be necessary? He mostly seemed
             | to predict the opposite, that we're at the early stage of a
             | trajectory that has yet to have it's Linux moment
        
               | khalic wrote:
               | I understand, thanks for pointing that out
        
           | khalic wrote:
           | I edited to make it clearer
        
         | mprovost wrote:
         | You can disagree with his conclusions but I don't think his
         | understanding of small models is up for debate. This is the
         | person who created micrograd/makemore/nanoGPT and who has
         | produced a ton of educational materials showing how to build
         | small and local models.
        
           | khalic wrote:
           | I'm going to edit, it was badly formulated, he underestimates
           | their potential for growth is what I meant by that
        
             | diggan wrote:
             | > underestimates their potential for growth
             | 
             | As far as I understood the talk and the analogies, he's
             | saying that local models will eventually replace the
             | current popular "mainframe" architecture. How is that
             | underestimating them?
        
         | dist-epoch wrote:
         | I tried the local small models. They are slow, much less
         | capable, and ironically much more expensive to run than the
         | frontier cloud models.
        
           | khalic wrote:
           | Phi4-mini runs on a basic laptop CPU at 20T/s... how is that
           | slow? Without optimization...
        
             | dist-epoch wrote:
             | I was running Qwen3-32B locally even faster, 70T/s, still
             | way too slow for me. I'm generating thousands of tokens of
             | output per request (not coding), running locally I could
             | get 6 mil tokens per day and pay electricity, or I can get
             | more tokens per day from Google Gemini 2.5 Flash for free.
             | 
             | Running models locally is a privilege for the rich and
             | those with too much disposable time.
        
       | imiric wrote:
       | It's fascinating to see his gears grinding at 22:55 when
       | acknowledging that a human still has to review the thousand lines
       | of LLM-generated code for bugs and security issues if they're
       | "actually trying to get work done". Yet these are the tools that
       | are supposed to make us hyperproductive? This is "Software 3.0"?
       | Give me a break.
        
         | rwmj wrote:
         | Plus coding is the fun bit, reviewing code is the hard and not
         | fun bit, arguing with an overconfident machine sound like it'll
         | be worse even than that. Thankfully I'm going to retire soon.
        
           | imiric wrote:
           | Agreed. Hell, even reviewing code can be fun and engaging,
           | especially if done in person. But it helps when the other
           | party can actually think, instead of automatically responding
           | with "You're right!", followed by changes that may or may not
           | make things worse.
           | 
           | It's as if software developers secretly hated their jobs and
           | found most tasks a chore, so they hired someone else to
           | poorly do the mechanical tasks for them, while ignoring the
           | tasks that actually matter. That's not software engineering,
           | programming, nor coding. It's some process of producing
           | shitty software for which we need new terminology to
           | describe.
           | 
           | I envy you for retiring. Good luck!
        
         | poorcedural wrote:
         | Because we are still using code as a proof that needs to be
         | proven. Software 3.0 will not be about reviewing legible code,
         | with its edge-cases and exploits and trying to impersonate
         | hardware.
        
       | bgwalter wrote:
       | I'd like to hear from Linux kernel developers. There is no
       | significant software that has been written (plagiarized) by "AI".
       | Why not ask the actual experts who deliver instead of talk?
       | 
       | This whole thing is a religion.
        
         | diggan wrote:
         | What counts as "significant software"? Only kernels I guess?
        
           | xvilka wrote:
           | Office software, CAD systems, Web Browsers, the list is long.
        
             | diggan wrote:
             | Microsoft (famously developing somewhat popular office-like
             | software) seems to be going in the direction of almost
             | forcing developers to use LLMs to assist with coding, at
             | least going by what people are willing to admit publicly
             | and seeing some GitHub activity.
             | 
             | Google (made a small browser or something) also develops
             | their own models, I don't think it's far fetched to imagine
             | there is at least one developer on the Chrome/Chromium team
             | that is trying to dogfood that stuff.
             | 
             | As for Autodesk, I have no idea what they're up to, but
             | corporate IT seems hellbent on killing themselves, not sure
             | Autodesk would do anything differently so they're probably
             | also trying to jam LLMs down their employees throats.
        
               | bgwalter wrote:
               | Microsoft is also selling "AI", so they want headlines
               | like "30% of our code is written by AI". So they force
               | open source developers to babysit the tools and suffer.
               | 
               | It's also an advertisement for potential "AI" military
               | applications that they undoubtedly propose after the
               | HoloLens failure:
               | 
               | https://www.theverge.com/2022/10/13/23402195/microsoft-
               | us-ar...
               | 
               | The HoloLens failure is a great example of overhyped
               | technology, just like the bunker busters that are now in
               | the headlines for overpromising.
        
               | e3bc54b2 wrote:
               | 'forcing' anybody to do anything means they don't like
               | doing it, usually because it causes them more work or
               | headache or discomfort.
               | 
               | You know, the exact opposite of what AI providers are
               | claiming it does.
        
               | sensanaty wrote:
               | > Microsoft
               | 
               | https://news.ycombinator.com/item?id=44050152
               | 
               | Very impressive indeed, not a single line of any quality
               | to be found despite them forcing it on people.
        
           | rwmj wrote:
           | Can you point to any significant open source software that
           | has any kind of significant AI contributions?
           | 
           | As an actual open source developer I'm not seeing anything. I
           | am getting bogus pull requests full of AI slop that are
           | causing problems though.
        
             | diggan wrote:
             | > Can you point to any significant open source software
             | that has any kind of significant AI contributions?
             | 
             | No, but I haven't looked. Can you?
             | 
             | As an actual open source developer too, I do get some value
             | from replacing search engine usage with LLMs that can do
             | the searching and collation for me, as long as they have
             | references I can use for diving deeper, they certainly
             | accelerate my own workflow. But I don't do "vibe-coding" or
             | use any LLM-connected editors, just my own written software
             | that is mostly various CLIs and chat-like UIs.
        
         | mellosouls wrote:
         | _There is no significant software that has been written
         | (plagiarized) by "AI"._
         | 
         | How do you know?
         | 
         | As you haven't evidenced your claim, you could start by
         | providing explicit examples of what is significant.
         | 
         | Even if you are correct, the amount of llm-assisted code is
         | increasing all the time, and we are still only a couple of
         | years in - give it time.
         | 
         |  _Why not ask the actual experts_
         | 
         | Many would regard Karpathy in the expert category I think?
        
           | rwmj wrote:
           | The AI people are the ones making the extraordinary claims
           | here.
        
           | bgwalter wrote:
           | I think you should not turn things around here. Up to 2021 we
           | had a vibrant software environment that obviously had zero
           | "AI" input. It has made companies and some developers filthy
           | rich.
           | 
           | Since "AI" became a religion, it is used as an excuse for
           | layoffs _while no serious software is written by "AI"_. The
           | "AI" people are making the claims. Since they invading a
           | functioning software environment, it is their responsibility
           | to back up their claims.
        
             | TeMPOraL wrote:
             | Still wonder what your definition of "serious software" is.
             | I kinda concur - I consider most of the webshit to be not
             | serious, but then, this is where software industry makes
             | bulk of its profits, and that space is _absolutely being
             | eaten by agentic coding_ , right now, today.
             | 
             | So if we s/serious/money-making/, you are wrong - or at
             | least about to be proven, as these things enter prod and
             | are talked about.
        
       | darqis wrote:
       | when I started coding at the age of 11 in machine code and
       | assembly on the C64, the dream was to create software that
       | creates software. Nowadays it's almost reality, almost because
       | the devil is always in the details. When you're used to write
       | code, writing code is relatively fast. You need this knowledge to
       | debug issues with generated code. However you're now telling AI
       | to fix the bugs in the generated code. I see it kind of like
       | machine code becomes overlaid with asm which becomes overlaid
       | with C or whatever higher level language, which then uses
       | dogma/methodology like MVC and such and on top of that there's
       | now the AI input and generation layer. But it's not widely
       | available. Affording more than 1 computer is a luxury. Many
       | households are even struggling to get by. When you see those what
       | 5 7 Mac Minis, which normal average Joe can afford that or does
       | even have to knowledge to construct an LLM at home? I don't. This
       | is a toy for rich people. Just like with public clouds like AWS,
       | GCP I left out, because the cost is too high and running my own
       | is also too expensive and there are cheaper alternatives that not
       | only cost less but also have way less overhead.
       | 
       | What would be interesting to see is what those kids produced with
       | their vibe coding.
        
         | diggan wrote:
         | > those kids produced with their vibe coding
         | 
         | No one, including Karpathy in this video, is advocating for
         | "vibe coding". If nothing more, LLMs paired with configurable
         | tool-usage, is basically a highly advanced and contextual
         | search engine you can ask questions. Are you not using a search
         | engine today?
         | 
         | Even without LLMs being able to produce code or act as agents
         | they'd be useful, because of that.
         | 
         | But it sucks we cannot run competitive models locally, I agree,
         | it is somewhat of a "rich people" tool today. Going by the talk
         | and theme, I'd agree it's a phase, like computing itself had
         | phases. But you're gonna have to actually watch and listen to
         | the talk itself, right now you're basically agreeing with the
         | video yet wrote your comment like you disagree.
        
         | dist-epoch wrote:
         | > This is a toy for rich people
         | 
         | GitHub copilot has a free tier.
         | 
         | Google gives you thousands of free LLM API calls per day.
         | 
         | There are other free providers too.
        
           | guappa wrote:
           | 1st dose is free
        
             | palmfacehn wrote:
             | Agreed. It is worth noting how search has evolved over the
             | years.
        
             | infecto wrote:
             | LLM APIs are pretty darn cheap for most of the developed
             | worlds income levels.
        
               | guappa wrote:
               | Yeah, because they're bleeding money like crazy now.
               | 
               | You should consider how much it actually costs, not how
               | much they charge.
               | 
               | How do people fail to consider this?
        
               | bdangubic wrote:
               | how much does it cost?
        
               | infecto wrote:
               | >You should consider how much it actually costs, not how
               | much they charge. How do people fail to consider this?
               | 
               | Sure, nobody can predict the long-term economics with
               | certainty but companies like OpenAI already have
               | compelling business fundamentals today. This isn't some
               | scooter startup praying for margins to appear; it's a
               | platform with real, scaled revenue and enterprise
               | traction.
               | 
               | But yeah, tell me more about how my $200/mo plan is
               | bankrupting them.
        
               | NitpickLawyer wrote:
               | No, there are 3rd party providers that run open-weights
               | models and they are (most likely) not bleeding money.
               | Their prices are kind of similar, and make sense in a
               | napkin-math kind of way (we looked into this when
               | ordering hardware).
               | 
               | You are correct that some providers might reduce prices
               | for market capture, but the alternatives are still cheap,
               | and some are close to being competitive in quality to the
               | API providers.
        
               | Eggpants wrote:
               | Starts with "No" then follows that up with "most likely".
               | 
               | So in other words you don't know the real answer but
               | posted anyways.
        
               | NitpickLawyer wrote:
               | That most likely is for the case where they made their
               | investment calculations wrong and they won't be able to
               | recoup their hw costs. So I think it's safe to say there
               | may be the outlier 3rd party provider that may lose money
               | in the long run.
               | 
               | But the majority of them are serving at ~ the same price,
               | and that matches to the raw cost + some profit if you
               | actually look into serving those models. And those prices
               | are still cheap.
               | 
               | So yeah, I stand by what I wrote, "most likely" included.
               | 
               | My main answer was "no, ..." because the gp post was only
               | considering the closed providers only (oai, anthropic,
               | goog, etc). But youc an get open-weight models pretty
               | cheap, and they are pretty close to SotA, depending on
               | your needs.
        
               | Eggpants wrote:
               | Just wait for the enshitencation of LLM services.
               | 
               | It going to get wild when the tech bro investors demand
               | ads be the included in responses.
               | 
               | It will be trivial for a version of AdWords where someone
               | pays for response words be replaced. "Car" replaced by
               | "Honda", variable names like "index" by
               | "this_index_variable_is_sponsered_by_coinbase" etc.
               | 
               | I'm trying to be funny with the last one but something
               | like this will be coming sooner than later. Remember,
               | google search used to be good and was ruined by bonus
               | seeking executives.
        
               | NoOn3 wrote:
               | It's cheap now. But if you take into account all the
               | training costs, then at such prices they cannot make a
               | profit in any way. This is called dumping to capture the
               | market.
        
               | infecto wrote:
               | No doubt the complete cost of training and to getting
               | where we are today has been significant and I don't know
               | how the accounting will look years from now but you are
               | just making up the rest based on feelings. We know
               | operationally OpenAI is profitable on purely the runtime
               | side, nobody knows how that will look when accounting for
               | R&D but you have no qualification to say they cannot make
               | a profit in any way.
        
               | NoOn3 wrote:
               | Yes, if you do not take into account the cost of
               | training, I think it is very likely profitable. The cost
               | of working models is not so high. This is just my opinion
               | based on open models and I admit that I have not carried
               | out accurate calculations.
        
               | guappa wrote:
               | Except they have to retrain constantly, so why would you
               | not consider the cost of training?
        
               | diggan wrote:
               | > But if you take into account all the training costs
               | 
               | Not everyone has to paid that cost, as some companies are
               | releasing weights for download and local use (like Llama)
               | and then some other companies are going even further and
               | releasing open source models+weights (like OLMo). If
               | you're a provider hosting those, I don't think it makes
               | sense to take the training cost into account when
               | planning your own infrastructure.
               | 
               | Although I don't it makes much sense personally,
               | seemingly it makes sense for other companies.
        
               | dist-epoch wrote:
               | There is no "capture" here, it's trivial to switch
               | LLM/providers, they all use OpenAI API. It's literally a
               | URL change.
        
               | jamessinghal wrote:
               | This is changing; OpenAI's newer API (Responses) is
               | required to include reasoning tokens in the context while
               | using the API, to get the reasoning summaries, and to use
               | some of the OpenAI provided tools. Google's OpenAI
               | compatibility supports Chat Completions, not Responses.
               | 
               | As the LLM developers continue to add unique features to
               | their APIs, the shared API which is now OpenAI will only
               | support the minimal common subset and many will probably
               | deprecate the compatibility API. Devs will have to rely
               | on SDKs to offer comptibility.
        
               | dist-epoch wrote:
               | It's still trivial to map to a somewhat different API.
               | Google has it's Vertex/GenAI API flavors.
               | 
               | At least for now, LLM APIs are just JSONs with a bunch of
               | prompts/responses in them and maybe some file URLs/IDs.
        
               | jamessinghal wrote:
               | It isn't necessarily difficult, but it's significantly
               | more effort than swapping a URL as I originally was
               | replying to.
        
               | lelanthran wrote:
               | > There is no "capture" here, it's trivial to switch
               | LLM/providers, they all use OpenAI API. It's literally a
               | URL change.
               | 
               | So? That's true for search as well, and yet Google has
               | been top-dog for decades _in spite of_ having worse
               | results and a poorer interface than almost all of the
               | competition.
        
         | infecto wrote:
         | This is most definitely not toys for rich people. Now perhaps
         | depending on your country it may be considered rich but I would
         | comfortably say that for most of the developed world, the costs
         | for these tools are absolutely attainable, there is a reason
         | ChatGPT has such a large subscriber base.
         | 
         | Also the disconnect for me here is I think back on the cost of
         | electronics, prices for the level of compute have generally
         | gone down significantly over time. The c64 launched around the
         | $5-600 price level, not adjusted for inflation. You can go and
         | buy a Mac mini for that price today.
        
           | bawana wrote:
           | I suspect that economies of scale are different for software
           | and hardware. With hardware, iteration results in
           | optimization of the supply chain, volume discount as the
           | marginal cost is so much less than the fixed cost, and lower
           | prices in time. The purpose of the device remains fixed. With
           | software, the software becomes ever more complex with
           | technical debt - featuritis, patches, bugs, vulnerabilities,
           | and evolution of purpose to try and capture more disparate
           | functions under one environment in an attempt to capture and
           | lock in users. Price tends to increase in time. (This
           | trajectory incidentally is the opposite of the unix
           | philosophy - having multiple small fast independent tools
           | than can be concatenated to achieve a purpose.) This results
           | in ever increasing profits for software and decreasing
           | profits for hardware at equilibrium. In the development of AI
           | we are already seeing this-first we had gpt, then chatbots,
           | then agents, now integration with existing software
           | architectures.Not only is each model ever larger and more
           | complex (RNN->transformer->multihead-> add fine tuning/LoRA->
           | add MCP), but the bean counters will find ways to make you
           | pay for each added feature. And bugs will multiply. Already
           | prompt injection attacks are a concern so now another layer
           | is needed to mitigate those.
           | 
           | For the general public, these increasing costs will
           | besubsidized by advertising. I cant wait for ads to start
           | appearring in chatGPT- it will be very insidious as the
           | advertising will be comingled with the output so there will
           | be no way to avoid it.
        
         | kordlessagain wrote:
         | Kids? Think about all the domain experts, entrepreneurs,
         | researchers, designers, and creative people who have incredible
         | ideas but have been locked out of software development because
         | they couldn't invest 5-10 years learning to code.
         | 
         | A 50-year-old doctor who wants to build a specialized medical
         | tool, a teacher who sees exactly what educational software
         | should look like, a small business owner who knows their
         | industry's pain points better than any developer. These people
         | have been sitting on the sidelines because the barrier to entry
         | was so high.
         | 
         | The "vibe coding" revolution isn't really about kids (though
         | that's cute) - it's about unleashing all the pent-up innovation
         | from people who understand problems deeply but couldn't
         | translate that understanding into software.
         | 
         | It's like the web democratized publishing, or smartphones
         | democratized photography. Suddenly expertise in the domain
         | matters more than expertise in the tools.
        
           | nevertoolate wrote:
           | It sounds too good to be true. Why do you think llm is better
           | in coding then in how education software should be designed?
        
           | pphysch wrote:
           | > These people have been sitting on the sidelines because the
           | barrier to entry was so high.
           | 
           | This comment is wildly out of touch. The SMB owner can now
           | generate some Python code. Great. Where do they deploy it?
           | How do they deploy it? How do they update it? How do they
           | handle disaster recovery? And so on and so forth.
           | 
           | LLMs accelerate only the easiest part of software
           | engineering, writing greenfield code. The remaining 80% is
           | left as an exercise to the reader.
        
             | bongodongobob wrote:
             | All the devs I work with would have to go through me to
             | touch the infra anyway, so I'm not sure I see the issue
             | here. No one is saying they need to deploy fully through
             | the stack. It's a great start for them and I can help them
             | along the way just like I would with anyone else deploying
             | anything.
        
               | pphysch wrote:
               | In other words, most of the barriers to leveraging custom
               | software are still present.
        
               | bongodongobob wrote:
               | Yes, the parts we aren't talking about that have nothing
               | to do with LLMs, ie normal business processes.
        
           | pton_xd wrote:
           | > Think about all the domain experts, entrepreneurs,
           | researchers, designers, and creative people who have
           | incredible ideas but have been locked out of software
           | development because they couldn't invest 5-10 years learning
           | to code.
           | 
           | > it's about unleashing all the pent-up innovation from
           | people who understand problems deeply but couldn't translate
           | that understanding into software.
           | 
           | This is just a fantasy. People with "incredible ideas" and
           | "pent-up innovation" also need incredible determination and
           | motivation to make something happen. LLMs aren't going to
           | magically help these people gain the energy and focus needed
           | to pursue an idea to fruition. Coding is just a detail; it's
           | not the key ingredient all these "locked out" people were
           | missing.
        
             | agentultra wrote:
             | 100% this. There have been generations of tools built to
             | help realize this idea and there is... not a lot of demand
             | for it. COBOL, BASIC, Hypercard, the wasteland of no-code
             | and low-code tools. The audience for these is incredibly
             | small.
             | 
             | A doctor has an idea. Great. Takes a lot more than a eureka
             | moment to make it reality. Even if you had a magic machine
             | that could turn it into the application you thought of. All
             | of the iterations, testing with users, refining, telemetry,
             | managing data, policies and compliance... it's a lot of
             | work. Code is such a small part. Most doctors want to do
             | doctor stuff.
             | 
             | We've had mind-blowing music production software available
             | to the masses for decades now... not a significant shift in
             | people lining up to be the musicians they always wanted to
             | be but were held back by limited access to the tools to
             | record their ideas.
        
         | kapildev wrote:
         | >What would be interesting to see is what those kids produced
         | with their vibe coding.
         | 
         | I think you are referring to what those kids in the vibe coding
         | event produced. Wasn't their output available in the video
         | itself?
        
       | yahoozoo wrote:
       | I was trying to do some reverse engineering with Claude using an
       | MCP server I wrote for a game trainer program that supports
       | Python scripts. The context window gets filled up _so_ fast. I
       | think my server is returning too many addresses (hex) when Claude
       | searches for values in memory, but it's annoying. These things
       | are so flaky.
        
       | kaycey2022 wrote:
       | I hope this excellent talk brings some much needed sense into the
       | discourse around vibe coding.
        
         | diggan wrote:
         | If anything I wished the conversation turned away from "vibe-
         | coding" which was essentially coined as a "lol look at this go"
         | thing, but media and corporations somehow picked up as "This is
         | the new workflow all developers are adopting".
         | 
         | LLMs as another tool in your toolbox? Sure, use it where it
         | makes sense, don't try to make them do 100% of everything.
         | 
         | LLMs as a "English to E2E product I'm charging for"? Lets maybe
         | make sure the thing works well as a tool before letting it be
         | responsible for stuff.
        
       | huksley wrote:
       | Vibe coding is making a LEGO furniture, getting it run on the
       | cloud is assembling the IKEA table for a busy restaurant
        
       | beacon294 wrote:
       | What is this "clerk" library he used at this timestamp to tell
       | him what to do? https://youtu.be/LCEmiRjPEtQ?si=XaC-
       | oOMUxXp0DRU0&t=1991
       | 
       | Gemini found it via screenshot or context: https://clerk.com/
       | 
       | This is what he used for login on MenuGen:
       | https://karpathy.bearblog.dev/vibe-coding-menugen/
        
         | xnx wrote:
         | That blog post is a great illustration that most of the
         | complexity/difficulty of a web app is in the hosting and not in
         | the useful code.
        
       | matiasmolinas wrote:
       | https://github.com/EvolvingAgentsLabs/llmunix
       | 
       | An experiment to explore Kaparthy ideas
        
         | bawana wrote:
         | how do i install this thing?
        
           | maleldil wrote:
           | As far as I understand, you don't. You open Claude Code
           | inside the repo and prompt `boot llmunix` inside Claude Code.
           | The CLAUDE.md file tells Claude how to respond to that.
        
             | bawana wrote:
             | Thank you for the hint. I guess I need a claude API token.
             | From the images it seems he is opening it from his default
             | directory. I sees the 'base env' so it is unclear if any
             | other packages were installed beyond the default linux. I
             | see he simply typed 'boot llmunix' so he must have
             | symlinked 'boot' to his PATH.
        
       | Aeroi wrote:
       | the fanboying for this dudes opinion is insane.
        
         | mrmansano wrote:
         | It's pastor preaching for the already converted, not new in the
         | area. The only thing new is that they are selling the kool-aid
         | this time.
        
           | Aeroi wrote:
           | It's been a multi-day like conversation where multiple people
           | are trying to obtain the transcripts, publish the text as
           | gospel, and now the video. Like, yes thank you but, holy
           | shit.
        
         | mupuff1234 wrote:
         | Yeah, not sure I ever saw anything similar on HN before, feels
         | very odd.
         | 
         | I mean the talk is fine and all but that's about it?
        
         | dang wrote:
         | Maybe so, but please don't post unsubstantive comments to
         | Hacker News.
         | 
         | (Thoughtful criticism that we can learn from is welcome, of
         | course. This is in the site guidelines:
         | https://news.ycombinator.com/newsguidelines.html.)
        
       | eitally wrote:
       | It's going to be very interesting to see how things evolve in
       | enterprise IT, especially but not exclusively in regulated
       | industries. As more SaaS services are at least partly vibe coded,
       | how are CIOs going to understand and mitigate risk? As more
       | internal developers are using LLM-powered coding interfaces and
       | become less clear on exactly how their resulting code works, how
       | will that codebase be maintained and incrementally updated with
       | new features, especially in solo dev teams (which is common)?
       | 
       | I easily see a huge future for agentic assistance in the
       | enterprise, but I struggle mightily to see how many IT leaders
       | would accept the output code of something like a menugen app as
       | production-viable.
       | 
       | Additionally, if you're licensing code from external vendors
       | who've built their own products at least partly through LLM-
       | driven superpowers, how do you have faith that they know how
       | things work and won't inadvertently break something they don't
       | know how to fix? This goes for niche tools (like Clerk, or
       | Polar.sh or similar) as much as for big heavy things (like a CRM
       | or ERP).
       | 
       | I was on the CEO track about ten years ago and left it for a new
       | career in big tech, and I don't envy the folks currently trying
       | to figure out the future of safe, secure IT in the enterprise.
        
         | gosub100 wrote:
         | > how many IT leaders would accept the output code of something
         | like a menugen app as production-viable.
         | 
         | probably all of the ones at microsoft
        
         | dapperdrake wrote:
         | Just like when all regulated industries started only using
         | decision trees and ordinary least-squares regression instead of
         | any other models.
        
         | r2b2 wrote:
         | I've found that as LLMs improve, some of their bugs become
         | increasingly slippery - I think of it as the uncanny valley of
         | code.
         | 
         | Put another way, when I cause bugs, they are often glaring
         | (more typos, fewer logic mistakes). Plus, as the author it's
         | often straightforward to debug since you already have a deep
         | sense for how the code works - you lived through it.
         | 
         | So far, using LLMs has downgraded my productivity. The bugs
         | LLMs introduce are often subtle logical errors, yet "working"
         | code. These errors are especially hard to debug when you didn't
         | write the code yourself -- now you have to learn the code as if
         | you wrote it anyway.
         | 
         | I also find it more stressful deploying LLM code. I _know in my
         | bones_ how carefully I write code, due to a decade of roughly
         | "one non critical bug per 10k lines" that keeps me asleep at
         | night. The quality of LLM code can be quite chaotic.
         | 
         | That said, I'm not holding my breath. I expect this to all flip
         | someday, with an LLM becoming a better and more stable coder
         | than I am, so I guess I will keep working with them to make
         | sure I'm proficient when that day comes.
        
           | DanHulton wrote:
           | I'm curious where that expectation of the flip comes from?
           | Your experience (and mine, frankly) would seem to indicate
           | the opposite, so from whence comes this certainty that one
           | day it'll change entirely and become reliable instead?
           | 
           | I ask (and I'll keep asking) because it really seems like the
           | prevailing narrative is that these tools have improved
           | substantially in a short period of time, and that is
           | seemingly enough justification to claim that they will
           | continue to improve until perfection because...? _waves hands
           | vaguely_
           | 
           | Nobody ever seems to have any good justification for how
           | we're going to overcome the fundamental issues with this
           | tech, just a belief that comes from SOMEWHERE that it'll
           | happen anyway, and I'm very curious to drill down into that
           | belief and see if it comes from somewhere concrete or it's
           | just something that gets said enough that it "becomes true",
           | regardless of reality.
        
           | thegeomaster wrote:
           | I have been using LLMs for coding a lot during the past year,
           | and I've been writing down my observations by task. I have a
           | _lot_ of tasks where my first entry is thoroughly impressed
           | by how e.g. Claude helped me with a task, and then the second
           | entry is a few days after when I 'm thoroughly irritated by
           | chasing down subtle and just _strange_ bugs it introduced
           | along the way. As a rule, these are incredibly hard to find
           | and tedious to debug, because they lurk in the weirdest
           | places, and the root cause is usually some weird
           | confabulation that a human brain would never concoct.
        
         | charlie0 wrote:
         | It will succeed due to the same reason other sloppy strategies
         | succeed, it has large short term gains and moves risk into the
         | nebulous future. Management LOVES these types of things.
        
       | poorcedural wrote:
       | Software 3.0 is where Engineers only create the kernel or seed of
       | an idea. Then all users are developers creating their own branch
       | using the feedback loop of their own behavior.
        
       | greybox wrote:
       | He's talking about "LLM Utility companies going down and the
       | world becoming dumber" as a sign of humanity's progress.
       | 
       | This if anything should be a huge red flag
        
         | bryanh wrote:
         | Replace with "Water Utility going down and the world becoming
         | less sanitary", etc. Still a red flag?
        
           | greybox wrote:
           | You're making leap of logic.
           | 
           | Before water sanitization technology we had no way of
           | sanitizing water on a large scale.
           | 
           | Before LLMs, we could still write software. Arguably we were
           | collectively better at it.
        
             | TeMPOraL wrote:
             | LLMs are general-purpose tools used for great many tasks,
             | most of them not related to writing code.
        
         | iLoveOncall wrote:
         | He lives in a GenAI bubble where everyone is self-
         | congratulating about the usage of LLMs.
         | 
         | The reality is that there's not a single critical component
         | anywhere that is built on LLMs. There's absolutely no reliance
         | on models, and ChatGPT being down has absolutely no impact on
         | anything beside teenagers not being able to cheat on their
         | homeworks and LLM wrappers not being able to wrap.
        
           | ukprogrammer wrote:
           | Even an LLM could tell you that that's an unknowable thing,
           | perhaps you should rely on them more.
        
             | iLoveOncall wrote:
             | Has a critical service that you used meaningfully changed
             | to seemingly integrate non-deterministic "intelligence" in
             | the past 3 years in one of its critical paths? I'd bet good
             | money that the answer to literally everyone is no.
             | 
             | My company uses GenAI a lot in a lot of projects. Would it
             | have some impact if all models suddenly stopped working?
             | Sure. But the oncalls wouldn't even get paged.
        
               | jeffnappi wrote:
               | Tesla FSD, Waymo are good examples.
        
           | bwfan123 wrote:
           | > The reality is that there's not a single critical component
           | anywhere that is built on LLMs.
           | 
           | Remember that there are billion dollar usecases where being
           | correct is not important. For example, shopping
           | recommendations, advertizing, search results, image
           | captioning, etc. All of these usecases have humans consuming
           | the output, and LLMs can play a useful role as productivity
           | boosters.
        
             | iLoveOncall wrote:
             | And none of those are crucial.
             | 
             | His point is that the world is RELIANT on GenAI. This isn't
             | true.
        
           | nlawalker wrote:
           | Adults everywhere are using it to "cheat" at work, except
           | there it's not cheating, it's celebrated and welcomed as a
           | performance enhancement because results are the only thing
           | that matter, and over time that will result in new
           | expectations for productivity.
           | 
           | It's going to take a while for those new expectations to
           | develop, and they won't develop evenly, just like how even
           | today there's plenty of low-hanging fruit in the form of
           | roles or businesses that aren't using what anyone here would
           | identify as simple opportunities for automation, and the main
           | benefit that accrues to the one guy in the office who knows
           | how to cheat with Excel and VBA is that he gets to slack off
           | most of the time. But there certainly are places where the
           | people in charge expect more, and are quick to perceive when
           | and how much that bar can be raised. They don't care if
           | you're cheating, but you'll need to keep up with the people
           | who are.
        
       | tinyhouse wrote:
       | After Cursor is sold for $3B, they should transfer Karpathy 20%.
       | (it also went viral before thanks to him tweeting about it)
       | 
       | Great talk like always. I actually disagree on a few things with
       | him. When he said "why would you go to ChatGPT and copy / paste,
       | it makes much more sense to use a GUI that is integrated to your
       | code such as Cursor".
       | 
       | Cursor and the like take a lot of the control from the user. If
       | you optimize for speed then use Cursor. But if you optimize for
       | balance of speed, control, and correctness, then using Cursor
       | might not be the best solution, esp if you're not an expert of
       | how to use it.
       | 
       | It seems that Karpathy is mainly writing small apps these days,
       | he's not working on large production systems where you cannot
       | vibe code your way through (not yet at least)
        
       | researchai wrote:
       | I can't believe I googled most of the dishes on the menu every
       | time I went to the Thai restaurant. I've just realised how
       | painful that was when I saw MenuGen!
        
       | ukprogrammer wrote:
       | Why do non-users of LLM's like to despise/belittle them so much?
       | 
       | Just don't use them, and, outcompete those who do. Or, use them
       | and outcompete those who don't.
       | 
       | Belittling/lamenting on any thread about them is not helpful and
       | akin to spam.
        
         | djeastm wrote:
         | Some people are annoyed at the hype, some are making good faith
         | arguments about the pros/cons, and some people are just cranky.
         | AI is a popular subject and we've all got our hot takes.
        
       | blixt wrote:
       | If we extrapolate these points about building tools for AI and
       | letting the AI turn prompts into code I can't help but reach the
       | conclusion that future programming languages and their runtimes
       | will be heavily influenced by the strengths and weaknesses of
       | LLMs.
       | 
       | What would the code of an application look like if it was
       | optimized to be efficiently used by LLMs and not humans?
       | 
       | * While LLMs do heavily tend towards expecting the same
       | inputs/outputs as humans because of the training data I don't
       | think this would inhibit co-evolution of novel representations of
       | software.
        
         | thierrydamiba wrote:
         | Is a world driven by the strengths and weaknesses of
         | programming languages better than the one driven by the
         | strengths and weaknesses of LLMs?
        
           | ivape wrote:
           | Better to think of it as a world driven by the strengths and
           | weaknesses of people. Is the world better if more people can
           | express themselves via software? Yes.
           | 
           | I don't believe in coincidences. I don't think the universe
           | provided AI by accident. I believe it showed up just at the
           | moment where the universe wants to make it clear - _your
           | little society of work and status and money can go straight
           | to living hell_. And that's where it's going, the developer
           | was never supposed to be a rockstar, they were always meant
           | to be creatives who do it because they like it. Fuck this job
           | bullshit, those days are over. You will program the same way
           | you play video games, it's never to be work again (it's
           | simply too creative).
           | 
           | Will the universe make it so a bunch of 12 year olds dictate
           | software in natural language in a Roblox like environment
           | that rivals the horeshit society sold for billions just a
           | decade ago? Yes, and thank god. It's been a wild ride, thank
           | you god for ending it (like he did with nuclear bombs after
           | ww2, our little universe of war _shrunk_ due to that).
           | 
           | Anyways, always pay attention to the little details, it's
           | never a coincidence. The universe doesn't just sit there and
           | watch our fiasco believe it or not, it gets involved.
        
         | mythrwy wrote:
         | It does seem a bit silly long term to have something like
         | Python which was developed as a human friendly language written
         | by LLMs.
         | 
         | If AI is going to write all the code going forward, we can
         | probably dispense with the user friendly part and just make
         | everything efficient as possible for machines.
        
           | doug_durham wrote:
           | I don't agree. Important code will need to be audited. I
           | think the language of the future will be easy to read by
           | human reviewers but deterministic. It won't be a human
           | language. Instead it will be computer language with horrible
           | ergonomics. I think Python or straight up Java would be a
           | good start. Things like templates wouldn't be necessary since
           | you could express that deterministically in a higher level
           | syntax (e.g. A list of elements that can accept any type). It
           | would be an interesting exercise.
        
           | mostlysimilar wrote:
           | If humans don't understand it to write the data the LLM is
           | trained on, how will the LLM be able to learn it?
        
         | s_ting765 wrote:
         | Given the plethora of programming languages that exist today,
         | I'm not worried at all about AI taking over SWE jobs.
        
       | tudorizer wrote:
       | 95% terrible expression of the landscape, 5% neatly dumbed down
       | analogies.
       | 
       | English is a terrible language for deterministic outcomes in
       | complex/complicated systems. Vibe coders won't understand this
       | until they are 2 years into building the thing.
       | 
       | LLMs have their merits and he sometimes aludes to them, although
       | it almost feels accidental.
       | 
       | Also, you don't spend years studying computer science to learn
       | the language/syntax, but rather the concepts and systems, which
       | don't magically disappear with vibe coding.
       | 
       | This whole direction is a cheeky Trojan horse. A dramatic
       | problem, hidden in a flashy solution, to which a fix will be
       | upsold 3 years from now.
       | 
       | I'm excited to come back to this comment in 3 years.
        
         | diggan wrote:
         | > English is a terrible language for deterministic outcomes in
         | complex/complicated systems
         | 
         | I think that you seem to be under the impression that Karpathy
         | somehow alluded to or hinted at that in his talk, which
         | indicates you haven't actually watched the talk, which makes
         | your first point kind of weird.
         | 
         | I feel like one of the stronger points he made, was that you
         | cannot treat the LLMs as something they're explicitly not, so
         | why would anyone expect deterministic outcomes from them?
         | 
         | He's making the case for coding with LLMs, not letting the LLMs
         | go by themselves writing code ("vibe coding"), and
         | understanding how they work before attempting to do so.
        
           | tudorizer wrote:
           | I watched the entire talk, quite carefully. He explicitly
           | states how excited he was about his tweet mentioning English.
           | 
           | The disclaimer you mention was indeed mentioned, although
           | it's "in one ear, out the other" with most of his audience.
           | 
           | If I give you a glazed donut with a brief asterisk about how
           | sugar can cause diabetes will it stop you from eating the
           | donut?
           | 
           | You also expect deterministic outcomes when making analogies
           | with power plants and fabs.
        
             | fifilura wrote:
             | Either way, I am not sure it is a requirement on HN to
             | read/view the source.
             | 
             | Particularly not a 40min video.
             | 
             | Maybe it is tongue-in-cheek, maybe I am serious. I am not
             | sure myself. But sometimes the interesting discussions
             | comes from what is on top of the posters mind when viewing
             | the title. Is that bad?
        
               | diggan wrote:
               | > Is that bad?
               | 
               | It doesn't have to be. But it does get somewhat boring
               | and trite after a while when you start noticing that
               | certain subjects on HN tend to attract general and/or
               | samey comments about $thing, rather than the submission
               | topic within $thing, and I do think that is against the
               | guidelines.
               | 
               | > Please don't post shallow dismissals [...] Avoid
               | generic tangents. Omit internet tropes. [...]
               | 
               | The specific part of:
               | 
               | > English is a terrible language for deterministic
               | outcomes
               | 
               | Strikes me as both as a generic tangent about LLMs, and
               | the comment as a whole feels like a shallow dismissal of
               | the entire talk, as Karpathy never claims English is a
               | good language for deterministic outcomes, nor have I
               | heard anyone else make that claim.
        
               | tudorizer wrote:
               | Might sound like a generic tangent, but it's the
               | conclusion people will leave from the talk.
        
               | diggan wrote:
               | But is it _curious_? Is it thoughtful and substantive?
               | Maybe it could have been thoughtful, if it felt like it
               | was in response to what was mentioned in the submission.
        
               | karaterobot wrote:
               | It's odd! The guidelines don't say anything about having
               | to read or watch what the posts linked to, all they say
               | is it's inappropriate to accuse someone you're responding
               | to of not having done so.
               | 
               | There is a community expectation that people will know
               | what they're talking about before posting, and in most
               | cases that means having read the article. At the same
               | time, I suspect that in many cases a lot of people
               | commenting have not actually read the thing they're
               | nominally commenting on, and they get away with it
               | because the people upvoting them haven't either.
               | 
               | However, I think it's a good idea to do so, at least to
               | make a top-level comment on an article. If you're just
               | responding to someone else's comment, I don't think it's
               | as necessary. But to stand up and make a statement about
               | something you know nothing about seems buffoonish and
               | would not, in general, elevate the level of discussion.
        
               | tudorizer wrote:
               | I accept any equivalents of reading comprehension tests
               | to prove thay I watched the video, as I have many of
               | Andrej's in the past. He's generally a good communicator,
               | defo easy to follow.
        
             | diggan wrote:
             | I think this is the moment you're referring to?
             | https://youtu.be/LCEmiRjPEtQ?si=QWkimLapX6oIqAjI&t=236
             | 
             | > maybe you've seen a lot of GitHub code is not just like
             | code anymore there's a bunch of like English interspersed
             | with code and so I think kind of there's a growing category
             | of new kind of code so not only is it a new programming
             | paradigm it's also remarkable to me that it's in our native
             | language of English and so when this blew my mind a few uh
             | I guess years ago now I tweeted this and um I think it
             | captured the attention of a lot of people and this is my
             | currently pinned tweet uh is that remarkably we're now
             | programming computers in English now
             | 
             | I agree that it's remarkable that you can tell a computer
             | "What is the biggest city in Maresme?" and it tries to
             | answer that question. I don't think he's saying "English is
             | the best language to make complicated systems uncomplicated
             | with", or anything to that effect. Just like I still think
             | "Wow, this thing is fucking flying" every time I sit
             | onboard a airplane, LLMs are kind of incredible in some
             | ways, yet so "dumb" in some other ways. It sounds to me
             | like he's sharing a similar sentiment but about LLMs.
             | 
             | > although it's "in one ear, out the other" with most of
             | his audience.
             | 
             | Did you talk with them? Otherwise this is just creating an
             | imaginary argument against some people you just assume they
             | didn't listen.
             | 
             | > If I give you a glazed donut with a brief asterisk about
             | how sugar can cause diabetes will it stop you from eating
             | the donut?
             | 
             | If I wanted to eat a donut at that point, I guess I'd eat
             | it anyways? But my aversion to risk (or rather the lack of
             | it) tend to be non-typical.
             | 
             | What does my answer mean in the context of LLMs and non-
             | determinism?
             | 
             | > You also expect deterministic outcomes when making
             | analogies with power plants and fabs.
             | 
             | Are you saying that the analogy should be deterministic or
             | that power plants and fabs are deterministic? Because I
             | don't understand if the former, and the latter really isn't
             | deterministic by any definition I recognize that word by.
        
               | tudorizer wrote:
               | > Did you talk with them? Otherwise this is just creating
               | an imaginary argument against some people you just assume
               | they didn't listen.
               | 
               | I have, unfortunately. Start-up founders, managers,
               | investors who taunt the need for engineers because "AI
               | can fix it".
               | 
               | Don't get me wrong, there are plenty of "stochastic
               | parrot" engineers even without AI, but still, not enough
               | to make blanket statements.
        
               | diggan wrote:
               | That's a lot of people to talk to in a day more or less,
               | since the talk happened. Were they all there and you too,
               | or you all had a watch party or something?
               | 
               | Still, what's the outcome of our "glazed donut" argument,
               | you got me curious what that would lead to. Did I die of
               | diabetes?
        
               | jbeninger wrote:
               | I think the analogy is that vibe coding is bad for you
               | but feels good. Like a donut.
               | 
               | But I'd say the real situation is more akin to "if you
               | eat this donut quickly, you might get diabetes, but if
               | you eat it slowly, it's fine", which is a bad analogy,
               | but a bit more accurate.
        
               | tudorizer wrote:
               | > That's a lot of people to talk to in a day more or
               | less, since the talk happened. Were they all there and
               | you too, or you all had a watch party or something?
               | 
               | hehe, I wish.
               | 
               | The topics in the talk are not new. They have been
               | explored and pondered up for quite a while now.
               | 
               | As for the outcome of the donut experiment, I don't know.
               | You tell me. Apply it repeatedly at a big scale and see
               | if you should alter the initial offer for best outcomes
               | (as relative as "best" might be).
        
               | diggan wrote:
               | > The topics in the talk are not new.
               | 
               | Sure, but your initial dismissal ("95% X, 5% Y") is
               | literally about this talk no? And when you say 'it's "in
               | one ear, out the other" with most of his audience' that's
               | based on some previous experience, rather than the talk
               | itself? I guess I got confused what applied to what
               | event.
               | 
               | > As for the outcome of the donut experiment, I don't
               | know. You tell me. Apply it repeatedly at a big scale and
               | see if you should alter the initial offer for best
               | outcomes (as relative as "best" might be).
               | 
               | Maybe I'm extra slow today, how does this tie into our
               | conversation so far? Does it have anything to do with
               | determinism or what was the idea behind bringing it up?
               | I'm afraid you're gonna have to spell it out for me,
               | sorry about that :)
        
             | pama wrote:
             | Your experience with fabs must be somewhat limited if you
             | think that the state of the art in fabs produces
             | deterministic results. Please lookup (or ask friends) for
             | the typical yields and error mitigation features of modern
             | chips and try to visualize if you think it is possible to
             | have determinism when the density of circuits starts to
             | approach levels that cannot be imspected with regular
             | optical microscopes anymore. Modern chip fabrication is
             | closer to LLM code in even more ways than what is presented
             | in the video.
        
               | whilenot-dev wrote:
               | > Modern chip fabrication is closer to LLM code
               | 
               | As is, I don't quite understand what you're getting at
               | here. Please just think that through and tell us what
               | happens to the yield ratio when the software running on
               | all those photolithography machines wouldn't be
               | deterministic.
        
               | kadushka wrote:
               | An output of a fab, just like an output of an LLM, is
               | non-deterministic, but is good enough, or is being
               | optimized to be good enough.
               | 
               | Non-determinism is not the problem, it's the quality of
               | the software that matters. You can repeatedly ask me to
               | solve a particular leetcode puzzle, and every time I
               | might output a slightly different version. That's fine as
               | long as the code solves the problem.
               | 
               | The software running on the machines (or anywhere) just
               | needs to be better (choose your metric here) than the
               | software written by humans. Software written by GPT-4 is
               | better than software written by GPT-3.5, and the software
               | written by o3 is better than software written by GPT-4.
               | That's just the improvement from the last 3 years, and
               | there's a massive, trillion-dollar effort worldwide to
               | continue the progress.
        
               | whilenot-dev wrote:
               | Hardware always involves some level of non-determinism,
               | because the physical world is messier than the virtual
               | software world. Every hardware engineer accepts that and
               | learns how to design solutions despite those constraints.
               | But you're right, non-determinism is not the current
               | problem in _some_ fabs, because the whole process has
               | been modeled with it in mind, and it 's the yield ratio
               | that needs to be deterministic enough to offer a service.
               | Remember the struggles in Intels fabs? Revenue reflects
               | that at fabs.
               | 
               | The software quality at companies like ASML seems to be
               | in a bad shape already, and I remember ex-employees
               | stating that there are some team leads higher up who can
               | at least reason about existing software procedures, their
               | implementation, side effects and their outcomes. Do you
               | think this software is as thoroughly documented as some
               | open source project? The purchase costs for those
               | machines are in the mid-3-digit million range (operating
               | costs excluded) and are expected to run 24/7 to be
               | somewhat worthwhile. Operators can handle hardware issues
               | on the spot and work around them, but what do you think
               | happens with downtime due to non-deterministic software
               | issues?
        
               | tudorizer wrote:
               | Fair. No process is 100% efficient and the depths of many
               | topics become ambiguous to the point where margins of
               | error need to be introduced.
               | 
               | Chip fabs are defo far into said depths.
               | 
               | Must we apply this at more shallow levels too?
        
         | m3kw9 wrote:
         | Like biz logic requirements they need to be fine grained
         | defined
        
         | oc1 wrote:
         | AI is all about context window. If you figured out the context
         | problem, you will see that all these "AI is bullshit, it
         | doesn't work and can't produce working code" goes away. Same
         | for everything else.
        
           | tudorizer wrote:
           | Working code or not is irelevant. Heck, even human-in-loop
           | (Tony-in-the-Iron-Man) is not actively the point. If we're
           | going into "it's all about" territory then it's all about:
           | 
           | - training data - approximation of the desired outcome
           | 
           | Neither support a good direction for the complexity of some
           | of the system around us, most of which require dedicated
           | language. Imagine doing calculus or quantum physics in
           | English. Novels of words would barely suffice.
           | 
           | So a context window as big as the training data itself?
           | 
           | What if the training data is faulty?
           | 
           | I'm confident you understand that working code or not doesn't
           | matter in this analogy. Neither does LLMs reaching out for
           | the right tool.
           | 
           | LLMs has its merits. Replacing concrete systems that require
           | a formal language and grammar is not.
           | 
           | `1 + 1 = 2` because that's how maths works, not because of
           | deja vu.
        
             | gardenhedge wrote:
             | Tony is iron man, not in him
        
               | tudorizer wrote:
               | Sure, I wasn't sure how to call the robot layer. Is is
               | "Iron Main Suit"?
        
           | cobertos wrote:
           | Untrue. I find problems with niche knowledge, heavy math,
           | and/or lack of good online resources to be troublesome for
           | AI. Examples so far I've found of consistent struggle points
           | are shaders, parsers, and streams (in Nodejs at least)
           | 
           | Context window will solve a class of problems, but will not
           | solve all problems with AI.
        
         | belter wrote:
         | You just described Software 4.0...
        
           | tudorizer wrote:
           | Can we have it now and skip 3.0?
        
         | strangescript wrote:
         | Who said I wanted my outcomes to be deterministic. Why is it
         | that the only way we accept programming is for completely
         | deterministic outcomes, when the reality is that is an
         | implementation detail.
         | 
         | I am a real user and I am on a general purpose e-commerce site
         | and my ask is "I want a TV that is not that expensive", then by
         | definition the user request is barely deterministic. User
         | requests are normally like this for any application. High level
         | and vague at best. Then developers spend all their time on edge
         | cases, user QA, in the weeds junk that the User does not care
         | about at all. People dont want to click filters and fill out
         | forms for your app. They want it to be easy.
        
           | tudorizer wrote:
           | Agreed. This e-commerce example is quite a good highlight for
           | LLMs.
           | 
           | Same can't be applied when your supplier needs 300 68 x 34 mm
           | gaskets by the BS10 standard, to give a random, more precise
           | example.
        
         | rudedogg wrote:
         | > English is a terrible language for deterministic outcomes in
         | complex/complicated systems.
         | 
         | Someone here shared this ancient article by Dijkstra about this
         | exact thing a few weeks ago:
         | https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...
        
           | tudorizer wrote:
           | TIL. Thanks for sharing
        
         | brainless wrote:
         | I am not sure I got your point about English. I thought
         | Karpathy was talking about English being the language of
         | prompts, not output. Outputs can be English but if the goal is
         | to compute using the output, then we need structured output
         | (JSON, snippets of code, etc.), not English.
        
           | tudorizer wrote:
           | Entertain me in an exercise:
           | 
           | First, instruct a friend/colleague of how to multiply two 2
           | digit numbers in plain English.
           | 
           | Secondly (ideally with a different friend, to not contaminate
           | tests), explain the same but using only maths formulas.
           | 
           | Where does the prompting process start and where does it end?
           | Is it a one-off? Is the prompt clear enough? Do all the
           | parties involved communicate within same domain objects?
           | 
           | Hopefully my example is not too contrived.
        
             | barumrho wrote:
             | I agree with your point about English, but LLMs are not
             | limited to English. You can show them formulas, images,
             | code, etc.
        
         | poorcedural wrote:
         | Time is a funny calculator, measuring how an individual is
         | behind. And in the funny circumstance that an individual is
         | human, they look back on this comment in 3 years and wonder why
         | humans only see themselves.
        
         | qjack wrote:
         | While I agree with you broadly, remember that those that employ
         | you don't have those skills either. They accept that they are
         | ceding control of the details and trust us to make those
         | decisions or ask clarifying questions (LLMs are getting better
         | at those things too). Vibe coders are clients seeking an
         | alternative, not developers.
        
           | tudorizer wrote:
           | > Vibe coders are clients seeking an alternative, not
           | developers.
           | 
           | Agreed. That's genuinely a good framing for clients.
        
           | unshavedyak wrote:
           | Maybe i'm not "vibing" enough, but i've actually been testing
           | this recently. So far i think the thing "vibing" helps most
           | with for me personally is just making decisions which i'm
           | often too tired to do after work.
           | 
           | I've been coming to the realization that working with LLMs
           | offer a different set of considerations than working on your
           | own. Notably i find that i often obsess about design, code
           | location, etc because if i get it wrong then my precious
           | after-work time and energy are wasted on refactoring. The
           | larger the code base, the more crippling this becomes for me.
           | 
           | However refactoring is almost not an issue with LLMs. They do
           | it very quickly and aggressively. So the areas i'm not vibing
           | on is just reviewing, and ensuring it isn't committing any
           | insane sins. .. because it definitely will. But the structure
           | i'm accepting is far from what i'd make myself. We'll see how
           | this pans out long term for me, but it's a strategy that i'm
           | exploring.
           | 
           | On the downside, my biggest difficulty with LLMs is getting
           | them to just.. not. To produce less. Choosing too large of
           | tasks is very easy and the code can snowball before you have
           | a chance to pump the breaks and course correct.
           | 
           | Still, it's been a positive experience so far. I still
           | consider it vibing though because i'm accepting far less
           | quality work than what i'd normally produce. In areas where
           | it matters though, i enforce correctness, and have to review
           | everything as a result.
        
         | serjester wrote:
         | I think you're straw manning his argument.
         | 
         | He explicitly says that both LLMs and traditional software have
         | very important roles to play.
         | 
         | LLMs though are incredibly useful when encoding the behavior of
         | the system deterministically is impossible. Previously this
         | fell under the umbrella of problems solved with ML. This would
         | take a giant time investment and a highly competent team to
         | pull off.
         | 
         | Now anyone can solve many of these same problems with a single
         | API call. It's easy to wave this off, but this a total paradigm
         | shift.
        
       | kypro wrote:
       | I know we've had thought leaders in tech before, but am I the
       | only one who is getting a bit fed up by practically anything a
       | handful of people in the AI space say being circulated everywhere
       | in tech spaces at the moment?
        
         | danny_codes wrote:
         | No it's incredibly annoying I agree.
         | 
         | The hype hysteria is ridiculous.
        
         | dang wrote:
         | If there are lesser-known voices who are as interesting as
         | karpathy or simonw (to mention one other example), I'd love to
         | know who they are so we can get them into circulation on HN.
        
       | jes5199 wrote:
       | okay I'm practicing my new spiel:
       | 
       | this focus on coding is the wrong level of abstraction
       | 
       | coding is no longer the problem. the problem is getting the right
       | context to the coding agent. this is much, much harder
       | 
       | "vibe coding" is the new "horseless carriage"
       | 
       | the job of the human engineer is "context wrangling"
        
         | diggan wrote:
         | > coding is no longer the problem.
         | 
         | "Coding" - The art of literally using your fingers to type
         | weird characters into a computer, was never a problem
         | developers had.
         | 
         | The problem has always been understanding and communication,
         | and neither of those have been solved at this moment. If
         | anything, they have gotten even more important, as usually
         | humans can infer things or pick up stuff by experience, but
         | LLMs cannot, and you have to be very precise and exact about
         | what you're telling them.
         | 
         | And so the problem remains the same. "How do I communicate what
         | I want to this person, while keeping the context as small as
         | possible as to not overflow, yet extensive enough to cover
         | everything?" except you're sending it to endpoint A instead of
         | endpoint B.
        
           | ofjcihen wrote:
           | I'd take it a step further honestly. You need to be precise
           | and exact but you also have to have enough domain knowledge
           | to know when the LLM is making a huge mistake.
        
             | diggan wrote:
             | > you also have to have enough domain knowledge
             | 
             | I'm a bit 50/50 on this. Generally I agree, how are you
             | supposed to review it otherwise? Blindly accepting whatever
             | the LLM tells you or gives you is bound to create trouble
             | in the future, you still need to understand and think about
             | what the thing you're building is, and how to
             | design/architect it.
             | 
             | I love making games, but I'm also terrible at math.
             | Sometimes, I end up out of my depth, and sometimes it could
             | take me maybe a couple of days to solve something that
             | probably would be trivial for a lot of people. I try my
             | best to understand the fundamentals and the theory behind
             | it, but also not get lost in rabbit holes, but it's still
             | hard, for whatever reason.
             | 
             | So I end up using LLMs sometimes to write small utility
             | functions used in my games for specific things. It takes a
             | couple of minutes. I know exactly what I want to pass into
             | it, and what I want to get back, but I don't necessarily
             | understand 100% of the math behind it. And I think I'm
             | mostly OK with this, as long as I can verify that the
             | expected inputs get the expected outputs, which I usually
             | do with unit or E2E tests.
             | 
             | Would I blindly accept information about nuclear reactors,
             | another topic I don't understand much about? No, I'd still
             | take everything a LLM outputs with a "grain of probability"
             | because that's how they work. Would I blindly accept it if
             | I can guarantee that for my particular use case, it gives
             | me what I expect from it? Begrudgingly, yeah, because I
             | just wanna create games and I'm terrible at math.
        
               | ofjcihen wrote:
               | Oh yeah definitely. The context matters.
               | 
               | For making CRUD apps or anything that doesn't involve
               | security or stores sensitive information I 100 percent
               | agree it's fine.
               | 
               | The issue I see is that we get some people storing
               | extremely sensitive info in apps made with these and they
               | don't know enough to verify the security of it. They'll
               | ask the LLM "is it secure?" But it doesn't matter if they
               | don't know it's not BSing
        
       | ldenoue wrote:
       | Full playable transcript
       | https://www.appblit.com/scribe?v=LCEmiRjPEtQ
        
         | swyx wrote:
         | slides:
         | https://docs.google.com/presentation/d/1sZqMAoIJDxz79cbC5ap5...
        
       | alightsoul wrote:
       | It's interesting to see people here and on Blind are more wary?
       | of AI than people in say, Reddit or Youtube comments
        
         | sponnath wrote:
         | Reddit and YouTube are such huge social media platforms that it
         | really depends on which bubble (read: subreddits/yt channels)
         | you're looking at. There's the "AGI is here" people over at
         | r/singularity and then the "AI is useless" people at
         | r/programming. I'm simplifying arguments from both sides here
         | but you get my point.
        
           | alightsoul wrote:
           | Even looking at r/programming I felt they were less wary of
           | AI, or even comparing the comments here vs those on YouTube
           | for this video
        
       | lubujackson wrote:
       | Generally, people behind big revolutionary tech are the worst
       | suited for understanding how it will do "in the wild". Forest for
       | the trees and all that.
       | 
       | Some good nuggets in this talk, specifically his concept that
       | Software 1.0, 2.0 and 3.0 will all persist and all have unique
       | use cases. I definitely agree with that. I disagree with his
       | belief that "anyone can vibe code" mindset - this works to a
       | certain level of fidelity ("make an asteroids clone") but what he
       | overlooks is his ability, honed over many years, to precisely
       | document requirements that will translate directly to code that
       | works in an expected way. If you can't write up a Jira epic that
       | covers all bases of a project, you probably can't vibe code
       | something beyond a toy project (or an obvious clone). LLM code
       | falls apart under its own weight without a solid structure, and I
       | don't think that will ever fundamentally change.
       | 
       | Where we are going next, and a lot of effort is being put behind,
       | is figuring out exactly how to "lengthen the leash" of AI through
       | smart framing, careful context manipulation and structured
       | requests. We obviously can have anyone vibe code a lot further if
       | we abstract different elements into known areas and simply allow
       | LLMs to stitch things together. This would allow much larger
       | projects with a much higher success rate. In other words, I
       | expect an AI Zapier/Yahoo Pipes evolution.
       | 
       | Lastly, I think his concept of only having AI pushing "under 1000
       | line PRs" that he carefully reviews is more short-sighted. We are
       | very, very early in learning how to control these big stupid
       | brains. Incrementally, we will define sub-tasks that the AI can
       | take over completely without anyone ever having to look at the
       | code, because the output will always be within an accepted and
       | tested range. The revolution will be at the middleware level.
        
         | AlexCoventry wrote:
         | I've seen evidence of "anyone can vibe code", but at this stage
         | the result tends to be a 5,000-line application intricately
         | entangled with 500,000 lines of irrelevant slop. Still, the
         | wonder is that the bear can dance at all. That's a new thing
         | under the sun.
        
           | nsagent wrote:
           | Having worked with game designers writing code for their
           | missions/levels in a scripting language, I'd say this has
           | been the case for quite a long while.
           | 
           | They start with the code from another level, then modify it
           | until it seems to do what they want. During the alpha testing
           | phase, we'd have a programmer read through the code and
           | remove all the useless cruft and fix any associated bugs.
           | 
           | In some sense that's what vibe coding with an AI is like if
           | you don't know how to code. You have the AI make some initial
           | set of code that you can't evaluate for correctness, then
           | slowly modify it until it seems to behave generally like you
           | want. You might even learn to recognize a few things in the
           | code over time, at which point you can directly change some
           | variables or structures in the code directly.
        
             | AlexCoventry wrote:
             | I'm not kidding about the orders of magnitude, though. It's
             | been literally roughly 100 lines to per line required to
             | competently implement the app. It doesn't seem economically
             | feasible to me, at this stage. I would prefer to just
             | rewrite. (I know it's a common bias.)
        
         | jmsdnns wrote:
         | There is another angle to this too.
         | 
         | Prior to LLMs, it was amusing to consider how ML folks and
         | software folks would talk passed each other. It was amusing
         | because both sides were great at what they do, neither side
         | understood the other side, and they had to work together
         | anyway.
         | 
         | After LLMs, we now have lots of ML folks talking about the
         | future of software, so ething previously established to be so
         | outside their expertise that communication with software
         | engineers was an amusing challenge.
         | 
         | So I must ask, are ML folks actually qualified to know the
         | future of software engineering? Shouldnt we be listening to
         | software engineers instead?
        
           | abeppu wrote:
           | This seems to be overstating the separation. For people doing
           | applied ML, there's often been a dual responsibility that
           | included a significant amount of software engineering. I
           | wouldn't necessarily listen to such declarations from an ML
           | researcher whose primary output is papers, but from ML
           | engineers who have built and shipped
           | products/services/libraries I think it's much more
           | reasonable.
        
           | tomrod wrote:
           | > So I must ask, are ML folks actually qualified to know the
           | future of software engineering?
           | 
           | Probably not CRUD apps typical to back office or website
           | software, but don't forget that ML folks come from the stock
           | of people that built Apollo, Mars Landers, etc. Scientific
           | computing shares some significant overlap with SWE, and ML is
           | a subset of that.
           | 
           | IMHO, the average SWE and ML person are different types when
           | it comes to how they cargocult develop, but the top 10% show
           | significant understanding and re speed across domains.
        
         | superconduct123 wrote:
         | Where was he was saying you could vibe code beyond a simple
         | app?
         | 
         | He even said it could be a gateway to actual programming
        
       | raffael_de wrote:
       | I'm a little surprised at how negative he is towards textual
       | interfaces and text for representing information.
        
       | j45 wrote:
       | It's interesting how researchers are ahead on some insights and
       | introducing them, and it feels like some are new to them but it
       | might already exist and they're helping present them to the
       | world.
       | 
       | A positive video all around, have got to learn a lot from
       | Andrej's Youtube account.
       | 
       | LLMs are really strange, I don't know if I've seen a technology
       | where the technology class that applies it (or can verify
       | applicability) has been so separate or unengaged compared to the
       | non-technical people looking to solve problems.
        
       | whilenot-dev wrote:
       | I watched Karpathy's _Intro to Large Language Models_ [0] not so
       | long ago and must say that I'm a bit confused by this
       | presentation, and it's a bit unclear to me what it adds.
       | 
       | 1,5 years ago he saw all the tool uses in agent systems as the
       | future of LLMs, which seemed reasonable to me. There was (and
       | maybe still is) potential for a lot of business cases to be
       | explored, but every system is defined by its boundaries
       | nonetheless. We still don't know all the challenges we face at
       | that boundaries, whether these could be modelled into a virtual
       | space, handled by software, and therefor also potentially AI and
       | businesses.
       | 
       | Now it all just seems to be analogies and what role LLMs could
       | play in our modern landscape. We should treat LLMs as
       | encapsulated systems of their own ...but sometimes an LLM becomes
       | the operating system, sometimes it's the CPU, sometimes it's the
       | mainframe from the 60s with time-sharing, a big fab complex, or
       | even outright electricity itself?
       | 
       | He's showing an iOS app, which seems to be, sorry for the
       | dismissive tone, an example for a better looking counter. This
       | demo app was in a presentable state for a demo after a day, and
       | it took him a week to implement Googles OAuth2 stuff. Is that
       | somehow exciting? What was that?
       | 
       | The only way I could interpret this is that it just shows a big
       | divide we're currently in. LLMs are a final API product for some,
       | but an unoptimized generative software-model with sophisticated-
       | but-opaque algorithms for others. Both are utterly in need for
       | real world use cases - the product side for the fresh training
       | data, and the business side for insights, integrations and
       | shareholder value.
       | 
       | Am I all of a sudden the one lacking imagination? Is he just
       | slurping the CEO cool aid and still has his investments in
       | OpenAI? Can we at least agree that we're still dealing with
       | software here?
       | 
       | [0]: https://www.youtube.com/watch?v=zjkBMFhNj_g
        
         | bwfan123 wrote:
         | > Am I all of a sudden the one lacking imagination?
         | 
         | No, The reality of what these tools can do is sinking in.. The
         | rubber is meeting the road and I can hear some screaching.
         | 
         | The boosters are in 5 stages of grief coming to terms with what
         | was once AGI and is now a mere co-pilot, while the haters are
         | coming to terms with the fact that LLMs can actually be useful
         | in a variety of usecases.
        
           | acedTrex wrote:
           | I actually quite agree with this, there is some reckoning on
           | both sides happening. It's quite entertaining to watch, a bit
           | painful as well of course as someone who is on the "they are
           | useless" side and is noticing some very clear usecases where
           | a value add is present.
        
             | natebc wrote:
             | I'm with you. I give several of 'em a shot a few times a
             | week (thanks Kagi for the fantastic menu of choices!). Over
             | the last quarter or so I've found that the bullshit:useful
             | ratio is creeping to the useful side. They still answer
             | like a high school junior writing a 5 paragraph essay but a
             | decade of sifting through blogspam has honed my own ability
             | to cut through that.
        
               | diggan wrote:
               | > but a decade of sifting through blogspam has honed my
               | own ability to cut through that.
               | 
               | Now, a different skill need to be honed :) Add "Be
               | concise and succinct without removing any details" to
               | your system prompt and hopefully it can output its text
               | slightly better.
        
             | Joel_Mckay wrote:
             | In general, the functional use-case traditionally covered
             | by basic heuristics is viable for a reasoning LLM. These
             | are useful for search. media processing, and language
             | translation.
             | 
             | LLM is not AI, and never was... and while the definition
             | has been twisted in marketing BS it does not mean either
             | argument is 100% correct or in err.
             | 
             | LLM is now simply a cult, and a rather old one dating back
             | to the 1960s Lisp machines.
             | 
             | Have a great day =3
        
               | johnxie wrote:
               | LLMs aren't perfect, but calling them a "cult" misses the
               | point. They're not just fancy heuristics, they're
               | general-purpose function approximators that can reason,
               | plan, and adapt across a huge range of tasks with zero
               | task-specific code.
               | 
               | Sure, it's not AGI. But dismissing the progress as just
               | marketing ignores the fact that we're already seeing them
               | handle complex workflows, multi-step reasoning, and real-
               | time interaction better than any previous system.
               | 
               | This is more than just Lisp nostalgia. Something real is
               | happening.
        
               | Joel_Mckay wrote:
               | Sure, I have seen the detrimental impact on some teams,
               | and it does not play out as Marketers suggest.
               | 
               | The trick is in people seeing meaning in well structured
               | nonsense, and not understanding high dimension vector
               | spaces simply abstracting associative false equivalency
               | with an inescapable base error rate.
               | 
               | I wager Neuromorphic computing is likely more viable than
               | LLM cults. The LLM subject is incredibly boring once your
               | tear it apart, and less interesting than watching Opuntia
               | cactus grow. Have a wonderful day =3
        
           | anothermathbozo wrote:
           | > The reality of what these tools can do is sinking in
           | 
           | It feels premature to make determinations about how far this
           | emergent technology can be pushed.
        
             | Joel_Mckay wrote:
             | The cognitive dissonance is predictable.
             | 
             | Now hold my beer, as I cast a superfluous rank to this
             | trivial 2nd order Tensor, because it looks awesome wasting
             | enough energy to power 5000 homes. lol =3
        
           | pera wrote:
           | Exactly! What skeptics don't get is that AGI is already here
           | and we are now starting a new age of infinite prosperity,
           | it's just that exponential growth looks flat at first,
           | obviously...
           | 
           | Quantum computers and fusion energy are basically solved
           | problems now. Accelerate!
        
             | hn_throwaway_99 wrote:
             | This sounds like clear satire to me, but at this point I
             | really can't tell.
        
           | hn_throwaway_99 wrote:
           | > The boosters are in 5 stages of grief coming to terms with
           | what was once AGI and is now a mere co-pilot, while the
           | haters are coming to terms with the fact that LLMs can
           | actually be useful in a variety of usecases.
           | 
           | I couldn't agree with this more. I often get frustrated
           | because I feel like the loudest voices in the room are so
           | laughably extreme. One on side you have the "AGI cultists",
           | and on the other you have the "But the hallucinations!!!"
           | people. I've personally been pretty amazed by the state of AI
           | (nearly all of this stuff was the domain of Star Trek just a
           | few years ago), and I get tons of value out of many of these
           | tools, but at the same time I hit tons of limitations and I
           | worry about the long-term effect on society (basically, I
           | think this "ask AI first" approach, especially among young
           | people, will kinda turn us all into idiots, similar to the
           | way Google Maps made it hard for most of us to remember the
           | simple directions). I also can't help but roll my eyes when I
           | hear all the leaders of these AI companies going on about how
           | AI will make a "white collar bloodbath" - there is some
           | nuggets of truth in that, but these folks are just using
           | scare tactics to hype their oversold products.
        
         | Workaccount2 wrote:
         | The fundamental mistake I see is people applying LLMs to the
         | current paradigm of software; enormous hulking codebases made
         | to have as many features as possible to appeal to as many users
         | as possible.
         | 
         | LLMs are excellent at helping non-programmers write narrow use
         | case, bespoke programs. LLMs don't need to be able to one-shot
         | excel.exe or Plantio.apk so that Christine can easily track
         | when she watered and fed her plants nutrients.
         | 
         | The change that LLMs will bring to computing is much deeper
         | than Garden Software trying to slot in some LLM workers to work
         | on their sprawling feature-pack Plantio SaaS.
         | 
         | I can tell you first hand I have already done this numerous
         | times as a non-programmer working a non-tech job.
        
           | skydhash wrote:
           | The thing is that there's a need to integrate all these
           | little tools because the problems they solve is part of the
           | same domain. And that's where problems lie. Something like
           | Excel have an advantage as being a common platform for both
           | data and procedures. Unix adopted text and pipes for
           | integration.
        
         | demosthanos wrote:
         | What you're missing is the audience.
         | 
         | This talk is different from his others because it's directed at
         | aspiring startup founders. It's about how we conceptualize the
         | place of an LLM in a new business. It's designed to provide a
         | series of analogies any one of which which may or may not help
         | a given startup founder to break out of the tired, binary
         | talking points they've absorbed from the internet ("AI all the
         | things" vs "AI is terrible") in favor of a more nuanced
         | perspective of the role of AI in their plans. It's soft and
         | squishy rhetoric because it's not about engineering, it's about
         | business and strategy.
         | 
         | I honestly left impressed that Karpathy has the dynamic range
         | necessary to speak to both engineers and business people, but
         | it also makes sense that a lot of engineers would come out of
         | this very confused at what he's on about.
        
           | whilenot-dev wrote:
           | I get that, motivating young founders is difficult, and I
           | think he has a charming geeky way of provoking some thoughts.
           | But on the other hand: Why mainframes with time-sharing from
           | the 60s? Why operating systems? LLMs to tell you how to boil
           | an egg, seriously?
           | 
           | Putting my engineering hat on, I understand his idea of the
           | "autonomy slider" as lazy workaround for a software
           | implementation that deals with _one_ system boundary. He
           | should aspire people there to seek out for unknown
           | boundaries, not provide implementation details to existing
           | boundaries. His _MenuGen_ app would probably be better off
           | using a web image search instead of LLM image generation.
           | Enhancing deployment pipelines with LLM setups is something
           | for the last generation of DevOps companies, not the next
           | one.
           | 
           | Please mention just once the value proposition and
           | responsibilities when handling large quantities of valuable
           | data - LLMs wouldn't exist without them! What makes quality
           | data for an LLM, or personal data?
        
         | westoncb wrote:
         | > and must say that I'm a bit confused by this presentation,
         | and it's a bit unclear to me what it adds.
         | 
         | I think the disconnect might come from the fact that Karpathy
         | is speaking as someone who's day-to-day computing work has
         | already been radically transformed by this technology (and he
         | interacts with a ton of other people for whom this is the
         | case), so he's not trying to sell the possibility of it: that
         | would be like trying to sell the possibility of an airplane for
         | someone who's already just cruising around in one every day.
         | Instead the mode of the presentation is more: well, here we are
         | at the dawn of a new era of computing, it really happened. Now
         | how can we relate this to the history of computing to
         | anticipate where we're headed next?
         | 
         | > ...but sometimes an LLM becomes the operating system,
         | sometimes it's the CPU, sometimes it's the mainframe from the
         | 60s with time-sharing, a big fab complex, or even outright
         | electricity itself?
         | 
         | He uses these analogies in clear and distinct ways to
         | characterize separate facets of the technology. If you were
         | unclear on the meanings of the separate analogies it seems like
         | the talk may offer some value for you after all but you may be
         | missing some prerequisites.
         | 
         | > This demo app was in a presentable state for a demo after a
         | day, and it took him a week to implement Googles OAuth2 stuff.
         | Is that somehow exciting? What was that?
         | 
         | The point here was that he'd built the core of the app within a
         | day without knowing the Swift language or ios app dev ecosystem
         | by leveraging LLMs, but that part of the process remains old-
         | fashioned and blocks people from leveraging LLMs as they can
         | when writing code--and he goes on to show concretely how this
         | could be improved.
        
       | wiremine wrote:
       | I spent a lot of time thinking about this recently. Ultimately,
       | English is not a clean, deterministic abstraction layer. This
       | isn't to say that LLMs aren't useful, and can create some great
       | efficiencies.
        
         | npollock wrote:
         | no, but a subset of English could be
        
           | freehorse wrote:
           | Thought we already had that?
        
           | 4gotunameagain wrote:
           | Let me introduce to you.. python ;)
        
           | axxto wrote:
           | You just invented programming languages, halfway
        
       | mkw5053 wrote:
       | This DevOps friction is exactly why I'm building an open-source
       | "Firebase for LLMs." The moment you want to add AI to an app,
       | you're forced to build a backend just to securely proxy API calls
       | --you can't expose LLM API keys client-side. So developers who
       | could previously build entire apps backend-free suddenly need
       | servers, key management, rate limiting, logging, deployment...
       | all just to make a single OpenAI call. Anyone else hit this wall?
       | The gap between "AI-first" and "backend-free" development feels
       | very solvable.
        
         | smpretzer wrote:
         | I think this lines up with Apple's thesis of on-device models
         | being a useful feature for developers who don't want to deal
         | with calling out the OpenAI
         | 
         | https://developer.apple.com/documentation/foundationmodels
        
         | sockboy wrote:
         | Yeah, hit this exact wall building a small AI tool. Ended up
         | spinning up a whole backend just to keep the keys safe. Feels
         | like there should be a simpler way, but haven't seen anything
         | that's truly plug-and-play yet. Curious to see what you're
         | working on.
        
           | dieortin wrote:
           | It's very obvious this account was just created to promote
           | your product...
        
         | jeremyjh wrote:
         | Do you think Firebase and Superbase are working on this? Good
         | luck but to me it sounds like a platform feature, not a
         | standalone product.
        
       | magicloop wrote:
       | I think this is a brilliant talk and truly captures the
       | "zeitgeist" of our times. He sees the emergent patterns arising
       | as software creation is changing.
       | 
       | I am writing a hobby app at the moment and I am thinking about
       | its architecture in a new way now. I am making all my model
       | structures comprehensible so that LLMs can see the inside
       | semantics of my app. I merely provide a human friendly GUI over
       | the top to avoid the linear wall-of-text problem you get when you
       | want to do something complex via a chat interface.
       | 
       | We need to meet LLMs in the middle ground to leverage the best of
       | our contributions - traditional code, partially autonomous AI,
       | and crafted UI/UX.
       | 
       | Part of, but not all of, programming is "prompting well". It goes
       | along with understanding the imperative aspects, developing a
       | nose for code smells, and the judgement for good UI/UX.
       | 
       | I find our current times both scary and exciting.
        
       | sockboy wrote:
       | Definitely hit this wall too. The backend just for API proxy
       | feels like a detour when all you want is to ship a quick
       | prototype. Would love to see more tools that make this seamless,
       | especially for solo builders.
        
       ___________________________________________________________________
       (page generated 2025-06-19 23:00 UTC)