[HN Gopher] Andrej Karpathy: Software in the era of AI [video]
___________________________________________________________________
Andrej Karpathy: Software in the era of AI [video]
Author : sandslash
Score : 1056 points
Date : 2025-06-19 00:33 UTC (22 hours ago)
(HTM) web link (www.youtube.com)
(TXT) w3m dump (www.youtube.com)
| gchamonlive wrote:
| I think it's interesting to juxtapose traditional coding, neural
| network weights and prompts because in many areas -- like the
| example of the self driving module having code being replaced by
| neural networks tuned to the target dataset representing the
| domain -- this will be quite useful.
|
| However I think it's important to make it clear that given the
| hardware constraints of many environments the applicability of
| what's being called software 2.0 and 3.0 will be severely
| limited.
|
| So instead of being replacements, these paradigms are more like
| extra tools in the tool belt. Code and prompts will live side by
| side, being used when convenient, but none a panacea.
| karpathy wrote:
| I kind of say it in words (agreeing with you) but I agree the
| versioning is a bit confusing analogy because it usually
| additionally implies some kind of improvement. When I'm just
| trying to distinguish them as very different software
| categories.
| miki123211 wrote:
| What do you think about structured outputs / JSON mode /
| constrained decoding / whatever you wish to call it?
|
| To me, it's a criminally underused tool. While "raw" LLMs are
| cool, they're annoying to use as anything but chatbots, as
| their output is unpredictable and basically impossible to
| parse programmatically.
|
| Structured outputs solve that problem neatly. In a way,
| they're "neural networks without the training". They can be
| used to solve similar problems as traditional neural
| networks, things like image classification or extracting
| information from messy text, but all they require is a Zod or
| Pydantic type definition and a prompt. No renting GPUs,
| labeling data and tuning hyperparameters necessary.
|
| They often also improve LLM performance significantly.
| Imagine you're trying to extract calories per 100g of
| product, but some product give you calories per serving and a
| serving size, calories per pound etc. The naive way to do
| this is a prompt like "give me calories per 100g", but that
| forces the LLM to do arithmetic, and LLMs are bad at
| arithmetic. With structured outputs, you just give it the
| fifteen different formats that you expect to see as
| alternatives, and use some simple Python to turn them all
| into calories per 100g on the backend side.
| abdullin wrote:
| Even more than that. With Structured Outputs we essentially
| control layout of the response, so we can force LLM to go
| through different parts of the completion in a predefined
| order.
|
| One way teams exploit that - force LLM to go through a
| predefined task-specific checklist before answering. This
| custom hard-coded chain of thought boosts the accuracy and
| makes reasoning more auditable.
| solaire_oa wrote:
| I also think that structured outputs are criminally
| underused, but it isn't perfect... and per your example, it
| might not even be good, because I've done something
| similar.
|
| I was trying to make a decent cocktail recipe database, and
| scraped the text of cocktails from about 1400 webpages.
| Note that this was just the text of the cocktail recipe,
| and cocktail recipes are comparatively small. I sent the
| text to an LLM for JSON structuring, and the LLM routinely
| miscategorized liquor types. It also failed to normalize
| measurements with explicit instructions and the temperature
| set to zero. I gave up.
| handfuloflight wrote:
| Which LLM?
| hellovai wrote:
| have you tried schema-aligned parsing yet?
|
| the idea is that instead of using JSON.parse, we create a
| custom Type.parse for each type you define.
|
| so if you want a: class Job { company:
| string[] }
|
| And the LLM happens to output: {
| "company": "Amazon" }
|
| We can upcast "Amazon" -> ["Amazon"] since you indicated
| that in your schema.
|
| https://www.boundaryml.com/blog/schema-aligned-parsing
|
| and since its only post processing, the technique will
| work on every model :)
|
| for example, on BFCL benchmarks, we got SAP + GPT3.5 to
| beat out GPT4o ( https://www.boundaryml.com/blog/sota-
| function-calling )
| solaire_oa wrote:
| Interesting! I was using function calling and JSON modes
| with zod. I may revisit the project with SAP!
| coderatlarge wrote:
| note the per 100g prompt might lead the llm to reach for
| the part of its training distribution that is actually
| written in terms of the 100g standard and just lead to
| different recall rather than a suboptimal calculation based
| on non-standardized per 100g training examples.
| poorcedural wrote:
| Andrej, maybe Software 3.0 is not written in spoken language
| like code or prompts. Software 3.0 is recorded in behavior, a
| behavior that today's software lacks. That behavior is
| written and consumed by machine and annotated by human
| interaction. Skipping to 3.0 is premature, but Software 2.0
| is a ramp.
| mclau157 wrote:
| Would this also be more of a push towards robotics and
| getting physical AI in our every day lives
| poorcedural wrote:
| Very insightful! How you would describe boiling an egg is
| different than how a machine would describe it to another
| machine.
| BobbyJo wrote:
| The versioning makes sense to me. Software has a cycle where
| a new tool is created to solve a problem, and the problem
| winds up being meaty enough, and the tool effective enough,
| that the exploration of the problem space the tool unlocks is
| essentially a new category/skill/whatever.
|
| computers -> assembly -> HLL -> web -> cloud -> AI
|
| Nothing on that list has disappeared, but the work has
| changed enough to warrant a few major versions imo.
| TeMPOraL wrote:
| For me it's even simpler:
|
| V1.0: describing solutions to specific problems directly,
| precisely, for machines to execute.
|
| V2.0: giving machine examples of good and bad answers to
| specific problems we don't know how to describe precisely,
| for machine to generalize from and solve such indirectly
| specified problem.
|
| V3.0: telling machine what to do in plain language, for it
| to figure out and solve.
|
| V2 was coded in V1 style, as a solution to problem of
| "build a tool that can solve problems defined as examples".
| V3 was created by feeding everything and the kitchen sink
| into V2 at the same time, so it learns to solve the problem
| of being general-purpose tool.
| BobbyJo wrote:
| That's less a versioning of software and more a
| versioning of AI's role in software. None -> Partial ->
| Total. Its a valid scale with regard to AI's role
| specifically, but I think Karpathy was intending to make
| a point about software as a whole, and even the details
| of how that middle "Partial" era evolves.
| swyx wrote:
| no no, it actually is a good analogy in 2 ways:
|
| 1) it is a breaking change from the prior version
|
| 2) it is an improvement in that, in its ideal/ultimate form,
| it is a full superset of capabilities of the previous version
| gchamonlive wrote:
| > versioning is a bit confusing analogy because it usually
| additionally implies some kind of improvement
|
| Exactly what I felt. Semver like naming analogies bring their
| own set of implicit meanings, like major versions having to
| necessarily supersede or replace the previous version, that
| is, it doesn't account for coexistence further than planning
| migration paths. This expectation however doesn't correspond
| with the rest of the talk, so I thought I might point it out.
| Thanks for taking the time to reply!
| radicalbyte wrote:
| Weights are code being replaced by data; something I've been
| making heavy use of since the early 00s. After coding for 10
| years you start to see the benefits of it and understand where
| you should use it.
|
| LLMs give us another tool only this time it's far more
| accessible and powerful.
| dcsan wrote:
| LLMs have already replaced some code directly for me eg NLP
| stuff. Previously I might write a bunch of code to do
| clustering now I just ask the LLM to group things. Obviously
| this is a very basic feature native to LLMs but there will be
| more first class LLM callable functions over time.
| nico wrote:
| Thank you YC for posting this before the talk became
| deprecated[1]
|
| 1: https://x.com/karpathy/status/1935077692258558443
| sandslash wrote:
| We couldn't let that happen!
| jppope wrote:
| Well that showed up significantly faster than they said it would.
| seneca wrote:
| Classic under promise and over deliver.
|
| I'm glad they got it out quickly.
| dang wrote:
| Me too. It was my favorite talk of the ones I saw.
| dang wrote:
| The team adapted quickly, which is a good sign. I believe
| getting the videos out sooner (as in why-not-immediately) is
| going to be a priority in the future.
| anythingworks wrote:
| loved the analogies! Karpathy is consistently one of the clearest
| thinkers out there.
|
| interesting that Waymo could do uninterrupted trips back in 2013,
| wonder what took them so long to expand? regulation? tailend of
| driving optimization issues?
|
| noticed one of the slides had a cross over 'AGI 2027'...
| ai-2027.com :)
| AlotOfReading wrote:
| You don't "solve" autonomous driving as such. There's a long,
| slow grind of gradually improving things until failures become
| rare enough.
| petesergeant wrote:
| I wonder at what point all the self-driving code becomes
| replaceable with a multimodal generalist model with the
| prompt "drive safely"
| AlotOfReading wrote:
| One of the issues with deploying models like that is the
| lack of clear, widely accepted ways to validate
| comprehensive safety and absence of unreasonable risk. If
| that can be solved, or regulators start accepting answers
| like "our software doesn't speed in over 95% of
| situations", then they'll become more common.
| anon7000 wrote:
| Very advanced machine learning models are used in current
| self driving cars. It all depends what the model is trying
| to accomplish. I have a hard time seeing a generalist
| prompt-based generative model ever beating a model
| specifically designed to drive cars. The models are just
| designed for different, specific purposes
| tshaddox wrote:
| I could see it being the case that driving is a fairly
| general problem, and this models intentionally designed
| to be general end up doing better than models designed
| with the misconception that you need a very particular
| set of driving-specific capabilities.
| anythingworks wrote:
| exactly! I think that was tesla's vision with self-
| driving to begin with... so they tried to frame it as
| problem general enough, that trying to solve it would
| also solve questions of more general intelligence ('agi')
| i.e. cars should use vision just like humans would
|
| but in hindsight looks like this slowed them down quite a
| bit despite being early to the space...
| shakna wrote:
| Driving is not a general problem, though. Its a
| contextual landscape of fast-based reactions and
| predictions. Both are required, and done regularly by the
| human element. The exact nature of every reaction, and
| every prediction, change vastly within the context
| window.
|
| You need image processing just as much as you need
| scenario management, and they're orthoganol to each
| other, as one example.
|
| If you want a general transport system... We do have
| that. It's called rail. (And can and has been automated.)
| melvinmelih wrote:
| > Driving is not a general problem, though.
|
| But what's driving a car? A generalist human brain that
| has been trained for ~30 hours to drive a car.
| shakna wrote:
| Human brain's aren't generalist!
|
| We have multiple parts of the brain that interact in
| vastly different ways! Your cerebellum won't be running
| the role of the pons.
|
| Most parts of the brain cannot take over for others.
| Self-healing is the exception, not the rule. Yes, we have
| a degree of neuroplasticity, but there are many limits.
|
| (Sidenote: Driver's license here is 240 hours.)
| Zanfa wrote:
| > Human brain's aren't generalist!
|
| What? Human intelligence is literally how AGI is defined.
| Brain's physical configuration is irrelevant.
| shakna wrote:
| A human brain is not a general model. We have multiple
| overlapping systems. The physical configuration is
| extremely relevant to that.
|
| AGI is defined in terms of "General Intelligence", a
| theory that general modelling is irrelevant to.
| azan_ wrote:
| > We have multiple parts of the brain that interact in
| vastly different ways!
|
| Yes, and thanks to that human brains are generalist
| shakna wrote:
| Only if that was a singular system, however, it is not.
| [0]
|
| For example... The nerve cells in your gut may speak to
| the brain, and interact with it in complex ways we are
| only just beginning to understand, but they are separate
| systems that both have control over the nervous system,
| and other systems. [1]
|
| General Intelligence, the psychological theory, and
| General Modelling, whilst sharing words, share little
| else.
|
| [0] https://doi.org/10.1016/j.neuroimage.2022.119673
|
| [1] https://doi.org/10.1126/science.aau9973
| yusina wrote:
| 240 hours sounds excessive. Where is "here"?
| TeMPOraL wrote:
| It partially is. You have the specialized part of
| maneuvering a fast moving vehicle in physical world,
| trying to keep it under control at all times and never
| colliding with anything. Then you have the general part,
| which is navigating the _human environment_. That 's
| lanes and traffic signs and road works and schoolbuses,
| that's kids on the road and badly parked trailers.
|
| Current breed of autonomous driving systems have problems
| with exceptional situations - but based on all I've read
| about so far, those are _exactly_ of the kind that would
| benefit from a general system able to _understand_ the
| situation it 's in.
| mannicken wrote:
| Speed and Moore's law. You don't need to just make a
| decision without hallucinations, you need to do it fast
| enough for it to propagate to the power electronics and
| hit the gas/brake/turn the wheel/whatever. Over and over
| and over again on thousands of different tests.
|
| A big problem I am noticing is that the IT culture over
| the last 70 years has existed in a state of "hardware gun
| get faster soon". And over the last ten years we had a
| "hardware cant get faster bc physics sorry" problem.
|
| The way we've been making software in the 90s and 00s
| just isn't gonna be happening anymore. We are used to
| throwing more abstraction layers (C->C++->Java->vibe
| coding etc) at the problem and waiting for the guys in
| the fab to hurry up and get their hardware faster so our
| new abstraction layers can work.
|
| Well, you can fire the guys in the fab all you want but
| no matter how much they try to yell at the nature it
| doesn't seem to care. They told us the embedded
| c++-monkeys to spread the message. Sorry, the moore's law
| is over, boys and girls. I think we all need to take a
| second to take that in and realize the significance of
| that.
|
| [1] The "guys in the fab" are a fictional character and
| any similarity to the real world is a coincidence.
|
| [2] No c++-monkeys were harmed in the process of making
| this comment.
| yokto wrote:
| This is (in part) what "world models" are about. While some
| companies like Tesla bring together a fleet of small
| specialised models, others like CommaAI and Wayve train
| generalist models.
| ActorNightly wrote:
| > Karpathy is consistently one of the clearest thinkers out
| there.
|
| Eh, he ran Teslas self driving division and put them into a
| direction that is never going to fully work.
|
| What they should have done is a) trained a neural net to
| represent sequence of frames into a physical environment, and
| b)leveraged Mu Zero, so that self driving system basically
| builds out parallel simulations into the future, and does a
| search on the best course of action to take.
|
| Because thats pretty much what makes humans great drivers. We
| don't need to know what a cone is - we internally compute that
| something that is an object on the road that we are driving
| towards is going to result in a negative outcome when we
| collide with it.
| visarga wrote:
| > We don't need to know what a cone is
|
| The counter argument is that you can't zoom in and fix a
| specific bug in this mode of operation. Everything is mashed
| together in the same neural net process. They needed to
| ensure safety, so testing was crucial. It is harder to test
| an end-to-end system than its individual parts.
| AlotOfReading wrote:
| Aren't continuous, stochastic, partial knowledge environments
| where you need long horizon planning with strict deadlines
| and limited compute exactly the sort of environments muzero
| variants struggle with? Because that's driving.
|
| It's also worth mentioning that humans intentionally (and
| safely) drive into "solid" objects all the time. Bags, steam,
| shadows, small animals, etc. We also break rules (e.g. drive
| on the wrong side of the road), and anticipate things we
| can't even see based on a theory of mind of other agents.
| Human driving is extremely sophisticated, not reducible to
| rules that are easily expressed in "simple" language.
| tayo42 wrote:
| Is that the approach that waymo uses?
| suddenlybananas wrote:
| That's absolutely not what makes humans great drivers?
| impossiblefork wrote:
| I don't think that would have worked either.
|
| But if they'd gone for radars and lidars and a bunch of
| sensors and then enough processing hardware to actually fuse
| that, then I think they could have built something that had a
| chance of working.
| AIorNot wrote:
| Love his analogies and clear eyed picture
| pyman wrote:
| "We're not building Iron Man robots. We're building Iron Man
| suits"
| reducesuffering wrote:
| [flagged]
| throwawayoldie wrote:
| I'm old enough to remember when Twitter was new, and for a
| moment it felt like the old utopian promise of the Internet
| finally fulfilled: ordinary people would be able to talk,
| one-on-one and unmediated, with other ordinary people
| across the world, and in the process we'd find out that
| we're all more similar than different and mainly want the
| same things out of life, leading to a new era of peace and
| empathy.
|
| It was a nice feeling while it lasted.
| _kb wrote:
| Believe it or not, humans did in fact have forms of
| written language and communication prior to twitter.
| dang wrote:
| Can you please make your substantive points without
| snark? We're trying for something a bit different here.
|
| https://news.ycombinator.com/newsguidelines.html
| throwawayoldie wrote:
| You missed the point, but that's fine, it happens.
| tock wrote:
| I believe the opposite happened. People found out that
| there are huge groups of people with wildly differing
| views on morality from them and that just encouraged more
| hate. I genuinely think old school facebook where people
| only interacted with their own private friend circles is
| better.
| prisenco wrote:
| Broadcast networks like Twitter only make sense for
| influencers, celebrities and people building a brand.
| They're a net negative for literally anyone else.
|
| | _old school facebook where people only interacted with
| their own private friend circles is better._
|
| 100% agree but crazy that option doesn't exist anymore.
| pryelluw wrote:
| Funny thing is that in more than one of the iron man movies
| the suits end up being bad robots. Even the ai iron man made
| shows up to ruin the day in the avengers movie. So it's a
| little in the nose that they'd try to pitch it this way.
| wiseowise wrote:
| That's looking too much into this. It's just an obvious
| plot twist to justify making another movie, nothing else.
| AdieuToLogic wrote:
| It's an interesting presentation, no doubt. The analogies
| eventually fail as analogies usually do.
|
| A recurring theme presented, however, is that LLM's are somehow
| not controlled by the corporations which expose them as a
| service. The presenter made certain to identify three interested
| actors (governments, corporations, "regular people") and how LLM
| offerings are not controlled by governments. This is a bit
| disingenuous.
|
| Also, the OS analogy doesn't make sense to me. Perhaps this is
| because I do not subscribe to LLM's having reasoning capabilities
| nor able to reliably provide services an OS-like system can be
| shown to provide.
|
| A minor critique regarding the analogy equating LLM's to
| mainframes: Mainframes in the 1960's never "ran
| in the cloud" as it did not exist. They still do not "run
| in the cloud" unless one includes simulators.
| Terminals in the 1960's - 1980's did not use networks. They
| used dedicated serial cables or dial-up modems to connect
| either directly or through stat-mux concentrators.
| "Compute" was not "batched over users." Mainframes either
| had jobs submitted and ran via operators (indirect execution)
| or supported multi-user time slicing (such as found in Unix).
| furyofantares wrote:
| > The presenter made certain to identify three interested
| actors (governments, corporations, "regular people") and how
| LLM offerings are not controlled by governments. This is a bit
| disingenuous.
|
| I don't think that's what he said, he was identifying the first
| customers and uses.
| AdieuToLogic wrote:
| >> A recurring theme presented, however, is that LLM's are
| somehow not controlled by the corporations which expose them
| as a service. The presenter made certain to identify three
| interested actors (governments, corporations, "regular
| people") and how LLM offerings are not controlled by
| governments. This is a bit disingenuous.
|
| > I don't think that's what he said, he was identifying the
| first customers and uses.
|
| The portion of the presentation I am referencing starts at or
| near 12:50[0]. Here is what was said: I wrote
| about this one particular property that strikes me as
| very different this time around. It's that LLM's like
| flip they flip the direction of technology diffusion that
| is usually present in technology. So for example
| with electricity, cryptography, computing, flight,
| internet, GPS, lots of new transformative that have not
| been around. Typically it is the government and
| corporations that are the first users because it's new
| expensive etc. and it only later diffuses to consumer.
| But I feel like LLM's are kind of like flipped around.
| So maybe with early computers it was all about ballistics
| and military use, but with LLM's it's all about how do you
| boil an egg or something like that. This is certainly like
| a lot of my use. And so it's really fascinating to me that
| we have a new magical computer it's like helping me boil an
| egg. It's not helping the government do something
| really crazy like some military ballistics or some
| special technology.
|
| Note the identification of historic government interest in
| computing along with a flippant "regular person" scenario in
| the context of "technology diffusion."
|
| You are right in that the presenter identified "first
| customers", but this is mentioned in passing when viewed in
| context. Perhaps I should not have characterized this as "a
| recurring theme." Instead, a better categorization might be:
| The presenter minimized the control corporations have by
| keeping focus on governmental topics and trivial customer
| use-cases.
|
| 0 - https://youtu.be/LCEmiRjPEtQ?t=770
| furyofantares wrote:
| Yeah that's explicitly about first customers and first
| uses, not about who controls it.
|
| I don't see how it minimizes the control corporations have
| to note this. Especially since he's quite clear about how
| everything is currently centralized / time share model, and
| obviously hopeful we can enter an era that's more analogous
| to the PC era, even explicitly telling the audience maybe
| some of them will work on making that happen.
| distalx wrote:
| Hang in there! Your comment makes some really good points about
| the limits of analogies and the real control corporations have
| over LLMs.
|
| Plus, your historical corrections were spot on. Sometimes, good
| criticisms just get lost in the noise online. Don't let it get
| to you!
| wjohn wrote:
| The comparison of our current methods of interacting with LLMs
| (back and forth text) to old-school terminals is pretty
| interesting. I think there's still a lot work to be done to
| optimize how we interact with these models, especially for non-
| dev consumers.
| informal007 wrote:
| Audio maybe the better option.
| recursive wrote:
| Based on my experience with voicemail, I'd say that audio is
| not always best, and is sometimes in the running for worst.
| nodesocket wrote:
| llms.txt makes a lot of sense, especially for LLMs to interact
| with http APIs autonomously.
|
| Seems like you could set a LLM loose and like the Google Bot have
| it start converting all html pages into llms.txt. Man, the future
| is crazy.
| nothrabannosir wrote:
| Couldn't believe my eyes. The www is truly bankrupt. If anyone
| has a browser plugin which automatically redirects to llms.txt
| sign me up.
|
| Website too confusing for humans? Add more design, modals,
| newsletter pop ups, cookie banners, ads, ...
|
| Website too confusing for LLMs? Add an accessible, clean, ad-
| free, concise, high entropy, plain text summary of your
| website. Make sure to hide it from the humans!
|
| PS: it should be /.well-known/llms.txt but that feels futile at
| this point..
|
| PPS: I enjoyed the talk, thanks.
| andrethegiant wrote:
| > If anyone has a browser plugin which automatically
| redirects to llms.txt sign me up.
|
| Not a browser plugin, but you can prefix URLs with `pure.md/`
| to get the pure markdown of that page. It's not quite a 1:1
| to llms.txt as it doesn't explain the entire domain, but
| works well for one-off pages. [disclaimer: I'm the
| maintainer]
| jph00 wrote:
| The next version of the llms.txt proposal will allow an
| llms.txt file to be added at any level of a path, which isn't
| compatible with /.well-known.
|
| (I'm the creator of the llms.txt proposal.)
| nothrabannosir wrote:
| [flagged]
| dang wrote:
| " _Please don 't post shallow dismissals, especially of
| other people's work. A good critical comment teaches us
| something._"
|
| https://news.ycombinator.com/newsguidelines.html
| nothrabannosir wrote:
| Fair
| nothrabannosir wrote:
| PS apologies to jph00. I still believe what I believe but
| I should have phrased it differently or not at all. Good
| luck on your endeavors either way.
| achempion wrote:
| Even with this future approach, it still can live under the
| `/.well-known`, think of `/.well-known/llm/<mirrored path>`
| or `/.well-known/llm.json` with key/value mappings.
| andrethegiant wrote:
| Doesn't this conflict with the original proposal of
| appending .md to any resource, e.g. /foo/bar.html.md? Or
| why not tell servers to respond to the Accept header when
| it's set to text/markdown?
| alightsoul wrote:
| The web started dying with mobile social media apps, in which
| hyperlinks are a poor UX choice. Then again with SEO banning
| outlinks. Now this. The web of interconnected pages that was
| the World Wide Web is dead. Not on social media? No one sees
| you. Run a website? more bots than humans. Unless you sell
| something on the side with the website it's not profitable.
| Hyperlinking to other websites is dead.
|
| Gen Alpha doesn't know what a web page is and if they do,
| it's for stuff like neocities aka as a curiosity or art form
| only. Not as a source of information anymore. I don't blame
| them. Apps (social media apps) have less friction than web
| sites but have a higher barrier for people to create. We are
| going back to pre World Wide Web days in a way, kind of like
| Bulletin Board Systems on dial up without hyperlinking, and
| centralized (social media) Some countries mostly ones with
| few technical people llike the ones in Central America have
| moved away from the web almost entirely and into social media
| like Instagram.
|
| Due to the death of the web, google search and friends now
| rely mostly on matching queries with titles now so just like
| before the internet you have to know people to learn new
| stuff or wait for an algorithm to show it to you or someone
| to comment it online or forcefully enroll in a university.
| Maybe that's why search results have declined and poeple
| search using ChatGPT or maybe perplexity. Scholarly search
| engines are a bit better but frankly irrelevant for most
| poeple.
|
| Now I understand why Google established their own DNS server
| at 8.8.8.8. If you have a directory of all domains on DNS,
| you can still index sites without hyperlinks between them,
| even if the web dies. They saw it coming.
| practal wrote:
| If you have different representations of the same thing
| (llms.txt / HTML), how do you know it is actually equivalent to
| each other? I am wondering if there are scenarios where webpage
| publishers would be interested in gaming this.
| andrethegiant wrote:
| <link rel="alternate" /> is a standards-friendly way to
| semantically represent the same content in a different format
| jph00 wrote:
| That's not what llms.txt is. You can just use a regular
| markdown URL or similar for that.
|
| llms.txt is a description for an LLM of how to find the
| information on your site needed for an LLM to use your
| product or service effectively.
| dang wrote:
| This was my favorite talk at AISUS because it was so full of
| _concrete_ insights I hadn 't heard before and (even better)
| practical points about what to build _now_ , in the immediate
| future. (To mention just one example: the "autonomy slider".)
|
| If it were up to me, which it very much is not, I would try to
| optimize the next AISUS for more of this. I felt like I was
| getting smarter as the talk went on.
| sneak wrote:
| Can we please stop standardizing on putting things in the root?
|
| /.well-known/ exists for this purpose.
|
| example.com/.well-known/llms.txt
|
| https://en.m.wikipedia.org/wiki/Well-known_URI
| andrethegiant wrote:
| https://github.com/AnswerDotAI/llms-txt/issues/2
| jph00 wrote:
| You can't just put things there any time you want - the RFC
| requires that they go through a registration process.
|
| Having said that, this won't work for llms.txt, since in the
| next version of the proposal they'll be allowed at any level of
| the path, not only the root.
| politelemon wrote:
| > You can't just put things there any time you want - the RFC
| requires that they go through a registration process.
|
| Actually, I can for two reasons. First is of course the RFC
| mentions that items can be registered after the fact, if it's
| found that a particular well-known suffix is being widely
| used. But the second is a bit more chaotic - website owners
| are under no obligation to consult a registry, much like port
| registrations; in many cases they won't even know it exists
| and may think of it as a place that should reflect their
| mental model.
|
| It can make things awkward and difficult though, that is
| true, but that comes with the free text nature of the well-
| known space. That's made evident in the Github issue linked,
| a large group of very smart people didn't know that there was
| a registry for it.
|
| https://github.com/AnswerDotAI/llms-
| txt/issues/2#issuecommen...
| jph00 wrote:
| There was no "large group of very smart people" behind
| llms.txt. It was just me. And I'm very familiar with the
| registry, and it doesn't work for this particular case IMO
| (although other folks are welcome to register it if they
| feel otherwise, of course).
| dncornholio wrote:
| > You can't just put things there any time you want - the RFC
| requires that they go through a registration process.
|
| Excuse me???
| jph00 wrote:
| From the RFC:
|
| """ A well-known URI is a URI [RFC3986] whose path
| component begins with the characters "/.well-known/", and
| whose scheme is "HTTP", "HTTPS", or another scheme that has
| explicitly been specified to use well- known URIs.
|
| Applications that wish to mint new well-known URIs MUST
| register them, following the procedures in Section 5.1. """
| sneak wrote:
| I put stuff in /.well-known/ all the time whenever I want.
| They're my servers.
| mikewarot wrote:
| A few days ago, I was introduced to the idea that when you're
| vibe coding, you're consulting a "genie", much like in the
| fables, you almost never get what you asked for, but if your
| wishes are small, you might just get what you want.
|
| The primagen reviewed this article[1] a few days ago, and (I
| think) that's where I heard about it. (Can't re-watch it now,
| it's members only) 8(
|
| [1] https://medium.com/@drewwww/the-gambler-and-the-
| genie-08491d...
| fudged71 wrote:
| "You are an expert 10x software developer. Make me a billion
| dollar app." Yeah this checks out
| anythingworks wrote:
| that's a really good analogy! It feels like wicked joke that
| llms behave in such a way that they're both intelligent and
| stupid at the same time
| fnord77 wrote:
| Him claiming govts don't use AI or are behind the curve is not
| accurate.
|
| Modern military drones are very much AI agents
| practal wrote:
| Great talk, thanks for putting it online so quickly. I liked the
| idea of making the generation / verification loop go brrr, and
| one way to do this is to make verification not just a human task,
| but a machine task, where possible.
|
| Yes, I am talking about formal verification, of course!
|
| That also goes nicely together with "keeping the AI on a tight
| leash". It seems to clash though with "English is the new
| programming language". So the question is, can you hide the
| formal stuff under the hood, just like you can hide a calculator
| tool for arithmetic? Use informal English on the surface, while
| some of it is interpreted as a formal expression, put to work,
| and then reflected back in English? I think that is possible, if
| you have a formal language and logic that is flexible enough, and
| close enough to informal English.
|
| Yes, I am talking about abstraction logic [1], of course :-)
|
| So the goal would be to have English (German, ...) as the ONLY
| programming language, invisibly backed underneath by abstraction
| logic.
|
| [1] http://abstractionlogic.com
| AdieuToLogic wrote:
| > So the question is, can you hide the formal stuff under the
| hood, just like you can hide a calculator tool for arithmetic?
| Use informal English on the surface, while some of it is
| interpreted as a formal expression, put to work, and then
| reflected back in English?
|
| The problem with trying to make "English -> formal language ->
| (anything else)" work is that informality is, by definition,
| not a formal specification and therefore subject to ambiguity.
| The inverse is not nearly as difficult to support.
|
| Much like how a property in an API initially defined as being
| optional cannot be made mandatory without potentially breaking
| clients, whereas making a mandatory property optional can be
| backward compatible. IOW, the cardinality of "0 .. 1" is a
| strict superset of "1".
| practal wrote:
| > The problem with trying to make "English -> formal language
| -> (anything else)" work is that informality is, by
| definition, not a formal specification and therefore subject
| to ambiguity. The inverse is not nearly as difficult to
| support.
|
| Both directions are difficult and important. How do you
| determine when going from formal to informal that you got the
| right informal statement? If you can judge that, then you can
| also judge if a formal statement properly represents an
| informal one, or if there is a problem somewhere. If you
| detect a discrepancy, tell the user that their English is
| ambiguous and that they should be more specific.
| amelius wrote:
| LLMs are pretty good at writing small pieces of code, so I
| suppose they can very well be used to compose some formal
| logic statements.
| singularity2001 wrote:
| lean 4/5 will be a rising star!
| practal wrote:
| You would definitely think so, Lean is in a great position
| here!
|
| I am betting though that type theory is not the right logic
| for this, and that Lean can be leapfrogged.
| gylterud wrote:
| I think type theory is exactly right for this! Being so
| similar to programming languages, it can piggy back on the
| huge amount of training the LLMs have on source code.
|
| I am not sure lean in part is the right language, there
| might be challengers rising (or old incumbents like Agda or
| Roq can find a boost). But type theory definitely has the
| most robust formal systems at the moment.
| practal wrote:
| > Being so similar to programming languages
|
| I think it is more important to be close to English than
| to programming languages, because that is the critical
| part:
|
| _" As close to a programming language as necessary, as
| close to English as possible"_
|
| is the goal, in my opinion, without sacrificing
| constraints such as simplicity.
| gylterud wrote:
| Why? Why would the language used to express proof of
| correctness have anything to do with English?
|
| English was not developed to facilitate exact and formal
| reasoning. In natural language ambiguity is a feature, in
| formal languages it is unwanted. Just look at maths. The
| reasons for all the symbols is not only brevity but also
| precision. (I dont think the symbolism of mathematics is
| something to strive for though, we can use sensible names
| in our languages, but the structure will need to be
| formal and specialised to the domain.)
|
| I think there could be meaningful work done to render the
| statements of the results automatically into (a
| restricted subset of) English for ease of human
| verification that the results proven are actually the
| results one wanted. I know there has been work in this
| direction. This might be viable. But I think the actual
| language of expressing results and proofs would have to
| be specialised for precision. And there I think type
| theory has the upper hand.
| practal wrote:
| My answer is already in my previous comment: if you have
| two formal languages to choose from, you want the one
| closer to natural language, because it will be easier to
| see if informal and formal statements match. Once you are
| in formal land, you can do transformations to other
| formal systems as you like, as these can be machine-
| verified. Does that make sense?
| skydhash wrote:
| Not really. You want the one more aligned to the domain.
| Think music notation. Languages have more evolved to
| match abstractions that help with software engineering
| principles than to help with layman understanding. (take
| SQL and the relational model, they have more relation
| with each other than the former with natural languages)
| polivier wrote:
| > if you have two formal languages to choose from, you
| want the one closer to natural language
|
| Given the choice I'd rather use Python than COBOL even
| though COBOL is closer to English than Python.
| voidhorse wrote:
| Why? By the completeness theorem, shouldn't first order
| logic already be sufficient?
|
| The calculus of constructions and other approaches are
| already available and proven. I'm not sure why we'd need a
| special logic for LLMs unless said logic somehow accounts
| for their inherently stochastic tendencies.
| practal wrote:
| If first-order logic is already sufficient, why are most
| mature systems using a type theory? Because type theory
| is more ergonomic and practical than first-order logic. I
| just don't think that type theory is ergonomic and
| practical _enough_. That is not a special judgement with
| respect to LLMs, I want a better logic for myself as
| well. This has nothing to do with "stochastic
| tendencies". If it is easier to use for humans, it will
| be easier for LLMs as well.
| tylerhou wrote:
| Completeness for FOL specifically says that semantic
| implications (in the language of FOL) have syntactic
| proofs. There are many concepts that are inexpressible in
| FOL (for example, the class of all graphs which contain a
| cycle).
| kordlessagain wrote:
| This thread perfectly captures what Karpathy was getting at.
| We're witnessing a fundamental shift where the interface to
| computing is changing from formal syntax to natural language.
| But you can see people struggling to let go of the formal
| foundations they've built their careers on.
| skydhash wrote:
| Not really. There's a problem to be solved, and the solution
| is always best exprimed in formal notation, because we can
| then let computers do it and not worry about it.
|
| We already have natural languages for human systems and the
| only way it works is because of shared metaphors and
| punishment and rewards. Everyone is incentivized to do a good
| job.
| neuronic wrote:
| It's called gatekeeping and the gatekeepers will be the ones
| left in the dust. This has been proven time and time again.
| Better learn to go with the flow - judging LLMs on linear
| improvements or even worse on today's performance is a fool's
| errand.
|
| Even if improvements level off and start plateauing, things
| will still get better and for careful guided, educated use
| LLMs have already become a great accelerator in many ways.
| StackOverflow is basically dead now which in itself is a
| fundamental shift from just 3-4 years ago.
| norir wrote:
| Have you thought through the downsides of letting go of these
| formal foundations that have nothing to do with job
| preservation? This comes across as a rather cynical
| interpretation of the motivations of those who have concerns.
| mkleczek wrote:
| This is why I call all this AI stuff BS.
|
| Using a formal language is a feature, not a bug. It is a
| cornerstone of all human engineering and scientific activity
| and is the _reason_ why these disciplines are successful.
|
| What you are describing (ie. ditching formal and using
| natural language) is moving humanity back towards magical
| thinking, shamanism and witchcraft.
| diggan wrote:
| > is the _reason_ why these disciplines
|
| Would you say that ML isn't a successful discipline? ML is
| basically balancing between "formal language"
| (papers/algorithms) and "non-deterministic outcomes"
| (weights/inference) yet it seems useful in a wide range of
| applications, even if you don't think about LLMs at all.
|
| > towards magical thinking, shamanism and witchcraft.
|
| I kind of feel like if you want to make a point about how
| something is bullshit, you probably don't want to call it
| "magical thinking, shamanism and witchcraft" because no
| matter how good your point is, if you end up basically re-
| inventing the witch hunt, how is what you say not bullshit,
| just in the other way?
| mkleczek wrote:
| > Would you say that ML isn't a successful discipline? ML
| is basically balancing between "formal language"
| (papers/algorithms) and "non-deterministic outcomes"
| (weights/inference) yet it seems useful in a wide range
| of applications
|
| Usefulness of LLMs has yet to be proven. So far there is
| more marketing in it than actual, real world results.
| Especially comparing to civil and mechanical engineering,
| maths, electrical engineering and plethora of disciplines
| and methods that bring real world results.
| diggan wrote:
| > Usefulness of LLMs has yet to be proven.
|
| What about ML (Machine Learning) as a whole? I kind of
| wrote ML instead of LLMs just to avoid this specific
| tangent. Are you feelings about that field the same?
| mkleczek wrote:
| > What about ML (Machine Learning) as a whole? I kind of
| wrote ML instead of LLMs just to avoid this specific
| tangent. Are you feelings about that field the same?
|
| No - I only expressed my thoughts about using natural
| language for computing.
| lelanthran wrote:
| > Would you say that ML isn't a successful discipline?
|
| Not yet it isn't; all I am seeing are tools to replace
| programmers and artists :-/
|
| Where are the tools to take in 400 recipes and spit out
| all of them in a formal structure (poster upthread
| literally gave up on trying to get an LLM to do this).
| Tools that can replace the 90% of office staff who _aren
| 't_ programmers?
|
| Maybe it's a successful low-code industry right now, it's
| not really a successful AI industry.
| diggan wrote:
| > Not yet it isn't; all I am seeing are tools to replace
| programmers and artists :-/
|
| You're missing a huge part of the ecosystem, ML is so
| much more than just "generative AI", which seems to be
| the extent of your experience so far.
|
| Weather predictions, computer vision, speech recognition,
| medicine research and more are already improved by
| various machine learning techniques, and already was
| before the current LLM/generative AI. Wikipedia has a
| list of ~50 topics where ML is already being used, in
| production, today ( https://en.wikipedia.org/wiki/Machine
| _learning#Applications ) if you're feeling curious about
| exploring the ecosystem more.
| lelanthran wrote:
| > You're missing a huge part of the ecosystem, ML is so
| much more than just "generative AI", which seems to be
| the extent of your experience so far.
|
| I'm not missing anything; I'm saying the current boom is
| being fueled by claims of "replacing workers", but the
| only class of AI being funded to do that are LLMs, and
| the only class of worker that _might_ get replaced are
| programmers and artists.
|
| Karpathy's video, and this thread, are not about the un-
| hyped ML stuff that has been employed in various
| disciplines since 2010 and has not been proposed as a
| replacement for workers.
| skydhash wrote:
| ML is basically greedy determinism. If we can't get the
| correct answer, we try to get one that is most likely
| wrong, but give us enough information that we can make a
| decision. So the answer is not useful, but its nature is.
|
| If we take object detection in computer vision, the
| detection by itself is not accurate, but it helps with
| resources management. instead of expensive continuous
| monitoring, we now have something cheaper which moves the
| expensive part to be discrete.
|
| But something deterministic would be always more
| preferable because you only needs to do verification
| once.
| jason_oster wrote:
| > What you are describing (ie. ditching formal and using
| natural language) is moving humanity back towards magical
| thinking ...
|
| "Any sufficiently advanced technology is indistinguishable
| from magic."
| discreteevent wrote:
| indistinguishable from magic != magic
| bwfan123 wrote:
| > Using a formal language is a feature, not a bug. It is a
| cornerstone of all human engineering and scientific
| activity and is the _reason_ why these disciplines are
| successful
|
| A similar argument was also made by Dijkstra in this brief
| essay here [1] - which is timely to this debate of why
| "english is the new programming language" is not well-
| founded.
|
| I quote a brief snippet here:
|
| "The virtue of formal texts is that their manipulations, in
| order to be legitimate, need to satisfy only a few simple
| rules; they are, when you come to think of it, an amazingly
| effective tool for ruling out all sorts of nonsense that,
| when we use our native tongues, are almost impossible to
| avoid."
|
| [1] https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/E
| WD667...
| uncircle wrote:
| > This thread perfectly captures what Karpathy was getting
| at. We're witnessing a fundamental shift where the interface
| to computing is changing from formal syntax to natural
| language.
|
| Yes, telling a subordinate with natural language what you
| need is called being a product manager. Problem is, the
| subordinate has encyclopedic knowledge but it's also
| _extremely_ dumb in many aspects.
|
| I guess this is good for people that got into CS and hate the
| craft so prefer doing management, but in many cases you still
| need in your team someone with a IQ higher than room
| temperature to deliver a product. The only "fundamental"
| shift here is killing the entry-level coder at the big corp
| tasked at doing menial and boilerplate tasks, when instead
| you can hire a mechanical replacement from an AI company for
| a few hundred dollars a month.
| sponnath wrote:
| I think the only places where the entry-level coder is
| being killed are corps that never cared about the junior to
| senior pipeline. Some of them love off-shoring too so I'm
| not sure much has changed.
| otabdeveloper4 wrote:
| > We're witnessing a fundamental shift where the interface to
| computing is changing from formal syntax to natural language.
|
| People have said this _every year_ since the 1950 's.
|
| No, it is not happening. LLMs won't help.
|
| Writing code is easy, it's understanding the problem domain
| is hard. LLMs won't help you understand the problem domain in
| a formal manner. (In fact they might make it even more
| difficult.)
| simplify wrote:
| Let's be real, people have said similar things about AI
| too. It was all fluff, until it wasn't.
| megaman821 wrote:
| Yep, that why I never write anything out using mathmatical
| expressions. Natural language only baby!
| lelanthran wrote:
| > Use informal English on the surface, while some of it is
| interpreted as a formal expression, put to work, and then
| reflected back in English? I think that is possible, if you
| have a formal language and logic that is flexible enough, and
| close enough to informal English.
|
| That sounds like a paradox.
|
| Formal verification can prove that constraints are held.
| English cannot. mapping between them necessarily requires
| disambiguation. How would you construct such a disambiguation
| algorithm which must, by its nature, be deterministic?
| redbell wrote:
| > "English is the new programming language."
|
| For those who missed it, here's the viral tweet by Karpathy
| himself: https://x.com/karpathy/status/1617979122625712128
| throwaway314155 wrote:
| Referenced in the video of course. Not that everyone should
| watch a 40 minute long video before commenting but his
| reaction to the "meme" that vibe coding became when his tweet
| was intended as more of a shower thought is worth checking
| out.
| hgl wrote:
| It's fascinating to think about what true GUI for LLM could be
| like.
|
| It immediately makes me think a LLM that can generate a
| customized GUI for the topic at hand where you can interact with
| in a non-linear way.
| nbbaier wrote:
| I love this concept and would love to know where to look for
| people working on this type of thing!
| dpkirchner wrote:
| Like a HyperCard application?
| necrodome wrote:
| We (https://vibes.diy/) are betting on this
| diggan wrote:
| Border-line off-topic, but since you're flagrantly self-
| promoting, might as well add some more rule breakage to it.
|
| You know websites/apps who let you enter text/details and
| then not displaying sign in/up screen until you submit it,
| so you feel like "Oh but I already filled it out, might as
| well sign up"?
|
| They really suck, big time! It's disingenuous, misleading
| and wastes people's time. I had no interest in using your
| thing for real, but thought I'd try it out, potentially
| leave some feedback, but this bait-and-switch just made the
| whole thing feel sour and I'll probably try to actively
| avoid this and anything else I feel is related to it.
| necrodome wrote:
| Thanks for the benefit of the doubt. I typed that in a
| hurry, and it didn't come out the way I intended.
|
| We had the idea that there's a class of apps [1] that
| could really benefit from our tooling - mainly Fireproof,
| our local-first database, along with embedded LLM calling
| and image generation support. The app itself is open
| source, and the hosted version is free.
|
| Initially, there was no login or signup - you could just
| generate an app right away. We knew that came with risks,
| but we wanted to explore what a truly frictionless
| experience could look like. Unfortunately, it didn't take
| long for our LLM keys to start getting scraped, so the
| next best step was to implement rate limiting in the
| hosted version.
|
| [1] https://tools.simonwillison.net/
| diggan wrote:
| My complaint isn't about that you need to protect it with
| a login/signup, but where in the process you put that
| login/signup.
|
| Put it before letting people enter text, rather than once
| they've entered text and pressed the button, and people
| won't feel mislead anymore.
| karpathy wrote:
| Fun demo of an early idea was posted by Oriol just yesterday :)
|
| https://x.com/OriolVinyalsML/status/1935005985070084197
| aprilthird2021 wrote:
| This is crazy cool, even if not necessarily the best use case
| for this idea
| hackernewds wrote:
| it's impressive but it seems like a crappier UX? that none of
| the patterns can really be memorized
| suddenlybananas wrote:
| Having different documents come up every time you go into the
| documents directory seems hellishly terrible.
| falcor84 wrote:
| It's a brand of terribleness I've somewhat gotten used to,
| opening Google Drive every time, when it takes me to the
| "Suggested" tab. I can't recall a single time when it had
| the document I care about anywhere close to the top.
|
| There's still nothing that beats the UX of Norton
| Commander.
| sensanaty wrote:
| [flagged]
| danielbln wrote:
| Maybe we can collect all of this salt and operate a Thorium
| reactor with it, this in turn can then power AI.
| sensanaty wrote:
| We'll need to boil a few more lakes before we get to that
| stage I'm afraid, who needs water when you can have your
| AI hallucinate some for you after all?
| TeMPOraL wrote:
| Who needs water when all these hot takes come from
| sources so dense, they're about to collapse into black
| holes.
| sensanaty wrote:
| Is me not wanting the UI of my OS to shift with every
| mouse click a hot take? If me wanting to have the
| consistent "When I click here, X happens" behavior
| instead of the "I click here and I'm Feeling Lucky
| happens" behavior is equal to me being dense, so be it I
| guess.
| TeMPOraL wrote:
| No. But you interpreting and evaluating the demo in
| question as suggesting the things you described -
| frankly, yes. It takes a deep gravity well to miss a
| point this clear from this close.
|
| It's a tech demo. It shows you it's _possible_ to do
| these things live, in real time (and to back Karpathy 's
| point about tech spread patterns, it's accessible to you
| and me right now). It's not saying it's a good idea - but
| there are obvious seeds of good ideas there. For one, it
| shows you a vision of an OS or software you can trivially
| extend yourself on the fly. "I wish it did X", bam, it
| does. And no one says it has to be non-deterministic each
| time you press some button. It can just fill what's
| missing and make additions permanent, fully deterministic
| after creation.
| dang wrote:
| " _Please don 't fulminate._"
|
| " _Don 't be curmudgeonly. Thoughtful criticism is fine,
| but please don't be rigidly or generically negative._"
|
| " _Please don 't post shallow dismissals, especially of
| other people's work. A good critical comment teaches us
| something._"
|
| " _Please respond to the strongest plausible interpretation
| of what someone says, not a weaker one that 's easier to
| criticize._"
|
| https://news.ycombinator.com/newsguidelines.html
| superfrank wrote:
| On one hand, I'm incredibly impressed by the technology
| behind that demo. On the other hand, I can't think of many
| things that would piss me off more than a non-deterministic
| operating system.
|
| I like my tools to be predictable. Google search trying to
| predict that I want the image or shopping tag based on my
| query already drives me crazy. If my entire operating system
| did that, I'm pretty sure I'd throw my computer out a window.
| iLoveOncall wrote:
| > incredibly impressed by the technology behind that demo
|
| An LLM generating some HTML?
| superfrank wrote:
| At a speed that feels completely seamless to navigate
| through. Yeah, I'm pretty impressed by that.
| spamfilter247 wrote:
| My takeaway from the demo is less that "it's different each
| time", but more a "it can be different for different users
| and their styles of operating" - a poweruser can now see a
| different Settings UI than a basic user, and it can be
| generated realtime based on the persona context of the user.
|
| Example use case (chosen specifically for tech): An IDE UI
| that starts basic, and exposes functionality over time as the
| human developer's skills grow.
| superconduct123 wrote:
| That looks both cool and infuriating
| throwaway314155 wrote:
| I would bet good money that many of the functions they chose
| not to drill down into (such as settings -> volume) do
| nothing at all or cause an error.
|
| It's a fronted generator. It's fast. That's cool. But is
| being pitched as a functioning OS generator and I can't help
| but think it isn't given the failure rates for those sorts of
| tasks. Further, the success rates for HTML generation
| probably _are_ good enough for a Holmes-esque (perhaps too
| harsh) rugpull (again, too harsh) demo.
|
| A cool glimpse into what the future might look like in any
| case.
| cjcenizal wrote:
| My friend Eric Pelz started a company called Malleable to do
| this very thing: https://www.linkedin.com/posts/epelz_every-
| piece-of-software...
| jonny_eh wrote:
| An ever-shifting UI sounds unlearnable, and therefore unusable.
| dang wrote:
| It wouldn't be unlearnable if it fits the way the user is
| already thinking.
| guappa wrote:
| AI is not mind reading.
| NitpickLawyer wrote:
| A sufficiently advanced prediction engine is
| indistinguishable from mind reading :D
| dang wrote:
| Behavioral patterns are not unpredictable. Who knows how
| far an LLM could get by pattern-matching what a user is
| doing and generating a UI to make it easier. Since the
| user could immediately say whether they liked it or not,
| this could turn into a rapid and creative feedback loop.
| OtherShrezzing wrote:
| A mixed ever-shifting UI can be excellent though. So you've
| got some tools which consistently interact with UI
| components, but the UI itself is altered frequently.
|
| Take for example world-building video games like Cities
| Skylines / Sim City or procedural sandboxes like Minecraft.
| There are 20-30 consistent buttons (tools) in the game's UX,
| while the rest of the game is an unbounded ever-shifting UI.
| skydhash wrote:
| The rest of the game is very deterministic where its state
| is controlled by the buttons. The slight variation is
| caused by the simulation engine and follows consistent
| patterns (you can't have building on fire if there's no
| building yet).
| sotix wrote:
| Like Spotify ugh
| 9rx wrote:
| Tools like v0 are a primitive example of what the above is
| talking about. The UI maintains familiar conventions, but is
| laid out dynamically based on surrounding context. I'm sure
| there are still weird edge cases, but for the most part
| people have no trouble figuring out how to use the output of
| such tools already.
| semi-extrinsic wrote:
| Humans are shit at interacting with systems in a non-linear
| way. Just look at Jupyter notebooks and the absolute mess that
| arises when you execute code blocks in arbitrary order.
| stoisesky wrote:
| This talk https://www.youtube.com/watch?v=MbWgRuM-7X8 explores
| the idea of generative / malleable personal user interfaces
| where LLMs can serve as the gateway to program how we want our
| UI to be rendered.
| stuartmemo wrote:
| It's probably Jira. https://medium.com/question-park/all-
| aboard-the-ai-train-b03...
| bedit wrote:
| I love the "people spirits" analogy. For casual tasks like
| vibecoding or boiling an egg, LLM errors aren't a big deal. But
| for critical work, we need rigorous checks--just like we do with
| human reasoning. That's the core of empirical science: we expect
| fallibility, so we verify. A great example is how early migration
| theories based on pottery were revised with better data like
| ancient DNA (see David Reich). Letting LLMs judge each other
| without solid external checks misses the point--leaderboard-style
| human rankings are often just as flawed.
| nilirl wrote:
| Where do these analogies break down?
|
| 1. Similar cost structure to electricity, but non-essential
| utility (currently)?
|
| 2. Like an operating system, but with non-determinism?
|
| 3. Like programming, but ...?
|
| Where does the programming analogy break down?
| rudedogg wrote:
| > programming
|
| The programming analogy is convenient but off. The joke has
| always been "the computer only does exactly what you tell it to
| do!" regarding logic bugs. Prompts and LLMs most certainly do
| not work like that.
|
| I loved the parallels with modern LLMs and time sharing he
| presented though.
| diggan wrote:
| > Prompts and LLMs most certainly do not work like that.
|
| It quite literally works like that. The computer is now OS +
| user-land + LLM runner + ML architecture + weights + system
| prompt + user prompt.
|
| Taken together, and since you're adding in probabilities (by
| using ML/LLMs), you're quite literally getting "the computer
| only does exactly what you tell it to do!", it's just that we
| have added "but make slight variations to what tokens you
| select next" (temperature>0.0) sometimes, but it's still the
| same thing.
|
| Just like when you tell the computer to create encrypted
| content by using some seed. You're getting exactly what you
| asked for.
| politelemon wrote:
| only in English, and also non-deterministic.
| malux85 wrote:
| Yeah, wherever possible I try to have the llm answer me in
| Python rather than English (especially when explaining new
| concepts)
|
| English is soooooo ambiguous
| falcor84 wrote:
| For what it's worth, I've been using it to help me learn
| math, and I added to my rules an instruction that it should
| always give me an example in Python (preferably sympy)
| whenever possible.
| PeterStuer wrote:
| Define non-essenti
|
| The way I see dependency in office ("knowledge") work:
|
| - pre-(computing) history. We are at the office, we work
|
| - dawn of the pc: my computer is down, work halts
|
| - dawn of the lan: the network is down, work halts
|
| - dawn of the Internet: the Internet connection is down, work
| halts (<- we are basically all here)
|
| - dawn of the LLM: ChatGPT is down, work halts (<- for many, we
| are here already)
| nilirl wrote:
| I see your point. It's nearing essential.
| sothatsit wrote:
| I find Karpathy's focus on tightening the feedback loop between
| LLMs and humans interesting, because I've found I am the happiest
| when I extend the loop instead.
|
| When I have tried to "pair program" with an LLM, I have found it
| incredibly tedious, and not that useful. The insights it gives me
| are not that great if I'm optimising for response speed, and it
| just frustrates me rather than letting me go faster. Worse, often
| my brain just turns off while waiting for the LLM to respond.
|
| OTOH, when I work in a more async fashion, it feels freeing to
| just pass a problem to the AI. Then, I can stop thinking about it
| and work on something else. Later, I can come back to find the AI
| results, and I can proceed to adjust the prompt and re-generate,
| to slightly modify what the LLM produced, or sometimes to just
| accept its changes verbatim. I really like this process.
| geeunits wrote:
| I would venture that 'tightening the feedback loop' isn't
| necessarily 'increasing the number of back and forth prompts'-
| and what you're saying you want is ultimately his argument.
| i.e. if integral enough it can almost guess what you're going
| to say next...
| sothatsit wrote:
| I specifically do not want AI as an auto-correct, doing auto-
| predictions while I am typing. I find this interrupts my
| thinking process, and I've never been bottlenecked by typing
| speed anyway.
|
| I want AI as a "co-worker" providing an alternative
| perspective or implementing my specific instructions, and
| potentially filling in gaps I didn't think about in my
| prompt.
| jwblackwell wrote:
| Yeah I am currently enjoying giving the LLM relatively small
| chunks of code to write and then asking it to write
| accompanying tests. While I focus on testing the product
| myself. I then don't even bother to read the code it's written
| most of the time
| dmitrijbelikov wrote:
| I think that Andrej presents "Software 3.0" as a revolution, but
| in essence it is a natural evolution of abstractions.
|
| Abstractions don't eliminate the need to understand the
| underlying layers - they just hide them until something goes
| wrong.
|
| Software 3.0 is a step forward in convenience. But it is not a
| replacement for developers with a foundation, but a tool for
| acceleration, amplification and scaling.
|
| If you know what is under the hood -- you are irreplaceable. If
| you do not know -- you become dependent on a tool that you do not
| always understand.
| poorcedural wrote:
| Foundational programmers form the base of where the seed can
| grow.
|
| In a way programmers found where our roots grow, they can not
| find your limits.
|
| Software 3.0 is a step into a different light, where software
| finds its own limits.
|
| If we know where they are rooted, we will merge their best
| attempts. Only because we appreciate their resultant behavior.
| ast0708 wrote:
| Should we not treat LLMs more as a UX feature to interact with a
| domain specific model (highly contextual), rather than expecting
| LLMs to provide the intelligence needed for software to act as
| partner to Humans.
| guappa wrote:
| He's selling something.
| rvz wrote:
| Someone is thinking.
| alightsoul wrote:
| why does vibe coding still involve any code at all? why can't an
| AI directly control the registers of a computer processor and
| graphics card, controlling a computer directly? why can't it draw
| on the screen directly, connected directly to the rows and
| columns of an LCD screen? what if an AI agent was implemented in
| hardware, with a processor for AI, a normal computer processor
| for logic, and a processor that correlates UI elements to touches
| on the screen? and a network card, some RAM for temporary stuff
| like UI elements and some persistent storage for vectors that
| represent UI elements and past converstations
| flumpcakes wrote:
| I'm not sure this makes sense as a question. Registers are
| 'controlled' by running code for a given state. An AI can write
| code that changes registers, as all code does in operation. An
| AI can't directly 'control registers' in any other way, just as
| you or I can't.
| singularity2001 wrote:
| what he means is why are the tokens not directly machine code
| tokens
| flumpcakes wrote:
| What is meant by a 'machine code token'? Ultimately a
| processor needs assembly code as input to do anything.
| Registers are set by assembly. Data is read by assembly.
| Hardware is managed through assembly (for example by
| setting bits in memory). Either I have a complete
| misunderstanding on what this thread is talking about, or
| others are commenting with some fundamental assumptions
| that aren't correct.
| alightsoul wrote:
| I would like to make an AI agent that directly interfaces
| with a processor by setting bits in a processor register,
| thus eliminating the need for even assembly code or any kind
| of code. The only software you would ever need would be the
| AI.
| shakna wrote:
| That's called a JIT compiler. And ignoring how bad an idea
| blending those two... It wouldn't be that difficult a task.
|
| The hardest parts of a jit is the safety aspect. And AI
| already violates most of that.
| alightsoul wrote:
| The safety part will probably be either solved or a non-
| issue or ignored. Similarly to how GPT3 was often seen as
| dangerous before ChatGPT was released. Some people who
| have only ever vibe coded are finding jobs today,
| ignoring safety entirely and lacking a notion of it or
| what it means. They just copy paste output from ChatGPT
| or an agentic IDE. To me it's JIT already with extra
| steps. Or they have pivoted their software engineers to
| vibe coding most of the time and don't even touch code
| anymore doing JIT with extra steps again.
| shakna wrote:
| As "jit" to you means running code, and not "building and
| executing machine code", maybe you could vibe code this.
| And enjoy the segfaults.
| guappa wrote:
| In a way he's making sense. If the "code" is the prompt,
| the output of the llm is an intermediate artifact, like
| the intermediate steps of gcc.
|
| So why should we still need gcc?
|
| The answer is of course, that we need it because llm's
| output is shit 90% of the time and debugging assembly or
| binary directly is even harder, so putting asides the
| difficulties of training the model, the output would be
| unusable.
| shakna wrote:
| Probably too much snark from me. But the gulf between
| interpreter and compiler can be decades of work, often
| discovering new mathematical principles along the way.
|
| The idea that you're fine to risk everything, in the way
| agentic things allow [0], and _want_ that messing around
| with raw memory is... A return to DOS ' crashes, but with
| HAL along for the ride.
|
| [0] https://msrc.microsoft.com/update-
| guide/vulnerability/CVE-20...
| guappa wrote:
| Ah don't worry, llms are a return to crashes as it is :)
|
| The other day it managed to produce code that made python
| segfault.
| flumpcakes wrote:
| It's not a JIT. A JIT produces assembly. You can't "set
| registers" or do anything useful without assembly code
| running on the processor.
| flumpcakes wrote:
| This makes no sense at all. You can't set registers without
| assembly code. If you could set registers without assembly
| code then it would be pointless as the registers wouldn't
| be 'running' against anything.
| birn559 wrote:
| Because any precise description of what the computer is
| supposed to do is already code as we know it. AI can fill in
| the gaps between natural language and programming by guessing
| and because you don't always care about the "how" only about
| the "what". The more you care about the "how" you have to
| become more precise in your language to reduce the guess work
| of the AI to the point that your input to the AI is already
| code.
|
| The question is: how much do we really care about the "how",
| even when we think we care about it? Modern programming
| language don't do guessing work, but they already abstract away
| quite a lot of the "how".
|
| I believe that's the original argument in favor of coding in
| assembler and that it will stay relevant.
|
| Following this argument, what AI is really missing is
| determinism to a far extend. I can't just save my input I have
| given to an AI and can be sure that it will produce the exact
| same output in a year from now on.
| alightsoul wrote:
| With vibe coding, I am under the impression that the only
| thing that matters for vibe coders is whether the output is
| good enough in the moment to fullfill a desire. For companies
| going AI first that's how it seems to be done. I see people
| in other places and those people have lost interest in the
| "how"
| therein wrote:
| All you need is a framebuffer and AI.
| abhaynayar wrote:
| Nice try, AI.
| belter wrote:
| Painful to watch. The new tech generation deserves better than
| hyped presentations from tech evangelists.
|
| This reminds me of the Three Amigos and Grady Booch evangelizing
| the future of software while ignoring the terrible output from
| Rational Software and the Unified Process.
|
| At least we got acknowledgment that self-driving remains
| unsolved: https://youtu.be/LCEmiRjPEtQ?t=1622
|
| And Waymo still requires extensive human intervention. Given
| Tesla's robotaxi timeline, this should crash their stock
| valuation...but likely won't.
|
| You can't discuss "vibe coding" without addressing security
| implications of the produced artifacts, or the fact that you're
| building on potentially stolen code, books, and copyrighted
| training data.
|
| And what exactly is Software 3.0? It was mentioned early then
| lost in discussions about making content "easier for agents."
| digianarchist wrote:
| In his defense he clearly articulated that meaningful change
| has not yet been achieved and could be a decade away. Even
| pointing to specific examples of LLMs failing to count letters
| and do basic arithmetic.
|
| What I find absent is where do we go from LLMs? More hardware,
| more training. "This isn't the scientific breakthrough you're
| looking for".
| nottorp wrote:
| In the era of AI and illiteracy...
| abdullin wrote:
| Tight feedback loops are the key in working productively with
| software. I see that in codebases up to 700k lines of code
| (legacy 30yo 4GL ERP systems).
|
| The best part is that AI-driven systems are fine with running
| even more tight loops than what a sane human would tolerate.
|
| Eg. running full linting, testing and E2E/simulation suite after
| any minor change. Or generating 4 versions of PR for the same
| task so that the human could just pick the best one.
| OvbiousError wrote:
| I don't think the human is the problem here, but the time it
| takes to run the full testing suite.
| Byamarro wrote:
| I work in web dev, so people sometimes hook code formatting
| as a git commit hook or sometimes even upon file save. The
| tests are problematic tho. If you work at huge project it's a
| no go idea at all. If you work at medium then the tests are
| long enough to block you, but short enough for you not to be
| able to focus on anything else in the meantime.
| diggan wrote:
| It is kind of a human problem too, although that the full
| testing suite takes X hours to run is also not fun, but it
| makes the human problem larger.
|
| Say you're Human A, working on a feature. Running the full
| testing suite takes 2 hours from start to finish. Every
| change you do to existing code needs to be confirmed to not
| break existing stuff with the full testing suite, so some
| changes it takes 2 hours before you have 100% understanding
| that it doesn't break other things. How quickly do you lose
| interest, and at what point do you give up to either improve
| the testing suite, or just skip that feature/implement it
| some other way?
|
| Now say you're Robot A working on the same task. The robot
| doesn't care if each change takes 2 hours to appear on their
| screen, the context is exactly the same, and they're still "a
| helpful assistant" 48 hours later when they still try to get
| the feature put together without breaking anything.
|
| If you're feeling brave, you start Robot B and C at the same
| time.
| abdullin wrote:
| This is the workflow that ChatGPT Codex demonstrates
| nicely. Launch any number of <<robotic>> tasks in parallel,
| then go on your own. Come back later to review the results
| and pick good ones.
| diggan wrote:
| Well, they're demonstrating it _somewhat_ , it's more of
| a prototype today. First tell is the low limit, I think
| the longest task for me been 15 minutes before it gives
| up. Second tell is still using a chat UI which is simple
| to implement, easy to implement and familiar, but also
| kind of lazy. There should be a better UX, especially
| with the new variations they just added. From the top of
| my head, some graph-like UX might have been better.
| abdullin wrote:
| I guess, it depends on the case and the approach.
|
| It works really nice with the following approach
| (distilled from experiences reported by multiple
| companies)
|
| (1) Augment codebase with explanatory texts that describe
| individual modules, interfaces and interactions
| (something that is needed for the humans anyway)
|
| (2) Provide Agent.MD that describes the
| approach/style/process that the AI agent must take. It
| should also describe how to run all tests.
|
| (3) Break down the task into smaller features. For each
| feature - ask first to write a detailed implementation
| plan (because it is easier to review the plan than 1000
| lines of changes. spread across a dozen files)
|
| (4) Review the plan and ask to improve it, if needed.
| When ready - ask to draft an actual pull request
|
| (5) The system will automatically use all available
| tests/linting/rules before writing the final PR. Verify
| and provide feedback, if some polish is needed.
|
| (6) Launch multiple instances of "write me an
| implementation plan" and "Implement this plan" task, to
| pick the one that looks the best.
|
| This is very similar to git-driven development of large
| codebases by distributed teams.
|
| Edit: added newlines
| diggan wrote:
| > distilled from experiences reported by multiple
| companies
|
| Distilled from my experience, I'd still say that the UX
| is lacking, as sequential chat just isn't the right
| format. I agree with Karpathy that we haven't found the
| right way of interacting with these OSes yet.
|
| Even with what you say, variations were implemented in a
| rush. Once you've iterated with one variation you can not
| at the same time iterate on another variant, for example.
| TeMPOraL wrote:
| Worked in such a codebase for about 5 years.
|
| No one really cares about improving test times. Everyone
| either suffers in private or gets convinced it's all normal
| and look at you weird when you suggest something needs to
| be done.
| diggan wrote:
| There a few of us around, but it's not a lot, agree. It
| really is an uphill battle trying to get development
| teams to design and implement test suites the same way
| they do with other "more important" code.
| londons_explore wrote:
| The full test suite is probably tens of thousands of tests.
|
| But AI will do a pretty decent job of telling you which tests
| are most likely to fail on a given PR. Just run those ones,
| then commit. Cuts your test time from hours down to seconds.
|
| Then run the full test suite only periodically and
| automatically bisect to find out the cause of any
| regressions.
|
| Dramatically cuts the compute costs of tests too, which in
| big codebase can easily become whole-engineers worth of
| costs.
| tele_ski wrote:
| It's an interesting idea, but reactive, and could cause big
| delays due to bisecting and testing on those regressions.
| There's the 'old' saying that the sooner the bug is found
| the cheaper it is to fix, seems weird to intentionally push
| finding side effect bugs later in the process because
| faster CI runs. Maybe AI will get there but it seems too
| aggressive right now to me. But yeah, put the automation
| slider where you're comfortable.
| tlb wrote:
| Yes, and (some near-future) AI is also more patient and
| better at multitasking than a reasonable human. It can make a
| change, submit for full fuzzing, and if there's a problem it
| can continue with the saved context it had when making the
| change. It can work on 100s of such changes in parallel,
| while a human trying to do this would mix up the reasons for
| the change with all the other changes they'd done by the time
| the fuzzing result came back.
|
| LLMs are worse at many things than human programmers, so you
| have to try to compensate by leveraging the things they're
| better at. Don't give up with "they're bad at such and such"
| until you've tried using their strengths.
| HappMacDonald wrote:
| You can't run N bots in parallel with testing between each
| attempt unless you're also running N tests in parallel.
|
| If you could run N tests in parallel, then you could
| probably also run the components of one test in parallel
| and keep it from taking 2 hours in the first place.
|
| To me this all sounds like snake oil to convince people to
| do something they were already doing, but by also spinning
| up N times as many compute instances and run a burn endless
| tokens along the way. And by the time it's demonstrated
| that it _doesn 't_ really offer anything more than doing it
| yourself, well you've already given them all of your money
| so their job is done.
| abdullin wrote:
| Running tests is already an engineering problem.
|
| In one of the systems (supply chain SaaS) we invested so
| much effort in having good tests in a simulated
| environment, that we could run full-stack tests at kHz.
| Roughly ~5k tests per second or so on a laptop.
| abdullin wrote:
| Humans tend to lack inhumane patience.
| 9rx wrote:
| Unless you are doing something crazy like letting the fuzzer
| run on every change (cache that shit), the full test suite
| taking a long time suggests that either your isolation points
| are _way_ too large or you are letting the LLM cross isolated
| boundaries and "full testing suite" here actually means
| "multiple full testing suites". The latter is an easy fix:
| Don't let it. Force it stay within a single isolation zone
| just like you'd expect of a human. The former is a lot harder
| to fix, but I suppose ending up there is a strong indicator
| that you can't trust the human picking the best LLM result in
| the first place and that maybe this whole thing isn't a good
| idea for the people in your organization.
| yahoozoo wrote:
| The problem is that every time you run your full automation
| with linting and tests, you're filling up the context window
| more and more. I don't know how people using Claude do it with
| its <300k context window. I get the "your message will exceed
| the length of this chat" message so many times.
| diggan wrote:
| I don't know exactly how Claude works, but the way I work
| around this with my own stuff is prompting it to not display
| full outputs ever, and instead temporary redirect the output
| somewhere then grep from the log-file what it's looking for.
| So a test run outputting 10K lines of test output and one
| failure is easily found without polluting the context with
| 10K lines.
| the_mitsuhiko wrote:
| I started to use sub agents for that. That does not pollute
| the context as much
| abdullin wrote:
| Claude's approach is currently a bit dated.
|
| Cursor.sh agents or especially OpenAI Codex illustrate that a
| tool doesn't need to keep on stuffing context window with
| irrelevant information in order to make progress on a task.
|
| And if really needed, engineers report that Gemini Pro 2.5
| keeps on working fine within 200k-500k token context. Above
| that - it is better to reset the context.
| latexr wrote:
| > Or generating 4 versions of PR for the same task so that the
| human could just pick the best one.
|
| That sounds awful. A truly terrible and demotivating way to
| work and produce anything of real quality. Why are we doing
| this to ourselves and embracing it?
|
| A few years ago, it would have been seen as a joke to say "the
| future of software development will be to have a million monkey
| interns banging on one million keyboards and submit a million
| PRs, then choose one". Today, it's lauded as a brilliant
| business and cost-saving idea.
|
| We're beyond doomed. The first major catastrophe caused by
| sloppy AI code can't come soon enough. The sooner it happens,
| the better chance we have to self-correct.
| bonoboTP wrote:
| If it's monkeylike quality and you need a million tries, it's
| shit. It you need four tries and one of those is top-tier
| professional programmer quality, then it's good.
| agos wrote:
| if the thing producing the four PRs can't distinguish the
| top tier one, I have strong doubts that it can even produce
| it
| solaire_oa wrote:
| Making 4 PRs for a well-known solution sounds insane,
| yes, but to be the devil's advocate, you could plausibly
| be working with an ambiguous task: "Create 4 PRs with 4
| different dependency libraries, so that I can compare
| their implementations." Technically it wouldn't need to
| pick the best one.
|
| I have apprehension about the future of software
| engineering, but comparison does technically seem like a
| valid use case.
| layer8 wrote:
| The problem is, for any change, you have to understand the
| existing code base to assess the quality of the change in
| the four tries. This means, you aren't relieved from being
| familiar with the code and reviewing everything. For many
| developers this review-only work style isn't an exciting
| prospect.
|
| And it will remain that way until you can delegate
| development tasks to AI with a 99+% success rate so that
| you don't have to review their output and understand the
| code base anymore. At which point developers will become
| truly obsolete.
| solaire_oa wrote:
| Top-tier professional programmer quality is exceedingly,
| impractically optimistic, for a few reasons.
|
| 1. There's a low probability of that in the first place.
|
| 2. You need to be a top-tier professional programmer to
| recognize that type of quality (i.e. a junior engineer
| could select one of the 3 shit PRs)
|
| 3. When it doesn't produce TTPPQ, you wasted tons of time
| prompting and reviewing shit code and still need to
| deliver, net negative.
|
| I'm not doubting the utility of LLMs but the scattershot
| approach just feels like gambling to me.
| zelphirkalt wrote:
| Also as a consequence of (1) the LLMs are trained on
| mediocre code mostly, so they often output mediocre or
| bad solutions.
| diggan wrote:
| > A truly terrible and demotivating way to work and produce
| anything of real quality
|
| You clearly have strong feelings about it, which is fine, but
| it would be much more interesting to know exactly why it
| would terrible and demotivating, and why it cannot produce
| anything of quality? And what is "real quality" and does that
| mean "fake quality" exists?
|
| > million monkey interns banging on one million keyboards and
| submit a million PRs
|
| I'm not sure if you misunderstand LLMs, or the famous
| "monkeys writing Shakespeare" part, but that example is more
| about randomness and infinity than about probabilistic
| machines somewhat working towards a goal with some non-
| determinism.
|
| > We're beyond doomed
|
| The good news is that we've been doomed for a long time, yet
| we persist. If you take a look at how the internet is
| basically held up by duct-tape at this point, I think you'd
| feel slightly more comfortable with how crap absolutely
| everything is. Like 1% of software is actually Good Software
| while the rest barely works on a good day.
| 3dsnano wrote:
| > And what is "real quality" and does that mean "fake
| quality" exists?
|
| I think there is no real quality or fake quality, just
| quality. I am referencing the quality that Persig and C.
| Alexander have written about.
|
| It's... qualitative, so it's hard to measure but easy to
| feel. Humans are really good at perceiving it then making
| objective decisions. LLMs don't know what it is (they've
| heard about it and think they know).
| abdullin wrote:
| It is actually funny that current AI+Coding tools benefit
| a lot from domain context and other information along the
| lines of Domain-Driven Design (which was inspired by the
| pattern language of C. Alexander).
|
| A few teams have started incorporating `CONTEXT.MD` into
| module descriptions to leverage this.
| diggan wrote:
| > LLMs don't know what it is
|
| Of course they don't, they're probability/prediction
| machines, they don't "know" anything, not even that Paris
| is the capital of France. What they do "know" is that
| once someone writes "The capital of France is", the most
| likely tokens to come after that, is "Paris". But they
| don't understand the concept, nor anything else, just
| that probably 54123 comes after 6723 (or whatever the
| tokens are).
|
| Once you understand this, I think it's easy to reason
| about _why_ they don 't understand code quality, why they
| _couldn 't_ ever understand it, and how you can make them
| output quality code regardless.
| bgwalter wrote:
| If "AI" worked (which fortunately isn't the case), humans
| would be degraded to passive consumers in the last domain
| in which they were active creators: thinking.
|
| Moreover, you would have to pay centralized corporations
| that stole all of humanity's intellectual output for
| engaging in your profession. That is terrifying.
|
| The current reality is also terrifying: Mediocre developers
| are enabled to have a 10x volume (not quality). Mediocre
| execs like that and force everyone to use the "AI"
| snakeoil. The profession becomes even more bureaucratic,
| tool oriented and soulless.
|
| People without a soul may not mind.
| diggan wrote:
| > If "AI" worked (which fortunately isn't the case),
| humans would be degraded to passive consumers in the last
| domain in which they were active creators: thinking.
|
| "AI" (depending on what you understand that to be) is
| already "working" for many, including myself. I've
| basically stopped using Google because of it.
|
| > humans would be degraded to passive consumers in the
| last domain in which they were active creators: thinking
|
| Why? I still think (I think at least), why would I stop
| thinking just because I have yet another tool in my
| toolbox?
|
| > you would have to pay centralized corporations that
| stole all of humanity's intellectual output for engaging
| in your profession
|
| Assuming we'll forever be stuck in the "mainframe" phase,
| then yeah. I agree that local models aren't really close
| to SOTA yet, but the ones you can run locally can already
| be useful in a couple of focused use cases, and judging
| by the speed of improvements, we won't always be stuck in
| this mainframe-phase.
|
| > Mediocre developers are enabled to have a 10x volume
| (not quality).
|
| In my experience, which admittedly been mostly in
| startups and smaller companies, this has always been the
| case. Most developers seem to like to produce MORE code
| over BETTER code, I'm not sure why that is, but I don't
| think LLMs will change people's mind about this, in
| either direction. Shitty developers will be shit, with or
| without LLMs.
| zelphirkalt wrote:
| The AI as it is currently, will not come up with that new
| app idea or that clever innovative way of implementing an
| application. It will endlessly rehash the training data
| it has ingested. Sure, you can tell an AI to spit out a
| CRUD, and maybe it will even eventually work in some sane
| way, but that's not innovative and not necessarily a good
| software. It is blindly copying existing approaches to
| implement something. That something is then maybe even
| working, but lacks any special sauce to make it special.
|
| Example: I am currently building a web app. My goal is to
| keep it entirely static, traditional template rendering,
| just using the web as a GUI framework. If I had just told
| the AI to build this, it would have thrown tons of JS at
| the problem, because that is what the mainstream does
| these days, and what it mostly saw as training data. Then
| my back button would most likely no longer work, I would
| not be able to use bookmarks properly, it would not
| automatically have an API as powerful as the web UI,
| usable from any script, and the whole thing would have
| gone to shit.
|
| If the AI tools were as good as I am at what I am doing,
| and I relied upon that, then I would not have spent time
| trying to think of the principles of my app, as I did
| when coming up with it myself. As it is now, the AI would
| not even have managed to prevent duplicate results from
| showing up in the UI, because I had a GPT4 session about
| how to prevent that, and none of the suggested AI answers
| worked and in the end I did what I thought I might have
| to do when I first discovered the issue.
| diggan wrote:
| > The AI as it is currently, will not come up with that
| new app idea or that clever innovative way of
| implementing an application
|
| Who has claimed that they can do that sort of stuff? I
| don't think my comment hints at that, nor does the talk
| in the submission.
|
| You're absolutely right with most of your comment, and
| seem to just be rehashing what Karpathy talks about but
| with different words. Of course it won't create good
| software unless you specify exactly what "good software"
| is for you, and tell it that. Of course it won't know you
| want "traditional static template rendering" unless you
| tell it to. Of course it won't create a API you can use
| from anywhere unless you say so. Of course it'll follow
| what's in the training data. Of course things won't
| automatically implement whatever you imagine your project
| should have, unless you tell it about those features.
|
| I'm not sure if you're just expanding on the talk but
| chose my previous comment to attach it to, or if you're
| replying to something I said in my comment.
| koakuma-chan wrote:
| > That sounds awful. A truly terrible and demotivating way to
| work and produce anything of real quality
|
| This is the right way to work with generative AI, and it
| already is an extremely common and established practice when
| working with image generation.
| notTooFarGone wrote:
| I can recognize images in one look.
|
| How about that 400 Line change that touches 7 files?
| koakuma-chan wrote:
| In my prompt I ask the LLM to write a short summary of
| how it solved the problem, run multiple instances of LLM
| concurrently, compare their summaries, and use the output
| of whichever LLM seems to have interpreted instructions
| the best, or arrived at the best solution.
| elt895 wrote:
| And you trust that the summary matches what was actually
| done? Your experience with the level of LLMs
| understanding of code changes must significantly differ
| from mine.
| koakuma-chan wrote:
| It matched every time so far.
| abdullin wrote:
| Exactly!
|
| This is why there has to be "write me a detailed
| implementation plan" step in between. Which files is it
| going to change, how, what are the gotchas, which tests
| will be affected or added etc.
|
| It is easier to review one document and point out missing
| bits, than chase the loose ends.
|
| Once the plan is done and good, it is usually a smooth
| path to the PR.
| bayindirh wrote:
| So you can create a more buggy code remixed from scraped
| bits from the internet which you don't understand, but
| somehow works rather than creating a higher quality,
| tighter code which takes the same amount of time to type?
| All the while offloading all the work to something else
| so your skills can atrophy at the same time?
|
| Sounds like progress to me.
| abdullin wrote:
| Here is another way to look at the problem.
|
| There is a team of 5 people that are passionate about
| their indigenous language and want to preserve it from
| disappearing. They are using AI+Coding tools to:
|
| (1) Process and prepare a ton of various datasets for
| training custom text-to-speech, speech-to-text models and
| wake word models (because foundational models don't know
| this language), along with the pipelines and tooling for
| the contributors.
|
| (2) design and develop an embedded device (running
| ESP32-S3) to act as a smart speaker running on the edge
|
| (3) design and develop backend in golang to orchestrate
| hundreds of these speakers
|
| (4) a whole bunch of Python agents (essentially glorified
| RAGs over folklore, stories)
|
| (5) a set of websites for teachers to create course
| content and exercises, making them available to these
| edge devices
|
| All that, just so that kids in a few hundred
| kindergartens and schools would be able to practice their
| own native language, listen to fairy tales, songs or ask
| questions.
|
| This project was acknowledged by the UN (AI for Good
| programme). They are now extending their help to more
| disappearing languages.
|
| None of that was possible before. This sounds like a good
| progress to me.
|
| Edit: added newlines.
| mistersquid wrote:
| > I can recognize images in one look.
|
| > How about that 400 Line change that touches 7 files?
|
| Karpathy discusses this discrepancy. In his estimation
| LLMs currently do not have a UI comparable to 1970s CLI.
| Today, LLMs output text and text does not leverage the
| human brain's ability to ingest visually coded
| information, literally, at a glance.
|
| Karpathy surmises UIs for LLMs are coming and I suspect
| he's correct.
| variadix wrote:
| The thing required isn't a GUI for LLMs, it's a visual
| model of code that captures all the behavior and is a
| useful representation to a human. People have floated
| this idea before LLMs, but as far as I know there isn't
| any real progress, probably because it isn't feasible.
| There's so much intricacy and detail in software (and
| getting it even slightly wrong can be catastrophic), any
| representation that can capture said detail isn't going
| to be interpretable at a glance.
| mistersquid wrote:
| > The thing required isn't a GUI for LLMs, it's a visual
| model of code that captures all the behavior and is a
| useful representation to a human.
|
| The visual representation that would be useful to humans
| is what Karpathy means by "GUI for LLMs".
| skydhash wrote:
| There's no visual model for code as code isn't 2d.
| There's 2 mechanism in the turing machine model: a state
| machine and a linear representation of code and data. The
| 2d representation of state machine has no significance
| and the linear aspect of code and data is hiding more
| dimensions. We invented more abstractions, but nothing
| that map to a visual representation.
| deadbabe wrote:
| It is not. The _right_ way to work with generative AI is to
| get the right answer in the first shot. But it 's the AI
| that is not living up to this promise.
|
| Reviewing 4 different versions of AI code is grossly
| unproductive. A human co-worker can submit one version of
| code and usually have it accepted with a single review, no
| other "versions" to verify. 4 versions means you're reading
| 75% more code than is necessary. Multiply this across every
| change ever made to a code base, and you're wasting a
| shitload of time.
| koakuma-chan wrote:
| > Reviewing 4 different versions of AI code is grossly
| unproductive.
|
| You can have another AI do that for you. I review
| manually for now though (summaries, not the code, as I
| said in another message).
| RHSeeger wrote:
| That's not really comparing apples to apples though.
|
| > A human co-worker can submit one version of code and
| usually have it accepted with a single review, no other
| "versions" to verify.
|
| But that human co-worker spent a lot of time generating
| what is being reviewed. You're trading "time saved
| coding" for "more time reviewing". You can't complain
| about the added time reviewing and then ignore all the
| time saved coding. THat's not to say it's necessarily a
| win, but it _is_ a tradeoff.
|
| Plus that co-worker may very well have spent some time
| discussing various approaches to the problem (with you),
| with is somewhat parallel to the idea of reviewing 4
| different PRs.
| xphos wrote:
| "If the only tool you have is a hammer, you tend to see
| every problem as a nail."
|
| I think the worlds leaning dangerously into LLMs expecting
| them to solve every problem under the sun. Sure AI can
| solve problems but I think that domain 1 they Karpathy
| shows if it is the body of new knowledge in the world
| doesn't grow with LLMs and agents maybe generation and
| selection is the best method for working with domain 2/3
| but there is something fundamentally lost in the rapid
| embrace of these AI tools.
|
| A true challenge question for people is would you give up
| 10 points of IQ for access to the next gen AI model? I
| don't ask this in the sense that AI makes people stupid but
| rather that it frames the value of intelligence is that you
| have it. Rather than, in how you can look up or generate an
| answer that may or may not be correct quickly. How we use
| our tools deeply shapes what we will do in the future. A
| cautionary tale is US manufacturing of precision tools
| where we give up on teaching people how to use Lathes,
| because they could simply run CNC machines instead. Now
| that industry has an extreme lack of programmers for CNC
| machines, making it impossible to keep up with other
| precision instrument producing countries. This of course is
| a normative statement and has more complex variables but I
| fear in this dead set charge for AI we will lose sight of
| what makes programming languages and programming in general
| valuable
| osigurdson wrote:
| I'm not sure that AI code has to be sloppy. I've had some
| success with hand coding some examples and then asking codex
| to rigorously adhere to prior conventions. This can end up
| with very self consistent code.
|
| Agree though on the "pick the best PR" workflow. This is pure
| model training work and you should be compensated for it.
| elif wrote:
| Yep this is what Andrej talks about around 20 minutes into
| this talk.
|
| You have to be extremely verbose in describing all of your
| requirements. There is seemingly no such thing as too much
| detail. The second you start being vague, even if it WOULD
| be clear to a person with common sense, the LLM views that
| vagueness as a potential aspect of it's own creative
| liberty.
| jebarker wrote:
| > the LLM views that vagueness as a potential aspect of
| it's own creative liberty.
|
| I think that anthropomorphism actually clouds what's
| going on here. There's no creative choice inside an LLM.
| More description in the prompt just means more
| constraints on the latent space. You still have no
| certainty whether the LLM models the particular part of
| the world you're constraining it to in the way you hope
| it does though.
| 9rx wrote:
| _> You have to be extremely verbose in describing all of
| your requirements. There is seemingly no such thing as
| too much detail._
|
| If only there was a language one could use that enables
| describing all of your requirements in a unambiguous
| manner, ensuring that you have provided all the necessary
| detail.
|
| Oh wait.
| joshuahedlund wrote:
| > You have to be extremely verbose in describing all of
| your requirements. There is seemingly no such thing as
| too much detail
|
| I understand YMMV, but I have yet to find a use case
| where this takes me less time than writing the code
| myself.
| SirMaster wrote:
| I'm really waiting for AI to get on par with the common
| sense of most humans in their respective fields.
| diggan wrote:
| I think you'll be waiting for a very long time. Right now
| we have programmable LLMs, so if you're not getting the
| results, you need to reprogram it to give the results you
| want.
| pja wrote:
| > You have to be extremely verbose in describing all of
| your requirements. There is seemingly no such thing as
| too much detail.
|
| Sounds like ... programming.
|
| Program specification is programming, ultimately. For any
| given problem if you're lucky the specification is
| concise & uniquely defines the required program. If
| you're unlucky the spec ends up longer than the code
| you'd write to implement it, because the language you're
| writing it in is less suited to the problem domain than
| the actual code.
| ponector wrote:
| >That sounds awful.
|
| Not for the cloud provider. AWS bill to the moon!
| chamomeal wrote:
| I say this all the time!
|
| Does anybody really want to be an assembly line QA reviewer
| for an automated code factory? Sounds like shit.
|
| Also I can't really imagine that in the first place. At my
| current job, each task is like 95% understanding all the
| little bits, and then 5% writing the code. If you're
| reviewing PRs from a bot all day, you'll still need to
| understand all the bits before you accept it. So how much
| time is that really gonna save?
| diggan wrote:
| > Does anybody really want to be an assembly line QA
| reviewer for an automated code factory? Sounds like shit.
|
| On the other hand, does anyone really wanna be a code-
| monkey implementing CRUD applications over and over by
| following product specifications by "product managers" that
| barely seem to understand the product they're "managing"?
|
| See, we can make bad faith arguments both ways, but what's
| the point?
| nevertoolate wrote:
| Issue is if product people will do the "coding" and you
| have to fix it is miserable
| diggan wrote:
| Even worse would be if we asked the accountants to do the
| coding, then you'll learn what miserable means.
|
| What was the point again?
| nevertoolate wrote:
| Yes
| consumer451 wrote:
| I hesitate to divide a group as diverse as software devs
| into two categories, but here I go:
|
| I have a feeling that devs who love LLM coding tools are
| more product-driven than those who hate them.
|
| Put another way, maybe devs with their own product ideas
| love LLM coding tools, devs without them do not.
|
| I am genuinely not trying to throw shade here in any way.
| bandoti wrote:
| Here's a few problems I foresee:
|
| 1. People get lazy when presented with four choices they had no
| hand in creating, and they don't look over the four and just
| click one, ignoring the others. Why? Because they have ten more
| of these on the go at once, diminishing their overall focus.
|
| 2. Automated tests, end-to-end sim., linting, etc--tools
| already exist and work at scale. They should be robust and
| THOROUGHLY reviewed by both AI and humans ideally.
|
| 3. AI is good for code reviews and "another set of eyes" but
| man it makes serious mistakes sometimes.
|
| An anecdote for (1), when ChatGPT tries to A/B test me with two
| answers, it's incredibly burdensome for me to read twice
| virtually the same thing with minimal differences.
|
| Code reviewing four things that do almost the same thing is
| more of a burden than writing the same thing once myself.
| abdullin wrote:
| A simple rule applies: "No matter what tool created the code,
| you are still responsible for what you merge into main".
|
| As such, task of verification, still falls on hands of
| engineers.
|
| Given that and proper processes, modern tooling works nicely
| with codebases ranging from 10k LOC (mixed embedded device
| code with golang backends and python DS/ML) to 700k LOC
| (legacy enterprise applications from the mainframe era)
| bandoti wrote:
| Agreed. I think engineers though following simple Test-
| Driven Development procedures can write the code, unit
| tests, integration tests, debug, etc for a small enough
| unit by default forces tight feedback loops. AI may assist
| in the particulars, not run the show.
|
| I'm willing to bet, short of droid-speak or some AI output
| we can't even understand, that when considering "the system
| as a whole", that even with short-term gains in speed, the
| longevity of any product will be better with real people
| following current best-practices, and perhaps a modest
| sprinkle of AI.
|
| Why? Because AI is trained on the results of human
| endeavors and can only work within that framework.
| abdullin wrote:
| Agreed. AI is just a tool. Letting in run the show is
| essentially what the vibe-coding is. It is a fun activity
| for prototyping, but tends to accumulate problems and
| tech debt at an astonishing pace.
|
| Code, manually crafted by professionals, will almost
| always beat AI-driven code in quality. Yet, one has still
| to find such professionals and wait for them to get the
| job done.
|
| I think, the right balance is somewhere in between - let
| tools handle the mundane parts (e.g. mechanically
| rewriting that legacy Progress ABL/4GL code to Kotlin),
| while human engineers will have fun with high-level tasks
| and shaping the direction of the project.
| ponector wrote:
| > As such, task of verification, still falls on hands of
| engineers.
|
| Even before LLM it was a common thing to merge changes
| which completely brake test environment. Some people really
| skip verification phase of their work.
| xpe wrote:
| > A simple rule applies: "No matter what tool created the
| code, you are still responsible for what you merge into
| main".
|
| Beware of claims of simple rules.
|
| Take one subset of the problem: code reviews in an
| organizational environment. How well does they simple rule
| above work?
|
| The idea of "Person P will take responsibility" is far from
| clear and often not a good solution. (1) P is fallible. (2)
| Some consequences are too great to allow one person to
| trigger them, which is why we have systems and checks. (3)
| P cannot necessarily right the wrong. (4) No-fault analyses
| are often better when it comes to long-term solutions which
| require a fear free culture to reduce cover-ups.
|
| But this is bigger than one organization. The effects of
| software quickly escape organizational boundaries. So when
| we think about giving more power to AI tooling, we have to
| be really smart. This means understanding human nature,
| decision theory, political economy [1], societal norms, and
| law. And building smart systems (technical and
| organizational)
|
| Recommending good strategies for making AI generated code
| safe is hard problem. I'd bet it is a much harder than even
| "elite" software developers people have contemplated, much
| less implemented. Training in software helps but is
| insufficient. I personally have some optimism for formal
| methods, defense in depth, and carefully implemented human-
| in-the-loop systems.
|
| [1] Political economy uses many of the tools of economics
| to study the incentives of human decision making
| eddd-ddde wrote:
| With lazy people the same applies for everything, code they
| do write, or code they review from peers. The issue is not
| the tooling, but the hands.
| freehorse wrote:
| The more tedious the work is, the less motivation and
| passion you get for doing it, and the more "lazy" you
| become.
|
| Laziness does not just come from within, there are
| situations that promote behaving lazy, and others that
| don't. Some people are just lazy most of the time, but most
| people are "lazy" in some scenarios and not in others.
| bandoti wrote:
| Seurat created beautiful works of art composed of
| thousands of tiny dots, painted by hand; one might find
| it meditational with the right mindset.
|
| Some might also find laziness itself dreadfully boring--
| like all the Microsoft employees code-reviewing AI-
| Generated pull requests!
|
| https://blog.stackademic.com/my-new-hobby-watching-
| copilot-s...
| chamomeal wrote:
| I am not a lazy worker but I guarantee you I will not
| thoroughly read through and review four PRs for the same
| thing
| elif wrote:
| In my experience with Jules and (worse) Codex, juggling
| multiple pull requests at once is not advised.
|
| Even if you tell the git-aware Jules to handle a merge conflict
| within the context window the patch was generated, it is like
| sorry bro I have no idea what's wrong can you send me a diff
| with the conflict?
|
| I find i have to be in the iteration loop at every stage or
| else the agent will forget what it's doing or why rapidly. for
| instance don't trust Jules to run your full test suite after
| every change without handholding and asking for specific run
| results every time.
|
| It feels like to an LLM, gaslighting you with code that
| nominally addresses the core of what you just asked while
| completely breaking unrelated code or disregarding previously
| discussed parameters is an unmitigated success.
| layer8 wrote:
| > Tight feedback loops are the key in working productively with
| software. [...] even more tight loops than what a sane human
| would tolerate.
|
| Why would a sane human be averse to things happening
| instantaneously?
| benob wrote:
| You can generate 1.0 programs with 3.0 programs. But can you
| generate 2.0 programs the same way?
| olmo23 wrote:
| 2.0 programs (model weights) are created by running 1.0
| programs (training runs).
|
| I don't think it's currently possible to ask a model to
| generate the weights for a model.
| movedx01 wrote:
| But you can generate synthetic data using a 3.0 program to
| train a smaller, faster, cheaper-to-run 2.0 program.
| amai wrote:
| The quite good blog post mentioned by Karpathy for working with
| LLMs when building software:
|
| - https://blog.nilenso.com/blog/2025/05/29/ai-assisted-coding/
|
| See also:
|
| - https://news.ycombinator.com/item?id=44242051
| mkw5053 wrote:
| I like the idea of having a single source of truth RULES.md,
| however I'm wondering why you used symlinks as opposed to the
| ability to link/reference other files in cursor rules,
| CLAUDE.md, etc. I understand that functionality doesn't exist
| for all coding agents, but I think it gives you more
| flexibility when composing rules files (for example you can
| have the standard cursor rules headers and then point to
| @RULES.md lower in the file)
| blobbers wrote:
| Software 3.0 is the code generated by the machine, not the
| prompts that generated it. The prompts don't even yield the same
| output; there is randomness.
|
| The new software world is the massive amount of code that will be
| burped out by these agents, and it should quickly dwarf the human
| output.
| pelagicAustral wrote:
| I think that if you give the same task to three different
| developers you'll get three different implementations. It's not
| a random result if you do get the functionality that was
| expected, and at that, I do think the prompt plays an important
| role in offering a view of how the result was achieved.
| klabb3 wrote:
| > I think that if you give the same task to three different
| developers you'll get three different implementations.
|
| Yes, but if you want them to be compatible you need to define
| a protocol and conformance test suite. This is way more work
| than writing a single implementation.
|
| The code is the real spec. Every piece of unintentional non-
| determinism can be a hazard. That's why you want the code to
| be the unit of maintenance, not a prompt.
| imiric wrote:
| I know! Let's encode the spec into a format that doesn't
| have the ambiguities of natural language.
| klabb3 wrote:
| Right. Great idea. Maybe call it "formal execution spec
| for LLM reference" or something. It could even be
| versioned in some kind of distributed merkle tree.
| tamersalama wrote:
| How I understood it is that natural language will form
| relatively large portions of stacks (endpoint descriptions,
| instructions, prompts, documentations, etc...). In addition to
| code generated by agents (which would fall under 1.0)
| poorcedural wrote:
| It is not the code, which just like prompts is a written
| language. Software 3.0 will be branches of behaviors, by the
| software and by the users all documented in a feedback loop.
| The best behaviors will be merged by users and the best will
| become the new HEAD. Underneath it all will be machine code for
| the hardware, but it will be the results that dictate progress.
| fritzo wrote:
| Code is read much more often than it is written. Code generated
| by the machine today will be prompt read by the machine going
| forward. It's a closed loop.
|
| Software is a world in motion. Software 1.0 was animated by
| developers pushing it around. Software 3.0 is additionally
| animated by AI agents.
| politelemon wrote:
| The beginning was painful to watch as is the cheering in this
| comment section.
|
| The 1.0, 2.0, and 3.0 simply aren't making sense. They imply a
| kind of a succession and replacement and demonstrate a lack of
| how programming works. It sounds as marketing oriented as "Web
| 3.0" that has been born inside an echo chamber. And yet halfway
| through, the need for determinism/validation is now being
| reinvented.
|
| The analogies make use of cherry picked properties, which could
| apply to anything.
| monsieurbanana wrote:
| > "Because they all have slight pros and cons, and you may want
| to program some functionality in 1.0 or 2.0, or 3.0, or you're
| going to train in LLM, or you're going to just run from LLM"
|
| He doesn't say they will fully replace each other (or had fully
| replaced each other, since his definition of 2.0 is quite old
| by now)
| whiplash451 wrote:
| I think Andrej is trying to elevate the conversation in an
| interesting way.
|
| That in and on itself makes it worth it.
|
| No one has a crystal clear view of what is happening, but at
| least he is bringing a novel and interesting perspective to
| the field.
| amelius wrote:
| The version numbers mean abrupt changes.
|
| Analogy: how we "moved" from using Google to ChatGPT is an
| abrupt change, and we still use Google.
| mentalgear wrote:
| The whole AI scene is starting to feel a lot like the
| cryptocurrency bubble before it burst. Don't get me wrong,
| there's real value in the field, but the hype, the influencers,
| and the flashy "salon tricks" are starting to drown out
| meaningful ML research (like Apple's critical research that
| actually improves AI robustness). It's frustrating to see solid
| work being sidelined or even mocked in favor of vibe-coding.
|
| Meanwhile, I asked this morning Claude 4 to write a simple EXIF
| normalizer. After two rounds of prompting it to double-check
| its code, I still had to point out that it makes no sense to
| load the entire image for re-orientating if the EXIF
| orientation is fine in the first place.
|
| Vibe vs reality, and anyone actually working in the space daily
| can attest how brittle these systems are.
| rxtexit wrote:
| I think part of the problem is that people have the wrong
| mental models currently.
|
| I am a non-software engineer and I fully expect someday to be
| a professional "vibe coder". It will be within a domain
| though and not a generalist like a real software engineer.
|
| I think "vibe coding" in this context will have a type of
| relationship to software engineering the way excel has a
| relationship to the professional mathematician.
|
| The knocks on "vibe coding" by software engineers are like a
| mathematician shitting on Excel for not being able to do
| symbolic manipulation.
|
| It is not wrong but missing the forest for the trees.
| fergie wrote:
| There were some cool ideas- I particularly liked "psychology of
| AI"
|
| Overall though I really feel like he is selling the idea that we
| are going to have to pay large corporations to be able to write
| code. Which is... terrifying.
|
| Also, as a lazy developer who is always trying to make AI do my
| job for me, it still kind of sucks, and its not clear that it
| will make my life easier any time soon.
| guappa wrote:
| I think it used to be like that before the GNU people made gcc,
| completely destroying the market of compilers.
|
| > Also, as a lazy developer who is always trying to make AI do
| my job for me, it still kind of sucks, and its not clear that
| it will make my life easier any time soon.
|
| Every time I have to write a simple self contained couple of
| functions I try... and it gets it completely wrong.
|
| It's easier to just write it myself rather than to iterate 50
| times and hope it will work, considering iterations are also
| very slow.
| ykonstant wrote:
| At least proprietary compilers were software you owned and
| could be airgapped from any network. You didn't create
| software by tediously negotiating with compilers running on
| remote machines controlled by a tech corp that can undercut
| you on whatever you are trying to build (but of course they
| will not, it says so in the Agreement, and other tales of the
| fantastic).
| teekert wrote:
| He says that now we are in the mainframe phase. We will hit the
| personal computing phase hopefully soon. He says llama (and
| DeepSeek?) are like Linux in a way, OpenAI and Claude are like
| Windows and MacOS.
|
| So, No, he's actually saying it may be everywhere for cheap
| soon.
|
| I find the talk to be refreshingly intellectually honest and
| unbiased. Like the opposite of a cringey LinkedIn post on AI.
| mirkodrummer wrote:
| Being Linux is not a good thing imo, it took decades for tech
| like proton to run Windows games reliably, if not better as
| now, than Windows does. Software is still mostly develop for
| Windows and macOS. Not to mention the Linux Desktop that
| never took off, I mean one could mention Android but there is
| a large corporation behind it. Sure Linux is successfull in
| many ways, it's embedded everywhere but nowhere near being
| the OS of the everyday people, "traditional linux desktop"
| never took off
| geraneum wrote:
| On a tangent, I find the analogies interesting as well.
| However, while Karpathy is an expert in Computer Science, NLP
| and machine vision, his understanding of how human psychology
| and brain work is as good as you an I (non-experts). So I take
| some of those comparisons as a lay person's feelings about the
| subject. Still, they are fun to listen to.
| pera wrote:
| Is it possible to vibe code NFT smart contracts with Software
| 3.0?
| romain_batlle wrote:
| Can't believe they wanted to postpone this video by a few weeks
| dang wrote:
| No one wanted to! I think we might have bitten off more than we
| could chew in terms of video production. There is a _lot_ of
| content to publish.
|
| Once it was clear how high the demand was for this talk, the
| team adapted quickly.
|
| That's how it goes sometimes! Future iterations will be
| different.
| William_BB wrote:
| [flagged]
| iLoveOncall wrote:
| He sounds like Terrence Howard with his nonsense.
| mentalgear wrote:
| Meanwhile, I asked this morning Claude 4 to write a simple EXIF
| normalizer. After two rounds of prompting it to double-check its
| code, I still had to point out that it makes no sense to load the
| entire image for re-orientating if the EXIF orientation is fine
| in the first place.
|
| Vibe vs reality, and anyone actually working in the space daily
| can attest how brittle these systems are.
|
| Maybe this changes in SWE with more automated tests in verifiable
| simulators, but the real world is far to complex to simulate in
| its vastness.
| diggan wrote:
| > Meanwhile
|
| What do you mean "meanwhile", that's exactly (among other
| things) the kind of stuff he's talking about? The various
| frictions and how you need to approach it
|
| > anyone actually working in the space
|
| Is this trying to say that Karpathy doesn't "actually work"
| with LLMs or in the ML space?
|
| I feel like your whole comment is just reacting to the title of
| the YouTube video, rather than actually thinking and reflecting
| on the content itself.
| demaga wrote:
| I'm pretty sure "actually work" part refers to SWE space
| rather than LLM/ML space
| coreyh14444 wrote:
| https://theeducationist.info/everything-amazing-nobody-happy...
| belter wrote:
| AI Snake Oil: https://press.princeton.edu/books/hardcover/978
| 0691249131/ai...
| ramon156 wrote:
| The real question is how long it'll take until they're not
| brittle
| kubb wrote:
| Or will they ever be reliable. Your question is already
| making an assumption.
| diggan wrote:
| They're reliable already if you change the way you approach
| them. These probabilistic token generators probably never
| will be "reliable" if you expect them to 100% always output
| exactly what you had in mind, without iterating in user-
| space (the prompts).
| kubb wrote:
| I also think they might never become reliable.
| diggan wrote:
| But _what does that mean_? If you tell the LLM "Say just
| 'hi' without any extra words or explanations", do you not
| get "hi" back from it?
| TeMPOraL wrote:
| That's literally _the_ wrong way to use LLMs though.
|
| LLMs think in tokens, the less they emit the dumber they
| are, so asking them to be concise, or to give the answer
| before explanation, is _extremely_ counterproductive.
| diggan wrote:
| I was trying to make a point regarding "reliability", not
| a point about how to prompt or how to use them for work.
| TeMPOraL wrote:
| This _is_ relevant. Your example may be simple enough,
| but for anything more complex, letting the model have its
| space to think /compute is critical to reliability - if
| you starve it for compute, you'll get more
| errors/hallucinations.
| diggan wrote:
| Yeah I mean I agree with you, but I'm still not sure how
| it's relevant. I'd also urge people to have unit tests
| they treat as production code, and proper system prompts,
| and X and Y, but it's really beyond the original point of
| "LLMs aren't reliable" which is the context in this sub-
| tree.
| kubb wrote:
| Sometimes I get "Hi!", sometimes "Hey!".
| diggan wrote:
| Which model? Just tried a bunch of ChatGPT, OpenAI's API,
| Claude, Anthropic's API and DeepSeek's API with both chat
| and reasonee, every single one replied with a single
| "hi".
| throwdbaaway wrote:
| o3-mini-2025-01-31 with high reasoning effort replied
| with "Hi" after 448 reasoning tokens.
|
| gpt-4.5-preview-2025-02-27 replied with "Hi!"
| diggan wrote:
| > o3-mini-2025-01-31 with high reasoning effort replied
| with "Hi" after 448 reasoning tokens.
|
| I got "hi", as expected. What is the full system prompt +
| user message you're using?
|
| https://i.imgur.com/Y923KXB.png
|
| > gpt-4.5-preview-2025-02-27
|
| Same "hi": https://i.imgur.com/VxiIrIy.png
| flir wrote:
| There is a bar below which they are reliable.
|
| "Write a Python script that adds three numbers together".
|
| Is that bar going up? I think it probably is, although
| not as fast/far as some believe. I also think that
| "unreliable" can still be "useful".
| vFunct wrote:
| Its perfectly reliable for the things you know it to be,
| such as operations within its context window size.
|
| Don't ask LLMs to "Write me Microsoft Excel".
|
| Instead, ask it to "Write a directory tree view for the
| Open File dialog box in Excel".
|
| Break your projects down into the smallest chunks you can
| for the LLMs. The more specific you are, the more reliable
| it's going to be.
|
| The rest of this year is going to be companies figuring out
| how to break down large tasks into smaller tasks for LLM
| consumption.
| dist-epoch wrote:
| I remember when people were saying here on HN that AIs will
| never be able to generate picture of hands with just 5
| fingers because they just "don't have common sense"
| guappa wrote:
| [?]
| yahoozoo wrote:
| "Treat it like a junior developer" ... 5 years later ...
| "Treat it like a junior developer"
| agile-gift0262 wrote:
| while True: print("This model that just
| came out changes everything. It's flawless. It doesn't have
| any of the issues the model from 6 months ago had. We are 1
| year away from AGI and becoming jobless")
| sleep(timedelta(days=180).total_seconds)
| TeMPOraL wrote:
| Usable LLMs are 3 years old at this point. ChatGPT, not
| Github Copilot, is the marker.
| LtWorf wrote:
| Usable for fun yes.
| ApeWithCompiler wrote:
| A manager in our company introduced Gemini as a chat bot
| coupled to our documentation.
|
| > It failed to write out our company name.The rest was flawed
| with hallucinations also, hardly worth to mention.
|
| I wish this is a rage bait towards others, but what should me
| feelings be? After all this is the tool thats sold to me, I am
| expected to work with.
| gorbachev wrote:
| We had exactly the opposite experience. CoPilot was able to
| answer questions accurately and reformatted the existing
| documentation to fit the context of users' questions, which
| made the information much easier to understand.
|
| Code examples, which we offer as sort of reference
| implementations, were also adopted to fit the specific
| questions without much issues. Granted these aren't whole
| applications, but 10 - 25 line examples of doing API setup /
| calls.
|
| We didn't, of course, just send users' questions directly to
| CoPilot. Instead there's a bit of prompt magic behind the
| scenes that tweaks the context so that CoPilot can produce
| better quality results.
| sensanaty wrote:
| There's also those instances where Microsoft unleashed Copilot
| on the .NET repo, and it resulted in the most hilariously
| terrible PRs that required the maintainers to basically tell
| Copilot every single step it should take to fix the issue. They
| were basically writing the PRs themselves at that point, except
| doing it through an intermediary that was much dumber, slower
| and less practical than them.
|
| And don't get me started on my own experiences with these
| things, and no, I'm not a luddite, I've tried my damndest and
| have followed all the cutting-edge advice you see posted on HN
| and elsewhere.
|
| Time and time again, the reality of these tools falls flat on
| their face while people like Andrej hype things up as if we're
| 5 minutes away from having Claude become Skynet or whatever, or
| as he puts it, before we enter the world of "Software 3.0"
| (coincidentally totally unrelated to Web 3.0 and the grift we
| had to endure there, I'm sure).
|
| To intercept the common arguments,
|
| - no I'm not saying LLMs are useless or have no usecases
|
| - yes there's a possibility if you extrapolate by current
| trends (https://xkcd.com/605/) that they indeed will be Skynet
|
| - yes I've tried the latest and greatest model released 7
| minutes ago to the best of my ability
|
| - yes I've tried giving it prompts so detailed a literal infant
| could follow along and accomplish the task
|
| - yes I've fiddled with providing it more/less context
|
| - yes I've tried keeping it to a single chat rather than
| multiple chats, as well as vice versa
|
| - yes I've tried Claude Code, Gemini Pro 2.5 With Deep
| Research, Roocode, Cursor, Junie, etc.
|
| - yes I've tried having 50 different "agents" running and only
| choosing the best output form the lot.
|
| I'm sure there's a new gotcha being written up as we speak,
| probably something along the lines of "Well for me it doubled
| my productivity!" and that's great, I'm genuinely happy for you
| if that's the case, but for me and my team who have been trying
| diligently to use these tools for anything that wasn't a
| microscopic toy project, it has fallen apart time and time
| again.
|
| The idea of an application UI or god forbid an entire fucking
| Operating System being run via these bullshit generators is
| just laughable to me, it's like I'm living on a different
| planet.
| diggan wrote:
| You're not the first, nor the last person, to have a
| seemingly vastly different experience than me and others.
|
| So I'm curious, what am I doing differently from what you
| did/do when you try them out?
|
| This is maybe a bit out there, but would you be up for
| sending me like a screen recording of exactly what you're
| doing? Or maybe even a video call sharing your screen? I'm
| not working in the space, have no products or services to
| sell, only curious is why this gap seemingly exists between
| you and me, and my only motive would be to understand if I'm
| the one who is missing something, or there are more effective
| ways to help people understand how they can use LLMs and what
| they can use them for.
|
| My email is on my profile if you're up for it. Invitation
| open for others in the same boat as parent too.
| bsenftner wrote:
| I'm a greybeard, 45+ years coding, including active in AI
| during the mid 80's and used it when it applied throughout
| my entire career. That career being media and animation
| production backends, where the work is both at the
| technical and creative edge.
|
| I currently have an AI integrated office suite, which has
| attorneys, professional writers, and political activists
| using the system. It is office software, word processing,
| spreadsheets, project management and about two dozen types
| of AI agents that act as virtual co-workers.
|
| No, my users are not programmers, but I do have interns;
| college students with anything from 3 to 10 years
| experience writing software.
|
| I see the same AI use problem issues with my users, and my
| interns. My office system bends over backwards to address
| this, but people are people: they do not realize that AI
| does not know what they are talking about. They will
| frequently ask questions with no preamble, no introduction
| to the subject. They will change topics, not bothering to
| start a new session or tell the AI the topic is now
| different. There is a huge number of things they do, often
| with escalating frustration evident in their prompts, that
| all violate the same basic issue: the LLM was not given a
| context to understand the subject at hand, and the user is
| acting like many people and when explaining they go
| further, past the point of confusion, now adding new
| confusion.
|
| I see this over and over. It frustrates the users to anger,
| yet at the same time if they acted, communicated to a
| human, in the same manner they'd have a verbal fight almost
| instantly.
|
| The problem is one of communications. ...and for a huge
| number of you I just lost you. You've not been taught to
| understand the power of communications, so you do not
| respect the subject. How to communication is practically
| everything when it comes to human collaboration. It is how
| one orders their mind, how one collaborates with others,
| AND how one gets AI to respond in the manner they desire.
|
| But our current software development industry, and by
| extension all of STEM has been short changed by never been
| taught how to effectively communicate, no not at all.
| Presentations and how to sell are not effective
| communications, that's persuasion, about 5% of what it
| takes to _convey understanding in others_ which then
| _unblocks resistance to changes_.
| diggan wrote:
| But parent explicitly mentioned:
|
| > - yes I've tried giving it prompts so detailed a
| literal infant could follow along and accomplish the task
|
| Which you are saying that might have missed in the end
| regardless?
| bsenftner wrote:
| I'd like to see the prompt. I suspect that "literal
| infant" is expected to be a software developer without
| preamble. The initial sentence to an LLM carries far more
| relevance, it sets the context stage to understand what
| follows. If there is no introduction to the subject at
| hand, the response will be just like anyone fed a wall of
| words: confusion as to what all this is about.
| diggan wrote:
| You and me both :) But I always try to read the comments
| here with the most charitable interpretation I can come
| up with.
| sensanaty wrote:
| So AI is simultaneously going to take over everyone's job
| and do literally everything, including being used as
| application UI somehow... But you have to talk to it like
| a moody teenager at their first job lest you get nothing
| but garbage? I have to put just as much (and usually,
| more) effort talking to this non-deterministic black box
| as I would to an intern who joined a week ago to get
| anything usable out of it?
|
| Yeah, I'd rather just type things out myself, and
| continue communicating with my fellow humans rather than
| expending my limited time on this earth appeasing a
| bullshit generator that's apparently going to make us all
| jobless Soon(tm)
| bsenftner wrote:
| Consider that these AIs are trained on human
| communications, they mirror that communication. They are
| literally damaged document repair models, they use what
| they are given to generate a response - statistically.
| The fact that a question generates text that appears like
| an answer is an exploited coincidence.
|
| It's a perspective shift few seem to have considered: if
| one wants an expert software developer from their AI,
| they need to create an expert software developer's
| context by using expert developer terminology that is
| present in the training data.
|
| One can take this to an extreme, and it works: read the
| source code of an open source project and get and idea of
| both the developer and their coding style. Write prompts
| that mimic both the developer and their project, and
| you'll find that the AI's context now can discuss that
| project with surprising detail. This is because that
| project is in the training data, the project is also
| popular, meaning it has additional sites of tutorials and
| people discussing use of that project, so a foundational
| model ends up knowing quite a bit, if one knows how to
| construct the context with that information.
|
| This is, of course, tricky with hallucination, but that
| can be minimized. Which is also why we will all become
| aware of AI context management if we continue writing
| software that incorporates AIs. I expect context
| management is what was meant by prompt engineering.
| Communicating within engineering disciplines has always
| been difficult.
| TeMPOraL wrote:
| > _But you have to talk to it like a moody teenager at
| their first job lest you get nothing but garbage?_
|
| No, you have to talk to it like to an adult human being.
|
| If one's doing so and still gets garbage results from
| SOTA LLMs, that to me is a strong indication one also
| cannot communicate with other human beings effectively.
| It's literally _the same skill_. Such individual is
| probably the kind of clueless person we all learn to
| isolate and navigate around, because contrary to their
| beliefs, they 're not the center of the world, and we
| cannot actually read their mind.
| ffsm8 wrote:
| Unironically, your comment mirrors my opinion as of last
| month.
|
| Since then I've given it another try last week and was quite
| literally mind blown how much it improved in the context of
| Vibe coding (Claude code). It actually improved so much that
| I thought "I would like to try that on my production
| codebase", (mostly because I _want_ if to fail, because that
| 's my job ffs) but alas - that's not allowed at my dayjob.
|
| From the limited experience I could gather over the last week
| as a software dev with over 10 yrs of experience (along with
| another 5-10 doing it as a hobby before employment) I can say
| that I expect our industry to get absolutely destroyed within
| the next 5 yrs.
|
| The skill ceiling for devs is going to get mostly squashed
| for 90% of devs, this will inevitably destroy our collective
| bargaining positions. Including for the last 10%, because the
| competition around these positions will be even more fierce.
|
| It's already starting, even if it's _currently_ very
| misguided and mostly down to short-sightedness.
|
| But considering the trajectory and looking at how naive
| current llms coding tools are... Once the industry adjusts
| and better tooling is pioneered... it's gonna get brutal.
|
| And most certainly not limited to software engineering.
| Pretty much all desk jobs will get hemorrhaged as soon as a
| llm-player basically replaces SAP with entirely new tooling.
|
| Frankly, I expect this to go bad, very very quickly. But I'm
| still hoping for a good ending.
| kypro wrote:
| I think part of the problem is that code quality is somewhat
| subjective and developers are of different skill levels.
|
| If you're fine with things that kinda working okay and you're
| not the best developer yourself then you probably think
| coding agents work really really well because the slop they
| produce isn't that much worse than yourself. In fact I know a
| mid-level dev who believes agent AIs write better code than
| himself.
|
| If you're very critical of code quality then it's much
| tougher... This is even more true in complex codebases where
| simply following some existing pattern to add a new feature
| isn't going to cut it.
|
| The degree to which it helps any individual developer will
| vary, and perhaps it's not that useful for yourself. For me
| over the last few months the tech has got to the point where
| I use it and trust it to write a fair percentage of my code.
| Unit tests are an example where I find it does a really good
| job.
| diggan wrote:
| > If you're very critical of code quality then it's much
| tougher
|
| I'm not sure, I'm hearing developers I know are sloppy and
| produce shit code both having no luck with LLMs, and some
| of them having lots of luck with them.
|
| On the other side, those who really think about the
| design/architecture and are very strict (which is the group
| I'd probably put myself into, but who wouldn't?) are split
| in a similar way.
|
| I don't have any concrete proof, but I'm guessing
| "expectations + workflow" differences would explain the
| vast difference in perception of usefulness.
| sensanaty wrote:
| Listen, I won't pretend to be The God Emperor Of Writing
| Code or anything of the sort, I'm realistically quite
| mediocre/dead average in the grand scheme of things.
|
| But literally yesterday, with Claude Code running 4 opus
| (aka: The latest and greatest, to intercept the "dId YoU
| tRy X" comment) which has full access to my entire Vue
| codebase at work, that has dedicated rules files I pass to
| it, that can see the fucking `.vue` file extension on every
| file in the codebase, after prompting it to "generate this
| vue component that does X, Y and Z" spat out React code at
| me.
|
| You don't have to be Bjarne Stroustrup to get annoyed at
| this kinda stuff, and it happens _constantly_ for a billion
| tiny things on the daily. The biggest pushers of AI have
| finally started admitting that it 's not literally perfect,
| but am I really supposed to pretend that this workflow of
| having AIs generate dozens of PRs where a single one is
| somewhat acceptable is somehow efficient or good?
|
| It's great for random one-offs, sure, but is that really
| deserving of this much _insane_ , blind hype?
| crmi wrote:
| I've got a working theory that models perform differently
| when used in different timezones... As in during US working
| hours they dont work as well due to high load. When used at
| 'offpeak' hours not only are they (obviously) snappier but
| the outputs appear to be a higher standard. Thought this for
| a while but now noticing with Claude4 [thinking] recently.
| Textbook case of anecdata of course though.
| diggan wrote:
| Interesting thought, if nothing less. Unless I
| misunderstand, it would be easy to run a study to see if
| this is true; use the API to send the same but slightly
| different prompt (as to avoid the caches) which has a
| definite answer, then run that once per hour for a week and
| see if the accuracy oscillates or not.
| crmi wrote:
| Yes good idea - although it appears we would also have to
| account for the possibility of providers nerfing their
| models. I've read others also think models are being
| quantized after a while to cut costs.
| jim180 wrote:
| Same! I did notice, a couples of months ago, that same
| prompt in the morning failed and then, later that day, when
| starting from scratch with identical prompts, the results
| were much better.
| crmi wrote:
| To add to this, I ran into a lot of issues too. And similar
| when using cursor... Until I started creating a mega list of
| rules for it to follow that attaches to the prompts. Then
| outputs improved (but fell off after the context window got
| too large). At that stage I then used a prompt to summarize,
| to continue with a new context.
| Seanambers wrote:
| Seems to me that this is just another level of throwing compute
| at the problem.
|
| Same way programs was way more efficient before and now they
| are "bloated" with packages, abstractions, slow implementations
| of algos and scaffolding.
|
| The concept of what is good software development might be
| changing as well.
|
| LLMs might not write the best code, but they sure can write a
| lot of it.
| hombre_fatal wrote:
| On the other hand, posts like this are like watching someone
| writing ask jeeves search queries into google 20 years ago and
| then gesturing how google sucks while everyone else in the room
| has figured out how to be productive with it and cringes at his
| "boomer" queries.
|
| If you're still struggling to make LLMs useful for you by now,
| you should probably ask someone. Don't let other noobs on HN
| +1'ing you hold you back.
| mirrorlake wrote:
| Perhaps consider making some tutorials, then, and share your
| wealth of knowledge rather than calling people stupid.
| imiric wrote:
| The slide at 13m claims that LLMs flip the script on technology
| diffusion and give power to the people. Nothing could be further
| from the truth.
|
| Large corporations, which have become governments in all but
| name, are the only ones with the capability to create ML models
| of any real value. They're the only ones with access to vast
| amounts of information and resources to train the models. They
| introduce biases into the models, whether deliberately or not,
| that reinforces their own agenda. This means that the models will
| either avoid or promote certain topics. It doesn't take a genius
| to imagine what will happen when the advertising industry
| inevitably extends its reach into AI companies, if it hasn't
| already.
|
| Even open weights models which technically users can self-host
| are opaque blobs of data that only large companies can create,
| and have the same biases. Even most truly open source models are
| useless since no individual has access to the same large datasets
| that corporations use for training.
|
| So, no, LLMs are the same as any other technology, and actually
| make governments and corporations even more powerful than
| anything that came before. The users benefit tangentially, if at
| all, but will mostly be exploited as usual. Though it's
| unsurprising that someone deeply embedded in the AI industry
| would claim otherwise.
| moffkalast wrote:
| Well there are cases like OLMo where the process, dataset, and
| model are all open source. As expected though, it doesn't
| really compare well to the worst closed model since the dataset
| can't contain vast amounts of stolen copyrighted data that
| noticeably improves the model. Llama is not good because Meta
| knows what they're doing, it's good because it was pretrained
| on the entirety of Anna's Archive and every pirated ebook they
| could get their hands on. Same goes for Elevenlabs and pirated
| audiobooks.
|
| Lack of compute on the Ai2's side also means the context OLMo
| is trained for is miniscule, the other thing that you need to
| throw brazillions of dollars at to make model that's maybe
| useful in the end if you're very lucky. Training needs high GPU
| interconnect bandwidth, it can't be done in distributed horde
| in any meaningful way even if people wanted to.
|
| The only ones who have the power now are the Chinese, since
| they can easily ignore copyright for datasets, patents for
| compute, and have infinite state funding.
| khalic wrote:
| His dismissal of smaller and local models suggests he
| underestimates their improvement potential. Give phi4 a run and
| see what I mean.
| TeMPOraL wrote:
| He ain't dismissing them. Comparing local/"open" model to Linux
| (and closed services to Windows and MacOS) is high praise. It's
| also accurate.
| khalic wrote:
| This is a bad comparison
| sriram_malhar wrote:
| Of all the things you could suggest, a lack of understanding is
| not one that can be pinned on Karpathy. He does know his
| technical stuff.
| khalic wrote:
| We all have blind spots
| diggan wrote:
| Sure, but maybe suggesting that the person who literally
| spent countless hours educating others on how to build
| small models locally from scratch, is lacking knowledge
| about local small models is going a bit beyond "people have
| blind spots".
| khalic wrote:
| Their potential, not how they work, it was very badly
| formulated, just corrected it
| diggan wrote:
| > suggests a lack of understanding of these smaller models
| capabilities
|
| If anything, you're showing a lack of understanding of what he
| was talking about. The context is this specific time, where
| we're early in a ecosystem and things are expensive and likely
| centralized (ala mainframes) but if his analogy/prediction is
| correct, we'll have a "Linux" moment in the future where that
| equation changes (again) and local models are competitive.
|
| And while I'm a huge fan of local models run them for maybe
| 60-70% of what I do with LLMs, they're nowhere near proprietary
| ones today, sadly. I want them to, really badly, but it's
| important to be realistic here and realize the differences of
| what a normal consumer can run, and what the current mainframes
| can run.
| khalic wrote:
| He understands the technical part, of course, I was referring
| to his prediction that large models will be always be
| necessary.
|
| There is a point where an LLM is good enough for most tasks,
| I don't need a megamind AI in order to greet clients, and
| both large and small/medium model size are getting there,
| with the large models hitting a computing/energy demand
| barrier. The small models won't hit that barrier anytime
| soon.
| vikramkr wrote:
| Did he predict they'd always be necessary? He mostly seemed
| to predict the opposite, that we're at the early stage of a
| trajectory that has yet to have it's Linux moment
| khalic wrote:
| I understand, thanks for pointing that out
| khalic wrote:
| I edited to make it clearer
| mprovost wrote:
| You can disagree with his conclusions but I don't think his
| understanding of small models is up for debate. This is the
| person who created micrograd/makemore/nanoGPT and who has
| produced a ton of educational materials showing how to build
| small and local models.
| khalic wrote:
| I'm going to edit, it was badly formulated, he underestimates
| their potential for growth is what I meant by that
| diggan wrote:
| > underestimates their potential for growth
|
| As far as I understood the talk and the analogies, he's
| saying that local models will eventually replace the
| current popular "mainframe" architecture. How is that
| underestimating them?
| dist-epoch wrote:
| I tried the local small models. They are slow, much less
| capable, and ironically much more expensive to run than the
| frontier cloud models.
| khalic wrote:
| Phi4-mini runs on a basic laptop CPU at 20T/s... how is that
| slow? Without optimization...
| dist-epoch wrote:
| I was running Qwen3-32B locally even faster, 70T/s, still
| way too slow for me. I'm generating thousands of tokens of
| output per request (not coding), running locally I could
| get 6 mil tokens per day and pay electricity, or I can get
| more tokens per day from Google Gemini 2.5 Flash for free.
|
| Running models locally is a privilege for the rich and
| those with too much disposable time.
| imiric wrote:
| It's fascinating to see his gears grinding at 22:55 when
| acknowledging that a human still has to review the thousand lines
| of LLM-generated code for bugs and security issues if they're
| "actually trying to get work done". Yet these are the tools that
| are supposed to make us hyperproductive? This is "Software 3.0"?
| Give me a break.
| rwmj wrote:
| Plus coding is the fun bit, reviewing code is the hard and not
| fun bit, arguing with an overconfident machine sound like it'll
| be worse even than that. Thankfully I'm going to retire soon.
| imiric wrote:
| Agreed. Hell, even reviewing code can be fun and engaging,
| especially if done in person. But it helps when the other
| party can actually think, instead of automatically responding
| with "You're right!", followed by changes that may or may not
| make things worse.
|
| It's as if software developers secretly hated their jobs and
| found most tasks a chore, so they hired someone else to
| poorly do the mechanical tasks for them, while ignoring the
| tasks that actually matter. That's not software engineering,
| programming, nor coding. It's some process of producing
| shitty software for which we need new terminology to
| describe.
|
| I envy you for retiring. Good luck!
| poorcedural wrote:
| Because we are still using code as a proof that needs to be
| proven. Software 3.0 will not be about reviewing legible code,
| with its edge-cases and exploits and trying to impersonate
| hardware.
| bgwalter wrote:
| I'd like to hear from Linux kernel developers. There is no
| significant software that has been written (plagiarized) by "AI".
| Why not ask the actual experts who deliver instead of talk?
|
| This whole thing is a religion.
| diggan wrote:
| What counts as "significant software"? Only kernels I guess?
| xvilka wrote:
| Office software, CAD systems, Web Browsers, the list is long.
| diggan wrote:
| Microsoft (famously developing somewhat popular office-like
| software) seems to be going in the direction of almost
| forcing developers to use LLMs to assist with coding, at
| least going by what people are willing to admit publicly
| and seeing some GitHub activity.
|
| Google (made a small browser or something) also develops
| their own models, I don't think it's far fetched to imagine
| there is at least one developer on the Chrome/Chromium team
| that is trying to dogfood that stuff.
|
| As for Autodesk, I have no idea what they're up to, but
| corporate IT seems hellbent on killing themselves, not sure
| Autodesk would do anything differently so they're probably
| also trying to jam LLMs down their employees throats.
| bgwalter wrote:
| Microsoft is also selling "AI", so they want headlines
| like "30% of our code is written by AI". So they force
| open source developers to babysit the tools and suffer.
|
| It's also an advertisement for potential "AI" military
| applications that they undoubtedly propose after the
| HoloLens failure:
|
| https://www.theverge.com/2022/10/13/23402195/microsoft-
| us-ar...
|
| The HoloLens failure is a great example of overhyped
| technology, just like the bunker busters that are now in
| the headlines for overpromising.
| e3bc54b2 wrote:
| 'forcing' anybody to do anything means they don't like
| doing it, usually because it causes them more work or
| headache or discomfort.
|
| You know, the exact opposite of what AI providers are
| claiming it does.
| sensanaty wrote:
| > Microsoft
|
| https://news.ycombinator.com/item?id=44050152
|
| Very impressive indeed, not a single line of any quality
| to be found despite them forcing it on people.
| rwmj wrote:
| Can you point to any significant open source software that
| has any kind of significant AI contributions?
|
| As an actual open source developer I'm not seeing anything. I
| am getting bogus pull requests full of AI slop that are
| causing problems though.
| diggan wrote:
| > Can you point to any significant open source software
| that has any kind of significant AI contributions?
|
| No, but I haven't looked. Can you?
|
| As an actual open source developer too, I do get some value
| from replacing search engine usage with LLMs that can do
| the searching and collation for me, as long as they have
| references I can use for diving deeper, they certainly
| accelerate my own workflow. But I don't do "vibe-coding" or
| use any LLM-connected editors, just my own written software
| that is mostly various CLIs and chat-like UIs.
| mellosouls wrote:
| _There is no significant software that has been written
| (plagiarized) by "AI"._
|
| How do you know?
|
| As you haven't evidenced your claim, you could start by
| providing explicit examples of what is significant.
|
| Even if you are correct, the amount of llm-assisted code is
| increasing all the time, and we are still only a couple of
| years in - give it time.
|
| _Why not ask the actual experts_
|
| Many would regard Karpathy in the expert category I think?
| rwmj wrote:
| The AI people are the ones making the extraordinary claims
| here.
| bgwalter wrote:
| I think you should not turn things around here. Up to 2021 we
| had a vibrant software environment that obviously had zero
| "AI" input. It has made companies and some developers filthy
| rich.
|
| Since "AI" became a religion, it is used as an excuse for
| layoffs _while no serious software is written by "AI"_. The
| "AI" people are making the claims. Since they invading a
| functioning software environment, it is their responsibility
| to back up their claims.
| TeMPOraL wrote:
| Still wonder what your definition of "serious software" is.
| I kinda concur - I consider most of the webshit to be not
| serious, but then, this is where software industry makes
| bulk of its profits, and that space is _absolutely being
| eaten by agentic coding_ , right now, today.
|
| So if we s/serious/money-making/, you are wrong - or at
| least about to be proven, as these things enter prod and
| are talked about.
| darqis wrote:
| when I started coding at the age of 11 in machine code and
| assembly on the C64, the dream was to create software that
| creates software. Nowadays it's almost reality, almost because
| the devil is always in the details. When you're used to write
| code, writing code is relatively fast. You need this knowledge to
| debug issues with generated code. However you're now telling AI
| to fix the bugs in the generated code. I see it kind of like
| machine code becomes overlaid with asm which becomes overlaid
| with C or whatever higher level language, which then uses
| dogma/methodology like MVC and such and on top of that there's
| now the AI input and generation layer. But it's not widely
| available. Affording more than 1 computer is a luxury. Many
| households are even struggling to get by. When you see those what
| 5 7 Mac Minis, which normal average Joe can afford that or does
| even have to knowledge to construct an LLM at home? I don't. This
| is a toy for rich people. Just like with public clouds like AWS,
| GCP I left out, because the cost is too high and running my own
| is also too expensive and there are cheaper alternatives that not
| only cost less but also have way less overhead.
|
| What would be interesting to see is what those kids produced with
| their vibe coding.
| diggan wrote:
| > those kids produced with their vibe coding
|
| No one, including Karpathy in this video, is advocating for
| "vibe coding". If nothing more, LLMs paired with configurable
| tool-usage, is basically a highly advanced and contextual
| search engine you can ask questions. Are you not using a search
| engine today?
|
| Even without LLMs being able to produce code or act as agents
| they'd be useful, because of that.
|
| But it sucks we cannot run competitive models locally, I agree,
| it is somewhat of a "rich people" tool today. Going by the talk
| and theme, I'd agree it's a phase, like computing itself had
| phases. But you're gonna have to actually watch and listen to
| the talk itself, right now you're basically agreeing with the
| video yet wrote your comment like you disagree.
| dist-epoch wrote:
| > This is a toy for rich people
|
| GitHub copilot has a free tier.
|
| Google gives you thousands of free LLM API calls per day.
|
| There are other free providers too.
| guappa wrote:
| 1st dose is free
| palmfacehn wrote:
| Agreed. It is worth noting how search has evolved over the
| years.
| infecto wrote:
| LLM APIs are pretty darn cheap for most of the developed
| worlds income levels.
| guappa wrote:
| Yeah, because they're bleeding money like crazy now.
|
| You should consider how much it actually costs, not how
| much they charge.
|
| How do people fail to consider this?
| bdangubic wrote:
| how much does it cost?
| infecto wrote:
| >You should consider how much it actually costs, not how
| much they charge. How do people fail to consider this?
|
| Sure, nobody can predict the long-term economics with
| certainty but companies like OpenAI already have
| compelling business fundamentals today. This isn't some
| scooter startup praying for margins to appear; it's a
| platform with real, scaled revenue and enterprise
| traction.
|
| But yeah, tell me more about how my $200/mo plan is
| bankrupting them.
| NitpickLawyer wrote:
| No, there are 3rd party providers that run open-weights
| models and they are (most likely) not bleeding money.
| Their prices are kind of similar, and make sense in a
| napkin-math kind of way (we looked into this when
| ordering hardware).
|
| You are correct that some providers might reduce prices
| for market capture, but the alternatives are still cheap,
| and some are close to being competitive in quality to the
| API providers.
| Eggpants wrote:
| Starts with "No" then follows that up with "most likely".
|
| So in other words you don't know the real answer but
| posted anyways.
| NitpickLawyer wrote:
| That most likely is for the case where they made their
| investment calculations wrong and they won't be able to
| recoup their hw costs. So I think it's safe to say there
| may be the outlier 3rd party provider that may lose money
| in the long run.
|
| But the majority of them are serving at ~ the same price,
| and that matches to the raw cost + some profit if you
| actually look into serving those models. And those prices
| are still cheap.
|
| So yeah, I stand by what I wrote, "most likely" included.
|
| My main answer was "no, ..." because the gp post was only
| considering the closed providers only (oai, anthropic,
| goog, etc). But youc an get open-weight models pretty
| cheap, and they are pretty close to SotA, depending on
| your needs.
| Eggpants wrote:
| Just wait for the enshitencation of LLM services.
|
| It going to get wild when the tech bro investors demand
| ads be the included in responses.
|
| It will be trivial for a version of AdWords where someone
| pays for response words be replaced. "Car" replaced by
| "Honda", variable names like "index" by
| "this_index_variable_is_sponsered_by_coinbase" etc.
|
| I'm trying to be funny with the last one but something
| like this will be coming sooner than later. Remember,
| google search used to be good and was ruined by bonus
| seeking executives.
| NoOn3 wrote:
| It's cheap now. But if you take into account all the
| training costs, then at such prices they cannot make a
| profit in any way. This is called dumping to capture the
| market.
| infecto wrote:
| No doubt the complete cost of training and to getting
| where we are today has been significant and I don't know
| how the accounting will look years from now but you are
| just making up the rest based on feelings. We know
| operationally OpenAI is profitable on purely the runtime
| side, nobody knows how that will look when accounting for
| R&D but you have no qualification to say they cannot make
| a profit in any way.
| NoOn3 wrote:
| Yes, if you do not take into account the cost of
| training, I think it is very likely profitable. The cost
| of working models is not so high. This is just my opinion
| based on open models and I admit that I have not carried
| out accurate calculations.
| guappa wrote:
| Except they have to retrain constantly, so why would you
| not consider the cost of training?
| diggan wrote:
| > But if you take into account all the training costs
|
| Not everyone has to paid that cost, as some companies are
| releasing weights for download and local use (like Llama)
| and then some other companies are going even further and
| releasing open source models+weights (like OLMo). If
| you're a provider hosting those, I don't think it makes
| sense to take the training cost into account when
| planning your own infrastructure.
|
| Although I don't it makes much sense personally,
| seemingly it makes sense for other companies.
| dist-epoch wrote:
| There is no "capture" here, it's trivial to switch
| LLM/providers, they all use OpenAI API. It's literally a
| URL change.
| jamessinghal wrote:
| This is changing; OpenAI's newer API (Responses) is
| required to include reasoning tokens in the context while
| using the API, to get the reasoning summaries, and to use
| some of the OpenAI provided tools. Google's OpenAI
| compatibility supports Chat Completions, not Responses.
|
| As the LLM developers continue to add unique features to
| their APIs, the shared API which is now OpenAI will only
| support the minimal common subset and many will probably
| deprecate the compatibility API. Devs will have to rely
| on SDKs to offer comptibility.
| dist-epoch wrote:
| It's still trivial to map to a somewhat different API.
| Google has it's Vertex/GenAI API flavors.
|
| At least for now, LLM APIs are just JSONs with a bunch of
| prompts/responses in them and maybe some file URLs/IDs.
| jamessinghal wrote:
| It isn't necessarily difficult, but it's significantly
| more effort than swapping a URL as I originally was
| replying to.
| lelanthran wrote:
| > There is no "capture" here, it's trivial to switch
| LLM/providers, they all use OpenAI API. It's literally a
| URL change.
|
| So? That's true for search as well, and yet Google has
| been top-dog for decades _in spite of_ having worse
| results and a poorer interface than almost all of the
| competition.
| infecto wrote:
| This is most definitely not toys for rich people. Now perhaps
| depending on your country it may be considered rich but I would
| comfortably say that for most of the developed world, the costs
| for these tools are absolutely attainable, there is a reason
| ChatGPT has such a large subscriber base.
|
| Also the disconnect for me here is I think back on the cost of
| electronics, prices for the level of compute have generally
| gone down significantly over time. The c64 launched around the
| $5-600 price level, not adjusted for inflation. You can go and
| buy a Mac mini for that price today.
| bawana wrote:
| I suspect that economies of scale are different for software
| and hardware. With hardware, iteration results in
| optimization of the supply chain, volume discount as the
| marginal cost is so much less than the fixed cost, and lower
| prices in time. The purpose of the device remains fixed. With
| software, the software becomes ever more complex with
| technical debt - featuritis, patches, bugs, vulnerabilities,
| and evolution of purpose to try and capture more disparate
| functions under one environment in an attempt to capture and
| lock in users. Price tends to increase in time. (This
| trajectory incidentally is the opposite of the unix
| philosophy - having multiple small fast independent tools
| than can be concatenated to achieve a purpose.) This results
| in ever increasing profits for software and decreasing
| profits for hardware at equilibrium. In the development of AI
| we are already seeing this-first we had gpt, then chatbots,
| then agents, now integration with existing software
| architectures.Not only is each model ever larger and more
| complex (RNN->transformer->multihead-> add fine tuning/LoRA->
| add MCP), but the bean counters will find ways to make you
| pay for each added feature. And bugs will multiply. Already
| prompt injection attacks are a concern so now another layer
| is needed to mitigate those.
|
| For the general public, these increasing costs will
| besubsidized by advertising. I cant wait for ads to start
| appearring in chatGPT- it will be very insidious as the
| advertising will be comingled with the output so there will
| be no way to avoid it.
| kordlessagain wrote:
| Kids? Think about all the domain experts, entrepreneurs,
| researchers, designers, and creative people who have incredible
| ideas but have been locked out of software development because
| they couldn't invest 5-10 years learning to code.
|
| A 50-year-old doctor who wants to build a specialized medical
| tool, a teacher who sees exactly what educational software
| should look like, a small business owner who knows their
| industry's pain points better than any developer. These people
| have been sitting on the sidelines because the barrier to entry
| was so high.
|
| The "vibe coding" revolution isn't really about kids (though
| that's cute) - it's about unleashing all the pent-up innovation
| from people who understand problems deeply but couldn't
| translate that understanding into software.
|
| It's like the web democratized publishing, or smartphones
| democratized photography. Suddenly expertise in the domain
| matters more than expertise in the tools.
| nevertoolate wrote:
| It sounds too good to be true. Why do you think llm is better
| in coding then in how education software should be designed?
| pphysch wrote:
| > These people have been sitting on the sidelines because the
| barrier to entry was so high.
|
| This comment is wildly out of touch. The SMB owner can now
| generate some Python code. Great. Where do they deploy it?
| How do they deploy it? How do they update it? How do they
| handle disaster recovery? And so on and so forth.
|
| LLMs accelerate only the easiest part of software
| engineering, writing greenfield code. The remaining 80% is
| left as an exercise to the reader.
| bongodongobob wrote:
| All the devs I work with would have to go through me to
| touch the infra anyway, so I'm not sure I see the issue
| here. No one is saying they need to deploy fully through
| the stack. It's a great start for them and I can help them
| along the way just like I would with anyone else deploying
| anything.
| pphysch wrote:
| In other words, most of the barriers to leveraging custom
| software are still present.
| bongodongobob wrote:
| Yes, the parts we aren't talking about that have nothing
| to do with LLMs, ie normal business processes.
| pton_xd wrote:
| > Think about all the domain experts, entrepreneurs,
| researchers, designers, and creative people who have
| incredible ideas but have been locked out of software
| development because they couldn't invest 5-10 years learning
| to code.
|
| > it's about unleashing all the pent-up innovation from
| people who understand problems deeply but couldn't translate
| that understanding into software.
|
| This is just a fantasy. People with "incredible ideas" and
| "pent-up innovation" also need incredible determination and
| motivation to make something happen. LLMs aren't going to
| magically help these people gain the energy and focus needed
| to pursue an idea to fruition. Coding is just a detail; it's
| not the key ingredient all these "locked out" people were
| missing.
| agentultra wrote:
| 100% this. There have been generations of tools built to
| help realize this idea and there is... not a lot of demand
| for it. COBOL, BASIC, Hypercard, the wasteland of no-code
| and low-code tools. The audience for these is incredibly
| small.
|
| A doctor has an idea. Great. Takes a lot more than a eureka
| moment to make it reality. Even if you had a magic machine
| that could turn it into the application you thought of. All
| of the iterations, testing with users, refining, telemetry,
| managing data, policies and compliance... it's a lot of
| work. Code is such a small part. Most doctors want to do
| doctor stuff.
|
| We've had mind-blowing music production software available
| to the masses for decades now... not a significant shift in
| people lining up to be the musicians they always wanted to
| be but were held back by limited access to the tools to
| record their ideas.
| kapildev wrote:
| >What would be interesting to see is what those kids produced
| with their vibe coding.
|
| I think you are referring to what those kids in the vibe coding
| event produced. Wasn't their output available in the video
| itself?
| yahoozoo wrote:
| I was trying to do some reverse engineering with Claude using an
| MCP server I wrote for a game trainer program that supports
| Python scripts. The context window gets filled up _so_ fast. I
| think my server is returning too many addresses (hex) when Claude
| searches for values in memory, but it's annoying. These things
| are so flaky.
| kaycey2022 wrote:
| I hope this excellent talk brings some much needed sense into the
| discourse around vibe coding.
| diggan wrote:
| If anything I wished the conversation turned away from "vibe-
| coding" which was essentially coined as a "lol look at this go"
| thing, but media and corporations somehow picked up as "This is
| the new workflow all developers are adopting".
|
| LLMs as another tool in your toolbox? Sure, use it where it
| makes sense, don't try to make them do 100% of everything.
|
| LLMs as a "English to E2E product I'm charging for"? Lets maybe
| make sure the thing works well as a tool before letting it be
| responsible for stuff.
| huksley wrote:
| Vibe coding is making a LEGO furniture, getting it run on the
| cloud is assembling the IKEA table for a busy restaurant
| beacon294 wrote:
| What is this "clerk" library he used at this timestamp to tell
| him what to do? https://youtu.be/LCEmiRjPEtQ?si=XaC-
| oOMUxXp0DRU0&t=1991
|
| Gemini found it via screenshot or context: https://clerk.com/
|
| This is what he used for login on MenuGen:
| https://karpathy.bearblog.dev/vibe-coding-menugen/
| xnx wrote:
| That blog post is a great illustration that most of the
| complexity/difficulty of a web app is in the hosting and not in
| the useful code.
| matiasmolinas wrote:
| https://github.com/EvolvingAgentsLabs/llmunix
|
| An experiment to explore Kaparthy ideas
| bawana wrote:
| how do i install this thing?
| maleldil wrote:
| As far as I understand, you don't. You open Claude Code
| inside the repo and prompt `boot llmunix` inside Claude Code.
| The CLAUDE.md file tells Claude how to respond to that.
| bawana wrote:
| Thank you for the hint. I guess I need a claude API token.
| From the images it seems he is opening it from his default
| directory. I sees the 'base env' so it is unclear if any
| other packages were installed beyond the default linux. I
| see he simply typed 'boot llmunix' so he must have
| symlinked 'boot' to his PATH.
| Aeroi wrote:
| the fanboying for this dudes opinion is insane.
| mrmansano wrote:
| It's pastor preaching for the already converted, not new in the
| area. The only thing new is that they are selling the kool-aid
| this time.
| Aeroi wrote:
| It's been a multi-day like conversation where multiple people
| are trying to obtain the transcripts, publish the text as
| gospel, and now the video. Like, yes thank you but, holy
| shit.
| mupuff1234 wrote:
| Yeah, not sure I ever saw anything similar on HN before, feels
| very odd.
|
| I mean the talk is fine and all but that's about it?
| dang wrote:
| Maybe so, but please don't post unsubstantive comments to
| Hacker News.
|
| (Thoughtful criticism that we can learn from is welcome, of
| course. This is in the site guidelines:
| https://news.ycombinator.com/newsguidelines.html.)
| eitally wrote:
| It's going to be very interesting to see how things evolve in
| enterprise IT, especially but not exclusively in regulated
| industries. As more SaaS services are at least partly vibe coded,
| how are CIOs going to understand and mitigate risk? As more
| internal developers are using LLM-powered coding interfaces and
| become less clear on exactly how their resulting code works, how
| will that codebase be maintained and incrementally updated with
| new features, especially in solo dev teams (which is common)?
|
| I easily see a huge future for agentic assistance in the
| enterprise, but I struggle mightily to see how many IT leaders
| would accept the output code of something like a menugen app as
| production-viable.
|
| Additionally, if you're licensing code from external vendors
| who've built their own products at least partly through LLM-
| driven superpowers, how do you have faith that they know how
| things work and won't inadvertently break something they don't
| know how to fix? This goes for niche tools (like Clerk, or
| Polar.sh or similar) as much as for big heavy things (like a CRM
| or ERP).
|
| I was on the CEO track about ten years ago and left it for a new
| career in big tech, and I don't envy the folks currently trying
| to figure out the future of safe, secure IT in the enterprise.
| gosub100 wrote:
| > how many IT leaders would accept the output code of something
| like a menugen app as production-viable.
|
| probably all of the ones at microsoft
| dapperdrake wrote:
| Just like when all regulated industries started only using
| decision trees and ordinary least-squares regression instead of
| any other models.
| r2b2 wrote:
| I've found that as LLMs improve, some of their bugs become
| increasingly slippery - I think of it as the uncanny valley of
| code.
|
| Put another way, when I cause bugs, they are often glaring
| (more typos, fewer logic mistakes). Plus, as the author it's
| often straightforward to debug since you already have a deep
| sense for how the code works - you lived through it.
|
| So far, using LLMs has downgraded my productivity. The bugs
| LLMs introduce are often subtle logical errors, yet "working"
| code. These errors are especially hard to debug when you didn't
| write the code yourself -- now you have to learn the code as if
| you wrote it anyway.
|
| I also find it more stressful deploying LLM code. I _know in my
| bones_ how carefully I write code, due to a decade of roughly
| "one non critical bug per 10k lines" that keeps me asleep at
| night. The quality of LLM code can be quite chaotic.
|
| That said, I'm not holding my breath. I expect this to all flip
| someday, with an LLM becoming a better and more stable coder
| than I am, so I guess I will keep working with them to make
| sure I'm proficient when that day comes.
| DanHulton wrote:
| I'm curious where that expectation of the flip comes from?
| Your experience (and mine, frankly) would seem to indicate
| the opposite, so from whence comes this certainty that one
| day it'll change entirely and become reliable instead?
|
| I ask (and I'll keep asking) because it really seems like the
| prevailing narrative is that these tools have improved
| substantially in a short period of time, and that is
| seemingly enough justification to claim that they will
| continue to improve until perfection because...? _waves hands
| vaguely_
|
| Nobody ever seems to have any good justification for how
| we're going to overcome the fundamental issues with this
| tech, just a belief that comes from SOMEWHERE that it'll
| happen anyway, and I'm very curious to drill down into that
| belief and see if it comes from somewhere concrete or it's
| just something that gets said enough that it "becomes true",
| regardless of reality.
| thegeomaster wrote:
| I have been using LLMs for coding a lot during the past year,
| and I've been writing down my observations by task. I have a
| _lot_ of tasks where my first entry is thoroughly impressed
| by how e.g. Claude helped me with a task, and then the second
| entry is a few days after when I 'm thoroughly irritated by
| chasing down subtle and just _strange_ bugs it introduced
| along the way. As a rule, these are incredibly hard to find
| and tedious to debug, because they lurk in the weirdest
| places, and the root cause is usually some weird
| confabulation that a human brain would never concoct.
| charlie0 wrote:
| It will succeed due to the same reason other sloppy strategies
| succeed, it has large short term gains and moves risk into the
| nebulous future. Management LOVES these types of things.
| poorcedural wrote:
| Software 3.0 is where Engineers only create the kernel or seed of
| an idea. Then all users are developers creating their own branch
| using the feedback loop of their own behavior.
| greybox wrote:
| He's talking about "LLM Utility companies going down and the
| world becoming dumber" as a sign of humanity's progress.
|
| This if anything should be a huge red flag
| bryanh wrote:
| Replace with "Water Utility going down and the world becoming
| less sanitary", etc. Still a red flag?
| greybox wrote:
| You're making leap of logic.
|
| Before water sanitization technology we had no way of
| sanitizing water on a large scale.
|
| Before LLMs, we could still write software. Arguably we were
| collectively better at it.
| TeMPOraL wrote:
| LLMs are general-purpose tools used for great many tasks,
| most of them not related to writing code.
| iLoveOncall wrote:
| He lives in a GenAI bubble where everyone is self-
| congratulating about the usage of LLMs.
|
| The reality is that there's not a single critical component
| anywhere that is built on LLMs. There's absolutely no reliance
| on models, and ChatGPT being down has absolutely no impact on
| anything beside teenagers not being able to cheat on their
| homeworks and LLM wrappers not being able to wrap.
| ukprogrammer wrote:
| Even an LLM could tell you that that's an unknowable thing,
| perhaps you should rely on them more.
| iLoveOncall wrote:
| Has a critical service that you used meaningfully changed
| to seemingly integrate non-deterministic "intelligence" in
| the past 3 years in one of its critical paths? I'd bet good
| money that the answer to literally everyone is no.
|
| My company uses GenAI a lot in a lot of projects. Would it
| have some impact if all models suddenly stopped working?
| Sure. But the oncalls wouldn't even get paged.
| jeffnappi wrote:
| Tesla FSD, Waymo are good examples.
| bwfan123 wrote:
| > The reality is that there's not a single critical component
| anywhere that is built on LLMs.
|
| Remember that there are billion dollar usecases where being
| correct is not important. For example, shopping
| recommendations, advertizing, search results, image
| captioning, etc. All of these usecases have humans consuming
| the output, and LLMs can play a useful role as productivity
| boosters.
| iLoveOncall wrote:
| And none of those are crucial.
|
| His point is that the world is RELIANT on GenAI. This isn't
| true.
| nlawalker wrote:
| Adults everywhere are using it to "cheat" at work, except
| there it's not cheating, it's celebrated and welcomed as a
| performance enhancement because results are the only thing
| that matter, and over time that will result in new
| expectations for productivity.
|
| It's going to take a while for those new expectations to
| develop, and they won't develop evenly, just like how even
| today there's plenty of low-hanging fruit in the form of
| roles or businesses that aren't using what anyone here would
| identify as simple opportunities for automation, and the main
| benefit that accrues to the one guy in the office who knows
| how to cheat with Excel and VBA is that he gets to slack off
| most of the time. But there certainly are places where the
| people in charge expect more, and are quick to perceive when
| and how much that bar can be raised. They don't care if
| you're cheating, but you'll need to keep up with the people
| who are.
| tinyhouse wrote:
| After Cursor is sold for $3B, they should transfer Karpathy 20%.
| (it also went viral before thanks to him tweeting about it)
|
| Great talk like always. I actually disagree on a few things with
| him. When he said "why would you go to ChatGPT and copy / paste,
| it makes much more sense to use a GUI that is integrated to your
| code such as Cursor".
|
| Cursor and the like take a lot of the control from the user. If
| you optimize for speed then use Cursor. But if you optimize for
| balance of speed, control, and correctness, then using Cursor
| might not be the best solution, esp if you're not an expert of
| how to use it.
|
| It seems that Karpathy is mainly writing small apps these days,
| he's not working on large production systems where you cannot
| vibe code your way through (not yet at least)
| researchai wrote:
| I can't believe I googled most of the dishes on the menu every
| time I went to the Thai restaurant. I've just realised how
| painful that was when I saw MenuGen!
| ukprogrammer wrote:
| Why do non-users of LLM's like to despise/belittle them so much?
|
| Just don't use them, and, outcompete those who do. Or, use them
| and outcompete those who don't.
|
| Belittling/lamenting on any thread about them is not helpful and
| akin to spam.
| djeastm wrote:
| Some people are annoyed at the hype, some are making good faith
| arguments about the pros/cons, and some people are just cranky.
| AI is a popular subject and we've all got our hot takes.
| blixt wrote:
| If we extrapolate these points about building tools for AI and
| letting the AI turn prompts into code I can't help but reach the
| conclusion that future programming languages and their runtimes
| will be heavily influenced by the strengths and weaknesses of
| LLMs.
|
| What would the code of an application look like if it was
| optimized to be efficiently used by LLMs and not humans?
|
| * While LLMs do heavily tend towards expecting the same
| inputs/outputs as humans because of the training data I don't
| think this would inhibit co-evolution of novel representations of
| software.
| thierrydamiba wrote:
| Is a world driven by the strengths and weaknesses of
| programming languages better than the one driven by the
| strengths and weaknesses of LLMs?
| ivape wrote:
| Better to think of it as a world driven by the strengths and
| weaknesses of people. Is the world better if more people can
| express themselves via software? Yes.
|
| I don't believe in coincidences. I don't think the universe
| provided AI by accident. I believe it showed up just at the
| moment where the universe wants to make it clear - _your
| little society of work and status and money can go straight
| to living hell_. And that's where it's going, the developer
| was never supposed to be a rockstar, they were always meant
| to be creatives who do it because they like it. Fuck this job
| bullshit, those days are over. You will program the same way
| you play video games, it's never to be work again (it's
| simply too creative).
|
| Will the universe make it so a bunch of 12 year olds dictate
| software in natural language in a Roblox like environment
| that rivals the horeshit society sold for billions just a
| decade ago? Yes, and thank god. It's been a wild ride, thank
| you god for ending it (like he did with nuclear bombs after
| ww2, our little universe of war _shrunk_ due to that).
|
| Anyways, always pay attention to the little details, it's
| never a coincidence. The universe doesn't just sit there and
| watch our fiasco believe it or not, it gets involved.
| mythrwy wrote:
| It does seem a bit silly long term to have something like
| Python which was developed as a human friendly language written
| by LLMs.
|
| If AI is going to write all the code going forward, we can
| probably dispense with the user friendly part and just make
| everything efficient as possible for machines.
| doug_durham wrote:
| I don't agree. Important code will need to be audited. I
| think the language of the future will be easy to read by
| human reviewers but deterministic. It won't be a human
| language. Instead it will be computer language with horrible
| ergonomics. I think Python or straight up Java would be a
| good start. Things like templates wouldn't be necessary since
| you could express that deterministically in a higher level
| syntax (e.g. A list of elements that can accept any type). It
| would be an interesting exercise.
| mostlysimilar wrote:
| If humans don't understand it to write the data the LLM is
| trained on, how will the LLM be able to learn it?
| s_ting765 wrote:
| Given the plethora of programming languages that exist today,
| I'm not worried at all about AI taking over SWE jobs.
| tudorizer wrote:
| 95% terrible expression of the landscape, 5% neatly dumbed down
| analogies.
|
| English is a terrible language for deterministic outcomes in
| complex/complicated systems. Vibe coders won't understand this
| until they are 2 years into building the thing.
|
| LLMs have their merits and he sometimes aludes to them, although
| it almost feels accidental.
|
| Also, you don't spend years studying computer science to learn
| the language/syntax, but rather the concepts and systems, which
| don't magically disappear with vibe coding.
|
| This whole direction is a cheeky Trojan horse. A dramatic
| problem, hidden in a flashy solution, to which a fix will be
| upsold 3 years from now.
|
| I'm excited to come back to this comment in 3 years.
| diggan wrote:
| > English is a terrible language for deterministic outcomes in
| complex/complicated systems
|
| I think that you seem to be under the impression that Karpathy
| somehow alluded to or hinted at that in his talk, which
| indicates you haven't actually watched the talk, which makes
| your first point kind of weird.
|
| I feel like one of the stronger points he made, was that you
| cannot treat the LLMs as something they're explicitly not, so
| why would anyone expect deterministic outcomes from them?
|
| He's making the case for coding with LLMs, not letting the LLMs
| go by themselves writing code ("vibe coding"), and
| understanding how they work before attempting to do so.
| tudorizer wrote:
| I watched the entire talk, quite carefully. He explicitly
| states how excited he was about his tweet mentioning English.
|
| The disclaimer you mention was indeed mentioned, although
| it's "in one ear, out the other" with most of his audience.
|
| If I give you a glazed donut with a brief asterisk about how
| sugar can cause diabetes will it stop you from eating the
| donut?
|
| You also expect deterministic outcomes when making analogies
| with power plants and fabs.
| fifilura wrote:
| Either way, I am not sure it is a requirement on HN to
| read/view the source.
|
| Particularly not a 40min video.
|
| Maybe it is tongue-in-cheek, maybe I am serious. I am not
| sure myself. But sometimes the interesting discussions
| comes from what is on top of the posters mind when viewing
| the title. Is that bad?
| diggan wrote:
| > Is that bad?
|
| It doesn't have to be. But it does get somewhat boring
| and trite after a while when you start noticing that
| certain subjects on HN tend to attract general and/or
| samey comments about $thing, rather than the submission
| topic within $thing, and I do think that is against the
| guidelines.
|
| > Please don't post shallow dismissals [...] Avoid
| generic tangents. Omit internet tropes. [...]
|
| The specific part of:
|
| > English is a terrible language for deterministic
| outcomes
|
| Strikes me as both as a generic tangent about LLMs, and
| the comment as a whole feels like a shallow dismissal of
| the entire talk, as Karpathy never claims English is a
| good language for deterministic outcomes, nor have I
| heard anyone else make that claim.
| tudorizer wrote:
| Might sound like a generic tangent, but it's the
| conclusion people will leave from the talk.
| diggan wrote:
| But is it _curious_? Is it thoughtful and substantive?
| Maybe it could have been thoughtful, if it felt like it
| was in response to what was mentioned in the submission.
| karaterobot wrote:
| It's odd! The guidelines don't say anything about having
| to read or watch what the posts linked to, all they say
| is it's inappropriate to accuse someone you're responding
| to of not having done so.
|
| There is a community expectation that people will know
| what they're talking about before posting, and in most
| cases that means having read the article. At the same
| time, I suspect that in many cases a lot of people
| commenting have not actually read the thing they're
| nominally commenting on, and they get away with it
| because the people upvoting them haven't either.
|
| However, I think it's a good idea to do so, at least to
| make a top-level comment on an article. If you're just
| responding to someone else's comment, I don't think it's
| as necessary. But to stand up and make a statement about
| something you know nothing about seems buffoonish and
| would not, in general, elevate the level of discussion.
| tudorizer wrote:
| I accept any equivalents of reading comprehension tests
| to prove thay I watched the video, as I have many of
| Andrej's in the past. He's generally a good communicator,
| defo easy to follow.
| diggan wrote:
| I think this is the moment you're referring to?
| https://youtu.be/LCEmiRjPEtQ?si=QWkimLapX6oIqAjI&t=236
|
| > maybe you've seen a lot of GitHub code is not just like
| code anymore there's a bunch of like English interspersed
| with code and so I think kind of there's a growing category
| of new kind of code so not only is it a new programming
| paradigm it's also remarkable to me that it's in our native
| language of English and so when this blew my mind a few uh
| I guess years ago now I tweeted this and um I think it
| captured the attention of a lot of people and this is my
| currently pinned tweet uh is that remarkably we're now
| programming computers in English now
|
| I agree that it's remarkable that you can tell a computer
| "What is the biggest city in Maresme?" and it tries to
| answer that question. I don't think he's saying "English is
| the best language to make complicated systems uncomplicated
| with", or anything to that effect. Just like I still think
| "Wow, this thing is fucking flying" every time I sit
| onboard a airplane, LLMs are kind of incredible in some
| ways, yet so "dumb" in some other ways. It sounds to me
| like he's sharing a similar sentiment but about LLMs.
|
| > although it's "in one ear, out the other" with most of
| his audience.
|
| Did you talk with them? Otherwise this is just creating an
| imaginary argument against some people you just assume they
| didn't listen.
|
| > If I give you a glazed donut with a brief asterisk about
| how sugar can cause diabetes will it stop you from eating
| the donut?
|
| If I wanted to eat a donut at that point, I guess I'd eat
| it anyways? But my aversion to risk (or rather the lack of
| it) tend to be non-typical.
|
| What does my answer mean in the context of LLMs and non-
| determinism?
|
| > You also expect deterministic outcomes when making
| analogies with power plants and fabs.
|
| Are you saying that the analogy should be deterministic or
| that power plants and fabs are deterministic? Because I
| don't understand if the former, and the latter really isn't
| deterministic by any definition I recognize that word by.
| tudorizer wrote:
| > Did you talk with them? Otherwise this is just creating
| an imaginary argument against some people you just assume
| they didn't listen.
|
| I have, unfortunately. Start-up founders, managers,
| investors who taunt the need for engineers because "AI
| can fix it".
|
| Don't get me wrong, there are plenty of "stochastic
| parrot" engineers even without AI, but still, not enough
| to make blanket statements.
| diggan wrote:
| That's a lot of people to talk to in a day more or less,
| since the talk happened. Were they all there and you too,
| or you all had a watch party or something?
|
| Still, what's the outcome of our "glazed donut" argument,
| you got me curious what that would lead to. Did I die of
| diabetes?
| jbeninger wrote:
| I think the analogy is that vibe coding is bad for you
| but feels good. Like a donut.
|
| But I'd say the real situation is more akin to "if you
| eat this donut quickly, you might get diabetes, but if
| you eat it slowly, it's fine", which is a bad analogy,
| but a bit more accurate.
| tudorizer wrote:
| > That's a lot of people to talk to in a day more or
| less, since the talk happened. Were they all there and
| you too, or you all had a watch party or something?
|
| hehe, I wish.
|
| The topics in the talk are not new. They have been
| explored and pondered up for quite a while now.
|
| As for the outcome of the donut experiment, I don't know.
| You tell me. Apply it repeatedly at a big scale and see
| if you should alter the initial offer for best outcomes
| (as relative as "best" might be).
| diggan wrote:
| > The topics in the talk are not new.
|
| Sure, but your initial dismissal ("95% X, 5% Y") is
| literally about this talk no? And when you say 'it's "in
| one ear, out the other" with most of his audience' that's
| based on some previous experience, rather than the talk
| itself? I guess I got confused what applied to what
| event.
|
| > As for the outcome of the donut experiment, I don't
| know. You tell me. Apply it repeatedly at a big scale and
| see if you should alter the initial offer for best
| outcomes (as relative as "best" might be).
|
| Maybe I'm extra slow today, how does this tie into our
| conversation so far? Does it have anything to do with
| determinism or what was the idea behind bringing it up?
| I'm afraid you're gonna have to spell it out for me,
| sorry about that :)
| pama wrote:
| Your experience with fabs must be somewhat limited if you
| think that the state of the art in fabs produces
| deterministic results. Please lookup (or ask friends) for
| the typical yields and error mitigation features of modern
| chips and try to visualize if you think it is possible to
| have determinism when the density of circuits starts to
| approach levels that cannot be imspected with regular
| optical microscopes anymore. Modern chip fabrication is
| closer to LLM code in even more ways than what is presented
| in the video.
| whilenot-dev wrote:
| > Modern chip fabrication is closer to LLM code
|
| As is, I don't quite understand what you're getting at
| here. Please just think that through and tell us what
| happens to the yield ratio when the software running on
| all those photolithography machines wouldn't be
| deterministic.
| kadushka wrote:
| An output of a fab, just like an output of an LLM, is
| non-deterministic, but is good enough, or is being
| optimized to be good enough.
|
| Non-determinism is not the problem, it's the quality of
| the software that matters. You can repeatedly ask me to
| solve a particular leetcode puzzle, and every time I
| might output a slightly different version. That's fine as
| long as the code solves the problem.
|
| The software running on the machines (or anywhere) just
| needs to be better (choose your metric here) than the
| software written by humans. Software written by GPT-4 is
| better than software written by GPT-3.5, and the software
| written by o3 is better than software written by GPT-4.
| That's just the improvement from the last 3 years, and
| there's a massive, trillion-dollar effort worldwide to
| continue the progress.
| whilenot-dev wrote:
| Hardware always involves some level of non-determinism,
| because the physical world is messier than the virtual
| software world. Every hardware engineer accepts that and
| learns how to design solutions despite those constraints.
| But you're right, non-determinism is not the current
| problem in _some_ fabs, because the whole process has
| been modeled with it in mind, and it 's the yield ratio
| that needs to be deterministic enough to offer a service.
| Remember the struggles in Intels fabs? Revenue reflects
| that at fabs.
|
| The software quality at companies like ASML seems to be
| in a bad shape already, and I remember ex-employees
| stating that there are some team leads higher up who can
| at least reason about existing software procedures, their
| implementation, side effects and their outcomes. Do you
| think this software is as thoroughly documented as some
| open source project? The purchase costs for those
| machines are in the mid-3-digit million range (operating
| costs excluded) and are expected to run 24/7 to be
| somewhat worthwhile. Operators can handle hardware issues
| on the spot and work around them, but what do you think
| happens with downtime due to non-deterministic software
| issues?
| tudorizer wrote:
| Fair. No process is 100% efficient and the depths of many
| topics become ambiguous to the point where margins of
| error need to be introduced.
|
| Chip fabs are defo far into said depths.
|
| Must we apply this at more shallow levels too?
| m3kw9 wrote:
| Like biz logic requirements they need to be fine grained
| defined
| oc1 wrote:
| AI is all about context window. If you figured out the context
| problem, you will see that all these "AI is bullshit, it
| doesn't work and can't produce working code" goes away. Same
| for everything else.
| tudorizer wrote:
| Working code or not is irelevant. Heck, even human-in-loop
| (Tony-in-the-Iron-Man) is not actively the point. If we're
| going into "it's all about" territory then it's all about:
|
| - training data - approximation of the desired outcome
|
| Neither support a good direction for the complexity of some
| of the system around us, most of which require dedicated
| language. Imagine doing calculus or quantum physics in
| English. Novels of words would barely suffice.
|
| So a context window as big as the training data itself?
|
| What if the training data is faulty?
|
| I'm confident you understand that working code or not doesn't
| matter in this analogy. Neither does LLMs reaching out for
| the right tool.
|
| LLMs has its merits. Replacing concrete systems that require
| a formal language and grammar is not.
|
| `1 + 1 = 2` because that's how maths works, not because of
| deja vu.
| gardenhedge wrote:
| Tony is iron man, not in him
| tudorizer wrote:
| Sure, I wasn't sure how to call the robot layer. Is is
| "Iron Main Suit"?
| cobertos wrote:
| Untrue. I find problems with niche knowledge, heavy math,
| and/or lack of good online resources to be troublesome for
| AI. Examples so far I've found of consistent struggle points
| are shaders, parsers, and streams (in Nodejs at least)
|
| Context window will solve a class of problems, but will not
| solve all problems with AI.
| belter wrote:
| You just described Software 4.0...
| tudorizer wrote:
| Can we have it now and skip 3.0?
| strangescript wrote:
| Who said I wanted my outcomes to be deterministic. Why is it
| that the only way we accept programming is for completely
| deterministic outcomes, when the reality is that is an
| implementation detail.
|
| I am a real user and I am on a general purpose e-commerce site
| and my ask is "I want a TV that is not that expensive", then by
| definition the user request is barely deterministic. User
| requests are normally like this for any application. High level
| and vague at best. Then developers spend all their time on edge
| cases, user QA, in the weeds junk that the User does not care
| about at all. People dont want to click filters and fill out
| forms for your app. They want it to be easy.
| tudorizer wrote:
| Agreed. This e-commerce example is quite a good highlight for
| LLMs.
|
| Same can't be applied when your supplier needs 300 68 x 34 mm
| gaskets by the BS10 standard, to give a random, more precise
| example.
| rudedogg wrote:
| > English is a terrible language for deterministic outcomes in
| complex/complicated systems.
|
| Someone here shared this ancient article by Dijkstra about this
| exact thing a few weeks ago:
| https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...
| tudorizer wrote:
| TIL. Thanks for sharing
| brainless wrote:
| I am not sure I got your point about English. I thought
| Karpathy was talking about English being the language of
| prompts, not output. Outputs can be English but if the goal is
| to compute using the output, then we need structured output
| (JSON, snippets of code, etc.), not English.
| tudorizer wrote:
| Entertain me in an exercise:
|
| First, instruct a friend/colleague of how to multiply two 2
| digit numbers in plain English.
|
| Secondly (ideally with a different friend, to not contaminate
| tests), explain the same but using only maths formulas.
|
| Where does the prompting process start and where does it end?
| Is it a one-off? Is the prompt clear enough? Do all the
| parties involved communicate within same domain objects?
|
| Hopefully my example is not too contrived.
| barumrho wrote:
| I agree with your point about English, but LLMs are not
| limited to English. You can show them formulas, images,
| code, etc.
| poorcedural wrote:
| Time is a funny calculator, measuring how an individual is
| behind. And in the funny circumstance that an individual is
| human, they look back on this comment in 3 years and wonder why
| humans only see themselves.
| qjack wrote:
| While I agree with you broadly, remember that those that employ
| you don't have those skills either. They accept that they are
| ceding control of the details and trust us to make those
| decisions or ask clarifying questions (LLMs are getting better
| at those things too). Vibe coders are clients seeking an
| alternative, not developers.
| tudorizer wrote:
| > Vibe coders are clients seeking an alternative, not
| developers.
|
| Agreed. That's genuinely a good framing for clients.
| unshavedyak wrote:
| Maybe i'm not "vibing" enough, but i've actually been testing
| this recently. So far i think the thing "vibing" helps most
| with for me personally is just making decisions which i'm
| often too tired to do after work.
|
| I've been coming to the realization that working with LLMs
| offer a different set of considerations than working on your
| own. Notably i find that i often obsess about design, code
| location, etc because if i get it wrong then my precious
| after-work time and energy are wasted on refactoring. The
| larger the code base, the more crippling this becomes for me.
|
| However refactoring is almost not an issue with LLMs. They do
| it very quickly and aggressively. So the areas i'm not vibing
| on is just reviewing, and ensuring it isn't committing any
| insane sins. .. because it definitely will. But the structure
| i'm accepting is far from what i'd make myself. We'll see how
| this pans out long term for me, but it's a strategy that i'm
| exploring.
|
| On the downside, my biggest difficulty with LLMs is getting
| them to just.. not. To produce less. Choosing too large of
| tasks is very easy and the code can snowball before you have
| a chance to pump the breaks and course correct.
|
| Still, it's been a positive experience so far. I still
| consider it vibing though because i'm accepting far less
| quality work than what i'd normally produce. In areas where
| it matters though, i enforce correctness, and have to review
| everything as a result.
| serjester wrote:
| I think you're straw manning his argument.
|
| He explicitly says that both LLMs and traditional software have
| very important roles to play.
|
| LLMs though are incredibly useful when encoding the behavior of
| the system deterministically is impossible. Previously this
| fell under the umbrella of problems solved with ML. This would
| take a giant time investment and a highly competent team to
| pull off.
|
| Now anyone can solve many of these same problems with a single
| API call. It's easy to wave this off, but this a total paradigm
| shift.
| kypro wrote:
| I know we've had thought leaders in tech before, but am I the
| only one who is getting a bit fed up by practically anything a
| handful of people in the AI space say being circulated everywhere
| in tech spaces at the moment?
| danny_codes wrote:
| No it's incredibly annoying I agree.
|
| The hype hysteria is ridiculous.
| dang wrote:
| If there are lesser-known voices who are as interesting as
| karpathy or simonw (to mention one other example), I'd love to
| know who they are so we can get them into circulation on HN.
| jes5199 wrote:
| okay I'm practicing my new spiel:
|
| this focus on coding is the wrong level of abstraction
|
| coding is no longer the problem. the problem is getting the right
| context to the coding agent. this is much, much harder
|
| "vibe coding" is the new "horseless carriage"
|
| the job of the human engineer is "context wrangling"
| diggan wrote:
| > coding is no longer the problem.
|
| "Coding" - The art of literally using your fingers to type
| weird characters into a computer, was never a problem
| developers had.
|
| The problem has always been understanding and communication,
| and neither of those have been solved at this moment. If
| anything, they have gotten even more important, as usually
| humans can infer things or pick up stuff by experience, but
| LLMs cannot, and you have to be very precise and exact about
| what you're telling them.
|
| And so the problem remains the same. "How do I communicate what
| I want to this person, while keeping the context as small as
| possible as to not overflow, yet extensive enough to cover
| everything?" except you're sending it to endpoint A instead of
| endpoint B.
| ofjcihen wrote:
| I'd take it a step further honestly. You need to be precise
| and exact but you also have to have enough domain knowledge
| to know when the LLM is making a huge mistake.
| diggan wrote:
| > you also have to have enough domain knowledge
|
| I'm a bit 50/50 on this. Generally I agree, how are you
| supposed to review it otherwise? Blindly accepting whatever
| the LLM tells you or gives you is bound to create trouble
| in the future, you still need to understand and think about
| what the thing you're building is, and how to
| design/architect it.
|
| I love making games, but I'm also terrible at math.
| Sometimes, I end up out of my depth, and sometimes it could
| take me maybe a couple of days to solve something that
| probably would be trivial for a lot of people. I try my
| best to understand the fundamentals and the theory behind
| it, but also not get lost in rabbit holes, but it's still
| hard, for whatever reason.
|
| So I end up using LLMs sometimes to write small utility
| functions used in my games for specific things. It takes a
| couple of minutes. I know exactly what I want to pass into
| it, and what I want to get back, but I don't necessarily
| understand 100% of the math behind it. And I think I'm
| mostly OK with this, as long as I can verify that the
| expected inputs get the expected outputs, which I usually
| do with unit or E2E tests.
|
| Would I blindly accept information about nuclear reactors,
| another topic I don't understand much about? No, I'd still
| take everything a LLM outputs with a "grain of probability"
| because that's how they work. Would I blindly accept it if
| I can guarantee that for my particular use case, it gives
| me what I expect from it? Begrudgingly, yeah, because I
| just wanna create games and I'm terrible at math.
| ofjcihen wrote:
| Oh yeah definitely. The context matters.
|
| For making CRUD apps or anything that doesn't involve
| security or stores sensitive information I 100 percent
| agree it's fine.
|
| The issue I see is that we get some people storing
| extremely sensitive info in apps made with these and they
| don't know enough to verify the security of it. They'll
| ask the LLM "is it secure?" But it doesn't matter if they
| don't know it's not BSing
| ldenoue wrote:
| Full playable transcript
| https://www.appblit.com/scribe?v=LCEmiRjPEtQ
| swyx wrote:
| slides:
| https://docs.google.com/presentation/d/1sZqMAoIJDxz79cbC5ap5...
| alightsoul wrote:
| It's interesting to see people here and on Blind are more wary?
| of AI than people in say, Reddit or Youtube comments
| sponnath wrote:
| Reddit and YouTube are such huge social media platforms that it
| really depends on which bubble (read: subreddits/yt channels)
| you're looking at. There's the "AGI is here" people over at
| r/singularity and then the "AI is useless" people at
| r/programming. I'm simplifying arguments from both sides here
| but you get my point.
| alightsoul wrote:
| Even looking at r/programming I felt they were less wary of
| AI, or even comparing the comments here vs those on YouTube
| for this video
| lubujackson wrote:
| Generally, people behind big revolutionary tech are the worst
| suited for understanding how it will do "in the wild". Forest for
| the trees and all that.
|
| Some good nuggets in this talk, specifically his concept that
| Software 1.0, 2.0 and 3.0 will all persist and all have unique
| use cases. I definitely agree with that. I disagree with his
| belief that "anyone can vibe code" mindset - this works to a
| certain level of fidelity ("make an asteroids clone") but what he
| overlooks is his ability, honed over many years, to precisely
| document requirements that will translate directly to code that
| works in an expected way. If you can't write up a Jira epic that
| covers all bases of a project, you probably can't vibe code
| something beyond a toy project (or an obvious clone). LLM code
| falls apart under its own weight without a solid structure, and I
| don't think that will ever fundamentally change.
|
| Where we are going next, and a lot of effort is being put behind,
| is figuring out exactly how to "lengthen the leash" of AI through
| smart framing, careful context manipulation and structured
| requests. We obviously can have anyone vibe code a lot further if
| we abstract different elements into known areas and simply allow
| LLMs to stitch things together. This would allow much larger
| projects with a much higher success rate. In other words, I
| expect an AI Zapier/Yahoo Pipes evolution.
|
| Lastly, I think his concept of only having AI pushing "under 1000
| line PRs" that he carefully reviews is more short-sighted. We are
| very, very early in learning how to control these big stupid
| brains. Incrementally, we will define sub-tasks that the AI can
| take over completely without anyone ever having to look at the
| code, because the output will always be within an accepted and
| tested range. The revolution will be at the middleware level.
| AlexCoventry wrote:
| I've seen evidence of "anyone can vibe code", but at this stage
| the result tends to be a 5,000-line application intricately
| entangled with 500,000 lines of irrelevant slop. Still, the
| wonder is that the bear can dance at all. That's a new thing
| under the sun.
| nsagent wrote:
| Having worked with game designers writing code for their
| missions/levels in a scripting language, I'd say this has
| been the case for quite a long while.
|
| They start with the code from another level, then modify it
| until it seems to do what they want. During the alpha testing
| phase, we'd have a programmer read through the code and
| remove all the useless cruft and fix any associated bugs.
|
| In some sense that's what vibe coding with an AI is like if
| you don't know how to code. You have the AI make some initial
| set of code that you can't evaluate for correctness, then
| slowly modify it until it seems to behave generally like you
| want. You might even learn to recognize a few things in the
| code over time, at which point you can directly change some
| variables or structures in the code directly.
| AlexCoventry wrote:
| I'm not kidding about the orders of magnitude, though. It's
| been literally roughly 100 lines to per line required to
| competently implement the app. It doesn't seem economically
| feasible to me, at this stage. I would prefer to just
| rewrite. (I know it's a common bias.)
| jmsdnns wrote:
| There is another angle to this too.
|
| Prior to LLMs, it was amusing to consider how ML folks and
| software folks would talk passed each other. It was amusing
| because both sides were great at what they do, neither side
| understood the other side, and they had to work together
| anyway.
|
| After LLMs, we now have lots of ML folks talking about the
| future of software, so ething previously established to be so
| outside their expertise that communication with software
| engineers was an amusing challenge.
|
| So I must ask, are ML folks actually qualified to know the
| future of software engineering? Shouldnt we be listening to
| software engineers instead?
| abeppu wrote:
| This seems to be overstating the separation. For people doing
| applied ML, there's often been a dual responsibility that
| included a significant amount of software engineering. I
| wouldn't necessarily listen to such declarations from an ML
| researcher whose primary output is papers, but from ML
| engineers who have built and shipped
| products/services/libraries I think it's much more
| reasonable.
| tomrod wrote:
| > So I must ask, are ML folks actually qualified to know the
| future of software engineering?
|
| Probably not CRUD apps typical to back office or website
| software, but don't forget that ML folks come from the stock
| of people that built Apollo, Mars Landers, etc. Scientific
| computing shares some significant overlap with SWE, and ML is
| a subset of that.
|
| IMHO, the average SWE and ML person are different types when
| it comes to how they cargocult develop, but the top 10% show
| significant understanding and re speed across domains.
| superconduct123 wrote:
| Where was he was saying you could vibe code beyond a simple
| app?
|
| He even said it could be a gateway to actual programming
| raffael_de wrote:
| I'm a little surprised at how negative he is towards textual
| interfaces and text for representing information.
| j45 wrote:
| It's interesting how researchers are ahead on some insights and
| introducing them, and it feels like some are new to them but it
| might already exist and they're helping present them to the
| world.
|
| A positive video all around, have got to learn a lot from
| Andrej's Youtube account.
|
| LLMs are really strange, I don't know if I've seen a technology
| where the technology class that applies it (or can verify
| applicability) has been so separate or unengaged compared to the
| non-technical people looking to solve problems.
| whilenot-dev wrote:
| I watched Karpathy's _Intro to Large Language Models_ [0] not so
| long ago and must say that I'm a bit confused by this
| presentation, and it's a bit unclear to me what it adds.
|
| 1,5 years ago he saw all the tool uses in agent systems as the
| future of LLMs, which seemed reasonable to me. There was (and
| maybe still is) potential for a lot of business cases to be
| explored, but every system is defined by its boundaries
| nonetheless. We still don't know all the challenges we face at
| that boundaries, whether these could be modelled into a virtual
| space, handled by software, and therefor also potentially AI and
| businesses.
|
| Now it all just seems to be analogies and what role LLMs could
| play in our modern landscape. We should treat LLMs as
| encapsulated systems of their own ...but sometimes an LLM becomes
| the operating system, sometimes it's the CPU, sometimes it's the
| mainframe from the 60s with time-sharing, a big fab complex, or
| even outright electricity itself?
|
| He's showing an iOS app, which seems to be, sorry for the
| dismissive tone, an example for a better looking counter. This
| demo app was in a presentable state for a demo after a day, and
| it took him a week to implement Googles OAuth2 stuff. Is that
| somehow exciting? What was that?
|
| The only way I could interpret this is that it just shows a big
| divide we're currently in. LLMs are a final API product for some,
| but an unoptimized generative software-model with sophisticated-
| but-opaque algorithms for others. Both are utterly in need for
| real world use cases - the product side for the fresh training
| data, and the business side for insights, integrations and
| shareholder value.
|
| Am I all of a sudden the one lacking imagination? Is he just
| slurping the CEO cool aid and still has his investments in
| OpenAI? Can we at least agree that we're still dealing with
| software here?
|
| [0]: https://www.youtube.com/watch?v=zjkBMFhNj_g
| bwfan123 wrote:
| > Am I all of a sudden the one lacking imagination?
|
| No, The reality of what these tools can do is sinking in.. The
| rubber is meeting the road and I can hear some screaching.
|
| The boosters are in 5 stages of grief coming to terms with what
| was once AGI and is now a mere co-pilot, while the haters are
| coming to terms with the fact that LLMs can actually be useful
| in a variety of usecases.
| acedTrex wrote:
| I actually quite agree with this, there is some reckoning on
| both sides happening. It's quite entertaining to watch, a bit
| painful as well of course as someone who is on the "they are
| useless" side and is noticing some very clear usecases where
| a value add is present.
| natebc wrote:
| I'm with you. I give several of 'em a shot a few times a
| week (thanks Kagi for the fantastic menu of choices!). Over
| the last quarter or so I've found that the bullshit:useful
| ratio is creeping to the useful side. They still answer
| like a high school junior writing a 5 paragraph essay but a
| decade of sifting through blogspam has honed my own ability
| to cut through that.
| diggan wrote:
| > but a decade of sifting through blogspam has honed my
| own ability to cut through that.
|
| Now, a different skill need to be honed :) Add "Be
| concise and succinct without removing any details" to
| your system prompt and hopefully it can output its text
| slightly better.
| Joel_Mckay wrote:
| In general, the functional use-case traditionally covered
| by basic heuristics is viable for a reasoning LLM. These
| are useful for search. media processing, and language
| translation.
|
| LLM is not AI, and never was... and while the definition
| has been twisted in marketing BS it does not mean either
| argument is 100% correct or in err.
|
| LLM is now simply a cult, and a rather old one dating back
| to the 1960s Lisp machines.
|
| Have a great day =3
| johnxie wrote:
| LLMs aren't perfect, but calling them a "cult" misses the
| point. They're not just fancy heuristics, they're
| general-purpose function approximators that can reason,
| plan, and adapt across a huge range of tasks with zero
| task-specific code.
|
| Sure, it's not AGI. But dismissing the progress as just
| marketing ignores the fact that we're already seeing them
| handle complex workflows, multi-step reasoning, and real-
| time interaction better than any previous system.
|
| This is more than just Lisp nostalgia. Something real is
| happening.
| Joel_Mckay wrote:
| Sure, I have seen the detrimental impact on some teams,
| and it does not play out as Marketers suggest.
|
| The trick is in people seeing meaning in well structured
| nonsense, and not understanding high dimension vector
| spaces simply abstracting associative false equivalency
| with an inescapable base error rate.
|
| I wager Neuromorphic computing is likely more viable than
| LLM cults. The LLM subject is incredibly boring once your
| tear it apart, and less interesting than watching Opuntia
| cactus grow. Have a wonderful day =3
| anothermathbozo wrote:
| > The reality of what these tools can do is sinking in
|
| It feels premature to make determinations about how far this
| emergent technology can be pushed.
| Joel_Mckay wrote:
| The cognitive dissonance is predictable.
|
| Now hold my beer, as I cast a superfluous rank to this
| trivial 2nd order Tensor, because it looks awesome wasting
| enough energy to power 5000 homes. lol =3
| pera wrote:
| Exactly! What skeptics don't get is that AGI is already here
| and we are now starting a new age of infinite prosperity,
| it's just that exponential growth looks flat at first,
| obviously...
|
| Quantum computers and fusion energy are basically solved
| problems now. Accelerate!
| hn_throwaway_99 wrote:
| This sounds like clear satire to me, but at this point I
| really can't tell.
| hn_throwaway_99 wrote:
| > The boosters are in 5 stages of grief coming to terms with
| what was once AGI and is now a mere co-pilot, while the
| haters are coming to terms with the fact that LLMs can
| actually be useful in a variety of usecases.
|
| I couldn't agree with this more. I often get frustrated
| because I feel like the loudest voices in the room are so
| laughably extreme. One on side you have the "AGI cultists",
| and on the other you have the "But the hallucinations!!!"
| people. I've personally been pretty amazed by the state of AI
| (nearly all of this stuff was the domain of Star Trek just a
| few years ago), and I get tons of value out of many of these
| tools, but at the same time I hit tons of limitations and I
| worry about the long-term effect on society (basically, I
| think this "ask AI first" approach, especially among young
| people, will kinda turn us all into idiots, similar to the
| way Google Maps made it hard for most of us to remember the
| simple directions). I also can't help but roll my eyes when I
| hear all the leaders of these AI companies going on about how
| AI will make a "white collar bloodbath" - there is some
| nuggets of truth in that, but these folks are just using
| scare tactics to hype their oversold products.
| Workaccount2 wrote:
| The fundamental mistake I see is people applying LLMs to the
| current paradigm of software; enormous hulking codebases made
| to have as many features as possible to appeal to as many users
| as possible.
|
| LLMs are excellent at helping non-programmers write narrow use
| case, bespoke programs. LLMs don't need to be able to one-shot
| excel.exe or Plantio.apk so that Christine can easily track
| when she watered and fed her plants nutrients.
|
| The change that LLMs will bring to computing is much deeper
| than Garden Software trying to slot in some LLM workers to work
| on their sprawling feature-pack Plantio SaaS.
|
| I can tell you first hand I have already done this numerous
| times as a non-programmer working a non-tech job.
| skydhash wrote:
| The thing is that there's a need to integrate all these
| little tools because the problems they solve is part of the
| same domain. And that's where problems lie. Something like
| Excel have an advantage as being a common platform for both
| data and procedures. Unix adopted text and pipes for
| integration.
| demosthanos wrote:
| What you're missing is the audience.
|
| This talk is different from his others because it's directed at
| aspiring startup founders. It's about how we conceptualize the
| place of an LLM in a new business. It's designed to provide a
| series of analogies any one of which which may or may not help
| a given startup founder to break out of the tired, binary
| talking points they've absorbed from the internet ("AI all the
| things" vs "AI is terrible") in favor of a more nuanced
| perspective of the role of AI in their plans. It's soft and
| squishy rhetoric because it's not about engineering, it's about
| business and strategy.
|
| I honestly left impressed that Karpathy has the dynamic range
| necessary to speak to both engineers and business people, but
| it also makes sense that a lot of engineers would come out of
| this very confused at what he's on about.
| whilenot-dev wrote:
| I get that, motivating young founders is difficult, and I
| think he has a charming geeky way of provoking some thoughts.
| But on the other hand: Why mainframes with time-sharing from
| the 60s? Why operating systems? LLMs to tell you how to boil
| an egg, seriously?
|
| Putting my engineering hat on, I understand his idea of the
| "autonomy slider" as lazy workaround for a software
| implementation that deals with _one_ system boundary. He
| should aspire people there to seek out for unknown
| boundaries, not provide implementation details to existing
| boundaries. His _MenuGen_ app would probably be better off
| using a web image search instead of LLM image generation.
| Enhancing deployment pipelines with LLM setups is something
| for the last generation of DevOps companies, not the next
| one.
|
| Please mention just once the value proposition and
| responsibilities when handling large quantities of valuable
| data - LLMs wouldn't exist without them! What makes quality
| data for an LLM, or personal data?
| westoncb wrote:
| > and must say that I'm a bit confused by this presentation,
| and it's a bit unclear to me what it adds.
|
| I think the disconnect might come from the fact that Karpathy
| is speaking as someone who's day-to-day computing work has
| already been radically transformed by this technology (and he
| interacts with a ton of other people for whom this is the
| case), so he's not trying to sell the possibility of it: that
| would be like trying to sell the possibility of an airplane for
| someone who's already just cruising around in one every day.
| Instead the mode of the presentation is more: well, here we are
| at the dawn of a new era of computing, it really happened. Now
| how can we relate this to the history of computing to
| anticipate where we're headed next?
|
| > ...but sometimes an LLM becomes the operating system,
| sometimes it's the CPU, sometimes it's the mainframe from the
| 60s with time-sharing, a big fab complex, or even outright
| electricity itself?
|
| He uses these analogies in clear and distinct ways to
| characterize separate facets of the technology. If you were
| unclear on the meanings of the separate analogies it seems like
| the talk may offer some value for you after all but you may be
| missing some prerequisites.
|
| > This demo app was in a presentable state for a demo after a
| day, and it took him a week to implement Googles OAuth2 stuff.
| Is that somehow exciting? What was that?
|
| The point here was that he'd built the core of the app within a
| day without knowing the Swift language or ios app dev ecosystem
| by leveraging LLMs, but that part of the process remains old-
| fashioned and blocks people from leveraging LLMs as they can
| when writing code--and he goes on to show concretely how this
| could be improved.
| wiremine wrote:
| I spent a lot of time thinking about this recently. Ultimately,
| English is not a clean, deterministic abstraction layer. This
| isn't to say that LLMs aren't useful, and can create some great
| efficiencies.
| npollock wrote:
| no, but a subset of English could be
| freehorse wrote:
| Thought we already had that?
| 4gotunameagain wrote:
| Let me introduce to you.. python ;)
| axxto wrote:
| You just invented programming languages, halfway
| mkw5053 wrote:
| This DevOps friction is exactly why I'm building an open-source
| "Firebase for LLMs." The moment you want to add AI to an app,
| you're forced to build a backend just to securely proxy API calls
| --you can't expose LLM API keys client-side. So developers who
| could previously build entire apps backend-free suddenly need
| servers, key management, rate limiting, logging, deployment...
| all just to make a single OpenAI call. Anyone else hit this wall?
| The gap between "AI-first" and "backend-free" development feels
| very solvable.
| smpretzer wrote:
| I think this lines up with Apple's thesis of on-device models
| being a useful feature for developers who don't want to deal
| with calling out the OpenAI
|
| https://developer.apple.com/documentation/foundationmodels
| sockboy wrote:
| Yeah, hit this exact wall building a small AI tool. Ended up
| spinning up a whole backend just to keep the keys safe. Feels
| like there should be a simpler way, but haven't seen anything
| that's truly plug-and-play yet. Curious to see what you're
| working on.
| dieortin wrote:
| It's very obvious this account was just created to promote
| your product...
| jeremyjh wrote:
| Do you think Firebase and Superbase are working on this? Good
| luck but to me it sounds like a platform feature, not a
| standalone product.
| magicloop wrote:
| I think this is a brilliant talk and truly captures the
| "zeitgeist" of our times. He sees the emergent patterns arising
| as software creation is changing.
|
| I am writing a hobby app at the moment and I am thinking about
| its architecture in a new way now. I am making all my model
| structures comprehensible so that LLMs can see the inside
| semantics of my app. I merely provide a human friendly GUI over
| the top to avoid the linear wall-of-text problem you get when you
| want to do something complex via a chat interface.
|
| We need to meet LLMs in the middle ground to leverage the best of
| our contributions - traditional code, partially autonomous AI,
| and crafted UI/UX.
|
| Part of, but not all of, programming is "prompting well". It goes
| along with understanding the imperative aspects, developing a
| nose for code smells, and the judgement for good UI/UX.
|
| I find our current times both scary and exciting.
| sockboy wrote:
| Definitely hit this wall too. The backend just for API proxy
| feels like a detour when all you want is to ship a quick
| prototype. Would love to see more tools that make this seamless,
| especially for solo builders.
___________________________________________________________________
(page generated 2025-06-19 23:00 UTC)