[HN Gopher] My 2.5 year old laptop can write Space Invaders in J...
       ___________________________________________________________________
        
       My 2.5 year old laptop can write Space Invaders in JavaScript now
       (GLM-4.5 Air)
        
       Author : simonw
       Score  : 417 points
       Date   : 2025-07-29 13:45 UTC (9 hours ago)
        
 (HTM) web link (simonwillison.net)
 (TXT) w3m dump (simonwillison.net)
        
       | croes wrote:
       | I bet the training data included enough space invader cloned in
       | JS
        
         | jplrssn wrote:
         | I also wouldn't be surprised if labs were starting to mix in a
         | few pelican SVGs into their training data.
        
           | quantumHazer wrote:
           | SVG benchmarking is a thing since GPT-4, so probably all
           | major labs are overfitting on some dataset ov svg images for
           | sure
        
           | diggan wrote:
           | Even "accidentally" it makes sense that "SVGs of pelicans
           | riding bikes" are now included into datasets used for
           | training as it has spread as a wildfire on the internet,
           | making it less useful as a simple benchmark.
           | 
           | This is why I keep all my benchmarks private and don't share
           | anything about them publicly, as soon as you write about them
           | anywhere publicly they'll stop being useful in some months.
        
             | toyg wrote:
             | _> This is why I keep all my benchmarks private_
             | 
             | This is also why, if I were an artist or anyone
             | commercially relying on creative output of any kind, I
             | wouldn't be posting _anything_ on the internet anymore,
             | ever. The minute you make anything public, the engines will
             | clone it to death and turn it into a commodity.
        
               | __mharrison__ wrote:
               | Somewhat defeats the purpose of being an artist, doesn't
               | it?
        
               | toyg wrote:
               | Defeating the purpose of creating almost anything,
               | really.
               | 
               | AI is definitely breaking the whole "labor for money"
               | architecture of our world.
        
               | zhengyi13 wrote:
               | Eeeehhhh.
               | 
               | Maybe the thing to do is provide public, physical
               | exhibits of your art in search of patronage.
        
               | debugnik wrote:
               | That makes it so much harder to show art to people and
               | market yourself though.
               | 
               | I considered experimenting with web DRM for art
               | sites/portfolios, on the assumption that scrappers won't
               | bother with the analog loophole (and dedicated art-style
               | cloners would hopefully be disappointed by the quality),
               | but gave up because of limited compatible devices for the
               | strongest DRM levels, and HDCP being broken on those
               | levels anyway. If the DRM technique caught on it would
               | take attackers, at most, a few bucks and hours once to
               | bypass it, and I don't think users would truly understand
               | that upfront.
        
           | simonw wrote:
           | I'll believe they are doing that when one of the models draws
           | me an SVG that actually looks like a pelican.
        
             | __mharrison__ wrote:
             | Someone needs to craft a beautifully bike donned by a
             | pelican, throw in some seo, and see how long it takes a
             | model to replicate it.
             | 
             | Simon probably wouldn't be happy about killing his multi-
             | year evaluation metric though...
        
               | simonw wrote:
               | I would be delighted.
               | 
               | My pelican on a bicycle benchmark is a long con. The goal
               | is to finally get a good SVG of a pelican riding a
               | bicycle, and if I can trick AI labs into investing
               | significant effort in cheating on my benchmark then fine,
               | that gets me my pelican!
        
         | gchamonlive wrote:
         | Which would make this disappointing if it was only good at
         | cloning space invaders. If it can reproduce all the clone it
         | has ever seen it would still be an impressive feat.
         | 
         | I just think we should stop to appreciate exactly how awesome
         | language models are. It's compressing and correctly reproducing
         | a lot of data with meaningful context between each token and
         | the rest of the context window. It's still amazing, specially
         | with smaller models like this, because even if it's reproducing
         | a clone, you can still ask questions about it and it should
         | perform reasonably well explaining you what it does and how you
         | can take it over to further develop that clone.
        
           | croes wrote:
           | But that would still be copy and paste with extra steps.
           | 
           | Like all these vibe coded to do apps, one of the most used
           | starting problems of programming courses.
           | 
           | It's great that an AI can do that but it could stall progress
           | if we get limited to existing tools and programs.
        
         | shermantanktop wrote:
         | How about an SVG of 9.11 pelicans riding bicycles and counting
         | the three Rs in "strawberry"?
        
       | chickenzzzzu wrote:
       | "2.5 year old laptop" is potentially the most useless way of
       | describing a 64GB M2, as it could be confused with virtually any
       | other configuration of laptop.
        
         | OJFord wrote:
         | I think the point is just that it doesn't require absolute
         | cutting edge nor server hardware.
        
           | jphoward wrote:
           | No but 64 GB of unified memory provides almost as much GPU
           | RAM capacity as two RTX 5090s (only less due to the unified
           | nature) - top of the range GPUs - so it's a truly exceptional
           | laptop in this regard.
        
             | turnsout wrote:
             | Except that it is not exceptional at all; it's an older-
             | generation MacBook Pro with 64GB of RAM. There's nothing
             | particularly unusual about it.
        
               | jphoward wrote:
               | 64 GB of RAM which is addressable by a GPU is exceptional
               | for a laptop - this is not just system RAM.
        
               | chickenzzzzu wrote:
               | To emphasize this point further, at least with my
               | efforts, it is not even possible to buy a 64GB M4 Pro
               | right now. 32GB, 64GB, and 128GB are all sold out.
               | 
               | We can say that 64GB addressable by a GPU is not
               | exceptional when compared to 128GB and it still costs
               | less than a month's pay for a FAANG engineer, but the
               | fact that they aren't actually purchasable right now
               | shows that it's not as easy as driving to Best Buy and
               | grabbing one off the shelf.
        
               | turnsout wrote:
               | They're not sold out--Apple's configurator (and chip
               | naming) is just confusing. The MacBook Pro with M4 Pro is
               | only available in 24 or 48 GB configurations. To get 64
               | or 128 GB, you need to upgrade to the M4 Max.
               | 
               | If you're looking for the cheapest way into 64 of unified
               | memory, the Mac mini is available with an M4 Pro and 64GB
               | at $1999.
               | 
               | So, truly, not "exceptional" unless you consider the
               | price to be exorbitant (it's not, as evidenced by the
               | long useful life of an M-series Mac).
        
               | chickenzzzzu wrote:
               | thank you for providing that extra info! i agree that
               | $2000-4000 is not an absolutely earth shattering price,
               | but i still wonder what the benefit one receives is when
               | they say "2.5 year old laptop" instead of "64GB M2
               | laptop"
        
               | turnsout wrote:
               | I understand, but _that is not exceptional for a Mac
               | laptop._ You could say all Apple Silicon Macs are
               | exceptional, and I guess I agree in the context of the
               | broader PC community. But I would not point at an
               | individual MacBook Pro with 64 GB of RAM and say  "whoa,
               | that's exceptional." It's literally just a standard
               | option when you buy the computer. It does bump the price
               | pretty high, but the point of the MBP is to cater to
               | higher-end workflows.
        
           | tantalor wrote:
           | It was also something he already had lying around. Did not
           | need to buy something new to get new functionality.
        
         | simonw wrote:
         | The thing I find most notable here is that this is the same
         | laptop I've used to run every open weights model since the
         | original LLaMA.
         | 
         | The models have got _so much better_ without me needing to
         | upgrade my hardware.
        
           | chickenzzzzu wrote:
           | That's great! Why can't we say that instead?
           | 
           | No need to overly quantize our headlines.
           | 
           | "64GB M2 makes Space Invaders-- can be bought for under
           | $xxxx"
        
       | AlexeyBrin wrote:
       | Most likely its training data included countless Space Invaders
       | in various programming languages.
        
         | quantumHazer wrote:
         | and probably some synthetic data are generated copy of the
         | games already on the dataset?
         | 
         | i have this feeling with LLM's generated react frontend, they
         | all look the same
        
           | bayindirh wrote:
           | Last time somebody asked for a "premium camera app for iOS",
           | and the model (re)generated Halide.
           | 
           | Models don't emit something they don't know. They remix and
           | rewrite what they know. There's no invention, just recall...
        
             | FeepingCreature wrote:
             | True where trivial; where nontrivial, false.
             | 
             | Trivially, humans don't emit something they don't know
             | either. You don't spontaneously figure out Javascript from
             | first principles, you put together your existing knowledge
             | into new shapes.
             | 
             | Nontrivially, LLMs can _absolutely_ produce code for
             | entirely new requirements. I 've seen them do it many
             | times. Will it be put together from smaller fragments? Yes,
             | this is called "experience" or if the fragments are small
             | enough, "understanding".
        
               | bayindirh wrote:
               | Humans can observe ants and invent any colony
               | optimization. AIs can't.
               | 
               | Humans can explore what they don't know. AIs can't.
        
               | falcor84 wrote:
               | What makes you categorically say that "AIs can't"?
               | 
               | Based on my experience with present day AIs, I personally
               | wouldn't be surprised at all that if you showed Gemini
               | 2.5 Pro a video of an insect colony and asked it "Take a
               | look at the way they organize and see if that gives you
               | inspiration for an optimization algorithm", it will spit
               | something interesting out.
        
               | sarchertech wrote:
               | It will 100% have something in its training set
               | discussing a human doing this and will almost definitely
               | spit out something similar.
        
               | FeepingCreature wrote:
               | What makes you categorically say that "humans can"?
               | 
               | I couldn't do that with an ant colony. I would have to
               | train on ant research first.
               | 
               | (Oh, and AIs can absolutely explore what they don't know.
               | Watch a Claude Code instance look at a new repository.
               | Exploration is a convergent skill in long-horizon RL.)
        
               | CamperBob2 wrote:
               | That's what benchmarks like ARC-AGI are designed to test.
               | The models are getting better at it, and you aren't.
               | 
               | Nothing ultimately matters in this business except the
               | first couple of time derivatives.
        
               | ben_w wrote:
               | > Humans can observe ants and invent any colony
               | optimization. AIs can't.
               | 
               | Surely this is _exactly_ what current AI do? Observe
               | stuff and apply that observation? Isn 't this the exact
               | criticism, that they aren't inventing ant colonies from
               | first principles without ever seeing one?
               | 
               | > Humans can explore what they don't know. AIs can't.
               | 
               | We only learned to decode Egyptian hieroglyphs because of
               | the Rosetta Stone. There's no translation for North
               | Sentinelese, the Voynich manuscript, or Linear A.
               | 
               | We're not magic.
        
               | phkahler wrote:
               | >> Nontrivially, LLMs can absolutely produce code for
               | entirely new requirements. I've seen them do it many
               | times.
               | 
               | I think most people writing software today are
               | reinventing a wheel, even in corporate environments for
               | internal tools. Everyone wants their own tweak or thinks
               | their idea is unique and nobody wants to share code
               | publicly, so everyone pays programmers to develop buggy
               | bespoke custom versions of the same stuff that's been
               | done 100 times before.
               | 
               | I guess what I'm saying is that your requirements are
               | probably not new, and to the extent they are yes an LLM
               | can fill in the blanks due to its fluency in languages.
        
             | satvikpendem wrote:
             | This doesn't make sense thermodynamically because models
             | are far smaller than the training data they purport to hold
             | and recall, so there must be some level of "understanding"
             | going on. Whether that's the same as human understanding is
             | a different matter.
        
               | Eggpants wrote:
               | It's a lossy text compression technique. It's clever
               | applied statistics. Basically an advanced association
               | rules algorithm which has been around for decades but
               | modified to consider order and relative positions.
               | 
               | There is no understanding, regardless of the wants of all
               | the capital investors in this domain.
        
               | simonw wrote:
               | I don't care if it can "understand" anything, as long as
               | I can use it to achieve useful things.
        
               | Eggpants wrote:
               | "useful things" like poorly drawing birds on bikes? ;)
               | 
               | (I have much respect for what you have done and are
               | currently doing, but you did walk right into that one)
        
               | msephton wrote:
               | The pelican on a bicycle is a very useful test.
        
               | CamperBob2 wrote:
               | _It's a lossy text compression technique._
               | 
               | That is a much, much bigger deal than you make it sound
               | like.
               | 
               | Compression may, in fact, be all we need. For that
               | matter, it may be all there _is_.
        
             | Uehreka wrote:
             | > Models don't emit something they don't know. They remix
             | and rewrite what they know. There's no invention, just
             | recall...
             | 
             | People really need to stop saying this. I get that it was
             | the Smart Guy Thing To Say in 2023, but by this point it's
             | pretty clear that that it's not true in any way that
             | matters for most practical purposes.
             | 
             | Coding LLMs have clearly been trained on conversations
             | where a piece of code is shown, a transformation is
             | requested (rewrite this from Python to Go), and then the
             | transformed code is shown. It's not that they're just
             | learning codebases, they're learning what working with code
             | looks like.
             | 
             | Thus you can ask an LLM to refactor a program in a language
             | it has never seen, and it will "know" what refactoring
             | means, because it has seen it done many times, and it will
             | stand a good chance of doing the right thing.
             | 
             | That's why they're useful. They're doing something way more
             | sophisticated than just "recombining codebases from their
             | training data", and anyone chirping 2023 sound bites is
             | going to miss that.
        
             | mr_toad wrote:
             | > They remix and rewrite what they know. There's no
             | invention, just recall...
             | 
             | If they only recalled they wouldn't "hallucinate". What's a
             | lie if not an invention? So clearly they can come up with
             | data that they weren't trained on, for better or worse.
        
               | 0x457 wrote:
               | Because internally, there isn't a difference between
               | correctly "recalled" token and incorrectly
               | (hallucinated).
        
           | tshaddox wrote:
           | To be fair, the human-generated user interfaces all look the
           | same too.
        
           | cchance wrote:
           | Have you used the internet? thats how the internet looks,
           | their all fuckin react and the same layouts and styles 90%
           | shadcn lol
        
         | NitpickLawyer wrote:
         | This comment is ~3 years late. Every model since gpt3 has had
         | the entirety of available code in their training data. That's
         | not a gotcha anymore.
         | 
         | We went from chatgpt's "oh, look, it looks like python code but
         | everything is wrong" to "here's a full stack boilerplate app
         | that does what you asked and works in 0-shot" inside 2 years.
         | That's the kicker. And the sauce isn't just in the training
         | set, models now do post-training and RL and a bunch of other
         | stuff to get to where we are. Not to mention the insane
         | abilities with extended context (first models were 2/4k max),
         | agentic stuff, and so on.
         | 
         | These kinds of comments are really missing the point.
        
           | haar wrote:
           | I've had little success with Agentic coding, and what success
           | I have had has been paired with hours of frustration, where
           | I'd have been better off doing it myself for anything but the
           | most basic tasks.
           | 
           | Even then, when you start to build up complexity within a
           | codebase - the results have often been worse than "I'll start
           | generating it all from scratch again, and include this as an
           | addition to the initial longtail specification prompt as
           | well", and even then... it's been a crapshoot.
           | 
           | I _want_ to like it. The times where it initially "just
           | worked" felt magical and inspired me with the possibilities.
           | That's what prompted me to get more engaged and use it more.
           | The reality of doing so is just frustrating and wishing
           | things _actually worked_ anywhere close to expectations.
        
             | aschobel wrote:
             | Bingo, it's magical but the learning curve is very very
             | steep. The METR study on open-source productivity alluded
             | to this a bit.
             | 
             | I am definitely at a point where I am more productive with
             | it, but it took a bunch of effort.
        
               | devmor wrote:
               | The subjects in the study you are referencing also
               | believed that they were more productive with it. What
               | metrics do you have to convince yourself you aren't under
               | the same illusionary bias they were?
        
               | simonw wrote:
               | Yesterday I used ffmpeg to extract the frame at the 13
               | second mark of a video out as a JPEG.
               | 
               | If I didn't have an LLM to figure that out for me I
               | wouldn't have done it at all.
        
               | devmor wrote:
               | You wouldn't have just typed "extract frame at timestamp
               | as jpeg ffmpeg" into Google and used the StackExchange
               | result that comes up first that gives you a command to do
               | exactly that?
        
               | simonw wrote:
               | Before LLMs made ffmpeg no-longer-frustrating-to-use I
               | genuinely didn't know that ffmpeg COULD do things like
               | that.
        
               | devmor wrote:
               | I'm not really sure what you're saying an LLM did in this
               | case. Inspired a lost sense of curiosity?
        
               | Philpax wrote:
               | Translated a vague natural language query ("cli, extract
               | frame 13s into video") into something immediately
               | actionable with specific examples and explanations,
               | surfacing information that I would otherwise not know how
               | to search for.
               | 
               | That's what I've done with my ffmpeg LLM queries, anyway
               | - can't speak for simonw!
        
               | wizzwizz4 wrote:
               | DuckDuckGo search results for "cli, extract frame 13s
               | into video" (no quotes):
               | 
               | * https://stackoverflow.com/questions/10957412/fastest-
               | way-to-...
               | 
               | * https://superuser.com/questions/984850/linux-how-to-
               | extract-...
               | 
               | * https://www.aleksandrhovhannisyan.com/notes/video-cli-
               | cheat-...
               | 
               | * https://www.baeldung.com/linux/ffmpeg-extract-video-
               | frames
               | 
               | * https://ottverse.com/extract-frames-using-ffmpeg-a-
               | comprehen...
               | 
               | Search engines have been able to translate "vague natural
               | language queries" into search results for a decade, now.
               | This pre-existing infrastructure accounts for the _vast_
               | majority of ChatGPT 's apparent ability to find answers.
        
               | 0x457 wrote:
               | LLM somewhat understood ffmpeg documentation? Not sure
               | what is not clear here.
        
               | simonw wrote:
               | My general point is that people say things like "yeah,
               | but this one study showed that programmers over-estimate
               | the productivity gain they get from LLMs so how can you
               | really be sure?"
               | 
               | Meanwhile I've spent the past two years constantly
               | building and implementing things I _never would have
               | done_ because of the reduction in friction LLM assistance
               | gives me.
               | 
               | I wrote about this first two years ago - AI-enhanced
               | development makes me more ambitious with my projects -
               | https://simonwillison.net/2023/Mar/27/ai-enhanced-
               | developmen... - when I realized I was hacking on things
               | with tech like AppleScript and jq that I'd previously
               | avoided.
               | 
               | It's hard to measure the productivity boost you get from
               | "wouldn't have built that thing" to "actually built that
               | thing".
        
               | dingnuts wrote:
               | It is nice to use LLMs to generate ffmpeg commands,
               | because those can be pretty tricky, but really, you
               | wouldn't have just used the man page before?
               | 
               | That explains a lot about Django that the author is
               | allergic to man pages lol
        
               | simonw wrote:
               | I just took a look, and the man page DOES explain how to
               | do that!
               | 
               | ... on line 3,218: https://gist.github.com/simonw/6fc05ea
               | 7392c5fb8a5621d65e0ed0...
               | 
               | (I am very confident I am not the only person who has
               | been deterred by ffmpeg's legendarily complex command-
               | line interface. I feel no shame about this at all.)
        
               | quesera wrote:
               | Ffmpeg is genuinely complicated! And the CLI is
               | convoluted (in justifiable, and unfortunate ways).
               | 
               | But if you approach ffmpeg from the perspective of "I
               | know this is possible", you are always correct, and can
               | almost always reach the "how" in a handful of minutes.
               | 
               | Whether that's worth it or not, will vary. :)
        
               | ben_w wrote:
               | I remember when I was a kid, people asking a teacher how
               | to spell a word, and the answer was generally "look it up
               | in a dictionary"... which you can only do if you already
               | have shortlist of possible spellings.
               | 
               | *nix man pages are the same: if you already know which
               | tool can solve your problem, they're easy to use. But you
               | have to already have a shortlist of tools that can solve
               | your problem, before you even know which man pages to
               | read.
        
               | throwworhtthrow wrote:
               | LLM's still give subpar results with ffmpeg. For example
               | when I asked Sonnet to trim a long video with ffmpeg, it
               | put the input file parameter before the start time
               | parameter, which triggers an unnecessary decode of the
               | video file. [1]
               | 
               | Sure, use the LLM to get over the initial hump. But
               | ffmpeg's no exception to the rule that LLM's produce
               | subpar code. It's worth spending a couple minutes reading
               | the docs to understand what it did so you can do it
               | better, and unassisted, next time.
               | 
               | [1] https://ffmpeg.org/ffmpeg.html#:~:text=ss%20position
        
               | CamperBob2 wrote:
               | That says more about suboptimal design on ffmpeg's part
               | than it does about the LLM. Most humans can't deal with
               | ffmpeg command lines, so it's not surprising that the LLM
               | misses a few tricks.
        
               | nottorp wrote:
               | Had a LLM generate 3 lines of working C++ code that was
               | "only" one order of magnitude slower than what i edited
               | the code to in 10 minutes.
               | 
               | If you're happy with results like that, sure, LLMs miss
               | "a few tricks"...
        
               | ben_w wrote:
               | You don't have to leave LLM code alone, it's fine to
               | change it -- unless, I guess, you're doing some kind of
               | LLM vibe-code-golfing?
               | 
               | But this does remind me of a previous co-worker. Wrote
               | something to convert from a custom data store to a
               | database, his version took 20 minutes on some inputs.
               | Swore it couldn't possibly be improved. Obviously
               | ridiculous because it didn't take 20 minutes to load from
               | the old data store, nor to load from the new database.
               | Over the next few hours of looking at very mediocre code,
               | I realised it was doing an unnecessary O(n^2) check,
               | confirmed with the CTO it wasn't business-critical, got
               | rid of it, and the same conversion on the same data ran
               | in something like 200ms.
               | 
               | Over a decade before LLMs.
        
               | nottorp wrote:
               | We all do that, sometimes where it's time critical
               | sometimes where it isn't.
               | 
               | But I keep being told "AI" is the second coming of Ahura
               | Mazda so it shouldn't do stuff like that right?
        
               | CamperBob2 wrote:
               | "I'm taking this talking dog right back to the pound. It
               | told me to short NVDA, and you should see the buffer
               | overflow bugs in the C++ code it wrote. Totally
               | overhyped. I don't get it."
        
               | nottorp wrote:
               | "We hear you have been calling our deity a talking dog.
               | Please enter the red door for reeducation."
        
               | ben_w wrote:
               | > Ahura Mazda
               | 
               | Niche reference, I like it.
               | 
               | But... I only hear of scammers who say, and psychosis
               | sufferers who think, LLMs are *already* that competent.
               | 
               | Future AI? Sure, lots of sane-seeming people also think
               | it could go far beyond us. Special purpose ones have in
               | very narrow domains. But current LLMs are only good
               | enough to be useful and potentially economically
               | disruptive, they're not even close to wildly superhuman
               | like Stockfish is.
        
               | CamperBob2 wrote:
               | Sure. If you ask ChatGPT to play chess, it will put up an
               | amateur-level effort at best. Stockfish will indeed wipe
               | the floor with it. But what happens when you ask
               | Stockfish to write a Space Invaders game?
               | 
               | ChatGPT will get better at chess over time. Stockfish
               | will not get better at anything _except_ chess. That 's
               | kind of a big difference.
        
               | ben_w wrote:
               | > ChatGPT will get better at chess over time
               | 
               | Oddly, LLMs got _worse_ at specifically chess:
               | https://dynomight.net/chess/
               | 
               | But even to the general point, there's absolutely no
               | agreement how much better the current architectures can
               | ultimately get, nor how quickly they can get there.
               | 
               | Do they have potential for unbounded improvements, albeit
               | at exponential cost for each linear incremental
               | improvement? Or will they asymptomatically approach
               | someone with 5 years experience, 10 years experience, a
               | lifetime of experience, or a higher level than any human?
               | 
               | If I had to bet, I'd say current models have an
               | asymptomatic growth converging to a merely "ok"
               | performance; and separately claim that even if they're
               | actually unbounded with exponential cost for linear
               | returns, we can't afford the training cost needed to make
               | them act like someone with even just 6 years professional
               | experience in any given subject.
               | 
               | Which is still a lot. Especially as it would be acting
               | like it had about as much experience in _every other
               | subject at the same time_. Just... not a literal Ahura
               | Mazda.
        
               | CamperBob2 wrote:
               | _If I had to bet, I 'd say current models have an
               | asymptomatic growth converging to a merely "ok"
               | performance_
               | 
               | (Shrug) People with actual money to spend are betting
               | twelve figures that you're wrong.
               | 
               | Should be fun to watch it shake out from up here in the
               | cheap seats.
        
               | ben_w wrote:
               | Nah, trillion dollars is about right for "ok". Percentage
               | point of the global economy in cost, automate 2 percent
               | and get a huge margin. We literally set more than that on
               | actual fire each year.
               | 
               | For "pretty good", it would be worth 14 figures, over two
               | years. The global GDP is 14 figures. Even if this only
               | automated 10% of the economy, it pays for itself after a
               | decade.
               | 
               | For "Ahura Mazda", it would easily be worth 16 figures,
               | what with that being the principal God and god of the sky
               | in Zoroastrianism, and the only reason it stops at 16 is
               | the implausibility of people staying organised for longer
               | to get it done.
        
               | haar wrote:
               | Apologies if I was unclear.
               | 
               | The more I've used it, the more I've disliked how poor
               | the results it's produced, and the more I've realised I
               | would have been better served by doing it myself and
               | following a methodical path for things that I didn't have
               | experience with.
               | 
               | It's easier to step through a problem as I'm learning and
               | making small changes than an LLM going "It's done, and
               | production ready!" where it just straight up doesn't work
               | for 101 different tiny reasons.
        
           | MyOutfitIsVague wrote:
           | I don't think they are missing the point, because they're
           | pointing out that the tools are still the most useful for
           | patterns that are extremely widely known and repeated. I use
           | Gemini 2.5 Pro every day for coding, and even that one still
           | falls over on tasks that aren't well known to it (which is
           | why I break the problem down into small parts that I know
           | it'll be able to handle properly).
           | 
           | It's kind of funny, because sometimes these tools are magical
           | and incredible, and sometimes they are extremely stupid in
           | obvious ways.
           | 
           | Yes, these are impressive, and especially so for local models
           | that you can run yourself, but there is a gap between
           | "absolutely magical" and "pretty cool, but needs heavy
           | guiding" depending on how heavily the ground you're treading
           | has been walked upon.
           | 
           | For a heavily explored space, it's like being impressed that
           | you're 2.5 year old M2 with 64 GB RAM can extract some source
           | code from a zip file. It's worth being impressed and excited
           | about the space and the pace of improvement, but it's also
           | worth stepping back and thinking rationally about the
           | specific benchmark at hand.
        
             | NitpickLawyer wrote:
             | > because they're pointing out that the tools are still the
             | most useful for patterns that are extremely widely known
             | and repeated
             | 
             | I agree with you, but your take is _much_ more nuanced than
             | what the GP comment said! These models don 't simply
             | regurgitate the training set. That was my point with gpt3.
             | The models have advanced from that, and can now
             | "generalise" over the context in ways they could not do ~3
             | years ago. We are now at a point where you can write a
             | detailed spec (10-20k tokens) for an unseen scripting
             | language, and have SotA models a) write a parser and b)
             | start writing scripts for you in that language, even though
             | it never saw that particular scripting language anywhere in
             | its training set. Try it. You'll be surprised.
        
           | jayd16 wrote:
           | I think you're missing the point.
           | 
           | Showing off moderately complicated results that are actually
           | not indicative of performance because they are sniped by the
           | training data turns this from a cool demo to a parlor trick.
           | 
           | Stating that, aha, jokes on you, that's the status quo, is an
           | even bigger indictment.
        
           | jan_Sate wrote:
           | Not exactly. The real utility value of LLM for programming is
           | to come up with something new. For Space Invaders, instead of
           | using LLM for that, I might as well just manually search for
           | the code online and use that.
           | 
           | To show that LLM actually can provide value for one-shot
           | programming, you need to find a problem that there's no fully
           | working sample code available online. I'm not trying to say
           | that LLM couldn't to that. But just because LLM can come up
           | with a perfectly-working Space Invaders doesn't mean that it
           | could do that.
        
             | devmor wrote:
             | > The real utility value of LLM for programming is to come
             | up with something new.
             | 
             | That's the goal for these projects anyways. I don't know
             | that its true or feasible. I find the RAG models much more
             | interesting myself, I see the technology as having far more
             | value in search than generation.
             | 
             | Rather than write some markov-chain reminiscent
             | frankenstein function when I ask it how to solve a problem,
             | I would like to see it direct me to the original sources it
             | would use to build those tokens, so that I can see their
             | implementations in context and use my judgement.
        
               | simonw wrote:
               | "I would like to see it direct me to the original sources
               | it would use to build those tokens"
               | 
               | Sadly that's not feasible with transformer-based LLMs:
               | those original sources are _long gone_ by the time you
               | actually get to use the model, scrambled a billion times
               | into a trained set of weights.
               | 
               | One thing that helped me understand this is understanding
               | that every single token output by an LLM is the result of
               | a calculation that considers _all X billion parameters_
               | that are baked into that model (or a subset of that in
               | the case of MoE models, but it 's still billions of
               | floating point calculations for every token.)
               | 
               | You can get an imitation of that if you tell the model
               | "use your search tool and find example code for this
               | problem and build new code based on that", but that's a
               | pretty unconventional way to use a model. A key component
               | of the value of these things is that they can spit out
               | completely new code based on the statistical patterns
               | they learned through training.
        
               | devmor wrote:
               | I am aware, and that's exactly why I don't think they're
               | anywhere near as useful for this type of work as the
               | people pushing them want them to be.
               | 
               | I tried to push for this type of model when an org I
               | worked with over a decade ago was first exploring using
               | the first generation of Tensorflow to drive customer
               | service chatbots and was sadly ignored.
        
               | simonw wrote:
               | I don't understand. For code, why would I want to remix
               | existing code snippets?
               | 
               | I totally get the value of RAG style patterns for
               | information retrieval against factual information - for
               | those I don't want the LLM to answer my question
               | directly, I want it to run a search and show me a
               | citation and directly quote a credible source as part of
               | answering.
               | 
               | For code I just want code that works - I can test it
               | myself to make sure it does what it's supposed to.
        
               | devmor wrote:
               | > I don't understand. For code, why would I want to remix
               | existing code snippets?
               | 
               | That is what you're doing already. You're just relying on
               | a vector compression and search engine to hide it from
               | you and hoping the output is what you expect, instead of
               | having it direct you to where it remixed those snippets
               | from so you can see how they work to start with and make
               | sure its properly implemented from the get-go.
               | 
               | We all want code that works, but understanding that code
               | is a critical part of that for anything but a throw-away
               | one time use script.
               | 
               | I don't really get this desire to replace critical
               | thought with hoping and testing. It sounds like the pipe
               | dream of a middle manager, not a tool for a programmer.
        
               | stavros wrote:
               | I don't understand your point. You seem to be saying that
               | we should be getting code from the source, then adapting
               | it to our project ourselves, instead of getting adapted
               | code to begin with.
               | 
               | I'm going to review the code anyway, why would I not want
               | to save myself some of the work? I can "see how they
               | work" after the LLM gives them to me just fine.
        
               | devmor wrote:
               | The work that you are "saving" is the work of using your
               | brain to determine the solution to the problem. Whatever
               | the LLM gives you doesn't have a context it is used in
               | other than your prompt - you don't even know what it does
               | until after you evaluate it.
               | 
               | If you instead have a set of sources related to your
               | problem, they immediately come with context, usage and in
               | many cases, developer notes and even change history to
               | show you mistakes and adaptations.
               | 
               | You're ultimately creating more work for yourself* by
               | trying to avoid work, and possibly ending up with an
               | inferior solution in the process. Where is your sense of
               | efficiency? Where is your pride as a intellectual?
               | 
               | * Yes, you are most likely creating more work for
               | yourself even if you think you are capable of telling
               | otherwise. [1]
               | 
               | 1. https://metr.org/blog/2025-07-10-early-2025-ai-
               | experienced-o...
        
               | stavros wrote:
               | Thanks for the concern, but I'm perfectly able to judge
               | for myself whether I'm creating more work or delivering
               | an inferior product.
        
               | simonw wrote:
               | It sounds like you care deeply about learning as much as
               | you can. I care about that too.
               | 
               | I would encourage you to consider that even LLM-generated
               | code can teach you a ton of useful new things.
               | 
               | Go read the source code for my dumb, zero-effort space
               | invaders clone:
               | https://github.com/simonw/tools/blob/main/space-invaders-
               | GLM...
               | 
               | There's a bunch of useful lessons to be picked up even
               | from that!
               | 
               | - Examples of CSS gradients, box shadows and flexbox
               | layout
               | 
               | - CSS keyframe animation
               | 
               | - How to implement keyboard events in JavaScript
               | 
               | - A simple but effective pattern for game loops against a
               | Canvas element, using requestAnimationFrame
               | 
               | - How to implement basic collision detection
               | 
               | If you've written games like this before these may not be
               | new to you, but I found them pretty interesting.
        
             | tracker1 wrote:
             | I have a friend who has been doing just that... usually
             | with his company he manages a handful of projects where a
             | bulk of the development is outsourced overseas. This past
             | year, he's outpaced the 6 devs he's had working on misc
             | projects just with his own efforts and AI. Most of this
             | being a relatively unique combination of UX with features
             | that are less common.
             | 
             | He's using AI with note taking apps for meetings to enhance
             | notes and flush out technology ideas at a higher level,
             | then refining those ideas into working experiments.
             | 
             | It's actually impressive to see. My personal experience has
             | been far more disappointing to say the least. I can't speak
             | to the code quality, consistency or even structure in terms
             | of most people being able to maintain such applications
             | though. I've asked to shadow him through a few of his vibe
             | coding sessions to see his workflow. It feels rather alien
             | to me, again my experience is much more disappointing in
             | having to correct AI errors.
        
               | nottorp wrote:
               | Is this the same person who posted about launching 17
               | "products" in one year a few days ago on HN? :)
        
               | tracker1 wrote:
               | No, he's been working on building a larger eLearning
               | solution with some interesting workflow analytics around
               | courseware evaluation and grading. He's been involved in
               | some of the newer LRS specifications and some
               | implementation details to bridge training as well as real
               | world exposure scenarios. Working a lot with first
               | responders, incident response training etc.
               | 
               | I've worked with him off and on for years from simulating
               | aircraft diagnostics hardware to incident command
               | simulation and setting up core infrastructure for F100
               | learning management backends.
        
           | Aurornis wrote:
           | > These kinds of comments are really missing the point.
           | 
           | I disagree. In my experience, asking coding tools to produce
           | something similar to all of the tutorials and example code
           | out there works amazingly well.
           | 
           | Asking them to produce novel output that doesn't match the
           | training set produces very different results.
           | 
           | When I tried multiple coding agents for a somewhat unique
           | task recently they all struggled, continuously trying to pull
           | the solution back to the standard examples. It felt like an
           | endless loop of the models grinding through a solution and
           | then spitting out something that matched common examples,
           | after which I had to remind them of the unique properties of
           | the task and they started all over again, eventually arriving
           | back in the same spot.
           | 
           | It shows the reality of working with LLMs and it's an
           | important consideration.
        
           | AlexeyBrin wrote:
           | You are reading too much into my comment. My point was that
           | the test (a Space Invaders clone) used to asses the model is
           | irrelevant for some time now. I could have gotten a similar
           | result with Mistral Small a few months ago.
        
           | stolencode wrote:
           | It's amazing that none of you even try to falsify you claims
           | anymore. You can literally just put some of the code in a
           | search engine and find the prior art example:
           | 
           | https://www.web-leb.com/en/code/2108
           | 
           | Your "AI tools" are just "copyright whitewashing machines."
           | 
           | These kinds of comments are really ignoring reality.
        
         | elif wrote:
         | Most likely this comment included countless similar comments in
         | its training data, likely all synthetic without any actual
         | tether to real analysis.
        
         | Conflonto wrote:
         | That sounds so dismissive.
         | 
         | I was not able to just download a 8-16GB File and then it would
         | be able to generate A LOT of different tools, games etc. for me
         | in multiply programming languages while in parallel ELI5 me
         | research papers, generate svgs and a lot lot lot more.
         | 
         | But hey.
        
         | phkahler wrote:
         | I find the visual similarity to breakout kind of interesting.
        
         | gblargg wrote:
         | The real test is if you can have it tweak things. Have the ship
         | shoot down. Have the space invaders come from the left and
         | right. Add two player simultaneous mode with two ships.
        
           | wizzwizz4 wrote:
           | It can _usually_ tweak things, if given specific instruction,
           | but it doesn 't know when to refactor (and can't reliably
           | preserve functionality when it does), so the program gets
           | further and further away from something sensible until it
           | can't make edits any more.
        
             | simonw wrote:
             | For serious projects you can address that by writing (or
             | having it write) unit tests along the way, that way it can
             | run in a loop and avoid breaking existing functionality
             | when it adds new changes.
        
               | greesil wrote:
               | Okay ask it to write unit tests for space invaders next
               | time :)
        
       | NitpickLawyer wrote:
       | > Two years ago when I first tried LLaMA I never dreamed that the
       | same laptop I was using then would one day be able to run models
       | with capabilities as strong as what I'm seeing from GLM 4.5 Air--
       | and Mistral 3.2 Small, and Gemma 3, and Qwen 3, and a host of
       | other high quality models that have emerged over the past six
       | months.
       | 
       | Yes, the open-models have surpassed my expectations in both
       | quality and speed of release. For a bit of context, when chatgpt
       | launched in Dec22, the "best" open models were GPT-J(~6-7B) and
       | GPT-neoX (~22B?). I actually had an app running live, with users,
       | using gpt-j for ~1 month. It was a pain. The quality was abysmal,
       | there was no instruction following (you had to start your prompt
       | like a story, or come up with a bunch of examples and hope the
       | model will follow along) and so on.
       | 
       | And then something happened, LLama models got "leaked" (I still
       | think it was a on purpose leak - don't sue us, we never meant to
       | release, etc), and the rest is history. With L1 we got lots of
       | optimisations like quantised models, fine-tuning and so on, L2
       | really saw fine-tuning go off (most of the fine-tunes were better
       | than what meta released), we got alpaca showing off LoRA, and
       | then a bunch of really strong models came out (mistrals,
       | mixtrals, L3, gemmas, qwens, deepseeks, glms, granites, etc.)
       | 
       | By some estimations the open models are ~6mo behind what SotA
       | labs have released. (note that doesn't mean the labs are
       | releasing their best models, it's likely they keep those in house
       | to use on next runs data curation, synthetic datasets, for
       | distilling, etc). Being 6mo behind is NUTS! I never in my wildest
       | dreams believed we'll be here. In fact I thought it would take
       | ~2years to reach gpt3.5 levels. It's really something insane that
       | we get to play with these models "locally", fine-tune them and so
       | on.
        
         | tonyhart7 wrote:
         | is GLM 4.5 better than Qwen3 coder??
        
           | diggan wrote:
           | For what? It's really hard to say what model is "generally"
           | better then another, as they're all better/worse at specific
           | things.
           | 
           | My own benchmarks has a bunch of different tasks I use
           | various local models for, and I run it when I wanna see if a
           | new model is better than the existing ones I use. The output
           | is basically a markdown table with a description of which
           | model is best for what task.
           | 
           | They're being sold as general purpose things that are
           | better/worse at _everything_ but reality doesn 't reflect
           | this, they all have very specific tasks they're worse/better
           | at, and the only way to find that out is by having a private
           | benchmark you run yourself.
        
             | kelvinjps10 wrote:
             | coding? they are coding models? what specific tasks is one
             | performing better than the other?
        
               | diggan wrote:
               | They may be, but there are lots of languages, lots of
               | approaches, lots of methodologies and just a ton of
               | different ways to "code", coding isn't one homogeneous
               | activity that one model beats all the other models at.
               | 
               | > what specific tasks is one performing better than the
               | other?
               | 
               | That's exactly why you create your own benchmark, so you
               | can figure that out by just having a list of models,
               | instead of testing each individually and basing it on
               | "feels better".
        
               | whimsicalism wrote:
               | glm 4.5 is not a coding model
        
               | simonw wrote:
               | It may not be code-only, but it was trained extensively
               | for coding:
               | 
               | > Our base model undergoes several training stages.
               | During pre-training, the model is first trained on 15T
               | tokens of a general pre-training corpus, followed by 7T
               | tokens of a code & reasoning corpus. After pre-training,
               | we introduce additional stages to further enhance the
               | model's performance on key downstream domains.
               | 
               | From my notes here:
               | https://simonwillison.net/2025/Jul/28/glm-45/
        
               | whimsicalism wrote:
               | yes, all reasoning models currently are, but it's not
               | like ds coder or qwen coder
        
               | simonw wrote:
               | I don't see how the training process for GLM-4.5 is
               | materially different from that used for
               | Qwen3-235B-A22B-Instruct-2507 - they both did a ton of
               | extra reinforcement learning training related to code.
               | 
               | Am I missing something?
        
               | whimsicalism wrote:
               | I think the primary thing you're missing is that
               | Qwen3-235B-A22B-Instruct-2507 !=
               | Qwen3-Coder-480B-A35B-Instruct. And the difference there
               | is that while both do tons of code RL, in one they do not
               | monitor performance on anything else for
               | forgetting/regression and focus fully on code post-
               | training pipelines and it is not meant for other tasks.
        
           | NitpickLawyer wrote:
           | I haven't tried them (released yesterday I think?). The
           | benchmarks look good (similar I'd say) but that's not saying
           | much these days. The best test you can do is have a couple of
           | cases that match your needs, and run them yourself w/ the
           | cradle that you are using (aider, cline, roo, any of the CLI
           | tools, etc). Openrouter usually has them up soon after
           | launch, and you can run a quick test really cheap (and only
           | deal with one provider for billing & stuff).
        
         | genewitch wrote:
         | I'll bite. How do i train/make and/or use LoRA, or, separately,
         | how do i fine-tune? I've been asking this for months, and no
         | one has a decent answer. websearch on my end is seo/geo-spam,
         | with no real instructions.
         | 
         | I know how to make an SD LoRA, and use it. I've known how to do
         | that for 2 years. So what's the big secret about LLM LoRA?
        
           | minimaxir wrote:
           | If you're using Hugging Face transformers, the library you
           | want to use is peft:
           | https://huggingface.co/docs/peft/en/quicktour
           | 
           | There are Colab Notebook tutorials around training models
           | with it as well.
        
           | notpublic wrote:
           | https://github.com/unslothai/unsloth
           | 
           | I'm not sure if it contains exactly what you're looking for,
           | but it includes several resources and notebooks related to
           | fine-tuning LLMs (including LoRA) that I found useful.
        
           | techwizrd wrote:
           | We have been fine-tuning models using Axolotl and Unsloth,
           | with a slight preference for Axolotl. Check out the docs [0]
           | and fine-tune or quantize your first model. There is a lot to
           | be learned in this space, but it's exciting.
           | 
           | 0: https://axolotl.ai/ and https://docs.axolotl.ai/
        
             | syntaxing wrote:
             | What hardware do you train on using axolotl? I use unsloth
             | with Google colab pro
        
             | arkmm wrote:
             | When do you think fine tuning is worth it over prompt
             | engineering a base model?
             | 
             | I imagine with the finetunes you have to worry about self-
             | hosting, model utilization, and then also retraining the
             | model as new base models come out. I'm curious under what
             | circumstances you've found that the benefits outweigh the
             | downsides.
        
               | whimsicalism wrote:
               | finetuning rarely makes sense unless you are an
               | enterprise and even generally doesn't in most cases there
               | either.
        
               | tough wrote:
               | only for narrow applications where your fine tune can let
               | you use a smaller model locally , specialised and trained
               | for your specific use-case mostly
        
               | reissbaker wrote:
               | For self-hosting, there are a few companies that offer
               | per-token pricing for LoRA finetunes (LoRAs are basically
               | efficient-to-train, efficient-to-host finetunes) of
               | certain base models:
               | 
               | - (shameless plug) My company, Synthetic, supports LoRAs
               | for Llama 3.1 8b and 70b: https://synthetic.new All you
               | need to do is give us the Hugging Face repo and we take
               | care of the rest. If you want other people to try your
               | model, we charge usage to them rather than to you. (We
               | can also host full finetunes of anything vLLM supports,
               | although we charge by GPU-minute for full finetunes
               | rather than the cheaper per-token pricing for supported
               | base model LoRAs.)
               | 
               | - Together.ai supports a slightly wider number of base
               | models than we do, with a bit more config required, and
               | any usage is charged to you.
               | 
               | - Fireworks does the same as Together, although they
               | quantize the models more heavily (FP4 for the higher-end
               | models). However, they support Llama 4, which is pretty
               | nice although fairly resource-intensive to train.
               | 
               | If you have reasonably good data for your task, and your
               | task is relatively "narrow" (i.e. find a specific kind of
               | bug, rather than general-purpose coding; extract a
               | specific kind of data from legal documents rather than
               | general-purpose reasoning about social and legal matters;
               | etc), finetunes of even a very small model like an 8b
               | will typically outperform -- by a pretty wide margin --
               | even very large SOTA models while being a lot cheaper to
               | run. For example, if you find yourself hand-coding
               | heuristics to fix some problem you're seeing with an
               | LLM's responses, it's probably more robust to just train
               | a small model finetune on the data and have the finetuned
               | model fix the issues rather than writing hardcoded
               | heuristics. On the other hand, no amount of finetuning
               | will make an 8b model a better general-purpose coding
               | agent than Claude 4 Sonnet.
        
               | delijati wrote:
               | Do you maybe know if there is a company in the EU that
               | hosts models (DeepSeek, Qwen3, Kimi)?
        
           | svachalek wrote:
           | For completeness, for Apple hardware MLX is the way to go.
        
             | w10-1 wrote:
             | MLX github: https://github.com/ml-explore/mlx
             | 
             | get started:
             | https://developer.apple.com/videos/play/wwdc2025/315/
             | 
             | details:
             | https://developer.apple.com/videos/play/wwdc2025/298/
        
           | qcnguy wrote:
           | LLM fine tuning tends to destroy the model's capabilities if
           | you aren't very careful. It's not as easy or effective as
           | with image generation.
        
           | electroglyph wrote:
           | unsloth is the easiest way to finetune due to the lower
           | memory requirements
        
           | pdntspa wrote:
           | Have you tried asking an LLM?
        
           | jasonjmcghee wrote:
           | brev.dev made an easy to follow guide a while ago but
           | apparently Nvidia took it down or something when they bought
           | them?
           | 
           | So here's the original
           | 
           | https://web.archive.org/web/20231127123701/https://brev.dev/.
           | ..
        
         | Nesco wrote:
         | Zuck wouldn't have leaked it on 4chan of all the places
        
           | tough wrote:
           | prob just told an employee to get it done no?
        
           | vaenaes wrote:
           | Why not?
        
       | pulkitsh1234 wrote:
       | Is there any website to see the minimum/recommended hardware
       | required for running local LLMs? Much like 'system requirements'
       | mentioned for games.
        
         | GaggiX wrote:
         | https://apxml.com/tools/vram-calculator
         | 
         | This one is very good in my opinion.
        
           | jxf wrote:
           | Don't think it has the GLM series on there yet.
        
         | knowaveragejoe wrote:
         | If you have a HuggingFace account, you can specify the hardware
         | you have and it will show on any given model's page what you
         | can run.
        
         | CharlesW wrote:
         | > _Is there any website to see the minimum /recommended
         | hardware required for running local LLMs?_
         | 
         | LM Studio (not exclusively, I'm sure) makes it a no-brainer to
         | pick models that'll work on your hardware.
        
         | qingcharles wrote:
         | This can be a useful resource too:
         | 
         | https://www.reddit.com/r/LocalLLaMA/
        
         | svachalek wrote:
         | In addition to the tools other people responded with, a good
         | rule of thumb is that most local models work best* at q4
         | quants, meaning the memory for the model is a little over half
         | the number of parameters, e.g. a 14b model may be 8gb. Add some
         | more for context and maybe you want 10gb VRAM for a 14gb model.
         | That will at least put you in the right ballpark for what
         | models to consider for your hardware.
         | 
         | (*best performance/size ratio, generally if the model easily
         | fits at q4 you're better off going to a higher parameter count
         | than going for a larger quant, and vice versa)
        
           | nottorp wrote:
           | > maybe you want 10gb VRAM for a 14gb model
           | 
           | ... or if you have Apple hardware with their unified memory,
           | whatever the assholes soldered in is your limit.
        
       | bradly wrote:
       | I appreciate you sharing both the chat log and the full source
       | code. I would be interested to see a followup post on how adding
       | moderately-sized features like High Score go.
       | 
       | Also, IANAL but Space Invaders is owned IP. I have no idea the
       | legality of a blog post describing steps to create and releasing
       | an existing game, but I've seen headlines on HN of engs in
       | trouble for things I would not expect to be problematic. Maybe
       | Space Invaders is in q-tip/band-aid territory at this point?, but
       | if this was Zelda instead of Space Invaders, I could see things
       | being more dicey.
        
         | Joker_vD wrote:
         | > Space Invaders is owned IP
         | 
         | So is Tetris. And I believe that Snake is also an owned IP
         | although I could be wrong on this one.
        
         | sowbug wrote:
         | It doesn't infringe any kind of intellectual property.
         | 
         | This isn't copyright infringement; it isn't based on the
         | original assembly code or artwork. A game concept can't be
         | copyrighted. Even if one of SI's game mechanics were patented,
         | it would have long expired. Trade secret doesn't apply in this
         | situation.
         | 
         | That leaves trademark. No reasonable person would be confused
         | whether Simon is trying to pass this creation off as a genuine
         | Space Invaders product.
        
           | 9rx wrote:
           | _> No reasonable person would be confused whether Simon is
           | trying to pass this creation off as a genuine Space Invaders
           | product._
           | 
           | There may be no reasonable confusion, but trademark holders
           | also have to protect against dilution of their brand, if they
           | want to retain their trademark. With use like this, people
           | might come to think of Space Invaders as a generic term for
           | all games of this type, not the brand of a specific game.
           | 
           | (there is a strong case to be made that they already do,
           | granted)
        
       | pamelafox wrote:
       | Alas, my 3 year old Mac has only 16 GB RAM, and can barely run a
       | browser without running out of memory. It's a work-issued Mac,
       | and we only get upgrades every 4/5 years. I must be content with
       | 8B parameters models from Ollama (some of which are quite good,
       | like llama3.1:8b).
        
         | e1gen-v wrote:
         | Just download more ram!
        
         | GaggiX wrote:
         | Reasoning models like qwen3 are even better, and they have more
         | options, for example you can choose the 14B model (at the usual
         | 4KM quantization) instead of the 8B model.
        
           | pamelafox wrote:
           | Are they quantized more effectively than the non-reasoning
           | models for some reason?
        
             | GaggiX wrote:
             | There is no difference, you can choose a 6 bits
             | quantization if you prefer, at that point it's essentially
             | lossless.
        
         | dreamer7 wrote:
         | I am able to run Gemma 3 12B on my M1 MBP 16GB. It is pretty
         | good at logic and reasoning!
        
         | __mharrison__ wrote:
         | Odd. My MBP has 16 GB and I routinely have 5 browsers windows
         | open. Most of them have 5-20 tabs. I'm also routinely running
         | vi vscode and editing videos with davinci resolve without
         | issue.
         | 
         | My only memory issue that I can remember is an OBS memory leak,
         | otherwise these MBPs incredible hardware. I wish any other
         | company could actually deliver a comparable machine.
        
           | pamelafox wrote:
           | I was exaggerating slightly - I think it's some combo of the
           | apps I use: Edge, Teams, Discord, VS Code, Docker. When I get
           | the RAM popup once a week, I typically have to close a few of
           | those, whichever is using the most memory according to
           | Activity Monitor. I've also got very little hard drive space
           | on my machine, about 15 GB free, so that makes it harder for
           | me to download the larger models. I keep trying to clear
           | space, even using CleanMyMac, but I somehow keep filling it
           | up.
        
       | neutronicus wrote:
       | If I understand correctly, the author is managing to run this
       | model on a laptop with 64GB of RAM?
       | 
       | So a home workstation with 64GB+ of RAM could get similar
       | results?
        
         | lynndotpy wrote:
         | The laptop has "unified RAM", so that's like 64GB of VRAM.
        
         | simonw wrote:
         | Only if that RAM is available to a GPU, or you're willing to
         | tolerate extremely slow responses.
         | 
         | The neat thing about Apple Silicon is the system RAM is
         | available to the GPU. On most other systems you would need
         | ~48GB of VRAM.
        
           | xrd wrote:
           | Aren't there non-Macos laptops which also support sharing the
           | VRAM and regular RAM, i.e. iGPU?
           | 
           | https://www.reddit.com/r/GamingLaptops/comments/1akj5aw/what.
           | ..
           | 
           | I personally want to run linux and feel like I'll get a
           | better price/GB offering that way. But, it is confusing to
           | know how local models will actually work on those and the
           | drawbacks of iGPU.
        
             | mft_ wrote:
             | iGPUs are typically weak, and/or aren't capable of running
             | the LLM so the CPU is used instead. You _can_ run things
             | this way, but it 's not fast, and it gets slower as the
             | models go up in size.
             | 
             | If you want things to run quickly, then aside from Macs,
             | there's the 2025 ASUS Flow z13 which (afaik) is the only
             | laptop with AMD's new Ryzen Max+ 395 processor. This is
             | powerful _and_ has up to 128Gb of RAM that can be shared
             | with the GPU, but they 're very rare (and Mac-expensive) at
             | the moment.
             | 
             | The other variable for running LLMs quickly is memory
             | bandwidth; the Max+ 395 has 256Gb/s, which is similar to
             | the M4 Pro; the M4 Max chips are considerably higher. Apple
             | fell on their feet on this one.
        
         | simlevesque wrote:
         | Not so sure. The MBP uses hybrid memory, the ram is shared with
         | the cpu and gpu.
         | 
         | Your 64gb workstation doesn't share the ram with your gpu.
        
         | NitpickLawyer wrote:
         | > So a home workstation with 64GB+ of RAM could get similar
         | results?
         | 
         | Similar in quality, but CPU generation will be slower than what
         | macs can do.
         | 
         | What you can do with MoEs (GLMs and Qwens) is to run _some_
         | experts (the shared ones usually) on a GPU (even a 12GB /16GB
         | will do) and the rest from RAM on CPU. That will speed things
         | up considerably (especially prompt processing). If you're
         | interested in this, look up llama.cpp and especially ik_llama,
         | which is a fork dedicated to this kind of selective offloading
         | of experts.
        
         | 0x457 wrote:
         | You can run, it will just run on CPU and will be pretty slow.
         | Macs, like everyone in this thread said, use unified memory, so
         | it's 64GB between CPU and GPU, while for you its just 64 for
         | CPU.
        
       | larodi wrote:
       | Is probably more correct to say - my 2.5 year laptop can RETELL
       | space invaders. Pretty sure it cannot write a game it has never
       | seen, so you can even say - my old laptop can now do this fancy
       | extraction of data from a smart probabilistic blob, where the
       | original things are retold in new colours and forms :)
        
         | simonw wrote:
         | I know these models can build games and apps they've never seen
         | before because I've already observed them doing exactly that
         | time and time again.
         | 
         | If you haven't seen that yourself yet I suggest firing up the
         | free, no registration required GLM-4.5 Air on
         | https://chat.z.ai/ and seeing if you can prove yourself wrong.
        
         | oceanplexian wrote:
         | So you're saying it works exactly the same way as humans, who
         | copied Space Invaders from Breakout which came out in 1976.
        
         | uludag wrote:
         | It's unfortunate that the ideas of things to test first are
         | exactly the things more likely to be contained in training
         | data. Hence why the pelican on a bicycle was such a good test,
         | until it became viral.
        
         | MattRix wrote:
         | No, that would be incorrect, nobody uses "retell" like that.
         | 
         | The impressive thing about these models is their ability to
         | write working code, not their ability to come up with unique
         | ideas. These LLMs actually can come up with unique ideas as
         | well, though I think it's more exciting that they can help
         | people execute human ideas instead.
        
       | anthk wrote:
       | Writting a Z80 emulator with the original Space Invaders ROM will
       | make you more fullfilled.
       | 
       | Either with SDL2+C, or even TCL/Tk, or Pythn with TKInter.
        
       | vFunct wrote:
       | please please apple give us a M5 MacBook Pro laptop with 2TB of
       | unified memory please please
        
       | stpedgwdgfhgdd wrote:
       | Aside that space invaders from scratch is not representative for
       | real engineering, it will be interesting to see what the business
       | model for Anthropic will be if I can run a solid code generation
       | model on my local machine (no usage tier per hour or week), let's
       | say, one year from now. At $200 per month for 2 years I can buy a
       | decent Mx with 64GB (or perhaps even 128GB taking residual value
       | into account)
        
         | falcor84 wrote:
         | How come it's "not representative for real engineering"? Other
         | than copy-pasting existing code (which is not what an LLM
         | does), I don't see how you can create a space invaders game
         | without applying "engineering".
        
           | phkahler wrote:
           | >> Other than copy-pasting existing code (which is not what
           | an LLM does)
           | 
           | I'd like to see someone try to prove this. How many space
           | invaders projects exist on the internet? I'd be hard to
           | compare model "generated" code to everything out there
           | looking for plagiarism, but I bet there are lots of snippets
           | pulled in. These things are NOT smart, they are huge and
           | articulate information repositories.
        
             | simonw wrote:
             | Go for it. https://www.google.com/search?client=firefox-b-1
             | -d&q=github+... has a bunch of results. Here's the source
             | code GLM-4.5 Air spat out for me on my laptop:
             | https://github.com/simonw/tools/blob/main/space-invaders-
             | GLM...
             | 
             | Based on my mental model of how these things work I'll be
             | genuinely surprised if you can find even a few lines of
             | code duplicated from one of those projects into the code
             | that GLM-4.5 wrote for me.
        
               | phkahler wrote:
               | So I scanned the beginning of the generated code, picked
               | line 83:                 animation: glow 2s ease-in-out
               | infinite;
               | 
               | stuffed it verbatim into google and found a stack
               | overflow discussion that contained this:
               | animation: glow .5s infinite alternate;
               | 
               | in under one minute. Then I found this page of CSS
               | effects:
               | 
               | https://alvarotrigo.com/blog/animated-backgrounds-css/
               | 
               | Another page has examples and contains:
               | animation: float 15s infinite ease-in-out;
               | 
               | There is just too much internet to scan for an exact
               | match or a match of larger size.
        
               | simonw wrote:
               | That's not an example of copying from an existing Space
               | Invaders implementation. That's an LLM using a CSS
               | animation pattern - one that it's seen thousands
               | (probably millions) of times in the training data.
               | 
               | That's what I expect these things to do: they break down
               | Space Invaders into the components they need to build,
               | then mix and match thousands of different coding patterns
               | (like "animation: glow 2s ease-in-out infinite;") to
               | implement different aspects of that game.
               | 
               | You can see that in the "reasoning" trace here: https://g
               | ist.github.com/simonw/9f515c8e32fb791549aeb88304550... -
               | "I'll use a modern design with smooth animations,
               | particle effects, and a retro-futuristic aesthetic."
        
               | threeducks wrote:
               | I think LLMs are adapting higher level concepts. For
               | example, the following JavaScript code generated by GLM (
               | https://github.com/simonw/tools/blob/9e04fd9895fae1aa9ac7
               | 8b8...) is clearly inspired by this C++ code
               | (https://github.com/portapack-mayhem/mayhem-
               | firmware/blob/28e...), but it is not an exact copy.
        
               | simonw wrote:
               | This is a really good spot.
               | 
               | That code certainly looks similar, but I have trouble
               | imagining how else you would implement very basic
               | collision detection between a projectile and a player
               | object in a game of this nature.
        
               | threeducks wrote:
               | A human would likely have refactored the two collision
               | checks between bullet/enemy and enemyBullet/player in the
               | JavaScript code into its own function, perhaps something
               | like "areRectanglesOverlapping". The C++ code only does
               | one collision check like that, so it has not been
               | refactored there, but as a human, I certainly would not
               | want to write that twice.
               | 
               | More importantly, it is not just the collision check that
               | is similar. Almost the entire sequence of operations is
               | identical on a higher level:                   1.
               | enemyBullet/player collision check         2. same
               | comment "// Player hit!" (this is how I found the code)
               | 3. remove enemy bullet from array         4. decrement
               | lives         5. update lives UI         6.
               | (createParticle only exists in JS code)         7. if
               | lives are <= 0, gameOver
        
               | falcor84 wrote:
               | The parent said
               | 
               | > find even a few lines of code duplicated from one of
               | those projects
               | 
               | I'm pretty sure they meant multiple lines copied verbatim
               | from a single project implementing space invaders, rather
               | than individual lines copied (or likely just accidentally
               | identical) across different unrelated projects.
        
               | ben_w wrote:
               | So, your example of it copying snippets is... using the
               | same API with fairly different parameters in a different
               | order?
        
               | sejje wrote:
               | Is this some kind of joke?
               | 
               | That's how you write css. The examples aren't the same at
               | all, they just use the same css feature.
               | 
               | It feels like you aren't a coder--you've sabotaged your
               | own point.
        
             | ben_w wrote:
             | Sorites paradox. Where's the distinction between "snippet"
             | and "a design pattern"?
             | 
             | Compressing a few petabytes into a few gigabytes _requires_
             | that they can 't be like this about all of the things
             | they're accused of simply copy-pasting, from code to
             | newspaper articles to novels. There's not enough space.
        
           | hbn wrote:
           | The prompt was
           | 
           | > Write an HTML and JavaScript page implementing space
           | invaders
           | 
           | It may not be "copy pasting" but it's generating output as
           | best it can be recreated from its training on looking at
           | Space Invaders source code.
           | 
           | The engineers at Taito that originally developed Space
           | Invaders were not told "make Space Invaders" and then did
           | their best to recall all the source code they've looked at in
           | their life to re-type the source code to an existing game.
           | From a logistics standpoint, where the source code already
           | exists and is accessible, you may as well have copy-pasted it
           | and fudged a few things around.
        
             | simonw wrote:
             | The source code for original Space Invaders from 1978 has
             | never been published. The closest to that is disassembled
             | ROMs.
             | 
             | I used that prompt because it's the shortest possible
             | prompt that tells the model to build a game with a specific
             | set of features. If I wanted to build a custom game I would
             | have had to write a prompt that was many paragraphs longer
             | than that.
             | 
             | The aim of this piece isn't "OMG looks LLMs can build space
             | invaders" - at this point that shouldn't be a surprise to
             | anyone. What's interesting is that _my laptop_ can run a
             | model that is capable of that now.
        
               | sarchertech wrote:
               | > The source code for original Space Invaders from 1978
               | has never been published. The closest to that is
               | disassembled ROMs.
               | 
               | Sure but that doesn't impact the OPs point at all because
               | there are numerous copies of reverse engineered source
               | code available.
               | 
               | There are numerous copies of the reverse engineered
               | source code already translated to JavaScript in your
               | models training set.
        
               | nottorp wrote:
               | > What's interesting is that my laptop can run a model
               | that is capable of that now.
               | 
               | I'm afraid no one cared much about your point :)
               | 
               | You'll only get "OMG look how good LLMs are they'll get
               | us all fired!" comments and "LLMs suck" comments.
               | 
               | This is how it goes with religion...
        
               | hbn wrote:
               | The discussion I replied to was just regarding whether or
               | not what the LLM did should be considered "engineering"
               | 
               | It doesn't really matter whether or not the original code
               | was published. In fact that original source code on its
               | own probably wouldn't be that useful, since I imagine it
               | wouldn't have tipped the weights enough to be
               | "recallable" from the model, not to mention it was tasked
               | with implementing it in web technologies.
        
           | sharkjacobs wrote:
           | Making a space invaders game is not representative of normal
           | engineering because you're reproducing an existing game with
           | well known specs and requirements. There are probably
           | hundreds of thousands of words describing and discussing
           | Space Invaders in GLM-4.5's training data
           | 
           | It's like using an LLM to implement a red black tree. Red
           | black trees are in the training data, so you don't need to
           | explain or describe what you mean beyond naming it.
           | 
           | "Real engineering" with LLMs usually requires a bunch of up
           | front work creating specs and outlines and unit tests.
           | "Context engineering"
        
             | jasonvorhe wrote:
             | Smells like moving the goal post. What's real engineering
             | to be in 2028? Implementing Google's infra stack in your
             | homelab?
        
         | rafaelmn wrote:
         | What about power used and support hardware ? Also card going
         | down means you are down until you get warranty service.
        
           | skeezyboy wrote:
           | why are you doing anything locally then?
        
         | tptacek wrote:
         | OK, go write Space Invaders by hand.
        
           | LandR wrote:
           | I'd hope most professional software engineers could do this
           | in an afternoon or so?
        
             | sejje wrote:
             | Most professional software engineers have never written a
             | game and don't do web work, so I somehow doubt that.
        
               | anthk wrote:
               | With TCL/TK it's a matter of less than 2 hours.
        
         | dmortin wrote:
         | " it will be interesting to see what the business model for
         | Anthropic will be if I can run a solid code generation model on
         | my local machine "
         | 
         | Most people won't bother with buying powerful hardware for
         | this, they will keep using SAAS solutions, so Anthropic can be
         | in trouble if cheaper SAAS solutions come out.
        
         | qingcharles wrote:
         | The frontier models are always going to tempt you with their
         | higher quality and quicker generation, IMO.
        
           | kasey_junk wrote:
           | I've been mentally mapping tge models to the history of db.
           | 
           | Most db in the early days you had to pay for. There are still
           | for pay db that are just better than ones you don't pay for.
           | Some teams think that the cost is worth the improvements and
           | there is a (tough) business there. Fortunes were made in the
           | early days.
           | 
           | But eventually open source models became good enough for many
           | use cases and they have their own advantages. So lots of
           | teams use them.
           | 
           | I think coding models might have a similar trajectory.
        
             | qingcharles wrote:
             | You make a good point -- a majority of applications are now
             | using open source or free versions[1] of DBs.
             | 
             | My only feedback is: are these the same animal? Can we
             | compare an O/S DB vs. paid/closed DB to me running an LLM
             | locally? The biggest issue right now with LLMs is simply
             | the cost of the _hardware_ to run one locally, not the
             | quality of the actual software (the model).
             | 
             | [1] e.g. SQL Server Express is good enough for a lot of
             | tasks, and I guess would be roughly equivalent to the
             | upcoming open versions of GPT vs. the frontier version.
        
               | qcnguy wrote:
               | A majority of apps nowadays are using proprietary forks
               | of open source DBs running in the cloud, where their
               | feature set is (slightly) rounded out and smoothed off by
               | the cloud vendors.
               | 
               | Not that many projects are doing fully self-hosted RDBMS
               | at this point. So ultimately proprietary databases still
               | win out, they just (ab)use the Postgresql trademark to
               | make people think they're using open source.
               | 
               | LLMs might go the same way. The big clouds offering
               | proprietary fine tunes of models given away by AI labs
               | using investor money?
        
               | qingcharles wrote:
               | That's definitely true. I could see more of the running
               | open source models on other people's hardware model.
               | 
               | I dislike running local LLMs right now because I find the
               | software kinda janky still, you often have to tweak
               | settings, find the right model files. Basically have a
               | bunch of domain knowledge I don't have space for in my
               | head. On top of maintaining a high-spec piece of hardware
               | and paying for the power costs.
        
           | zarzavat wrote:
           | Closed doesn't always win over open. People said the same
           | thing about Windows vs Linux, but even Microsoft was forced
           | to admit defeat and support Linux.
           | 
           | All it takes is some large companies commoditizing their
           | complements. For Linux it was Google, etc. For AI it's Meta
           | and China.
           | 
           | The only thing keeping Anthropic in business is geopolitics.
           | If China were allowed full access to GPUs, they would
           | probably die.
        
       | amelius wrote:
       | Wake me up when I can apt-get install the llm.
        
         | Kurtz79 wrote:
         | You can install ollama with a script fetched with curl and run
         | a llm model with a grand total of two bash commands (including
         | curl).
        
       | jus3sixty wrote:
       | I recently let go of my 2.5 year old vacuum. It was just
       | collecting dust.
        
         | falcor84 wrote:
         | Thinking about it, the measure of whether a vacuum is being
         | sufficiently used is probably that the circulation of dust
         | within it over the last year is greater than the circulation of
         | dust on its external boundary over that time period.
        
       | alankarmisra wrote:
       | I see the value in showcasing that LLMs can run locally on
       | laptops -- it's an important milestone, especially given how
       | difficult that was before smaller models became viable.
       | 
       | That said, for something like this, I'd probably get more out of
       | simply finding an existing implementation on github or the like
       | and downloading that.
       | 
       | When it comes to specialized and narrow domains like Space
       | Invaders, the training set is likely to be extremely small and
       | the model's vector space will have limited room to generalize.
       | You'll get code that is more or less identical to the original
       | source and you also have to wait for it to 'type' the code and
       | the value add seems very low. I would rather ask it to point me
       | to known Space Invaders implementations in language X on github
       | (or search there).
       | 
       | Note that ChatGPT gets very nervous if I put this into GPT to
       | clean up the grammar. It wants very badly for me to stress that
       | LLMs don't memorize and overfitting is very unlikely (I believe
       | neither).
        
         | tossandthrow wrote:
         | Interesting, I can not produce these warnings in ChatGPT -
         | though this is something that really interests me, as it
         | represents immense political power to be able ti interject such
         | warnings (explicitly, or implicitly by slight reformulations)
        
       | efitz wrote:
       | I missed the word "laptop" in the title at first glance and
       | thought this was a "I taught my toddler to code" article.
        
         | juliangoetze wrote:
         | I thought I was the only one.
        
       | joelthelion wrote:
       | Apart from using a Mac, what can you use for inference with
       | reasonable performance? Is a Mac the only realistic option at the
       | moment?
        
         | AlexeyBrin wrote:
         | A gaming PC with an NVIDIA 4090/5090 will be more than adequate
         | for running local models.
         | 
         | Where a Mac may beat the above is on the memory side, if a
         | model requires more than 24/32 GB of GPU memory you are usually
         | better off with a Mac with 64/128 GB of RAM. On a Mac the
         | memory is shared between CPU and GPU, so the GPU can load
         | larger models.
        
         | reilly3000 wrote:
         | The top 3 approaches I see a lot on r/localllama are:
         | 
         | 1. 2-4x 3090+ nvidia cards. Some are getting Chinese 48GB
         | cards. There is a ceiling to vRAM that prevents the biggest
         | models from being able to load, most can run most quants at
         | great speeds
         | 
         | 2. Epyc servers running CPU inference with lots of RAM at as
         | high of memory bandwidth as is available. With these setups
         | people are getting like 5-10 t/s but are able to run 450B
         | parameter models.
         | 
         | 3. High RAM Macs with as much memory bandwidth as possible.
         | They are the best balanced approach and surprisingly reasonable
         | relative to other options.
        
         | thenaturalist wrote:
         | This guy [0] does a ton of in-depth HW comparison/
         | benchmarking, including against Mac mini clusters and an M3
         | ultra.
         | 
         | 0: https://www.youtube.com/@AZisk
        
         | regularfry wrote:
         | This one should just about fit on a box with an RTX 4090 and
         | 64GB RAM (which is what I've got) at q4. Don't know what the
         | performance will be yet. I'm hoping for an unsloth dynamic
         | quant to get the most out of it.
        
           | weberer wrote:
           | Whats important is VRAM, not system RAM. The 4090 has 16gb of
           | VRAM so you'll be limited to smaller models at decent speeds.
           | Of course, you can run models from system memory, but your
           | tokens/second will be orders of magnitude slower. ARM Macs
           | are the exception since they have unified memory, allowing
           | high bandwidth between the GPU and the system's RAM.
        
         | whimsicalism wrote:
         | you are almost certainly better off renting GPUs, but i
         | understand self-hosting is an HN touchstone
        
           | qingcharles wrote:
           | This. Especially if you just want to try a bunch of different
           | things out. Renting is insanely cheap -- to the point where I
           | don't understand how the renters are making their money back
           | unless they stole the hardware and power.
           | 
           | It can really help you figure a ton of things out before you
           | blow the cash on your own hardware.
        
             | 4b11b4 wrote:
             | Recommended sites to rent from
        
               | doormatt wrote:
               | runpod.io
        
               | whimsicalism wrote:
               | runpod, vast, hyperbolic, prime intellect. if all you're
               | doing is going to be running LLMs, you can pay per token
               | on openrouter or some of the providers listed there
        
           | mrinterweb wrote:
           | I don't know about that. I've had my RTX 4090 for nearly 3
           | years now. If I had a script that provisioned and
           | deprovisioned a rented 4090 at $0.70/hr for an 8 hour work
           | day for 20 work days per month. Assuming I get 2 paid weeks
           | off per year + normal holidays over 3 years.
           | 
           | 0.7 * 8 * ((20 * 12) - 8 - 14) * 3 = $3662
           | 
           | I bought my RTX 4090 for about $2200. I also had the pleasure
           | of being able to use it for gaming when I wasn't working. To
           | be fair, the VRAM requirements for local models keeps
           | climbing and my 4090 isn't able to run many of the latest
           | LLMs. Also, I omitted cost of electricity for my local LLM
           | server cost. I have not been measuring total watts consumed
           | by just that machine.
           | 
           | One nice thing about renting is that it give you flexibility
           | in terms of what you want to try.
           | 
           | If you're really looking for the best deals look at 3rd party
           | hosts serving open models for the API-based pricing, or
           | honestly a Claude subscription can easily be worth it if you
           | use LLMs a fair bit.
        
             | whimsicalism wrote:
             | 1. I agree - there are absolutely scenarios in which it can
             | make sense to buy a GPU and run it yourself. If you are
             | managing a software firm with multiple employees, you very
             | well might break even in less than a few years. But I would
             | gander this is not the case for 90%+ of people self-hosting
             | these models, unless they have some other good reason (like
             | gaming) to buy a GPU.
             | 
             | 2. I basically agree with your caveats - excluding
             | electricity is a pretty big exclusion and I don't think
             | that you've had 3 years of really high-value self-hostable
             | models, I would really only say the last year and I'm
             | somewhat skeptical of how good for ones that can be hosted
             | in 24gb vram. 4x4090 is a different story.
        
         | badsectoracula wrote:
         | An Nvidia GPU is the most common answer, but personally i've
         | done all my LLM use locally using mainly Mistral Small
         | 3.1/3.2-based models and llama.cpp with an AMD RX 7900 XTX GPU.
         | It only gives you ~4.71 tokens per second, but that is fast
         | enough for a lot of uses. For example last month or so i wrote
         | a raytracer[0][1] in C with Devstral Small 1.0 (based on
         | Mistral Small 3.1). It wasn't "vibe coding" as much as a "co-
         | op" where i'd go back and forth a chat interface (koboldcpp)
         | and i'd, e.g. ask the LLM to implement some feature, then i'd
         | switch to the editor and start writing code using that feature
         | while the LLM was generating it in the background. Or, more
         | often, i'd fix bugs in the LLM's code :-P.
         | 
         | FWIW GPU aside, my PC isn't particularly new - it is a 5-6 year
         | old PC that was the cheapest money could buy originally and
         | became "decent" at the time i upgraded it ~5 years ago and i
         | only added the GPU around Christmas as prices were dropping
         | since AMD was about to release the new GPUs.
         | 
         | [0] https://i.imgur.com/FevOm0o.png
         | 
         | [1]
         | https://app.filen.io/#/d/e05ae468-6741-453c-a18d-e83dcc3de92...
        
       | joshstrange wrote:
       | My next MBP is going to need the next size up SSD (RIP bank
       | account) so it can hold all the models I want to play with
       | locally and my data. Thankfully I already have been maxing out
       | the RAM so that isn't something new I also have to do.
        
       | __mharrison__ wrote:
       | Time to get a new laptop. My MBP only has 16 gigs.
       | 
       | Looking forward to trying this with Aider.
        
       | sneak wrote:
       | What is the SOTA for benchmarking all of the models you can run
       | on your local machine vs a test suite?
       | 
       | Surely this must exist, no? I want to generate a local
       | leaderboard and perhaps write new test cases.
        
       | petercooper wrote:
       | I ran the same experiment on the full size model. It used a
       | custom 80s style font (from Google Fonts) and gave 'eyes' and
       | more differences to the enemies but otherwise had a similar vibe
       | to Simon's. An interesting visual demonstration of what
       | quantization does though! Screenshot:
       | https://peterc.org/img/aliens.png
        
       | deadbabe wrote:
       | You can overtrain a neural network to write a space invaders
       | clone. The final weights might take up less disk space than the
       | output code.
        
       | indigodaddy wrote:
       | Did pretty well with a boggle clone. I like that it tries to do a
       | single html file (I didn't ask for that but was pleasantly
       | surprised). It didn't include dictionary validation so needed a
       | couple of prompts. Touch selection on mobile isn't the greatest
       | but I've seen plenty worse
       | 
       | https://chat.z.ai/space/z0gcn6qtu8s1-art
       | 
       | https://chat.z.ai/s/74fe4ddc-f528-4d21-9405-0a8b15a96520
        
         | JKCalhoun wrote:
         | Cool -- if only diagonals were easier. ;-) (Hopefully I'm being
         | constructive here.)
        
           | indigodaddy wrote:
           | Yep I tried to have it improve that but actually didn't use
           | the word 'diagonal' in the prompt. I bet it would have done
           | better if I had..
        
             | indigodaddy wrote:
             | Had it try to improve Diagonal selection but didn't seem to
             | help much
             | 
             | https://chat.z.ai/space/b01dc65rg2p0-art
        
         | Keyframe wrote:
         | I went the other route with tetris clone the other day. It's
         | definitely not a single prompt. It took me solid 15 hours until
         | this stage to get here and most of that me thinking.. BUT,
         | except one small trivial thing (space invader logo in pre tag)
         | I haven't touched code - just looked at it. I made it mandatory
         | for myself to see if I can first greenfield myself into this
         | project and then brownfield features and fixes.. It's
         | definitely a ton of work on my end, but it's also not something
         | I'd be able to do in ~2 working days or less. As a cherry on
         | top, even though it's still not done yet, I put in AI-generated
         | music singing about the project itself.
         | https://www.susmel.com/stacky/
         | 
         | Definitely a ton of things I learned about how to "develop"
         | "with" AI along the way.
        
       | lifestyleguru wrote:
       | > my 2.5 year old laptop (a 64GB MacBook Pro M2) i
       | 
       | My MacBook has 16GB of RAM and it is from a period when everyone
       | was fiercely insisting that 8GB base model is all I'll ever need.
        
         | tracker1 wrote:
         | I'm kind of with you... while I've run 128gb on my desktop, and
         | currently at 96gb with dr5 what it is, It's far less common for
         | typical laptops. I'm a bit curious how the Ryzen 395+ with
         | 128gb will handle some of these models. The 200gb options feel
         | completely out of reach.
        
       | Aurornis wrote:
       | This is very cool. The blog had to run it from the main branch of
       | the mlx-lm library and a custom script. Can someone up to date on
       | the local LLM tools let us know which mainstream tools we should
       | be watching for an easier way to run this on MLX? The space moves
       | so fast that it's hard to keep up.
        
         | simonw wrote:
         | I expect LM Studio will have this pretty soon - I imagine they
         | are waiting on the next stable release of mlx-lm which will
         | include the change I needed to get this to work.
        
       | righthand wrote:
       | Did you understand the implementation or just that it produced a
       | result?
       | 
       | I would hope an LLM could spit out a cobbled form of answer to a
       | common interview question.
       | 
       | Today a colleague presented data changes and used an LLM to build
       | a display app for the JSON for presentation. Why did they not
       | just pipe the JSON into our already working app that displays
       | this data?
       | 
       | People around me for the most part are using LLMs to enhance
       | their presentations, not to actually implement anything useful. I
       | have been watching my coworkers use it that way for months.
       | 
       | Another example? A different coworker wanted to build a document
       | macro to perform bulk updates on courseware content. Swapping old
       | words for new words. To build the macro they first wrote a
       | rubrick to prompt an LLM correctly inside of a word doc.
       | 
       | That filled rubrik is then used to generate a program template
       | for the macro. To define the requirements for the macro the
       | coworker then used a slideshow slide to list bullet points of
       | functionality, in this case to Find+Replace words in courseware
       | slides/documents using a list of words from another text
       | document. Due to the complexity of the system, I can't believe my
       | colleague saved any time. The presentation was interesting though
       | and that is what they got compliments on.
       | 
       | However the solutions are absolutely useless for anyone else but
       | the implementer.
        
         | simonw wrote:
         | I scanned the code and understood what it was doing, but I
         | didn't spend much time on it once I'd seen that it worked.
         | 
         | If I'm writing code for production systems using LLMs I still
         | review every single line - my personal rule is I need to be
         | able to explain how it works to someone else before I'm willing
         | to commit it.
         | 
         | I wrote a whole lot more about my approach to using LLMs to
         | help write "real" code here:
         | https://simonwillison.net/2025/Mar/11/using-llms-for-code/
        
           | th0ma5 wrote:
           | [flagged]
        
             | CamperBob2 wrote:
             | I missed the part where he said he was going to put the
             | Space Invaders game into production. Link?
        
             | bnchrch wrote:
             | You do realize your talking to the creator of Django,
             | Datassette, and Lanyrd right?
        
               | tough wrote:
               | that made me chuckle
        
             | ajcp wrote:
             | They said "production systems", not "critical production
             | applications".
             | 
             | Also the 'if' doesn't negate anything as they say "I
             | still", meaning the behavior is actively happening or
             | ongoing; they don't use a hypothetical or conditional after
             | "still", as in "I still _would_ ".
        
             | dang wrote:
             | Please don't cross into personal attack in HN comments.
             | 
             | https://news.ycombinator.com/newsguidelines.html
             | 
             | Edit: twice is already a pattern -
             | https://news.ycombinator.com/item?id=44110785. No more of
             | this, please.
             | 
             | Edit 2: I only just realized that you've been frequently
             | posting abusive replies in a way that crosses into harangue
             | if not harassment:
             | 
             | https://news.ycombinator.com/item?id=44725284 (July 2025)
             | 
             | https://news.ycombinator.com/item?id=44725227 (July 2025)
             | 
             | https://news.ycombinator.com/item?id=44725190 (July 2025)
             | 
             | https://news.ycombinator.com/item?id=44525830 (July 2025)
             | 
             | https://news.ycombinator.com/item?id=44441154 (July 2025)
             | 
             | https://news.ycombinator.com/item?id=44110817 (May 2025)
             | 
             | https://news.ycombinator.com/item?id=44110785 (May 2025)
             | 
             | https://news.ycombinator.com/item?id=44018000 (May 2025)
             | 
             | https://news.ycombinator.com/item?id=44008533 (May 2025)
             | 
             | https://news.ycombinator.com/item?id=43779758 (April 2025)
             | 
             | https://news.ycombinator.com/item?id=43474204 (March 2025)
             | 
             | https://news.ycombinator.com/item?id=43465383 (March 2025)
             | 
             | https://news.ycombinator.com/item?id=42960299 (Feb 2025)
             | 
             | https://news.ycombinator.com/item?id=42942818 (Feb 2025)
             | 
             | https://news.ycombinator.com/item?id=42706415 (Jan 2025)
             | 
             | https://news.ycombinator.com/item?id=42562036 (Dec 2024)
             | 
             | https://news.ycombinator.com/item?id=42483664 (Dec 2024)
             | 
             | https://news.ycombinator.com/item?id=42021665 (Nov 2024)
             | 
             | https://news.ycombinator.com/item?id=41992383 (Oct 2024)
             | 
             | That's abusive, unacceptable, and not even a complete list!
             | 
             | You can't go after another user like this on HN, regardless
             | of how right you are or feel you are or who you have a
             | problem with. If you keep doing this, we're going to end up
             | banning you, so please stop now.
        
           | photon_lines wrote:
           | This is why I love using the Deep-Seek chain of reason output
           | ... I can actually go through and read what it's 'thinking'
           | to validate whether it's basing its solution on valid facts /
           | assumptions. Either way thanks for all of your valuable
           | write-ups on these models I really appreciate them Simon!
        
             | vessenes wrote:
             | Nota bene - there is a fair amount of research that
             | indicates models outputs and 'thoughts' do not necessarily
             | align with their chain of reasoning output.
             | 
             | You can validate this pretty easily by asking some logic or
             | coding questions: you will likely note that a final output
             | is not necessarily the logical output of the end of the
             | thinking; sometimes significantly orthogonal to it, or
             | returning to reasoning in the middle.
             | 
             | All that to say - good idea to read it, but stay vigilant
             | on outputs.
        
           | shortrounddev2 wrote:
           | Serious question: if you have to read every line of code in
           | order to validate it in production, why not just _write_
           | every line of code instead?
        
             | simonw wrote:
             | Because it's much, much faster to review a hundred lines of
             | code than it is to write a hundred lines of code.
             | 
             | (I'm experienced at reading and reviewing code.)
        
               | paufernandez wrote:
               | Simon, don't you fear "atrophy" in your writing ability?
        
         | bsder wrote:
         | > However the solutions are absolutely useless for anyone else
         | but the implementer.
         | 
         | Disposable code is where AI _shines_.
         | 
         | AI generating the boilerplate code for an obtuse build system?
         | Yes, please. AI generating an animation? Ganbatte. (Look at how
         | much work 3Blue1Brown had to put into that--if AI can help that
         | kind of thing, it has my blessings). AI enabling someone who
         | doesn't program to generate _some prototype_ that they can then
         | point at an actual programmer? Excellent.
         | 
         | This is fine because you _don 't need to understand the
         | result_. You have a concrete pass/fail gate and don't care
         | about underneath. This is real value. The problem is that it
         | isn't _gigabuck_ value.
         | 
         | The stuff that would be gigabuck value is unfortunately where
         | AI falls down. Fix this bug in a product. Add this feature to
         | an existing codebase. etc.
         | 
         | AI is also a problem because disposable code is what you would
         | assign to junior programmers in order for them to learn.
        
         | magic_hamster wrote:
         | The LLM is the solution.
        
       | aplzr wrote:
       | I really like talking to Claude (free tier) instead of using a
       | search engine when I'm stumbling upon a random topic that
       | interests me. For example, this morning I had it explain the
       | differences between pass by value, pass by reference, and pass by
       | sharing, the last of which I wasn't aware of until then.
       | 
       | Is this kind of thing also possible with one of these self-hosted
       | models in a comparable way, or are they mostly good for coding?
        
       | dcchambers wrote:
       | Amazing. There really is no secret sauce that the frontier models
       | have.
        
       | accrual wrote:
       | Very impressive model! The SVG pelican designed by GLM 4.5 in
       | Simon's adjacent article is the most accurate I've seen yet.
        
         | 4b11b4 wrote:
         | Quick, someone knit a quilt with all the different SVG pelicans
        
       | bgwalter wrote:
       | The GML-4.5 model utterly fails at creating ASCII art or
       | factorizing numbers. It can "write" Space Invaders because there
       | are literally thousands of open source projects out there.
       | 
       | This is another example of LLMs being dumb copiers that do
       | understand human prompts.
       | 
       | But there is one positive side to this: If this photocopying
       | business can be run locally, the stocks of OpenAI etc. should got
       | to zero.
        
         | simonw wrote:
         | Why would you use an LLM to factorize numbers?
        
           | bgwalter wrote:
           | Because we are told that they can solve IMO problems. Yet
           | they fail at basic math problems, not only at factorization
           | but also when probing them with relatively basic symbolic
           | math that would not require the invocation of an external
           | program.
           | 
           | Also, you know it they fail they could say so instead of
           | giving a hallucinated answer. First the models lie and say
           | that a 20 digit number takes vast amounts of computing. Then,
           | if pointed to a factorization program they pretend to execute
           | it and lie about the output.
           | 
           | There is no intelligence or flexibility apart from stealing
           | other people's open source code.
        
             | simonw wrote:
             | That's why the IMO results were so notable: that was one of
             | those moments where new models were demonstrated doing
             | something that they had previously been unable to do.
        
               | ducktective wrote:
               | I can't fathom why more people aren't talking about the
               | IMO story. Apparently the model they used is not just an
               | LLM but some RL are involved too. If a model wins gold at
               | IMO, is it still merely a "statistical parrot"?
        
               | sejje wrote:
               | Stochastic parrot is the term.
               | 
               | I don't think it's ever been accurate.
        
               | bgwalter wrote:
               | The results were private and the methodology was not
               | revealed. Even Tao, who was bullish on "AI", is starting
               | to question the process.
        
               | simonw wrote:
               | The same thing has also been achieved by a Google
               | DeepMind team and at least one group of independent
               | researchers using publicly available models and careful
               | promoting tricks.
        
       | lherron wrote:
       | With the Anthropic rug pull on quotas for Max, I feel the short-
       | mid term value sweet spot will be a Frankensteined together
       | "Claude as orchestrator/coder, falling back to local models as
       | quota limits approach" tool suite.
        
         | 4b11b4 wrote:
         | Was thinking this one might backfire on Anthropic in the end...
         | 
         | People are going to explore and get comfortable with
         | alternatives.
         | 
         | There may have been other ways to deal with the cases they were
         | worried about.
        
       | h-bradio wrote:
       | Thanks so much for this! I updated LM Studio, and it picked up
       | the mlx-lm update required. After a small tweak to tool-calling
       | in the prompt, it works great with Zed!
        
         | torarnv wrote:
         | Could you describe the tweak you did, and possibly the general
         | setup you have with zed working with LM Studio? Do you use a
         | custom system prompt? What context size do you use?
         | Temperature? Thanks!
        
       | ddtaylor wrote:
       | My brain is running legacy COBOL and first read this as
       | 
       | > My 2.5 year old with their laptop can write Space Invaders
       | 
       | For a few hundred milliseconds there I was thinking "these damn
       | kids are getting good with tablets"
        
         | Imustaskforhelp wrote:
         | Don't worry I guess my brain is running bleeding edge
         | typescript with react (I am in high school for context) and the
         | first time I also read it this way...
         | 
         | But I am without my glasses, but still I have hackernews at
         | 250%, I think I am a little cooked lol.
        
           | OldfieldFund wrote:
           | We are all cooked at this point :)
        
       | skeezyboy wrote:
       | But arent we still decades away from running our own video-
       | creating AIs locally? Have we plateaued with this current
       | generation of techniques?
        
         | svachalek wrote:
         | It's more a question of, how long do you want it to take to
         | create a video locally?
        
           | skeezyboy wrote:
           | nah, i definitely want to know what i asked
        
             | sejje wrote:
             | His answer implies you can run them locally now, just not
             | in a useful timeframe.
        
       | polynomial wrote:
       | At first I read this as "My 2.5 year old can write Space Invaders
       | in JavaScript now"
        
       | maksimur wrote:
       | A $xxxx 2.5 year old laptop, one that's probably much more
       | powerful than an average laptop bought today and probably next
       | year as well. I don't think it's a fair reference point.
        
         | parsimo2010 wrote:
         | The article is pretty good overall, but the title did irk me a
         | little. I assumed when reading "2.5 year old" that it was
         | fairly low-spec only to find out it was an M2 Macbook Pro with
         | 64 GB of unified memory, so it can run models bigger than what
         | an Nvidia 5090 can handle.
         | 
         | I suppose that it could be intended to be read as "my laptop is
         | only 2.5 years old, and therefore fairly modern/powerful" but I
         | doubt that was the intention.
        
           | simonw wrote:
           | The reason I emphasize the laptop's age is that it is the
           | same laptop I have been using ever since the first LLaMA
           | release.
           | 
           | This makes it a great way to illustrate how much better the
           | models have got without requiring new hardware to unlock
           | those improved abilities.
        
         | bprew wrote:
         | His point isn't that you can run a model on an average laptop,
         | but that the same laptop can still run frontier models.
         | 
         | It speaks to the advancements in models that aren't just
         | throwing more compute/ram at it.
         | 
         | Also, his laptop isn't that fancy.
         | 
         | > It claims to be small enough to run on consumer hardware. I
         | just ran the 7B and 13B models on my 64GB M2 MacBook Pro!
         | 
         | From: https://simonwillison.net/2023/Mar/11/llama/
        
         | nh43215rgb wrote:
         | About $3700 laptop...
        
       | asadm wrote:
       | How good is this model with tool calling.
        
       | bob1029 wrote:
       | > still think it's noteworthy that a model running on my 2.5 year
       | old laptop (a 64GB MacBook Pro M2) is able to produce code like
       | this--especially code that worked first time with no further
       | edits needed.
       | 
       | I believe we are vastly underestimating what our existing
       | hardware is capable of in this space. I worry that narratives
       | like the bitter lesson and the efficient compute frontier are
       | pushing a lot of brilliant minds away from investigating
       | revolutionary approaches.
       | 
       | It is obvious that the current models are deeply inefficient when
       | you consider how much you can decimate the precision of the
       | weights post-training and still have pelicans on bicycles, etc.
        
         | jonas21 wrote:
         | Wasn't the bitter lesson about training on large amounts of
         | data? The model that he's using was still trained on a massive
         | corpus (22T tokens).
        
           | yahoozoo wrote:
           | What does that have to do with quantizing?
        
           | itsalotoffun wrote:
           | I think GP means that if you internalize the bitter lesson
           | (more data more compute wins), you stop imagining how to
           | squeeze SOTA minus 1 performance out of constrained compute
           | environments.
        
       | lxgr wrote:
       | This raises an interesting question I've seen occasionally
       | addressed in science fiction before:
       | 
       | Could today's consumer hardware run a future superintelligence
       | (or, as a weaker hypothesis, at least contain some lower-level
       | agent that can bootstrap something on other hardware via
       | networking or hyperpersuasion) if the binary dropped out of a
       | wormhole?
        
         | switchbak wrote:
         | This is what I find fascinating. What hidden capabilities
         | exist, and how far could it be exploited? Especially on exotic
         | or novel hardware.
         | 
         | I think much of our progress is limited by the capacity of the
         | human brain, and we mostly proceed via abstraction which allows
         | people to focus on narrow slices. That abstraction has a cost,
         | sometimes a high one, and it's interesting to think about what
         | the full potential could be without those limitations.
        
           | lxgr wrote:
           | Abstraction, or efficient modeling of a given system, is
           | probably a feature, not a bug, given the strong similarity
           | between intelligence and compression and all that.
           | 
           | A concise description of the _right_ abstractions for our
           | universe is probably not too far removed from the weights of
           | a superintelligence, modulo a few transformations :)
        
         | bob1029 wrote:
         | This is the premise of all of the ML research I've been into.
         | The only difference is to replace the wormhole with linear
         | genetic programming, neuroevolution, et. al. The size of
         | programs in the demoscene is what originally sent me down this
         | path.
         | 
         | The biggest question I keep asking myself - What is the
         | Kolmogorov complexity of a binary image that provides the exact
         | same capabilities as the current generation LLMs? What are the
         | chances this could run on the machine under my desk right now?
         | 
         | I know how many AAA frames per second my machine is capable of
         | rendering. I refuse to believe the gap between running CS2 at
         | 400fps and getting ~100b/s of UTF8 text out of a NLP black box
         | is this big.
        
           | bgirard wrote:
           | > ~100b/s of UTF8 text out of a NLP black box is this big
           | 
           | That's not a good measure. NP problem solutions are only a
           | single bit, but they are much harder to solve than CS2 frames
           | for large N. If it could solve any problem perfectly, I would
           | pay you billions for just 1b/s of UTF8 text.
        
             | bob1029 wrote:
             | > If it could solve any problem perfectly, I would pay you
             | billions for just 1b/s of UTF8 text.
             | 
             | Exactly. This is what compels me to try.
        
       | wslh wrote:
       | Here's a sci-fi twist: suppose Space Invaders and similar early
       | games were seeded by a future intelligence. (*_*)[?]#-#
        
       | another_one_112 wrote:
       | Crazy to think that you can have a mostly-competent oracle even
       | when disconnected from the grid.
        
       | msikora wrote:
       | With 48GB MAcBook Pro M3 I'm probably out of luck, right?
        
         | simonw wrote:
         | For this particular model, yes.
         | 
         | This new one from Qwen should fit though - it looks like that
         | only needs ~30GB of RAM: https://huggingface.co/lmstudio-
         | community/Qwen3-30B-A3B-Inst...
        
           | omneity wrote:
           | It takes ~17-20GB on Q4 depending on context length &
           | settings (running it as we speak)
           | 
           | ~30GB in Q8 sure, but it's a minimal gain for double the VRAM
           | usage.
        
       | andai wrote:
       | I got almost the same result with a 4B model (Qwen3-4B), about
       | 20x smaller than OP's ~200B model.
       | 
       | https://jsbin.com/lejunenezu/edit?html,output
       | 
       | Its pelican was a total fail though.
        
         | andai wrote:
         | Update: It failed to make Flappy Bird though (several
         | attempts).
         | 
         | This surprises me, I thought it would be simpler than Space
         | Invaders.
        
       | simonw wrote:
       | There's a new model from Qwen today - Qwen3-30B-A3B-Instruct-2507
       | - that also runs comfortably on my Mac (using about 30GB of RAM
       | with an 8bit quantization).
       | 
       | I tried the "Write an HTML and JavaScript page implementing space
       | invaders" prompt against it and didn't quite get a working game
       | with a single shot, but it was still an interesting result:
       | https://simonwillison.net/2025/Jul/29/qwen3-30b-a3b-instruct...
        
       | xianshou wrote:
       | I initially read the title as "My 2.5 year old can write Space
       | Invaders in JavaScript now (GLM-4.5 Air)."
       | 
       | Though I suppose, given a few years, that may also be true!
        
       | dust42 wrote:
       | I tried with Claude Sonnet 4 and it does *not* work. So looks
       | like GLM-4.5 Air in 3bit quant is ahead.
       | 
       | Chat is here:
       | https://claude.ai/share/dc9eccbf-b34a-4e2b-af86-ec2dd83687ea
       | 
       | Claude Opus 4 does work but is far behind of Simon's GLM-4.5:
       | https://claude.ai/share/5ddc0e94-3429-4c35-ad3f-2c9a2499fb5d
        
       ___________________________________________________________________
       (page generated 2025-07-29 23:00 UTC)