[HN Gopher] I failed to recreate the 1996 Space Jam Website with...
       ___________________________________________________________________
        
       I failed to recreate the 1996 Space Jam Website with Claude
        
       Author : thecr0w
       Score  : 212 points
       Date   : 2025-12-07 17:18 UTC (5 hours ago)
        
 (HTM) web link (j0nah.com)
 (TXT) w3m dump (j0nah.com)
        
       | stonecharioteer wrote:
       | I'm curious. Did you ask it to use tables and no CSS?
       | 
       | In 1996, We had only css1. Ask it to use tables to do this,
       | perhaps.
        
         | lagniappe wrote:
         | We actually had a myriad of competing specs. CSS proper wasn't
         | released until december 1996.
        
           | thecr0w wrote:
           | Any names for the competing specs? Maybe i could try re-
           | prompting with that direction.
        
             | lagniappe wrote:
             | Models are trained with content scraped from the net, for
             | the most part. The availability of content pertaining to
             | those specs is almost nil, and of no SEO value. Ergo,
             | models for the most part will only have a cursory knowledge
             | of a spec that your browser will never be able to parse
             | because that isn't the spec that won.
             | 
             | Nonetheless, here is a link to a list of the specs you
             | asked for:
             | https://www.w3.org/Style/History/Overview.en.html
        
               | boie0025 wrote:
               | Thanks for sharing that. I read through a lot of this.
               | Interesting to read those perspectives in the context of
               | today.
        
               | lagniappe wrote:
               | Much obliged. Have a good weekend. Your new gray hairs
               | are en route :)
        
             | wanderingstan wrote:
             | There were specs competing for adoption, but only tables
             | (the old way) and CSS were actually adopted by browsers. So
             | no point trying to use some other positioning technique.
        
         | thecr0w wrote:
         | Yes yes great question!
         | 
         | I tried your suggestion and also tried giving it various more
         | general versions of the limitations presented by earlier
         | generations.
         | 
         | Claude's instinct initially was actually to limit itself to
         | less modern web standards.
         | 
         | Unfortunately, nothing got those planets to be in the right
         | place.
        
       | Wowfunhappy wrote:
       | Claude is not very good at using screenshots. The model may
       | _technically_ be multi-modal, but its strength is clearly in
       | reading text. I 'm not surprised it failed here.
        
         | fnordpiglet wrote:
         | Especially since it decomposes the image into a semantic vector
         | space rather than the actual grid of pixels. Once the image is
         | transformed into patch embeddings all sense of pixels is
         | entirely destroyed. The author demonstrates a profound lack of
         | understanding for how multimodal LLMs function that a simple
         | query of one would elucidate immediately.
         | 
         | The right way to handle this is not to build it grids and
         | whatnot, which all get blown away by the embedding encoding but
         | to instruct it to build image processing tools of its own and
         | to mandate their use in constructing the coordinates required
         | and computing the eccentricity of the pattern etc in code and
         | language space. Doing it this way you can even get it to write
         | assertive tests comparing the original layout to the final
         | among various image processing metrics. This would assuredly
         | work better, take far less time, be more stable on iteration,
         | and fits neatly into how a multimodal agentic programming tool
         | actually functions.
        
           | mcbuilder wrote:
           | Yeah, this is exactly what I was thinking. LLMs don't have
           | precise geometrical reasoning from images. Having an
           | intuition of how the models work is actually.a defining skill
           | in "prompt engineering"
        
         | dcanelhas wrote:
         | Even with text, parsing content in 2D seems to be a challenge
         | for every LLM I have interacted with. Try getting a chatbot to
         | make an ascii-art circle with a specific radius and you'll see
         | what I mean.
        
           | Wowfunhappy wrote:
           | I don't really consider ASCII art to _be_ text. It requires a
           | completely different type of reasoning. A blind person can be
           | understand text if it 's read out loud. A blind person really
           | can't understand ASCII art if it's read out loud.
        
       | throwaway314155 wrote:
       | Somehow I suspect Claude Code (in an interactive session with
       | trial, error, probing, critiquing, perusing, and all the other
       | benefits you get) would do better. This example seems to assume
       | Claude can do things in "one shot" (even the later attempts all
       | seem to conceal information like it's a homework assignment).
       | 
       | That's not how to successfully use LLM's for coding in my
       | experience. It is however perhaps a good demonstration of
       | Claude's poor spatial reasoning skills. Another good
       | demonstration of this is the twitch.tv/ClaudePlaysPokemon where
       | Claude has been failing to beat pokemon for months now.
        
         | thecr0w wrote:
         | Not a homework assignment, and no deliberate attempt to conceal
         | information, just very long and repetitive logs. A lot of the
         | same "insights" so I just didn't provide them here.
         | 
         | > That's not how to successfully use LLM's for coding in my
         | experience.
         | 
         | Yeah agree. I think I was just a little surprised it couldn't
         | one-shot given the simplicity.
        
         | CharlesW wrote:
         | Using https://github.com/anthropics/claude-
         | code/tree/main/plugins/... with style-supporting instructions
         | and context would've improved the outcome as well.
        
           | thecr0w wrote:
           | thank you! I'll try this
        
           | aidos wrote:
           | Is the skill effectively just adding a little extra context
           | here though? Doesn't strike me as the sort of context that
           | would improve the outcome.
           | 
           | https://github.com/anthropics/claude-
           | code/blob/main/plugins/...
        
       | 999900000999 wrote:
       | Space Jam website design as an LLM benchmark.
       | 
       | This article is a bit negative. Claude gets close , it just can't
       | get the order right which is something OP can manually fix.
       | 
       | I prefer GitHub Copilot because it's cheaper and integrates with
       | GitHub directly. I'll have times where it'll get it right, and
       | times when I have to try 3 or 4 times.
        
         | thecr0w wrote:
         | ya, this is true. Another commenter also pointed out that my
         | intention was to one-shot. I didn't really go too deeply into
         | trying to try multiple iterations.
         | 
         | This is also fairly contrived, you know? It's not a realistic
         | limitation to rebuild HTML from a screenshot because of course
         | if I have the website loaded I can just download the HTML.
        
           | Retric wrote:
           | It's not unrealistic to want to revert to an early version of
           | something you only have a screenshot of.
        
           | swatcoder wrote:
           | > rebuild HTML from a screenshot
           | 
           | ???
           | 
           | This is precisely the workflow when a traditional graphic
           | designer mocks up a web/app design, which still happens _all
           | the time_.
           | 
           | They sketch a design in something like Photoshop or
           | Illustrator, because they're fluent in these tools and many
           | have been using them for decades, and somebody else is tasked
           | with figuring out how to slice and encode that design in the
           | target interactive tech (HTML+CSS, SwiftUI, QT, etc).
           | 
           | Large companies, design agencies, and consultancies with
           | tech-first design teams have a different workflow, because
           | they intentionally staff graphic designers with a tighter
           | specialization/preparedness, but that's a much smaller share
           | of the web and software development space than you may think.
           | 
           | There's nothing contrived at all about this test and it's a
           | really great demonstration of how tools like Claude don't
           | take naturally to this important task yet.
        
             | thecr0w wrote:
             | You know, you're totally right and I didn't even think
             | about that.
        
         | smallnix wrote:
         | That's not the point of the article. It's about Claude/LLM
         | being overconfident in recreating pixel perfect.
        
           | jacquesm wrote:
           | All AI's are overconfident. It's impressive what they can do,
           | but it is at the same time extremely unimpressive what they
           | can't do while passing it off as the best thing since sliced
           | bread. 'Perfect! Now I see the problem.'. 'Thank you for
           | correcting that, here is a perfect recreation of problem 'x'
           | that will work with your hardware.' (never mind the 10
           | glaring mistakes).
           | 
           | I've tried these tools a number of times and spent a good bit
           | of effort on learning to maximize the return. By the time you
           | know what prompt to write you've solved the problem yourself.
        
         | GeoAtreides wrote:
         | >which is something OP can manually fix
         | 
         | what if the LLM gets something wrong that the operator (a
         | junior dev perhaps) doesn't even know it's wrong? that's the
         | main issue: if it fails here, it will fail with other things,
         | in not such obvious ways.
        
           | godelski wrote:
           | I think that's the main problem with them. It is hard to
           | figure out when they're wrong.
           | 
           | As the post shows, you can't trust them when they think they
           | solved something but you also can't trust them when they
           | think they haven't[0]. The things are optimized for human
           | preference, which ultimately results in this being optimized
           | to hide mistakes. After all, we can't penalize mistakes in
           | training when we don't know the mistakes are mistakes. The de
           | facto bias is that we prefer mistakes that we don't know are
           | mistakes than mistakes that we do[1].
           | 
           | Personally I think a well designed tool makes errors obvious.
           | As a tool user that's what I want and makes tool use
           | effective. But LLMs flip this on the head, making errors
           | difficult to detect. Which is incredibly problematic.
           | 
           | [0] I frequently see this in a thing it thinks is a problem
           | but actually isn't, which makes steering more difficult.
           | 
           | [1] Yes, conceptually unknown unknowns are worse. But you
           | can't measure unknown unknowns, they are indistinguishable
           | from knowns. So you always optimize deception (along with
           | other things) when you don't have clear objective truths
           | (most situations).
        
         | bigstrat2003 wrote:
         | > it just can't get the order right which is something OP can
         | manually fix.
         | 
         | If the tool needs you to check up on it and fix its work, it's
         | a bad tool.
        
           | markbao wrote:
           | "Bad" seems extreme. The only way to pass the litmus test
           | you've described is for a tool to be 100% perfect, so then
           | the graph looks like 99.99% "bad tool" until it reaches 100%
           | perfection.
           | 
           | It's not that binary imo. It can still be extremely useful
           | and save a ton of time if it does 90% of the work and you fix
           | the last 10%. Hardly a bad tool.
           | 
           | It's only a bad tool if you spent more time fixing the
           | results than building it yourself, which sometimes used to be
           | the case for LLMs but is happening less and less as they get
           | more capable.
        
             | a4isms wrote:
             | If you show me a tool that does a thing perfectly 99% of
             | the time, I will stop checking it eventually. Now let me
             | ask you: How do you feel about the people who manage the
             | security for your bank using that tool? And eventually
             | overlooking a security exploit?
             | 
             | I agree that there are domains for which 90% good is very,
             | very useful. But 99% isn't always better. In some limited
             | domains, it's actually worse.
        
               | 999900000999 wrote:
               | Counterpoint.
               | 
               | Humans don't get it right 100% or the time.
        
           | wvenable wrote:
           | Perfection is the enemy of good.
        
           | mrweasel wrote:
           | That's my thinking. If I need to check up on the work, then
           | I'm equally capable of writing the code myself. It might go
           | faster with an LLM assisting me, and that feels perfectly
           | fine. My issue is when people use the AI tools to generate
           | something far beyond their own capabilities. In those cases,
           | who checks the result?
        
       | supern0va wrote:
       | Honestly, if you had showed this article to me even eighteen
       | months ago, I would have been blown away at how good of a job
       | Claude did.
       | 
       | It's remarkable how high our expectations have been steadily
       | creeping.
        
         | WhyOhWhyQ wrote:
         | This comment is missing the point. The real goal of all this is
         | not to amaze. It's to create better software. Let's graduate
         | past the amazement phase into the realism phase as soon as
         | possible. What parts of my project is the LLM for? That is the
         | real question worth asking.
        
           | supern0va wrote:
           | Oh, to be clear, this isn't a criticism. I think it's super
           | cool that we're moving onto the nitpick/refinement phase of
           | this tech. :)
        
       | dreadnip wrote:
       | Why involve an LLM in this? Just download the site?
        
         | ChrisArchitect wrote:
         | Yeah, Internet Archive has lots of copies
         | https://web.archive.org/web/20250000000000*/https://www.spac...
         | also
         | 
         | What's with the panicked pleas and need to preserve the site,
         | assuming locally...?
        
           | GeoAtreides wrote:
           | The post is clearly about something else than preserving
           | https://www.spacejam.com/1996/
           | 
           | It seems to me the post is about how Claude fails to recreate
           | a very simple website from 1996.
        
       | sigseg1v wrote:
       | Curious if you've tested something such as:
       | 
       | - "First, calculate the orbital radius. To do this accurately,
       | measure the average diameter of each planet, p, and the average
       | distance from the center of the image to the outer edge of the
       | planets, x, and calculate the orbital radius r = x - p"
       | 
       | - "Next, write a unit test script that we will run that reads the
       | rendered page and confirms that each planet is on the orbital
       | radius. If a planet is not, output the difference you must shift
       | it by to make the test pass. Use this feedback until all planets
       | are perfectly aligned."
        
         | turnsout wrote:
         | Yes, this is a key step when working with an agent--if they're
         | able to check their work, they can iterate pretty quickly. If
         | you're in the loop, something is wrong.
         | 
         | That said, I love this project. haha
        
           | monsieurbanana wrote:
           | I'm trying to understand why this comment got downvoted. My
           | best guess is that "if you're in the loop, something is
           | wrong" is interpreted as there should be no human involvement
           | at all.
           | 
           | The loop here, imo, refers to the feedback loop. And it's
           | true that ideally there should be no human involvement there.
           | A tight feedback loop is as important for llms as it is for
           | humans. The more automated you make it, the better.
        
             | turnsout wrote:
             | Yes, maybe I goofed on the phrasing. If you're in the
             | _feedback loop,_ something is wrong. Obviously a human
             | should be  "in the loop" in the sense that they're aware of
             | and reviewing what the agent is doing.
        
         | Aurornis wrote:
         | This is my experience with using LLMs for complex tasks: If
         | you're lucky they'll figure it out from a simple description,
         | but to get most things done the way you expect requires a lot
         | of explicit direction, test creation, iteration, and tokens.
         | 
         | One of the keys to being productive with LLMs is learning how
         | to recognize when it's going to take much more effort to
         | babysit the LLM into getting the right result as opposed to
         | simply doing the work yourself.
        
           | jacquesm wrote:
           | It would not be the first time that an IT services provider
           | makes more money the worse their products perform.
        
         | thecr0w wrote:
         | Hm, I didn't try exactly this, but I probably should!
         | 
         | Wrt unit test script, let's take Claude out of the equation,
         | how would you design the unit test? I kept running into either
         | Claude or some library not being capable of consistently
         | identifying planet vs non planet which was hindering Claude's
         | ability to make decisions based on fine detail or "pixel
         | coordinates" if that makes sense.
        
           | cfbradford wrote:
           | Do you give Claude the screenshot as a file? If so I'd just
           | ask it to write a tool to diff each asset to every possible
           | location in the source image to find the most likely position
           | of each asset. You don't really need recognition if you can
           | brute force the search. As a human this is roughly what I
           | would do if you told me I needed to recreate something like
           | that with pixel perfect precision.
        
         | bluedino wrote:
         | Congratulations, we finally created 'plain English' programming
         | languages. It only took 1/10th of the worlds electricity and
         | 40% of the semiconductor production.
        
       | zitterbewegung wrote:
       | In actual workflows someone would accept a very close
       | reproduction and fix the small issues. Generally I use systems to
       | get close enough to a scaffolding and / or make small incremental
       | improvements and direct its design
        
       | micromacrofoot wrote:
       | I wouldn't call it entirely defeated, it got maybe 90% of the way
       | there. Before LLMs you couldn't get 50% of the way there in an
       | automated way.
       | 
       | > What he produces
       | 
       | I feel like personifying LLMs more than they currently are is a
       | mistake people make (though humans always do this), they're not
       | entities, they don't know anything. If you treat them too human
       | you might eventually fool yourself a little too much.
        
         | thecr0w wrote:
         | As a couple other comments pointed out, it's also not fair to
         | judge Claude based on a one shot like this. I sort of assume
         | these limitations will remain even if we went back and forth
         | but to be fair, I didn't try that more than a few times in this
         | investigation. Maybe on try three it totally nails it.
        
       | bdcravens wrote:
       | A comparison would Codex would be good. I haven't done it with
       | Codex, but when working through problems using ChatGPT, it does a
       | great job when given screenshots.
        
       | daemonologist wrote:
       | Interesting - these models are all trained to do pixel-level(ish)
       | measurement now, for bounding boxes and such. I wonder if you
       | could railroad it into being accurate with the right prompt.
        
         | Lerc wrote:
         | What models are good at this? I have tried passing images to
         | models and asking them for coordinates for specific features,
         | then overlaid dots on those points and passed that image back
         | to the model so it has a perception of how far out it was. It
         | had a tendency to be consistently off by a fixed amount without
         | getting closer.
         | 
         | I don't doubt that it is possible eventually, but I haven't had
         | much luck.
         | 
         | Something that seemed to assist was drawing a multi coloured
         | transparent chequerboard, if the AI knows the position of the
         | grid colours it can pick out some relative information from the
         | grid.
        
           | daemonologist wrote:
           | I've found Qwen3-VL to be fairly accurate at detection
           | (though it doesn't always catch every instance). Note that it
           | gives answers as per-mille-ages, as if the image was
           | 1000x1000 regardless of actual resolution or aspect ratio.
           | 
           | I have also not had luck with any kind of iterative/guess-
           | and-check approach. I assume the models are all trained to
           | one-shot this kind of thing and struggle to generalize to
           | what are effectively relative measurements.
        
         | sdenton4 wrote:
         | Feels like the "right" approach would be to have it write some
         | code to measure how far off the elements are in the original vs
         | recreated image, and then iterate using the numerical output of
         | the program...
        
       | johncoatesdev wrote:
       | You last-minute cancelled coffee with your friends to work on
       | this? I'm not sure how I would feel if a friend did that to me.
        
         | fishtoaster wrote:
         | Based on the later life updates, I suspect this was being
         | humorous.
         | 
         | > After these zoom attempts, I didn't have any new moves left.
         | I was being evicted. The bank repo'd my car. So I wrapped it
         | there.
        
       | syassami wrote:
       | We've lost the capability to build such marvels.
       | 
       | https://knowyourmeme.com/memes/my-father-in-law-is-a-builder...
        
         | barfoure wrote:
         | Lost it at wooden structures at the playground.
        
       | a-dub wrote:
       | maybe ask it to use 1990s table based layout approaches?
        
         | al_borland wrote:
         | Interesting. I just looked at the page source and it is in fact
         | using a table layout. I always assumed it was an image map,
         | which I assume would be even more obscure for the LLM.
        
           | thecr0w wrote:
           | We should check the Wayback Machine, but in my memory this
           | was built with an image map. Maybe like, 10 years ago or
           | something. I was googling around when writing this post and
           | saw that there are folks still tasked with making sure it's
           | up and running. I wonder if they migrated it to tables at
           | some point in the last decade.
        
       | smoghat wrote:
       | Ok, so here is an interesting case where Claude was almost good
       | enough, but not quite. But I've been amusing myself by taking
       | abandoned Mac OS programs from 20 years ago that I find on GitHub
       | and bringing them up to date to work on Apple silicon. For
       | example, jpegview, which was a very fast and simple slideshow
       | viewer. It took about three iterations with Claude code before I
       | had it working. Then it was time to fix some problems, add some
       | features like playing videos, a new layout, and so on. I may be
       | the only person in the world left who wants this app, but well,
       | that was fine for a day long project that cooked in a window with
       | some prompts from me while I did other stuff. I'll probably
       | tackle scantailor advanced next to clean up some terrible book
       | scans. Again, I have real things to do with my time, but each of
       | these mini projects just requires me to have a browser window
       | open to a Claude code instance while I work on more attention
       | demanding tasks.
        
         | egeozcan wrote:
         | Side note: As a person who started using a mac since march, I
         | found phoenix slides really good.
        
           | smoghat wrote:
           | It is! I was really just curious if I could update this old
           | codebase without getting my hands dirty.
        
         | skrebbel wrote:
         | > Ok, so here is an interesting case where Claude was almost
         | good enough, but not quite.
         | 
         | You say that as if that's uncommon.
        
           | jonplackett wrote:
           | This should be the strap line for all AI (so far)
        
             | smoghat wrote:
             | That's fair. But I always think of it as an intern I am
             | paying $20 a month for or $200 a month. I would be kind of
             | shocked if they could do everything as well as I'd hoped
             | for that price point. It's fascinating for me and worth the
             | money.
             | 
             | I am lucky that I don't depend on this for work at a
             | corporation. I'd be pulling my hair out if some boss said
             | "You are going to be doing 8 times as much work using our
             | corporate AI from now on."
        
               | jonplackett wrote:
               | Don get me wrong, doing 80% of my work for me is still
               | great. And I'm actually quite glad I'm still needed for
               | the other 20%
        
       | pluc wrote:
       | I like how the author calls a script on the internet "him".
        
         | thecr0w wrote:
         | lol
        
         | NooneAtAll3 wrote:
         | better than using a plural for a single entity
        
           | shwaj wrote:
           | "it"
        
       | docheinestages wrote:
       | > Note: please help, because I'd like to preserve this website
       | forever and there's no other way to do it besides getting Claude
       | to recreate it from a screenshot.
       | 
       | Why not use wget to mirror the website? Unless you're being
       | sarcastic.
       | 
       | $ wget --mirror --convert-links --adjust-extension --page-
       | requisites --no-parent http://example.org
       | 
       | Source: https://superuser.com/questions/970323/using-wget-to-
       | copy-we...
        
         | thecr0w wrote:
         | The stuff about not being able to download it is a bit of a
         | joke and I don't think the tone landed with everybody haha.
         | This was just an experiment to see if Claude could recreate a
         | simple website from a screenshot, of course to your point you
         | could download it if you wanted.
        
         | malfist wrote:
         | Because that wasn't the goal of this exercise
        
       | hestefisk wrote:
       | Would be interesting to see whether Gemini could crack this
       | problem.
        
       | th0ma5 wrote:
       | I personally don't understand why asking these things to do
       | things we know they can't do is supposed to be productive. Maybe
       | for getting around restrictions or fuzzing... I don't see it as
       | an effective benchmark unless it can link directly to the ways
       | the models are being improved, but, to look at random results
       | that sometimes are valid and think more iterations of randomness
       | will eventually give way to _control_ is a maddening perspective
       | to me, but perhaps I need better language to describe this.
        
         | thecr0w wrote:
         | I think this is a reasonable take. I think for me, I like to
         | investigate limitations like this in order to understand where
         | the boundaries are. Claude isn't impossibly bad at analyzing
         | images. It's just pixel perfect corrections that seem to be a
         | limitation. Maybe for some folks it's enough to just read that
         | but for me, I like to feel like I have some good experiential
         | knowledge about the limitations that I can keep in my brain and
         | apply appropriately in the future.
        
       | iwontberude wrote:
       | Apropos given Warner Brothers Discovery just sold to Netflix
        
       | soared wrote:
       | I got quite close with Gemini 3 pro in AI studio. I uploaded a
       | screenshot (no assets) and the results were similar to OP. It
       | failed to follow my fix initially but I told it to follow my
       | directions (lol) and it came quite close (though portrait mode
       | distorted it, landscape was close to perfect.
       | 
       | "Reference the original uploaded image. Between each image in the
       | clock face, create lines to each other image. Measure each line.
       | Now follow that same process on the app we've created, and adjust
       | the locations of each image until all measurements align
       | exactly."
       | 
       | https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
        
       | jacobsenscott wrote:
       | > here's no other way to do it besides getting Claude to recreate
       | it from a screenshot
       | 
       | And
       | 
       | > I'm an engineering manager
       | 
       | I can't tell if this is an intentional or unintentional satire of
       | the current state of AI mandates from management.
        
         | thecr0w wrote:
         | lololol
        
         | dmd wrote:
         | i can't tell if your comment is satire or not
        
         | chilmers wrote:
         | You _really_ can't tell? Perhaps the bar for AGI is lower than
         | I thought.
        
         | master_crab wrote:
         | Honest question: does he know about F5? Or was it intentional
         | to use screenshots when source is available?
        
       | thuttinger wrote:
       | Claude/LLMs in general are still pretty bad at the intricate
       | details of layouts and visual things. There are a lot of problems
       | that are easy to get right for a junior web dev but impossible
       | for an LLM. On the other hand, I was able to write a C program
       | that added gamma color profile support to linux compositors that
       | don't support it (in my case Hyprland) within a few minutes! A -
       | for me - seemingly hard task, which would have taken me at least
       | a day or more if I didn't let Claude write the code. With one
       | prompt Claude generated C code that compiled on first try that:
       | 
       | - Read an .icc file from disk
       | 
       | - parsed the file and extracted the VCGT (video card gamma table)
       | 
       | - wrote the VCGT to the video card for a specified display via
       | amdgpu driver APIs
       | 
       | The only thing I had to fix was the ICC parsing, where it would
       | parse header strings in the wrong byte-order (they are big-
       | endian).
        
         | littlecranky67 wrote:
         | > Claude/LLMs in general are still pretty bad at the intricate
         | details of layouts and visual things
         | 
         | Because the rendered output (pixels, not HTML/CSS) is not fed
         | as data in the training. You will find tons of UI snippets and
         | questions, but they rarely included screenshots. And if they
         | do, the are not scraped.
        
           | Wowfunhappy wrote:
           | Interesting thought. I wonder if Anthropic et al could
           | include some sort of render-html-to-screenshot as part of the
           | training routine, such that the rendered output would get
           | included as training data.
        
             | KaiserPro wrote:
             | thats basically a VLM, but the problem is that describing
             | the world requires a better understanding of the world.
             | Hence why LeCunn is talking about world models (Its also
             | cutting edge for teaching robots to manipulate and plan
             | manipulations)
        
         | jacquesm wrote:
         | Claude didn't write that code. Someone else did and Claude took
         | that code without credit to the original author, adapted it to
         | your use case and then presented it as its own creation to you
         | and you accepted this. If a human did this we probably would
         | have a word for them.
        
           | Mtinie wrote:
           | > If a human did this we probably would have a word for them.
           | 
           | I don't think it's fair to call someone who used Stack
           | Overflow to find a similar answer with samples of code to
           | copy to their project an asshole.
        
             | jacquesm wrote:
             | Who brought Stack Overflow up? Stack Overflow does not
             | magically generate code, someone has to actually provide it
             | first.
        
               | Mtinie wrote:
               | I generally agree with your underlying point concerning
               | attribution and intellectual property ownership but your
               | follow-up comment reframes your initial statement: LLMs
               | generate recombinations of code from code created by
               | humans, without giving credit.
               | 
               | Stack Overflow offers access to other peoples' work, and
               | developers combined those snippets and patterns into
               | their own projects. I suspect attribution is low.
        
               | jacquesm wrote:
               | Stack Overflow deals with that issue by having a license
               | agreement.
        
             | sublinear wrote:
             | Using stack overflow recklessly is definitely asshole
             | behavior.
        
             | bluedino wrote:
             | It has been for the last 15 years.
        
           | giancarlostoro wrote:
           | You mean like copying and pasting code from Stack Overflow?
        
           | idiotsecant wrote:
           | Yes, the word for that is software developer.
        
           | FanaHOVA wrote:
           | Are you saying that every piece of code you have ever written
           | contains a full source list of every piece of code you
           | previously read to learn specific languages, patterns, etc?
           | 
           | Or are you saying that every piece of code you ever wrote was
           | 100% original and not adapted from any previous codebase you
           | ever worked in or any book / reference you ever read?
        
             | jacquesm wrote:
             | What's with the bad takes in this thread. That's two
             | strawmen in one comment, it's getting a bit crowded.
        
               | DangitBobby wrote:
               | Or the original point doesn't actually hold up to basic
               | scrutiny and is indistinguishable from straw itself.
        
               | jacquesm wrote:
               | HN has guidelines for a reason.
        
           | bsaul wrote:
           | That's an interesting hypothesis : that LLM are fundamentally
           | unable to produce original code.
           | 
           | Do you have papers to back this up ? That was also my
           | reaction when i saw some really crazy accurate comments on
           | some vibe coded piece of code, but i couldn't prove it, and
           | thinking about it now i think my intuition was wrong (ie :
           | LLMs do produce original complex code).
        
             | jacquesm wrote:
             | We can solve that question in an intuitive way: if human
             | input is not what is driving the output then it would be
             | sufficient to present it with a fraction of the current
             | inputs, say everything up to 1970 and have it generate all
             | of the input data from 1970 onwards as output.
             | 
             | If that does not work then the moment you introduce AI you
             | cap their capabilities unless humans continue to create
             | original works to feed the AI. The conclusion - to me, at
             | least - is that these pieces of software regurgitate their
             | inputs, they are effectively whitewashing plagiarism, or,
             | alternatively, their ability to generate new content is
             | capped by some arbitrary limit relative to the inputs.
        
               | andrepd wrote:
               | Excellent observation.
        
               | bfffbgfdcb wrote:
               | I guess _you_ can't create original work either. Given
               | you can't reproduce the sum total of human output since
               | 1970 by next Tuesday.
               | 
               | There's _something_ in what you're saying, but until you
               | refine it to something actually _true_ , it's just more
               | slop.
        
               | jacquesm wrote:
               | I think my track record belies your very low value and
               | frankly cowardly comment. If you have something to say at
               | least do it under your real username instead of a
               | throwaway.
        
               | andsoitis wrote:
               | I like your test. Should we also apply to specific
               | humans?
               | 
               | We all stand on the shoulders of giants and learn by
               | looking at others' solutions.
        
               | jacquesm wrote:
               | That's true. But if we take your implied rebuttal then
               | current level AI would be able to learn from current AI
               | as well as it would learn from humans. But so far that
               | does not seem to be the case, in fact, AI companies do
               | everything they can to avoid eating their own tail.
               | They'd love eating their own tail if it was worth it.
        
             | fpoling wrote:
             | Pick up a book about programming from seventies or eighties
             | that was unlikely to be scanned and feed into LLM. Take a
             | task from it and ask LLM to write a program from it that
             | even a student can solve within 10 minutes. If the problem
             | was not really published before, LLM fails spectacularly.
        
               | crawshaw wrote:
               | This does not appear to be true. Six months ago I created
               | a small programming language. I had LLMs write hundreds
               | of small programs in the language, using the parser,
               | interpreter, and my spec as a guide for the language. The
               | vast majority of these programs were either very close or
               | exactly what I wanted. No prior source existed for the
               | programming language because I created it whole cloth
               | days earlier.
        
               | anjel wrote:
               | Sometimes its generated, and many times its not. Trivial
               | to denote, but its been deemed non of your business.
        
           | mlinsey wrote:
           | Certainly if a human wrote code that solved this problem, and
           | a second human copied and tweaked it slightly for their use
           | case, we would have a word for them.
           | 
           | Would we use the same word if two different humans wrote code
           | that solved two different problems, but one part of each
           | problem was somewhat analogous to a different aspect of a
           | third human's problem, and the third human took inspiration
           | from those parts of both solutions to create code that solved
           | a third problem?
           | 
           | What if it were ten different humans writing ten different-
           | but-related pieces of code, and an eleventh human piecing
           | them together? What if it were 1,000 different humans?
           | 
           | I think "plagiarism", "inspiration", and just "learning from"
           | fall on some continuous spectrum. There are clear differences
           | when you zoom out, but they are in degree, and it's hard to
           | set a hard boundary. The key is just to make sure we have
           | laws and norms that provide sufficient incentive for new
           | ideas to continue to be created.
        
             | whatshisface wrote:
             | They key difference between plagarism and building on
             | someone's work is whether you say, "this based on code by
             | linsey at github.com/socialnorms" or "here, let me write
             | that for you."
        
           | ekropotin wrote:
           | > If a human did this we probably would have a word for them.
           | 
           | What do you mean? The programmers work is literally combining
           | the existing patterns into solutions for problems.
        
           | Aeolun wrote:
           | Software engineer? You think I cite all the code I've ever
           | seen before when I reproduce it? That I even remember where
           | it comes from?
        
           | fooker wrote:
           | > If a human did this we probably would have a word for them.
           | 
           | Humans do this all the time.
        
         | chongli wrote:
         | Why is this something a Wayland compositor (a glorified window
         | manager) needs to worry about? Apple figured this out back in
         | the 1990s with ColorSync and they did it once for the Mac OS
         | and any application that wanted colour management could use the
         | ColorSync APIs.
        
           | hedgehog wrote:
           | Color management infrastructure is intricate. To grossly
           | simplify: somehow you need to connect together the profile
           | and LUT for each display, upload the LUTs to the display
           | controller, and provide appropriate profile data for each
           | window to their respective processes. During compositing then
           | convert buffers that don't already match the output
           | (unmanaged applications will probably be treated as sRGB,
           | color managed graphics apps will opt out of conversion and do
           | whatever is correct for their purpose).
        
       | pfix wrote:
       | I checked the source of the original (like maybe many of you) to
       | check how they actually did it and it was... simpler than
       | expected. I drilled myself so hard to forget tables as layout...
       | And here it is. So simple it's a marvel.
        
         | COAGULOPATH wrote:
         | And they do hacky things like space elements vertically using
         | <br> tags.
        
       | fortyseven wrote:
       | Look at that stupid dog. It's reading a book, but it's really
       | trashy YA. It's not even Shakespeare. Dogs are stupid.
        
         | thecr0w wrote:
         | lol, that is fair criticism
        
         | satisfice wrote:
         | But the dog's owner keeps saying that it ONLY reads
         | Shakespeare. The dog's owner wants millions of dollars for the
         | dog on that basis.
         | 
         | I'm not mad at the dogs, I'm mad at the stupid investors and
         | the lying owners.
        
       | vmg12 wrote:
       | We don't know how to build it anymore
        
       | stwsk wrote:
       | >Look, I still need this Space Jam website recreated.
       | 
       | Now that's a novel sentence if I've ever read one.
        
       | nickdothutton wrote:
       | I have recently been working on something "fun" in the terminal
       | that mingles plain ASCII, ANSI "graphics", actual bitmaps
       | (Sixel), and Nerdfonts in a TUI framework (Charm etc). After a
       | week of smashing Claude's head against a wall, which is better
       | than smashing my own, I've had to significantly alter my hopes
       | and expectations.
        
       | isoprophlex wrote:
       | I have a very weird tangential nit to pick: gendering LLMs. I
       | swear I'm not pushing any sort of gender agenda/discussion that
       | can be had anytime anywhere else in the current age, but to me
       | there is something quintessentially a-gendered about the output
       | of a computer program.
       | 
       | Calling Claude (or GPT-5 or Gemini or my bash terminal for that
       | matter) a "he" seems absurd to the point of hilarity.
       | 
       | In my mind, they've always firmly been "it"s.
        
         | bojan wrote:
         | This felt quirky to me as well, possibly because my native
         | language is strictly gendered.
        
         | DocTomoe wrote:
         | Hm, Claude is a common male surname, especially in Europe. That
         | plays into it. Also many people - including me - have
         | personalised their AI chats, have given it names, even
         | something resembling a personality (it's easy with prefix
         | prompts). Why others do it, who knows, I do it because I find
         | it a lot less frustrating when ChatGPT fucks up when it
         | pretends to be a young adult female klutz.
        
       | johnfn wrote:
       | Context is king. The problem is that you are the one currently
       | telling Claude how close it is and what to do next. But if you
       | give it the tools to do that itself, it will make a world of
       | difference.
       | 
       | Give Claude a way to iteratively poke at what it created (such as
       | a playwright harness), and screenshot of what you want, and maybe
       | a way to take a screenshot in Playwright and I think you will get
       | much closer. You might even be able to one shot it.
       | 
       | I've always wondered what would happen if I gave it a screenshot
       | and told it to iterate until the Playwright screenshot matched
       | the mock screenshot, pixel perfect. I imagine it would go nuts,
       | but after a few hours I think it would likely get it. (Either
       | that or minor font discrepancies and rounding errors would cause
       | it to give up...)
        
       | sqircles wrote:
       | > The Space Jam website is simple: a single HTML page, absolute
       | positioning for every element...
       | 
       | Absolute positioning wasn't available until CSS2 in 1998. This is
       | just a table with crafty use of align, valign, colspan, and
       | rowspan.
        
         | DocTomoe wrote:
         | Which would also render differently on every machine, based on
         | browser settings, screen sizes, and available fonts.
         | 
         | Like the web was meant to be. An _interpreted_ hypertext
         | format, not a pixel-perfect brochure for marketing execs.
        
           | jeanlucas wrote:
           | >Like the web was meant to be.
           | 
           | what?
        
           | masswerk wrote:
           | Hum, table cells provide the max-width and images a min-with,
           | heights are absolute (with table cells spilling over, as with
           | CCS "overflow-y: visible"), aligns and maybe HSPACE and
           | VSPACE attributes do the rest. As long as images heights
           | exceed the effective line-height and there's no visible text,
           | this should render pixel perfect on any browser then in use.
           | In this case, there's also an absolute width set for the
           | entire table, adding further constraints. Table layouts can
           | be elastic, with constraints or without, but this one should
           | be pretty stable.
           | 
           | (Fun fact, the most amazing layout foot-guns, then: Effective
           | font sizes and line-heights are subject to platform and
           | configuration (e.g., Win vs Mac); Netscape does paragraph
           | spacing at 1.2em, IE at 1em (if this matters, prefer `<br>`
           | over paragraphs); frames dimensions in Netscape are always
           | calculated as integer percentages of window dimensions, even
           | if you provide absolute dimensions in pixels, while IE does
           | what it says on the tin (a rare example), so they will be the
           | same only by chance and effective rounding errors. And, of
           | course, screen gamma is different on Win and Mac, so your
           | colors will always be messed up - aim for a happy medium.)
        
       | shortformblog wrote:
       | Claude can't properly count the number of characters in a
       | sentence. It's asking a lot to assume it can get pixel perfect.
        
       | simonw wrote:
       | I wonder if Gemini 3 Pro would do better at this particular test?
       | They're very proud of its spatial awareness and vision abilities.
        
       | thenumpaduser wrote:
       | We are actually spoiled at this point.
        
         | echelon wrote:
         | [flagged]
        
           | dang wrote:
           | Could you please stop posting this sort of indignant-
           | sensational comment? It's not what this site is for, as you
           | know (or should know).
           | 
           | https://news.ycombinator.com/newsguidelines.html
        
             | echelon wrote:
             | Dang, can you explain how this is indignant or sensational?
             | 
             | Anthropic's leadership and researchers continue to this day
             | to post messages saying engineering will be fully
             | automated. I can go find recent messages on X if you'd
             | like.
             | 
             | This forum is comprised mostly of engineers, who will be
             | the most impacted if their vision of the world pans out.
             | 
             | YC depends on innovation capital to make money. If the
             | means of production are centralized, how does YC make any
             | money at all from engineers? Such a world will be
             | vertically and horizontally integrated, not democratically
             | spread for others to take advantage of.
             | 
             | Now I don't think that's what's going to happen, but that's
             | what the messaging has been and continues to be from
             | Anthropic's leadership, researchers, and ICs.
             | 
             | Why should we support companies like this?
             | 
             | We shouldn't we advocate for open models where any market
             | participants can fully utilize and explore the competitive
             | gradients?
             | 
             | I don't think I'm saying anything controversial here.
             | 
             | Furthermore, if this pans out like it seems it will - a set
             | of three or four AI hyperscalers - we'll also be in the
             | same situation we have today with the big tech
             | hyperscalers.
             | 
             | Due to a lax regulatory environment, these companies put a
             | ceiling on startup exits by funding internal competition,
             | buying competitors, etc. I don't see how the situation will
             | improve in an AI world.
             | 
             | If you're a capitalist, you want competition to be fierce
             | and fair. You don't want concentration of power.
             | 
             | I can see how an Anthropic IC might not like this post, but
             | this should be fairly reasonable for everyone else who
             | would like to see more distribution of power.
        
       | DocTomoe wrote:
       | I keep wondering ... is this a good benchmark? What is a
       | practical use-case for the skills Claude is supposed to present
       | here? And if the author needs that particular website re-created
       | with pixel-perfect accuracy, woulnd't it me simpler to just to it
       | yourself?
       | 
       | Sure, you can argue this is some sort of modern ACID-Test - but
       | the ACID tests checked for real-world use-cases. This feels more
       | like 'I have this one very specific request, the machine doesn't
       | perfectly fullfill it, so the machine is at fault.'. Complaining
       | from a high pedestal.
       | 
       | I'm more surprised at how close Claude got in its reimagined
       | SpaceJam-site.
        
       | sema4hacker wrote:
       | > The total payload is under 200KB.
       | 
       | Just out of curiosity, how big was what you considered Claude's
       | best attempt to be?
        
       | tehjoker wrote:
       | Hmm you note that the problem is the LLM doesn't have enough
       | image context, but then zoom the image more?
       | 
       | Why not downscale the image and feed it as a second input so that
       | entire planets fit into a patch and instruct it to use the
       | doensampled image for coarse coordinate estimation
        
       | 960design wrote:
       | Claude argued with me about the quadratic equation the other day.
       | It vehemently felt a -c was required whereas a c was the correct
       | answer. I pointed this out showing step by step and it finally
       | agreed. I tried Grok to see if it could get it right. Nope, the
       | exact same response as Claude, but Grok never backed down; even
       | after the step by step explanation of the maths.
        
         | kristofferc wrote:
         | Can you link to the conversation log?
        
         | pbhjpbhj wrote:
         | In the generalised solution there is a '-c' term with
         | coefficient '4a'...?
         | 
         | I'm not well at the moment, perhaps your ML model has flu?!!
        
       | rickcarlino wrote:
       | I look forward to an alternative reality where AI vendors race to
       | have the model with the best Space Jam Bench scores.
        
         | jacquesm wrote:
         | It beats pelicans by a considerable margin.
        
       | computersuck wrote:
       | Why not just host a copy from waybackmachine?
        
       | Madmallard wrote:
       | I wrote a 20,000 line multiplayer battle-arena game in XNA back
       | in 2015 with manually coded physics (so everything is there in
       | the code) and have tried several times with Claude, Gemini, Grok,
       | DeepSeek, and GPT to translate it to JavaScript.
       | 
       | They all fail massively 100% of the time. Even if I break it down
       | into chunks once they get to the chunks that matter the most
       | (i.e. physics, collision detection and resolution, event handling
       | and game logic) they all break down horribly and no amount of
       | prompting back and forth will fix it.
        
       | RagnarD wrote:
       | Why not just feed it the actual instructions that create the site
       | - the page source code, the HTML, CSS, JS if any?
        
         | masswerk wrote:
         | This may be an even greater challenge: analysing a table layout
         | and recreating it in CSS with absolute positioning.
        
       | computersuck wrote:
       | this is just AI brainrot disease
       | 
       | Help, I can't recreate a website with AI! There's no other way,
       | no way I could fix up some HTML code! Believe me, I'm an
       | engineering manager with a computer science degree!
       | 
       | Absolutely disgusting.
        
         | derektank wrote:
         | I'm pretty confident the author was being sarcastic
        
       | victorbuilds wrote:
       | Building something similar - using Claude API to generate mini
       | games from text descriptions (https://codorex.com, still pretty
       | rough).
       | 
       | Can confirm: Claude is weirdly good at generating functional game
       | logic from vague prompts, but spatial precision is a constant
       | battle. Anything involving exact pixel positions needs
       | validation/correction layers on top.
       | 
       | The suggestion upthread about having it write its own measurement
       | tools seems promising - haven't tried that approach yet.
        
         | victorbuilds wrote:
         | Actually, here's a relevant example - had Claude generate an
         | interactive solar system with circular orbits and moons:
         | https://codorex.com/shared/yeABdJWvRHAKqHs2kxpRnZNZPWmqWSu9
         | 
         | Interesting that it handles this fine (functional orbital
         | mechanics, animation) but would probably struggle to recreate
         | the exact pixel positions of the Space Jam layout. Confirms the
         | pattern: good at "make something like X" but bad at "recreate X
         | exactly."
        
       | anorwell wrote:
       | The article does not say at any point which model was used. This
       | is the most basic important information when talking about the
       | capabilities of a model, and probably belongs in the title.
        
       | ErrantX wrote:
       | I just feel this is a great example of someone falling into the
       | common trap of treating an LLM like a human.
       | 
       | They are vastly less intelligent than a human and logical leaps
       | that make sense to you make no sense to Claude. It has no concept
       | of aesthetics or of course any vision.
       | 
       | All that said; it got pretty close even with those impediments!
       | (It got worse because the writer tried to force it to act more
       | like a human would)
       | 
       | I think a better approach would be to write a tool to compare
       | screenshots, identity misplaced items and output that as a text
       | finding/failure state. claude will work much better because your
       | dodging the bits that are too interpretive (that humans rock at
       | and LLMs don't)
        
         | sallveburrpi wrote:
         | > vastly less intelligent than a human
         | 
         | I would more phrase it like that they are a completely alien
         | "intelligence" that cant really be compared to human
         | intelligence
        
           | oncallthrow wrote:
           | Yes, currently they are both smarter and dumber than humans.
        
       | manmal wrote:
       | I would put Claude into a loop and let it make screenshots
       | itself, diffing them against the original screenshot, until it
       | has found the right arrangement at the planets' starting position
       | (pixel perfect match).
        
         | epgui wrote:
         | At that point you may as well just do the work yourself.
        
           | hooo wrote:
           | Why? Just give it access to the playwright mcp server.
        
         | fluidcruft wrote:
         | I would try giving it a tool to work with projections and
         | process axis by axis to see if it works better in 1D than 2D. I
         | dunno how clever claude is about signal processing though.
         | There's no noise here so... I mean really it's just template
         | matching without rotation and without noise so... But I doubt
         | claude can do or reason about basic image processing.
        
       | neuroelectron wrote:
       | My web-dev friend saw the original Space Jam site. I asked him
       | what it would cost to build something like that today. He paused
       | and said:
       | 
       | We can't. We don't know how to do it.
        
       | Aeolun wrote:
       | I think claude could have easily used a script to calculate the
       | positions of the planets exactly here, instead of trying to use
       | the frankly horrible image recognition.
        
         | fluidcruft wrote:
         | Claude could probably have written a script to do it. I'm
         | hunting assets to confirm.
         | 
         | Edit: yup. Most of the wall time was spent installing cv2 in a
         | venv.
         | 
         | Prompt:
         | 
         | ME> In this directory there are two images: @p-pressbox.gif and
         | @screenshot.png. @p-pressbox.gif is an image that exists
         | somewhere inside @screenshot.png Write a script that finds the
         | precise location
         | 
         | CLAUDE: Looking at the images, I can see that p-pressbox.gif
         | (the spaceship with "PRESS BOX SHUTTLE" text) appears in the
         | upper-left area of screenshot.png. I'll write a Python script
         | using OpenCV to find the precise location.
         | 
         | [ some venv stuff/telling claude to use uv and python 3.13 ]
         | 
         | CLAUDE: The script found the precise location of p-pressbox.gif
         | within screenshot.png:                 | Measurement         |
         | Value           |
         | |---------------------|-----------------|       | Top-left
         | corner     | (87, 110)       |       | Bottom-right corner |
         | (218, 166)      |       | Center              | (152, 138)
         | |       | Template size       | 131 x 56 pixels |       | Match
         | confidence    | 97.2%           |            The script is at
         | /tmp/search/find_image.py and uses OpenCV's template matching
         | with normalized cross-correlation.
         | 
         | ME> As part of the output create a visualization of the
         | @p-pressbox.gif positioned on top of the original screenshot
         | 
         | CLAUDE: * Done! The visualization shows a green rectangle
         | highlighting the exact location where p-pressbox.gif was found
         | within the screenshot. The match is at coordinates (87, 110) to
         | (218, 166).
         | 
         | Result visually confirmed.
        
       ___________________________________________________________________
       (page generated 2025-12-07 23:00 UTC)