[HN Gopher] I failed to recreate the 1996 Space Jam Website with...
___________________________________________________________________
I failed to recreate the 1996 Space Jam Website with Claude
Author : thecr0w
Score : 212 points
Date : 2025-12-07 17:18 UTC (5 hours ago)
(HTM) web link (j0nah.com)
(TXT) w3m dump (j0nah.com)
| stonecharioteer wrote:
| I'm curious. Did you ask it to use tables and no CSS?
|
| In 1996, We had only css1. Ask it to use tables to do this,
| perhaps.
| lagniappe wrote:
| We actually had a myriad of competing specs. CSS proper wasn't
| released until december 1996.
| thecr0w wrote:
| Any names for the competing specs? Maybe i could try re-
| prompting with that direction.
| lagniappe wrote:
| Models are trained with content scraped from the net, for
| the most part. The availability of content pertaining to
| those specs is almost nil, and of no SEO value. Ergo,
| models for the most part will only have a cursory knowledge
| of a spec that your browser will never be able to parse
| because that isn't the spec that won.
|
| Nonetheless, here is a link to a list of the specs you
| asked for:
| https://www.w3.org/Style/History/Overview.en.html
| boie0025 wrote:
| Thanks for sharing that. I read through a lot of this.
| Interesting to read those perspectives in the context of
| today.
| lagniappe wrote:
| Much obliged. Have a good weekend. Your new gray hairs
| are en route :)
| wanderingstan wrote:
| There were specs competing for adoption, but only tables
| (the old way) and CSS were actually adopted by browsers. So
| no point trying to use some other positioning technique.
| thecr0w wrote:
| Yes yes great question!
|
| I tried your suggestion and also tried giving it various more
| general versions of the limitations presented by earlier
| generations.
|
| Claude's instinct initially was actually to limit itself to
| less modern web standards.
|
| Unfortunately, nothing got those planets to be in the right
| place.
| Wowfunhappy wrote:
| Claude is not very good at using screenshots. The model may
| _technically_ be multi-modal, but its strength is clearly in
| reading text. I 'm not surprised it failed here.
| fnordpiglet wrote:
| Especially since it decomposes the image into a semantic vector
| space rather than the actual grid of pixels. Once the image is
| transformed into patch embeddings all sense of pixels is
| entirely destroyed. The author demonstrates a profound lack of
| understanding for how multimodal LLMs function that a simple
| query of one would elucidate immediately.
|
| The right way to handle this is not to build it grids and
| whatnot, which all get blown away by the embedding encoding but
| to instruct it to build image processing tools of its own and
| to mandate their use in constructing the coordinates required
| and computing the eccentricity of the pattern etc in code and
| language space. Doing it this way you can even get it to write
| assertive tests comparing the original layout to the final
| among various image processing metrics. This would assuredly
| work better, take far less time, be more stable on iteration,
| and fits neatly into how a multimodal agentic programming tool
| actually functions.
| mcbuilder wrote:
| Yeah, this is exactly what I was thinking. LLMs don't have
| precise geometrical reasoning from images. Having an
| intuition of how the models work is actually.a defining skill
| in "prompt engineering"
| dcanelhas wrote:
| Even with text, parsing content in 2D seems to be a challenge
| for every LLM I have interacted with. Try getting a chatbot to
| make an ascii-art circle with a specific radius and you'll see
| what I mean.
| Wowfunhappy wrote:
| I don't really consider ASCII art to _be_ text. It requires a
| completely different type of reasoning. A blind person can be
| understand text if it 's read out loud. A blind person really
| can't understand ASCII art if it's read out loud.
| throwaway314155 wrote:
| Somehow I suspect Claude Code (in an interactive session with
| trial, error, probing, critiquing, perusing, and all the other
| benefits you get) would do better. This example seems to assume
| Claude can do things in "one shot" (even the later attempts all
| seem to conceal information like it's a homework assignment).
|
| That's not how to successfully use LLM's for coding in my
| experience. It is however perhaps a good demonstration of
| Claude's poor spatial reasoning skills. Another good
| demonstration of this is the twitch.tv/ClaudePlaysPokemon where
| Claude has been failing to beat pokemon for months now.
| thecr0w wrote:
| Not a homework assignment, and no deliberate attempt to conceal
| information, just very long and repetitive logs. A lot of the
| same "insights" so I just didn't provide them here.
|
| > That's not how to successfully use LLM's for coding in my
| experience.
|
| Yeah agree. I think I was just a little surprised it couldn't
| one-shot given the simplicity.
| CharlesW wrote:
| Using https://github.com/anthropics/claude-
| code/tree/main/plugins/... with style-supporting instructions
| and context would've improved the outcome as well.
| thecr0w wrote:
| thank you! I'll try this
| aidos wrote:
| Is the skill effectively just adding a little extra context
| here though? Doesn't strike me as the sort of context that
| would improve the outcome.
|
| https://github.com/anthropics/claude-
| code/blob/main/plugins/...
| 999900000999 wrote:
| Space Jam website design as an LLM benchmark.
|
| This article is a bit negative. Claude gets close , it just can't
| get the order right which is something OP can manually fix.
|
| I prefer GitHub Copilot because it's cheaper and integrates with
| GitHub directly. I'll have times where it'll get it right, and
| times when I have to try 3 or 4 times.
| thecr0w wrote:
| ya, this is true. Another commenter also pointed out that my
| intention was to one-shot. I didn't really go too deeply into
| trying to try multiple iterations.
|
| This is also fairly contrived, you know? It's not a realistic
| limitation to rebuild HTML from a screenshot because of course
| if I have the website loaded I can just download the HTML.
| Retric wrote:
| It's not unrealistic to want to revert to an early version of
| something you only have a screenshot of.
| swatcoder wrote:
| > rebuild HTML from a screenshot
|
| ???
|
| This is precisely the workflow when a traditional graphic
| designer mocks up a web/app design, which still happens _all
| the time_.
|
| They sketch a design in something like Photoshop or
| Illustrator, because they're fluent in these tools and many
| have been using them for decades, and somebody else is tasked
| with figuring out how to slice and encode that design in the
| target interactive tech (HTML+CSS, SwiftUI, QT, etc).
|
| Large companies, design agencies, and consultancies with
| tech-first design teams have a different workflow, because
| they intentionally staff graphic designers with a tighter
| specialization/preparedness, but that's a much smaller share
| of the web and software development space than you may think.
|
| There's nothing contrived at all about this test and it's a
| really great demonstration of how tools like Claude don't
| take naturally to this important task yet.
| thecr0w wrote:
| You know, you're totally right and I didn't even think
| about that.
| smallnix wrote:
| That's not the point of the article. It's about Claude/LLM
| being overconfident in recreating pixel perfect.
| jacquesm wrote:
| All AI's are overconfident. It's impressive what they can do,
| but it is at the same time extremely unimpressive what they
| can't do while passing it off as the best thing since sliced
| bread. 'Perfect! Now I see the problem.'. 'Thank you for
| correcting that, here is a perfect recreation of problem 'x'
| that will work with your hardware.' (never mind the 10
| glaring mistakes).
|
| I've tried these tools a number of times and spent a good bit
| of effort on learning to maximize the return. By the time you
| know what prompt to write you've solved the problem yourself.
| GeoAtreides wrote:
| >which is something OP can manually fix
|
| what if the LLM gets something wrong that the operator (a
| junior dev perhaps) doesn't even know it's wrong? that's the
| main issue: if it fails here, it will fail with other things,
| in not such obvious ways.
| godelski wrote:
| I think that's the main problem with them. It is hard to
| figure out when they're wrong.
|
| As the post shows, you can't trust them when they think they
| solved something but you also can't trust them when they
| think they haven't[0]. The things are optimized for human
| preference, which ultimately results in this being optimized
| to hide mistakes. After all, we can't penalize mistakes in
| training when we don't know the mistakes are mistakes. The de
| facto bias is that we prefer mistakes that we don't know are
| mistakes than mistakes that we do[1].
|
| Personally I think a well designed tool makes errors obvious.
| As a tool user that's what I want and makes tool use
| effective. But LLMs flip this on the head, making errors
| difficult to detect. Which is incredibly problematic.
|
| [0] I frequently see this in a thing it thinks is a problem
| but actually isn't, which makes steering more difficult.
|
| [1] Yes, conceptually unknown unknowns are worse. But you
| can't measure unknown unknowns, they are indistinguishable
| from knowns. So you always optimize deception (along with
| other things) when you don't have clear objective truths
| (most situations).
| bigstrat2003 wrote:
| > it just can't get the order right which is something OP can
| manually fix.
|
| If the tool needs you to check up on it and fix its work, it's
| a bad tool.
| markbao wrote:
| "Bad" seems extreme. The only way to pass the litmus test
| you've described is for a tool to be 100% perfect, so then
| the graph looks like 99.99% "bad tool" until it reaches 100%
| perfection.
|
| It's not that binary imo. It can still be extremely useful
| and save a ton of time if it does 90% of the work and you fix
| the last 10%. Hardly a bad tool.
|
| It's only a bad tool if you spent more time fixing the
| results than building it yourself, which sometimes used to be
| the case for LLMs but is happening less and less as they get
| more capable.
| a4isms wrote:
| If you show me a tool that does a thing perfectly 99% of
| the time, I will stop checking it eventually. Now let me
| ask you: How do you feel about the people who manage the
| security for your bank using that tool? And eventually
| overlooking a security exploit?
|
| I agree that there are domains for which 90% good is very,
| very useful. But 99% isn't always better. In some limited
| domains, it's actually worse.
| 999900000999 wrote:
| Counterpoint.
|
| Humans don't get it right 100% or the time.
| wvenable wrote:
| Perfection is the enemy of good.
| mrweasel wrote:
| That's my thinking. If I need to check up on the work, then
| I'm equally capable of writing the code myself. It might go
| faster with an LLM assisting me, and that feels perfectly
| fine. My issue is when people use the AI tools to generate
| something far beyond their own capabilities. In those cases,
| who checks the result?
| supern0va wrote:
| Honestly, if you had showed this article to me even eighteen
| months ago, I would have been blown away at how good of a job
| Claude did.
|
| It's remarkable how high our expectations have been steadily
| creeping.
| WhyOhWhyQ wrote:
| This comment is missing the point. The real goal of all this is
| not to amaze. It's to create better software. Let's graduate
| past the amazement phase into the realism phase as soon as
| possible. What parts of my project is the LLM for? That is the
| real question worth asking.
| supern0va wrote:
| Oh, to be clear, this isn't a criticism. I think it's super
| cool that we're moving onto the nitpick/refinement phase of
| this tech. :)
| dreadnip wrote:
| Why involve an LLM in this? Just download the site?
| ChrisArchitect wrote:
| Yeah, Internet Archive has lots of copies
| https://web.archive.org/web/20250000000000*/https://www.spac...
| also
|
| What's with the panicked pleas and need to preserve the site,
| assuming locally...?
| GeoAtreides wrote:
| The post is clearly about something else than preserving
| https://www.spacejam.com/1996/
|
| It seems to me the post is about how Claude fails to recreate
| a very simple website from 1996.
| sigseg1v wrote:
| Curious if you've tested something such as:
|
| - "First, calculate the orbital radius. To do this accurately,
| measure the average diameter of each planet, p, and the average
| distance from the center of the image to the outer edge of the
| planets, x, and calculate the orbital radius r = x - p"
|
| - "Next, write a unit test script that we will run that reads the
| rendered page and confirms that each planet is on the orbital
| radius. If a planet is not, output the difference you must shift
| it by to make the test pass. Use this feedback until all planets
| are perfectly aligned."
| turnsout wrote:
| Yes, this is a key step when working with an agent--if they're
| able to check their work, they can iterate pretty quickly. If
| you're in the loop, something is wrong.
|
| That said, I love this project. haha
| monsieurbanana wrote:
| I'm trying to understand why this comment got downvoted. My
| best guess is that "if you're in the loop, something is
| wrong" is interpreted as there should be no human involvement
| at all.
|
| The loop here, imo, refers to the feedback loop. And it's
| true that ideally there should be no human involvement there.
| A tight feedback loop is as important for llms as it is for
| humans. The more automated you make it, the better.
| turnsout wrote:
| Yes, maybe I goofed on the phrasing. If you're in the
| _feedback loop,_ something is wrong. Obviously a human
| should be "in the loop" in the sense that they're aware of
| and reviewing what the agent is doing.
| Aurornis wrote:
| This is my experience with using LLMs for complex tasks: If
| you're lucky they'll figure it out from a simple description,
| but to get most things done the way you expect requires a lot
| of explicit direction, test creation, iteration, and tokens.
|
| One of the keys to being productive with LLMs is learning how
| to recognize when it's going to take much more effort to
| babysit the LLM into getting the right result as opposed to
| simply doing the work yourself.
| jacquesm wrote:
| It would not be the first time that an IT services provider
| makes more money the worse their products perform.
| thecr0w wrote:
| Hm, I didn't try exactly this, but I probably should!
|
| Wrt unit test script, let's take Claude out of the equation,
| how would you design the unit test? I kept running into either
| Claude or some library not being capable of consistently
| identifying planet vs non planet which was hindering Claude's
| ability to make decisions based on fine detail or "pixel
| coordinates" if that makes sense.
| cfbradford wrote:
| Do you give Claude the screenshot as a file? If so I'd just
| ask it to write a tool to diff each asset to every possible
| location in the source image to find the most likely position
| of each asset. You don't really need recognition if you can
| brute force the search. As a human this is roughly what I
| would do if you told me I needed to recreate something like
| that with pixel perfect precision.
| bluedino wrote:
| Congratulations, we finally created 'plain English' programming
| languages. It only took 1/10th of the worlds electricity and
| 40% of the semiconductor production.
| zitterbewegung wrote:
| In actual workflows someone would accept a very close
| reproduction and fix the small issues. Generally I use systems to
| get close enough to a scaffolding and / or make small incremental
| improvements and direct its design
| micromacrofoot wrote:
| I wouldn't call it entirely defeated, it got maybe 90% of the way
| there. Before LLMs you couldn't get 50% of the way there in an
| automated way.
|
| > What he produces
|
| I feel like personifying LLMs more than they currently are is a
| mistake people make (though humans always do this), they're not
| entities, they don't know anything. If you treat them too human
| you might eventually fool yourself a little too much.
| thecr0w wrote:
| As a couple other comments pointed out, it's also not fair to
| judge Claude based on a one shot like this. I sort of assume
| these limitations will remain even if we went back and forth
| but to be fair, I didn't try that more than a few times in this
| investigation. Maybe on try three it totally nails it.
| bdcravens wrote:
| A comparison would Codex would be good. I haven't done it with
| Codex, but when working through problems using ChatGPT, it does a
| great job when given screenshots.
| daemonologist wrote:
| Interesting - these models are all trained to do pixel-level(ish)
| measurement now, for bounding boxes and such. I wonder if you
| could railroad it into being accurate with the right prompt.
| Lerc wrote:
| What models are good at this? I have tried passing images to
| models and asking them for coordinates for specific features,
| then overlaid dots on those points and passed that image back
| to the model so it has a perception of how far out it was. It
| had a tendency to be consistently off by a fixed amount without
| getting closer.
|
| I don't doubt that it is possible eventually, but I haven't had
| much luck.
|
| Something that seemed to assist was drawing a multi coloured
| transparent chequerboard, if the AI knows the position of the
| grid colours it can pick out some relative information from the
| grid.
| daemonologist wrote:
| I've found Qwen3-VL to be fairly accurate at detection
| (though it doesn't always catch every instance). Note that it
| gives answers as per-mille-ages, as if the image was
| 1000x1000 regardless of actual resolution or aspect ratio.
|
| I have also not had luck with any kind of iterative/guess-
| and-check approach. I assume the models are all trained to
| one-shot this kind of thing and struggle to generalize to
| what are effectively relative measurements.
| sdenton4 wrote:
| Feels like the "right" approach would be to have it write some
| code to measure how far off the elements are in the original vs
| recreated image, and then iterate using the numerical output of
| the program...
| johncoatesdev wrote:
| You last-minute cancelled coffee with your friends to work on
| this? I'm not sure how I would feel if a friend did that to me.
| fishtoaster wrote:
| Based on the later life updates, I suspect this was being
| humorous.
|
| > After these zoom attempts, I didn't have any new moves left.
| I was being evicted. The bank repo'd my car. So I wrapped it
| there.
| syassami wrote:
| We've lost the capability to build such marvels.
|
| https://knowyourmeme.com/memes/my-father-in-law-is-a-builder...
| barfoure wrote:
| Lost it at wooden structures at the playground.
| a-dub wrote:
| maybe ask it to use 1990s table based layout approaches?
| al_borland wrote:
| Interesting. I just looked at the page source and it is in fact
| using a table layout. I always assumed it was an image map,
| which I assume would be even more obscure for the LLM.
| thecr0w wrote:
| We should check the Wayback Machine, but in my memory this
| was built with an image map. Maybe like, 10 years ago or
| something. I was googling around when writing this post and
| saw that there are folks still tasked with making sure it's
| up and running. I wonder if they migrated it to tables at
| some point in the last decade.
| smoghat wrote:
| Ok, so here is an interesting case where Claude was almost good
| enough, but not quite. But I've been amusing myself by taking
| abandoned Mac OS programs from 20 years ago that I find on GitHub
| and bringing them up to date to work on Apple silicon. For
| example, jpegview, which was a very fast and simple slideshow
| viewer. It took about three iterations with Claude code before I
| had it working. Then it was time to fix some problems, add some
| features like playing videos, a new layout, and so on. I may be
| the only person in the world left who wants this app, but well,
| that was fine for a day long project that cooked in a window with
| some prompts from me while I did other stuff. I'll probably
| tackle scantailor advanced next to clean up some terrible book
| scans. Again, I have real things to do with my time, but each of
| these mini projects just requires me to have a browser window
| open to a Claude code instance while I work on more attention
| demanding tasks.
| egeozcan wrote:
| Side note: As a person who started using a mac since march, I
| found phoenix slides really good.
| smoghat wrote:
| It is! I was really just curious if I could update this old
| codebase without getting my hands dirty.
| skrebbel wrote:
| > Ok, so here is an interesting case where Claude was almost
| good enough, but not quite.
|
| You say that as if that's uncommon.
| jonplackett wrote:
| This should be the strap line for all AI (so far)
| smoghat wrote:
| That's fair. But I always think of it as an intern I am
| paying $20 a month for or $200 a month. I would be kind of
| shocked if they could do everything as well as I'd hoped
| for that price point. It's fascinating for me and worth the
| money.
|
| I am lucky that I don't depend on this for work at a
| corporation. I'd be pulling my hair out if some boss said
| "You are going to be doing 8 times as much work using our
| corporate AI from now on."
| jonplackett wrote:
| Don get me wrong, doing 80% of my work for me is still
| great. And I'm actually quite glad I'm still needed for
| the other 20%
| pluc wrote:
| I like how the author calls a script on the internet "him".
| thecr0w wrote:
| lol
| NooneAtAll3 wrote:
| better than using a plural for a single entity
| shwaj wrote:
| "it"
| docheinestages wrote:
| > Note: please help, because I'd like to preserve this website
| forever and there's no other way to do it besides getting Claude
| to recreate it from a screenshot.
|
| Why not use wget to mirror the website? Unless you're being
| sarcastic.
|
| $ wget --mirror --convert-links --adjust-extension --page-
| requisites --no-parent http://example.org
|
| Source: https://superuser.com/questions/970323/using-wget-to-
| copy-we...
| thecr0w wrote:
| The stuff about not being able to download it is a bit of a
| joke and I don't think the tone landed with everybody haha.
| This was just an experiment to see if Claude could recreate a
| simple website from a screenshot, of course to your point you
| could download it if you wanted.
| malfist wrote:
| Because that wasn't the goal of this exercise
| hestefisk wrote:
| Would be interesting to see whether Gemini could crack this
| problem.
| th0ma5 wrote:
| I personally don't understand why asking these things to do
| things we know they can't do is supposed to be productive. Maybe
| for getting around restrictions or fuzzing... I don't see it as
| an effective benchmark unless it can link directly to the ways
| the models are being improved, but, to look at random results
| that sometimes are valid and think more iterations of randomness
| will eventually give way to _control_ is a maddening perspective
| to me, but perhaps I need better language to describe this.
| thecr0w wrote:
| I think this is a reasonable take. I think for me, I like to
| investigate limitations like this in order to understand where
| the boundaries are. Claude isn't impossibly bad at analyzing
| images. It's just pixel perfect corrections that seem to be a
| limitation. Maybe for some folks it's enough to just read that
| but for me, I like to feel like I have some good experiential
| knowledge about the limitations that I can keep in my brain and
| apply appropriately in the future.
| iwontberude wrote:
| Apropos given Warner Brothers Discovery just sold to Netflix
| soared wrote:
| I got quite close with Gemini 3 pro in AI studio. I uploaded a
| screenshot (no assets) and the results were similar to OP. It
| failed to follow my fix initially but I told it to follow my
| directions (lol) and it came quite close (though portrait mode
| distorted it, landscape was close to perfect.
|
| "Reference the original uploaded image. Between each image in the
| clock face, create lines to each other image. Measure each line.
| Now follow that same process on the app we've created, and adjust
| the locations of each image until all measurements align
| exactly."
|
| https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
| jacobsenscott wrote:
| > here's no other way to do it besides getting Claude to recreate
| it from a screenshot
|
| And
|
| > I'm an engineering manager
|
| I can't tell if this is an intentional or unintentional satire of
| the current state of AI mandates from management.
| thecr0w wrote:
| lololol
| dmd wrote:
| i can't tell if your comment is satire or not
| chilmers wrote:
| You _really_ can't tell? Perhaps the bar for AGI is lower than
| I thought.
| master_crab wrote:
| Honest question: does he know about F5? Or was it intentional
| to use screenshots when source is available?
| thuttinger wrote:
| Claude/LLMs in general are still pretty bad at the intricate
| details of layouts and visual things. There are a lot of problems
| that are easy to get right for a junior web dev but impossible
| for an LLM. On the other hand, I was able to write a C program
| that added gamma color profile support to linux compositors that
| don't support it (in my case Hyprland) within a few minutes! A -
| for me - seemingly hard task, which would have taken me at least
| a day or more if I didn't let Claude write the code. With one
| prompt Claude generated C code that compiled on first try that:
|
| - Read an .icc file from disk
|
| - parsed the file and extracted the VCGT (video card gamma table)
|
| - wrote the VCGT to the video card for a specified display via
| amdgpu driver APIs
|
| The only thing I had to fix was the ICC parsing, where it would
| parse header strings in the wrong byte-order (they are big-
| endian).
| littlecranky67 wrote:
| > Claude/LLMs in general are still pretty bad at the intricate
| details of layouts and visual things
|
| Because the rendered output (pixels, not HTML/CSS) is not fed
| as data in the training. You will find tons of UI snippets and
| questions, but they rarely included screenshots. And if they
| do, the are not scraped.
| Wowfunhappy wrote:
| Interesting thought. I wonder if Anthropic et al could
| include some sort of render-html-to-screenshot as part of the
| training routine, such that the rendered output would get
| included as training data.
| KaiserPro wrote:
| thats basically a VLM, but the problem is that describing
| the world requires a better understanding of the world.
| Hence why LeCunn is talking about world models (Its also
| cutting edge for teaching robots to manipulate and plan
| manipulations)
| jacquesm wrote:
| Claude didn't write that code. Someone else did and Claude took
| that code without credit to the original author, adapted it to
| your use case and then presented it as its own creation to you
| and you accepted this. If a human did this we probably would
| have a word for them.
| Mtinie wrote:
| > If a human did this we probably would have a word for them.
|
| I don't think it's fair to call someone who used Stack
| Overflow to find a similar answer with samples of code to
| copy to their project an asshole.
| jacquesm wrote:
| Who brought Stack Overflow up? Stack Overflow does not
| magically generate code, someone has to actually provide it
| first.
| Mtinie wrote:
| I generally agree with your underlying point concerning
| attribution and intellectual property ownership but your
| follow-up comment reframes your initial statement: LLMs
| generate recombinations of code from code created by
| humans, without giving credit.
|
| Stack Overflow offers access to other peoples' work, and
| developers combined those snippets and patterns into
| their own projects. I suspect attribution is low.
| jacquesm wrote:
| Stack Overflow deals with that issue by having a license
| agreement.
| sublinear wrote:
| Using stack overflow recklessly is definitely asshole
| behavior.
| bluedino wrote:
| It has been for the last 15 years.
| giancarlostoro wrote:
| You mean like copying and pasting code from Stack Overflow?
| idiotsecant wrote:
| Yes, the word for that is software developer.
| FanaHOVA wrote:
| Are you saying that every piece of code you have ever written
| contains a full source list of every piece of code you
| previously read to learn specific languages, patterns, etc?
|
| Or are you saying that every piece of code you ever wrote was
| 100% original and not adapted from any previous codebase you
| ever worked in or any book / reference you ever read?
| jacquesm wrote:
| What's with the bad takes in this thread. That's two
| strawmen in one comment, it's getting a bit crowded.
| DangitBobby wrote:
| Or the original point doesn't actually hold up to basic
| scrutiny and is indistinguishable from straw itself.
| jacquesm wrote:
| HN has guidelines for a reason.
| bsaul wrote:
| That's an interesting hypothesis : that LLM are fundamentally
| unable to produce original code.
|
| Do you have papers to back this up ? That was also my
| reaction when i saw some really crazy accurate comments on
| some vibe coded piece of code, but i couldn't prove it, and
| thinking about it now i think my intuition was wrong (ie :
| LLMs do produce original complex code).
| jacquesm wrote:
| We can solve that question in an intuitive way: if human
| input is not what is driving the output then it would be
| sufficient to present it with a fraction of the current
| inputs, say everything up to 1970 and have it generate all
| of the input data from 1970 onwards as output.
|
| If that does not work then the moment you introduce AI you
| cap their capabilities unless humans continue to create
| original works to feed the AI. The conclusion - to me, at
| least - is that these pieces of software regurgitate their
| inputs, they are effectively whitewashing plagiarism, or,
| alternatively, their ability to generate new content is
| capped by some arbitrary limit relative to the inputs.
| andrepd wrote:
| Excellent observation.
| bfffbgfdcb wrote:
| I guess _you_ can't create original work either. Given
| you can't reproduce the sum total of human output since
| 1970 by next Tuesday.
|
| There's _something_ in what you're saying, but until you
| refine it to something actually _true_ , it's just more
| slop.
| jacquesm wrote:
| I think my track record belies your very low value and
| frankly cowardly comment. If you have something to say at
| least do it under your real username instead of a
| throwaway.
| andsoitis wrote:
| I like your test. Should we also apply to specific
| humans?
|
| We all stand on the shoulders of giants and learn by
| looking at others' solutions.
| jacquesm wrote:
| That's true. But if we take your implied rebuttal then
| current level AI would be able to learn from current AI
| as well as it would learn from humans. But so far that
| does not seem to be the case, in fact, AI companies do
| everything they can to avoid eating their own tail.
| They'd love eating their own tail if it was worth it.
| fpoling wrote:
| Pick up a book about programming from seventies or eighties
| that was unlikely to be scanned and feed into LLM. Take a
| task from it and ask LLM to write a program from it that
| even a student can solve within 10 minutes. If the problem
| was not really published before, LLM fails spectacularly.
| crawshaw wrote:
| This does not appear to be true. Six months ago I created
| a small programming language. I had LLMs write hundreds
| of small programs in the language, using the parser,
| interpreter, and my spec as a guide for the language. The
| vast majority of these programs were either very close or
| exactly what I wanted. No prior source existed for the
| programming language because I created it whole cloth
| days earlier.
| anjel wrote:
| Sometimes its generated, and many times its not. Trivial
| to denote, but its been deemed non of your business.
| mlinsey wrote:
| Certainly if a human wrote code that solved this problem, and
| a second human copied and tweaked it slightly for their use
| case, we would have a word for them.
|
| Would we use the same word if two different humans wrote code
| that solved two different problems, but one part of each
| problem was somewhat analogous to a different aspect of a
| third human's problem, and the third human took inspiration
| from those parts of both solutions to create code that solved
| a third problem?
|
| What if it were ten different humans writing ten different-
| but-related pieces of code, and an eleventh human piecing
| them together? What if it were 1,000 different humans?
|
| I think "plagiarism", "inspiration", and just "learning from"
| fall on some continuous spectrum. There are clear differences
| when you zoom out, but they are in degree, and it's hard to
| set a hard boundary. The key is just to make sure we have
| laws and norms that provide sufficient incentive for new
| ideas to continue to be created.
| whatshisface wrote:
| They key difference between plagarism and building on
| someone's work is whether you say, "this based on code by
| linsey at github.com/socialnorms" or "here, let me write
| that for you."
| ekropotin wrote:
| > If a human did this we probably would have a word for them.
|
| What do you mean? The programmers work is literally combining
| the existing patterns into solutions for problems.
| Aeolun wrote:
| Software engineer? You think I cite all the code I've ever
| seen before when I reproduce it? That I even remember where
| it comes from?
| fooker wrote:
| > If a human did this we probably would have a word for them.
|
| Humans do this all the time.
| chongli wrote:
| Why is this something a Wayland compositor (a glorified window
| manager) needs to worry about? Apple figured this out back in
| the 1990s with ColorSync and they did it once for the Mac OS
| and any application that wanted colour management could use the
| ColorSync APIs.
| hedgehog wrote:
| Color management infrastructure is intricate. To grossly
| simplify: somehow you need to connect together the profile
| and LUT for each display, upload the LUTs to the display
| controller, and provide appropriate profile data for each
| window to their respective processes. During compositing then
| convert buffers that don't already match the output
| (unmanaged applications will probably be treated as sRGB,
| color managed graphics apps will opt out of conversion and do
| whatever is correct for their purpose).
| pfix wrote:
| I checked the source of the original (like maybe many of you) to
| check how they actually did it and it was... simpler than
| expected. I drilled myself so hard to forget tables as layout...
| And here it is. So simple it's a marvel.
| COAGULOPATH wrote:
| And they do hacky things like space elements vertically using
| <br> tags.
| fortyseven wrote:
| Look at that stupid dog. It's reading a book, but it's really
| trashy YA. It's not even Shakespeare. Dogs are stupid.
| thecr0w wrote:
| lol, that is fair criticism
| satisfice wrote:
| But the dog's owner keeps saying that it ONLY reads
| Shakespeare. The dog's owner wants millions of dollars for the
| dog on that basis.
|
| I'm not mad at the dogs, I'm mad at the stupid investors and
| the lying owners.
| vmg12 wrote:
| We don't know how to build it anymore
| stwsk wrote:
| >Look, I still need this Space Jam website recreated.
|
| Now that's a novel sentence if I've ever read one.
| nickdothutton wrote:
| I have recently been working on something "fun" in the terminal
| that mingles plain ASCII, ANSI "graphics", actual bitmaps
| (Sixel), and Nerdfonts in a TUI framework (Charm etc). After a
| week of smashing Claude's head against a wall, which is better
| than smashing my own, I've had to significantly alter my hopes
| and expectations.
| isoprophlex wrote:
| I have a very weird tangential nit to pick: gendering LLMs. I
| swear I'm not pushing any sort of gender agenda/discussion that
| can be had anytime anywhere else in the current age, but to me
| there is something quintessentially a-gendered about the output
| of a computer program.
|
| Calling Claude (or GPT-5 or Gemini or my bash terminal for that
| matter) a "he" seems absurd to the point of hilarity.
|
| In my mind, they've always firmly been "it"s.
| bojan wrote:
| This felt quirky to me as well, possibly because my native
| language is strictly gendered.
| DocTomoe wrote:
| Hm, Claude is a common male surname, especially in Europe. That
| plays into it. Also many people - including me - have
| personalised their AI chats, have given it names, even
| something resembling a personality (it's easy with prefix
| prompts). Why others do it, who knows, I do it because I find
| it a lot less frustrating when ChatGPT fucks up when it
| pretends to be a young adult female klutz.
| johnfn wrote:
| Context is king. The problem is that you are the one currently
| telling Claude how close it is and what to do next. But if you
| give it the tools to do that itself, it will make a world of
| difference.
|
| Give Claude a way to iteratively poke at what it created (such as
| a playwright harness), and screenshot of what you want, and maybe
| a way to take a screenshot in Playwright and I think you will get
| much closer. You might even be able to one shot it.
|
| I've always wondered what would happen if I gave it a screenshot
| and told it to iterate until the Playwright screenshot matched
| the mock screenshot, pixel perfect. I imagine it would go nuts,
| but after a few hours I think it would likely get it. (Either
| that or minor font discrepancies and rounding errors would cause
| it to give up...)
| sqircles wrote:
| > The Space Jam website is simple: a single HTML page, absolute
| positioning for every element...
|
| Absolute positioning wasn't available until CSS2 in 1998. This is
| just a table with crafty use of align, valign, colspan, and
| rowspan.
| DocTomoe wrote:
| Which would also render differently on every machine, based on
| browser settings, screen sizes, and available fonts.
|
| Like the web was meant to be. An _interpreted_ hypertext
| format, not a pixel-perfect brochure for marketing execs.
| jeanlucas wrote:
| >Like the web was meant to be.
|
| what?
| masswerk wrote:
| Hum, table cells provide the max-width and images a min-with,
| heights are absolute (with table cells spilling over, as with
| CCS "overflow-y: visible"), aligns and maybe HSPACE and
| VSPACE attributes do the rest. As long as images heights
| exceed the effective line-height and there's no visible text,
| this should render pixel perfect on any browser then in use.
| In this case, there's also an absolute width set for the
| entire table, adding further constraints. Table layouts can
| be elastic, with constraints or without, but this one should
| be pretty stable.
|
| (Fun fact, the most amazing layout foot-guns, then: Effective
| font sizes and line-heights are subject to platform and
| configuration (e.g., Win vs Mac); Netscape does paragraph
| spacing at 1.2em, IE at 1em (if this matters, prefer `<br>`
| over paragraphs); frames dimensions in Netscape are always
| calculated as integer percentages of window dimensions, even
| if you provide absolute dimensions in pixels, while IE does
| what it says on the tin (a rare example), so they will be the
| same only by chance and effective rounding errors. And, of
| course, screen gamma is different on Win and Mac, so your
| colors will always be messed up - aim for a happy medium.)
| shortformblog wrote:
| Claude can't properly count the number of characters in a
| sentence. It's asking a lot to assume it can get pixel perfect.
| simonw wrote:
| I wonder if Gemini 3 Pro would do better at this particular test?
| They're very proud of its spatial awareness and vision abilities.
| thenumpaduser wrote:
| We are actually spoiled at this point.
| echelon wrote:
| [flagged]
| dang wrote:
| Could you please stop posting this sort of indignant-
| sensational comment? It's not what this site is for, as you
| know (or should know).
|
| https://news.ycombinator.com/newsguidelines.html
| echelon wrote:
| Dang, can you explain how this is indignant or sensational?
|
| Anthropic's leadership and researchers continue to this day
| to post messages saying engineering will be fully
| automated. I can go find recent messages on X if you'd
| like.
|
| This forum is comprised mostly of engineers, who will be
| the most impacted if their vision of the world pans out.
|
| YC depends on innovation capital to make money. If the
| means of production are centralized, how does YC make any
| money at all from engineers? Such a world will be
| vertically and horizontally integrated, not democratically
| spread for others to take advantage of.
|
| Now I don't think that's what's going to happen, but that's
| what the messaging has been and continues to be from
| Anthropic's leadership, researchers, and ICs.
|
| Why should we support companies like this?
|
| We shouldn't we advocate for open models where any market
| participants can fully utilize and explore the competitive
| gradients?
|
| I don't think I'm saying anything controversial here.
|
| Furthermore, if this pans out like it seems it will - a set
| of three or four AI hyperscalers - we'll also be in the
| same situation we have today with the big tech
| hyperscalers.
|
| Due to a lax regulatory environment, these companies put a
| ceiling on startup exits by funding internal competition,
| buying competitors, etc. I don't see how the situation will
| improve in an AI world.
|
| If you're a capitalist, you want competition to be fierce
| and fair. You don't want concentration of power.
|
| I can see how an Anthropic IC might not like this post, but
| this should be fairly reasonable for everyone else who
| would like to see more distribution of power.
| DocTomoe wrote:
| I keep wondering ... is this a good benchmark? What is a
| practical use-case for the skills Claude is supposed to present
| here? And if the author needs that particular website re-created
| with pixel-perfect accuracy, woulnd't it me simpler to just to it
| yourself?
|
| Sure, you can argue this is some sort of modern ACID-Test - but
| the ACID tests checked for real-world use-cases. This feels more
| like 'I have this one very specific request, the machine doesn't
| perfectly fullfill it, so the machine is at fault.'. Complaining
| from a high pedestal.
|
| I'm more surprised at how close Claude got in its reimagined
| SpaceJam-site.
| sema4hacker wrote:
| > The total payload is under 200KB.
|
| Just out of curiosity, how big was what you considered Claude's
| best attempt to be?
| tehjoker wrote:
| Hmm you note that the problem is the LLM doesn't have enough
| image context, but then zoom the image more?
|
| Why not downscale the image and feed it as a second input so that
| entire planets fit into a patch and instruct it to use the
| doensampled image for coarse coordinate estimation
| 960design wrote:
| Claude argued with me about the quadratic equation the other day.
| It vehemently felt a -c was required whereas a c was the correct
| answer. I pointed this out showing step by step and it finally
| agreed. I tried Grok to see if it could get it right. Nope, the
| exact same response as Claude, but Grok never backed down; even
| after the step by step explanation of the maths.
| kristofferc wrote:
| Can you link to the conversation log?
| pbhjpbhj wrote:
| In the generalised solution there is a '-c' term with
| coefficient '4a'...?
|
| I'm not well at the moment, perhaps your ML model has flu?!!
| rickcarlino wrote:
| I look forward to an alternative reality where AI vendors race to
| have the model with the best Space Jam Bench scores.
| jacquesm wrote:
| It beats pelicans by a considerable margin.
| computersuck wrote:
| Why not just host a copy from waybackmachine?
| Madmallard wrote:
| I wrote a 20,000 line multiplayer battle-arena game in XNA back
| in 2015 with manually coded physics (so everything is there in
| the code) and have tried several times with Claude, Gemini, Grok,
| DeepSeek, and GPT to translate it to JavaScript.
|
| They all fail massively 100% of the time. Even if I break it down
| into chunks once they get to the chunks that matter the most
| (i.e. physics, collision detection and resolution, event handling
| and game logic) they all break down horribly and no amount of
| prompting back and forth will fix it.
| RagnarD wrote:
| Why not just feed it the actual instructions that create the site
| - the page source code, the HTML, CSS, JS if any?
| masswerk wrote:
| This may be an even greater challenge: analysing a table layout
| and recreating it in CSS with absolute positioning.
| computersuck wrote:
| this is just AI brainrot disease
|
| Help, I can't recreate a website with AI! There's no other way,
| no way I could fix up some HTML code! Believe me, I'm an
| engineering manager with a computer science degree!
|
| Absolutely disgusting.
| derektank wrote:
| I'm pretty confident the author was being sarcastic
| victorbuilds wrote:
| Building something similar - using Claude API to generate mini
| games from text descriptions (https://codorex.com, still pretty
| rough).
|
| Can confirm: Claude is weirdly good at generating functional game
| logic from vague prompts, but spatial precision is a constant
| battle. Anything involving exact pixel positions needs
| validation/correction layers on top.
|
| The suggestion upthread about having it write its own measurement
| tools seems promising - haven't tried that approach yet.
| victorbuilds wrote:
| Actually, here's a relevant example - had Claude generate an
| interactive solar system with circular orbits and moons:
| https://codorex.com/shared/yeABdJWvRHAKqHs2kxpRnZNZPWmqWSu9
|
| Interesting that it handles this fine (functional orbital
| mechanics, animation) but would probably struggle to recreate
| the exact pixel positions of the Space Jam layout. Confirms the
| pattern: good at "make something like X" but bad at "recreate X
| exactly."
| anorwell wrote:
| The article does not say at any point which model was used. This
| is the most basic important information when talking about the
| capabilities of a model, and probably belongs in the title.
| ErrantX wrote:
| I just feel this is a great example of someone falling into the
| common trap of treating an LLM like a human.
|
| They are vastly less intelligent than a human and logical leaps
| that make sense to you make no sense to Claude. It has no concept
| of aesthetics or of course any vision.
|
| All that said; it got pretty close even with those impediments!
| (It got worse because the writer tried to force it to act more
| like a human would)
|
| I think a better approach would be to write a tool to compare
| screenshots, identity misplaced items and output that as a text
| finding/failure state. claude will work much better because your
| dodging the bits that are too interpretive (that humans rock at
| and LLMs don't)
| sallveburrpi wrote:
| > vastly less intelligent than a human
|
| I would more phrase it like that they are a completely alien
| "intelligence" that cant really be compared to human
| intelligence
| oncallthrow wrote:
| Yes, currently they are both smarter and dumber than humans.
| manmal wrote:
| I would put Claude into a loop and let it make screenshots
| itself, diffing them against the original screenshot, until it
| has found the right arrangement at the planets' starting position
| (pixel perfect match).
| epgui wrote:
| At that point you may as well just do the work yourself.
| hooo wrote:
| Why? Just give it access to the playwright mcp server.
| fluidcruft wrote:
| I would try giving it a tool to work with projections and
| process axis by axis to see if it works better in 1D than 2D. I
| dunno how clever claude is about signal processing though.
| There's no noise here so... I mean really it's just template
| matching without rotation and without noise so... But I doubt
| claude can do or reason about basic image processing.
| neuroelectron wrote:
| My web-dev friend saw the original Space Jam site. I asked him
| what it would cost to build something like that today. He paused
| and said:
|
| We can't. We don't know how to do it.
| Aeolun wrote:
| I think claude could have easily used a script to calculate the
| positions of the planets exactly here, instead of trying to use
| the frankly horrible image recognition.
| fluidcruft wrote:
| Claude could probably have written a script to do it. I'm
| hunting assets to confirm.
|
| Edit: yup. Most of the wall time was spent installing cv2 in a
| venv.
|
| Prompt:
|
| ME> In this directory there are two images: @p-pressbox.gif and
| @screenshot.png. @p-pressbox.gif is an image that exists
| somewhere inside @screenshot.png Write a script that finds the
| precise location
|
| CLAUDE: Looking at the images, I can see that p-pressbox.gif
| (the spaceship with "PRESS BOX SHUTTLE" text) appears in the
| upper-left area of screenshot.png. I'll write a Python script
| using OpenCV to find the precise location.
|
| [ some venv stuff/telling claude to use uv and python 3.13 ]
|
| CLAUDE: The script found the precise location of p-pressbox.gif
| within screenshot.png: | Measurement |
| Value |
| |---------------------|-----------------| | Top-left
| corner | (87, 110) | | Bottom-right corner |
| (218, 166) | | Center | (152, 138)
| | | Template size | 131 x 56 pixels | | Match
| confidence | 97.2% | The script is at
| /tmp/search/find_image.py and uses OpenCV's template matching
| with normalized cross-correlation.
|
| ME> As part of the output create a visualization of the
| @p-pressbox.gif positioned on top of the original screenshot
|
| CLAUDE: * Done! The visualization shows a green rectangle
| highlighting the exact location where p-pressbox.gif was found
| within the screenshot. The match is at coordinates (87, 110) to
| (218, 166).
|
| Result visually confirmed.
___________________________________________________________________
(page generated 2025-12-07 23:00 UTC)