[HN Gopher] WebSim, WorldSim and the Summer of Simulative AI
___________________________________________________________________
WebSim, WorldSim and the Summer of Simulative AI
Author : swyx
Score : 58 points
Date : 2024-04-27 12:11 UTC (10 hours ago)
(HTM) web link (www.latent.space)
(TXT) w3m dump (www.latent.space)
| swyx wrote:
| author here! I absolutely enjoyed interviewing Joscha Bach who
| was graceful enough to give 30mins of his time with zero prep and
| no idea who I was. I also am in a unique position to report on
| the rise of both WorldSim and WebSim as I literally saw them both
| happen up close. questions welcome!
|
| if you liked the ChatGPT Virtual Machine story from 2022:
| https://news.ycombinator.com/item?id=33847479
|
| you will like this.
|
| if you enjoy behind the scenes, i live streamed the making of the
| video, audio, and essay last night with a few people on
| twitter/youtube https://x.com/swyx/status/1784110650777854148
|
| comments and tough love welcome!
| fjkdlsjflkds wrote:
| A quick comment: The idea seems interesting/entertaining, but
| the requirement to login with a Google account will make some
| people (like me) simply not even try it.
| ClassicRob wrote:
| Login with google was just the quickest thing we could do to
| get auth, we'll roll out more ways to sign in soon. Thanks
| for the feedback!
| mlb_hn wrote:
| nice overview of progress over time. are there quant metrics for
| the sim capabilities or is it mostly vibes?
| ClassicRob wrote:
| Cofounder of Websim here. Right now it's not clear that there's
| any eval for a language model's simulation capabilities.
| Internally, we've (vibe) tested Llama 3, Command R+, WizardLM
| 8x22b, Mistral Large (first version of Websim came out of a
| Mistral hackathon) and GPT-4 Turbo and found them all lacking,
| due to either meh website outputs or mode collapse from
| reinforcement learning (lack of creativity and flexibility).
| That also may be a "skill issue" thing because our system
| prompt is very much optimized for Claude 3's "mind." We'll
| release functionality in the next week or two that lets users
| update the system prompt, in which case this may be less of an
| issue
|
| Claude 3 has a much broader latent space, and seems to "enjoy"
| imagining things. It hasn't been banged into too specific of an
| assistant shape, and doesn't suffer the same degree of "mode
| collapse"
| https://lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-m...
|
| Even Sonnet produces mindblowingly good outputs
| (https://x.com/RobertHaisfield/status/1774579381132050696).
| Haiku is capable of producing full websites with insightful and
| creative content, even if it isn't as capable as Sonnet/Opus.
| For example, I found Curio, an esolang where every line of code
| is a living, sentient being with its own unique personality,
| memories, and goals, mostly by browsing around with Haiku
| (https://x.com/RobertHaisfield/status/1782586807261233620).
| Although Haiku tends to perform better when it is few-shot
| prompted with outputs from Sonnet or Opus earlier in the
| "browser history."
| smusamashah wrote:
| https://websim.ai/ is the project's website being discussed in
| the article
___________________________________________________________________
(page generated 2024-04-27 23:01 UTC)