hngopher.com

       [HN Gopher] WebSim, WorldSim and the Summer of Simulative AI
       ___________________________________________________________________
        
       WebSim, WorldSim and the Summer of Simulative AI
        
       Author : swyx
       Score  : 58 points
       Date   : 2024-04-27 12:11 UTC (10 hours ago)
        
 (HTM) web link (www.latent.space)
 (TXT) w3m dump (www.latent.space)
        
       | swyx wrote:
       | author here! I absolutely enjoyed interviewing Joscha Bach who
       | was graceful enough to give 30mins of his time with zero prep and
       | no idea who I was. I also am in a unique position to report on
       | the rise of both WorldSim and WebSim as I literally saw them both
       | happen up close. questions welcome!
       | 
       | if you liked the ChatGPT Virtual Machine story from 2022:
       | https://news.ycombinator.com/item?id=33847479
       | 
       | you will like this.
       | 
       | if you enjoy behind the scenes, i live streamed the making of the
       | video, audio, and essay last night with a few people on
       | twitter/youtube https://x.com/swyx/status/1784110650777854148
       | 
       | comments and tough love welcome!
        
         | fjkdlsjflkds wrote:
         | A quick comment: The idea seems interesting/entertaining, but
         | the requirement to login with a Google account will make some
         | people (like me) simply not even try it.
        
           | ClassicRob wrote:
           | Login with google was just the quickest thing we could do to
           | get auth, we'll roll out more ways to sign in soon. Thanks
           | for the feedback!
        
       | mlb_hn wrote:
       | nice overview of progress over time. are there quant metrics for
       | the sim capabilities or is it mostly vibes?
        
         | ClassicRob wrote:
         | Cofounder of Websim here. Right now it's not clear that there's
         | any eval for a language model's simulation capabilities.
         | Internally, we've (vibe) tested Llama 3, Command R+, WizardLM
         | 8x22b, Mistral Large (first version of Websim came out of a
         | Mistral hackathon) and GPT-4 Turbo and found them all lacking,
         | due to either meh website outputs or mode collapse from
         | reinforcement learning (lack of creativity and flexibility).
         | That also may be a "skill issue" thing because our system
         | prompt is very much optimized for Claude 3's "mind." We'll
         | release functionality in the next week or two that lets users
         | update the system prompt, in which case this may be less of an
         | issue
         | 
         | Claude 3 has a much broader latent space, and seems to "enjoy"
         | imagining things. It hasn't been banged into too specific of an
         | assistant shape, and doesn't suffer the same degree of "mode
         | collapse"
         | https://lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-m...
         | 
         | Even Sonnet produces mindblowingly good outputs
         | (https://x.com/RobertHaisfield/status/1774579381132050696).
         | Haiku is capable of producing full websites with insightful and
         | creative content, even if it isn't as capable as Sonnet/Opus.
         | For example, I found Curio, an esolang where every line of code
         | is a living, sentient being with its own unique personality,
         | memories, and goals, mostly by browsing around with Haiku
         | (https://x.com/RobertHaisfield/status/1782586807261233620).
         | Although Haiku tends to perform better when it is few-shot
         | prompted with outputs from Sonnet or Opus earlier in the
         | "browser history."
        
       | smusamashah wrote:
       | https://websim.ai/ is the project's website being discussed in
       | the article
        
       ___________________________________________________________________
       (page generated 2024-04-27 23:01 UTC)