[HN Gopher] Genie 2: A large-scale foundation world model
       ___________________________________________________________________
        
       Genie 2: A large-scale foundation world model
        
       Author : meetpateltech
       Score  : 709 points
       Date   : 2024-12-04 14:45 UTC (8 hours ago)
        
 (HTM) web link (deepmind.google)
 (TXT) w3m dump (deepmind.google)
        
       | jjice wrote:
       | I don't understand this space very well, but this seems
       | incredible.
       | 
       | Something I find interesting about generative AI is how it adds a
       | huge layer of flexibility, but at the cost of lots of
       | computation, while a very narrow set of constraints (a
       | traditional program) is comparatively incredibly efficient.
       | 
       | If someone spent a ton of time building out something simple in
       | Unity, they could get the same thing running with a small
       | fraction of the computation, but this has seemingly infinite
       | flexibility based on so little and that's just incredible.
       | 
       | The reason I mention it is because I'm interested in where we end
       | up using these. Will traditional programming be used for most
       | "production" workloads with gen AI being used to aid in the
       | prototyping and development of those traditional programs, or
       | will we get to the point where our gen AI is the primary driver
       | of software?
       | 
       | I assume that concrete code will always be faster and the best
       | way to have deterministic results, but I really have to idea how
       | to conceptualize what the future looks like now.
        
         | Retric wrote:
         | Longer term computation isn't really the limiting factor for
         | generative AI, it's training data. Generative AI is like Google
         | search before the web responded to their search engine
         | existing. There's a huge quantity of high quality training data
         | which nobody had any reason to pollute ready for the scrapping.
         | 
         | But modern search is hampered by people responding to
         | algorithmic indexes. Algorithms responding to metadata without
         | directly evaluating content enabled a world of SEO and low
         | quality websites suddenly being discoverable as long as they
         | narrow their focus enough.
         | 
         | So longer term it's going to be an arms race between the output
         | of Generative AI and people trying to keep updating their
         | models. In 20 years people will get much better at using these
         | tools, but the tools themselves may be less useful. I wouldn't
         | be surprised if eventually someone sneaks advertising into the
         | output of someone else's model etc.
        
           | Miraste wrote:
           | This has already happened. Search google for a few random
           | terms, and go through the first page of web and image
           | results. A decent chunk will be AI-generated.
        
           | golol wrote:
           | I disagree. With more computation you can train a bigger
           | model on the same size training data and it will be better.
           | There is a lot if knowledge on the internet that GPT-4 etc.
           | have not yet learned.
        
             | Retric wrote:
             | The issue is the training data _isn't_ some constant. Let's
             | suppose OpenAI had 10x the computing power but a vastly
             | worse dataset, do you expect a better or worse result?
             | 
             | The question is ambiguous without defining how much worse
             | the dataset is.
        
         | danans wrote:
         | > I assume that concrete code will always be faster and the
         | best way to have deterministic results, but I really have to
         | idea how to conceptualize what the future looks like now.
         | 
         | It will likely be a mix of both concrete code and live AI
         | generated experiences, but even the concrete code will likely
         | be partially AI generated and modified. The ratio will depend
         | on how reliable vs creative the software needs to be.
         | 
         | For example, no AI generated code running pacemakers or power
         | plants. But game world experiences could easily be made more
         | dynamic by generative AI.
        
         | singularity2001 wrote:
         | Makes me wonder if there's any company which is trying to train
         | a model to produce three D worlds within Unity (not as a video
         | like oasis).
        
         | sbarre wrote:
         | > Will traditional programming be used for most "production"
         | workloads with gen AI being used to aid in the prototyping and
         | development of those traditional programs
         | 
         | I mean we're already there with Copilot, Cursor and other tools
         | that use LLMs to assist in coding tasks.
        
       | me551ah wrote:
       | So when I can try this?
        
         | ilaksh wrote:
         | It's Google so I assume never. No model release, no product, no
         | API, no detailed paper.
         | 
         | There was another quite similar model from a different group
         | within the last month or so. I can't remember if they released
         | any weights or anything or the name of it. But it was the same
         | concept.
        
         | vessenes wrote:
         | You'll need to wait until Baidu or AliBaba or Nvidia publish a
         | competing model, unfortunately, if history is any guide.
        
         | mhld wrote:
         | Probably when Genie 10 will get integrated on a Pixel phone.
        
       | vessenes wrote:
       | This is.. super impressive. I'd like to know how large this model
       | is. I note that the first thing they have it do is talk to agents
       | who can control the world gen; geez - even robots get to play
       | video games while we work.
       | 
       | That said; I cannot find any:
       | 
       | - architecture explanation
       | 
       | - code
       | 
       | - technical details
       | 
       | - API access information
       | 
       | Feels very DeepMind / 2015, and that's a bummer. I think the
       | point of the "we have no moat" email has been taken to heart at
       | Google, and they continue to be on the path of great demos, bleh
       | product launches two years later, and no open access in the
       | interim.
       | 
       | That said, just _knowing_ this is possible - world navigation
       | based on a photo and a text description with up to a minute of
       | held context -- is amazing, and I believe will inspire some
       | groups out there to put out open versions.
        
         | wongarsu wrote:
         | We already knew it's possible from AI minecraft
         | (https://oasis.decart.ai). This is just a more impressive
         | version of that, trained on a wider range of games and with
         | more context frames (Oasis has about a second of context, this
         | one a minute). Even the architecture seems to be about the
         | same.
         | 
         | Had they released this two months earlier it would have been
         | incredibly impressive. Now it's still cool and inspiring, but
         | no longer as ground breaking. It's the cooler version that
         | doesn't come with a demo or any hope of actually trying it out.
         | 
         | And with the things we know from Oasis's demo, the agent-
         | training use case the post tries to sell for Genie 2 is a hard
         | sell. Any attempt to train an agent on such a world would
         | likely look like an AI Minecraft speedrun: generate enough
         | misleading context frames to trick the AI into generating what
         | you want
        
           | achierius wrote:
           | This is far beyond Oasis. Oasis had approximately 0
           | continuity, and the generated world was a blurry mess. This
           | on the other hand actually approaches usability.
        
         | summerlight wrote:
         | While this is impressive, yet still looks like a very early
         | prototype. The overall nuance seems that it doesn't try to be a
         | standalone product but a part of broader R&D projects toward
         | general agents... I doubt if they even have any productionized
         | modeling pipelines for this project yet and pretty sure that we
         | won't have an open access anytime soon.
        
           | mclau156 wrote:
           | there are lots of 3D modelers spending hours on 3D worlds and
           | assets to use in training, this seems to automate a lot of
           | that work
        
         | niceice wrote:
         | Any estimates of how much one of these cost to generate and
         | keep a minute of context?
         | 
         | Secondly, any estimate of how much the price could fall in 5-10
         | years?
        
           | wongarsu wrote:
           | Oasis (the Minecraft world model) can serve about 5 players
           | on 8 H100 in real-time at 20fps in 360p. This is a much more
           | capable model with two orders of magnitude more context. They
           | pretty much say it can't be played real-time, which I read as
           | they generate less than 15fps@240p on 8 GPUs. Probably why
           | they talk so much about using it for AI training and
           | evaluation rather than human use. There is a distilled
           | version that works in real-time, but they don't show anything
           | from that version (which is a statement in itself).
           | 
           | For reducing the price, ASICs like etched may be the way
           | forwards [1]. The models will get bigger for a time, but
           | there may be a lot of room for models that can exploit
           | purpose-built hardware.
           | 
           | 1: https://www.etched.com
        
             | onlyrealcuzzo wrote:
             | > Probably why they talk so much about using it for AI
             | training and evaluation rather than human use.
             | 
             | What would they do / how would they use this output to make
             | a better AI?
        
             | latchkey wrote:
             | Hey! I'd love to know how this performs on 8xMI300x in
             | comparison. Reach out to me?
        
           | llm_trw wrote:
           | The price of LLMs has fallen 1,000 times in the last year for
           | the same quality tokens.
           | 
           | It's not clear if video models will follow the same
           | trajectory.
        
         | lovich wrote:
         | I asked this in a similar thread the other day but what is with
         | this pattern as well exemplifies with the below quote
         | 
         | > This is.. super impressive. I'd like to know how large this
         | model is. I note that the first thing they have it do is talk
         | to agents who can control the world gen; geez - even robots get
         | to play video games while we work. That said; I cannot find
         | any:
         | 
         | > architecture explanation > code > technical details > API
         | access information
        
         | whiplash451 wrote:
         | This kind of demo is probably great for hiring top talents:
         | come work here, we have the best models and you'll have your
         | name on the best papers.
        
       | artninja1988 wrote:
       | Looking at the list of authors, is this from their open endedness
       | team? I found their position paper on it super convincing
       | https://arxiv.org/abs/2406.02061
        
         | warkdarrior wrote:
         | Did you link the wrong Arxiv paper?
         | https://arxiv.org/abs/2406.02061 does not look like a position
         | paper nor does it share any authors with this Genie 2 work.
        
           | artninja1988 wrote:
           | Yes, I meant this paper https://arxiv.org/abs/2406.04268
           | Should have double checked, sorry and thank you for pointing
           | it out
        
       | mdrzn wrote:
       | Wow.. I can't even imagine where we'll be in 5 or 10 years from
       | now.
       | 
       | Seems that it's only "consistent" up to a minute, but if the
       | progress keeps the same rate.. just wow.
        
         | netdevphoenix wrote:
         | Progress is not linear. For all we know, in 2027 things will
         | slow down to a virtual halt for the next 30 years. Look at how
         | much big science progressed in the first 20 years of the 19th
         | century/20th century and look how little it has progressed in
         | the first 20 years of this century. We are on the downlow
         | compared to the last centuries and even if you look at crisp or
         | deep learning, they are not as impactful NOW as let's say the
         | germ theory of disease, evolution, the discovery of the double
         | helix structure or general relativity was. Almost a quarter of
         | a century gone and we don't have much to show for it.
         | 
         | For reference:
         | 
         | 19th century
         | 
         | evolution by natural selection as science
         | 
         | electromagnetism
         | 
         | germ theory of disease
         | 
         | first law of thermodynamics
         | 
         | --------------------------------------------
         | 
         | 20th century
         | 
         | general relativity
         | 
         | quantum mechanics
         | 
         | dna structure
         | 
         | penicillin
         | 
         | big bang theory
         | 
         | --------------------------------------------
         | 
         | 21st century
         | 
         | crisp
         | 
         | deep learning
        
           | dooglius wrote:
           | The things you list for previous centuries aren't limited to
           | the first 20 years
        
             | netdevphoenix wrote:
             | 19th century: electromagnetism, the voltaic pile, the
             | double slit experiment for the light wave theory
             | 
             | 20th century: general/special relativity, radioactive
             | decay, discovery of the electron
             | 
             | 21st century: crisp and deep learning
             | 
             | Hard to argue that the big science of the first 20 years of
             | the previous century looks way more impact than crisp and
             | deep learning put TOGETHER.
        
               | dekhn wrote:
               | its called crispr, not crisp.
        
               | samvher wrote:
               | 100 years later, sure. What about in December 1924?
        
           | Workaccount2 wrote:
           | >Look how little it has progressed in the first 20 years of
           | this century
           | 
           | This is naivete on the scale of "Cars were much safer 70
           | years ago".
        
           | w10-1 wrote:
           | crispr variants have not particularly improved treatments.
           | 
           | But DNA sequencing and biologics have revolutionized medicine
           | and changed lives.
           | 
           | Also, the computer as phone took it from 100M's mostly
           | business users buying optical disks to 3+B everyday people
           | getting regular system updates and apps on demand accessing
           | real-time information. That change alone far outweighs the
           | impact of anything produced by advanced physics.
           | 
           | As a result we, as developers, now have the power to deliver
           | both messages and experiences to the entire world.
           | 
           | Ideas are cheap, and progress is virtually guaranteed in
           | intellectual history. But execution is exquisitely easy to
           | get wrong. Genie 2 is just Google's first bite at this apple,
           | and milestones and feedback are key to getting something as
           | general as AI right. Fingers crossed!
        
       | lionkor wrote:
       | > deepmind.google uses cookies from Google to deliver and enhance
       | the quality of its services and to analyze traffic. Learn more.
       | 
       | Yippee finally google posts a non confirming cookie popup with no
       | way to reject the ad cookies!
        
       | wildermuthn wrote:
       | The technology is incredible, but the path to AGI isn't single-
       | player. Qualia is the missing dataset required for AGI. See
       | attention-schema theory for how social pressures lead to qualia-
       | driven minds capable of true intelligence.
        
       | simonw wrote:
       | Related recent project you can try out yourself (Chrome only)
       | which hallucinates new frames of a Minecraft style game:
       | https://oasis.decart.ai/
       | 
       | That one would reimagine the world any time you look at the sky
       | or ground. Sounds like Genie2 solves that: "Genie 2 is capable of
       | remembering parts of the world that are no longer in view and
       | then rendering them accurately when they become observable
       | again."
        
         | echelon wrote:
         | This blows Decart's Oasis (which raised $25 million at $500
         | million valuation) and World Labs (which raised $230 million in
         | complete stealth) out of the water.
         | 
         | Google is firing warning shots to kill off interest in funding
         | competing startups in this space.
         | 
         | I suspect that in 6 months it won't matter as we'll have
         | completely open source Chinese world models. They're already
         | starting to kill video foundation model companies' entire value
         | prop by releasing open models and weights. Hunyuan blows Runway
         | and OpenAI's Sora completely out of the water, and it's 100%
         | open source. How do companies like Pika compete with free?
         | 
         | Meta and Chinese companies are not the leaders in the space, so
         | they're salting the earth with insanely powerful SOTA open
         | models to prevent anyone from becoming a runaway success. Meta
         | is still playing its cards close to its chest so they can keep
         | the best pieces private, but these Chinese companies are
         | dropping innovation left and right like there's no tomorrow.
         | 
         | The game theory here is that if you're a foundation model
         | "company", you're dead - big tech will kill you. You don't have
         | a product, and you're paying a lot to do research that isn't
         | necessarily tied to customer demand. If you're a leading AI
         | research+product company, everyone else will release their
         | code/research to create a thousand competitors to you.
        
           | Workaccount2 wrote:
           | I strongly suspect that like open ai and O1, for profit
           | companies are going to start locking down whatever advances
           | they find.
           | 
           | There is still an enormous amount of long hanging fruit that
           | anyone can harvest right now, but eventually big advances are
           | going to require big budgets and I can only imagine how
           | technically tight lipped they will be with those.
        
           | senko wrote:
           | > The game theory here is that if you're a foundation model
           | "company", you're dead - big tech will kill you. You don't
           | have a product, and you're paying a lot to do research that
           | isn't necessarily tied to customer demand.
           | 
           | Basically, the foundation model companies are outsourced R&D
           | labs for big tech. They can be kept at arms length (like
           | OpenAI with Microsoft and Anthropic with Amazon) or be bought
           | outright (like Inflection, although that was a weird one).
           | 
           | Both OpenAI and Anthropic are trying to move away from being
           | pure model companies.
           | 
           | > If you're a leading AI research+product company, everyone
           | else will release their code/research to create a thousand
           | competitors to you.
           | 
           | Trillion dollar question - is there a competitive edge / moat
           | in vertical integration in AI? Apple proved there was in
           | hardware + os (which were unbundled in wintel times). For AI,
           | right now, I can't see one, but I'm just a random internet
           | comentator, who knows.
        
             | refulgentis wrote:
             | I think not, it feels more like a utility to me until
             | someone pulls their API.
        
           | mrandish wrote:
           | > Chinese companies are not the leaders in the space, so
           | they're salting the earth with insanely powerful SOTA open
           | models to prevent anyone from becoming a runaway success.
           | 
           | While it would be interesting if Chinese companies were
           | releasing their best full models as an intentional strategy
           | to reduce VC funding availability for western AI startups, it
           | would be downright _fascinating_ if the Chinese government
           | was supporting this as a broader geopolitical strategy to
           | slow down the West.
           | 
           | It does make sense but would require a remarkable level of
           | insight, coordination and commitment to a costly yet
           | uncertain strategy.
        
             | whiplash451 wrote:
             | I don't think it requires a remarkable level of insight.
             | 
             | The overall cost for the Chinese government is probably
             | very small in the grand scheme of things. And it makes a
             | lot of sense from a geopolitical strategy.
        
           | whiplash451 wrote:
           | The game has indeed become brutal for foundational model
           | companies.
           | 
           | I am less worried for AI research+product companies: they
           | have likely secured revenue streams with real customers and
           | built domain knowledge in the meantime.
        
         | ilaksh wrote:
         | There is another recent project that is more general game
         | generation very similar to Genie 2. I can't remember the name.
         | 
         | GameGen-X came out last month.
         | https://arxiv.org/html/2411.00769v1
        
         | psb217 wrote:
         | RE: "Genie 2 is capable of remembering parts of the world that
         | are no longer in view and then rendering them accurately when
         | they become observable again." -- This claim is almost
         | certainly wildly misleading. This claim is technically true if
         | there's any scenario where their agent, eg, briefly looked down
         | at the ground and then back up at the sky and at least one of
         | the clouds in the sky was the same as before looking down.
         | However, I expect most people will interpret the claim far more
         | broadly than the model can support. It's classic weasel
         | wording.
        
           | pfortuny wrote:
           | "remember parts of the world..." not even "some"... That is a
           | tell-tale.
        
           | isotypic wrote:
           | Looking at how no samples other than the 3 samples in the
           | "Long horizon memory" section have any camera movement which
           | puts something offscreen and then back onscreen, it certainly
           | seems that they are stretching the capabilities as far as
           | they can in writing.
        
           | drusepth wrote:
           | Yeah, my best guess is they're probably including the
           | previous N frames as context into generating the next model.
           | This works to preserve continuity over a short amount of time
           | (as you say, briefly looking at the ground and then back up),
           | but only a short period of time.
           | 
           | For these kinds of models to be "playable" by humans (and,
           | I'd argue, most fledgling AI agents), the world state needs
           | to be encoded in the context, not just a visual
           | representation of what the player most recently saw.
        
         | wongarsu wrote:
         | However the architecture they describe really sounds like it
         | should still have that issue. I doubt they really solved it.
         | 
         | Which is a big problem for the agent-training use case they
         | keep reiterating on the website. Agents are like speedrunners:
         | if there is a stupid exploit, the agent will probably find and
         | use it. And for Oasis the speedrunning meta for getting to the
         | nether is to find anything red, make it fill the screen, and
         | repeat until the world-generating AI thinks you look at lava
         | and must be in the nether
        
       | bix6 wrote:
       | Genuine question: What is the point of telling us about this if
       | we can't use it? Is it just to flex on everyone?
        
         | mhld wrote:
         | Some kind of marketing strategy that actually nobody
         | understands
        
           | jazzyjackson wrote:
           | It's not that opaque, it's recruitment. Basically same
           | marketing as a univeristy. "We do state of the art research
           | here. If you are a talented researcher who wants to advance
           | the field, you'll want to work here"
           | 
           | Now, how Google plans to make money with all this bleeding
           | edge research, _that 's_ the mystery.
        
         | tootie wrote:
         | It's PR but it's also meant to entice. Let the world know
         | Google is #1 for Gen AI, convince researchers to join Google,
         | convince investors to boost the stock price, make Elon Musk
         | grit his teeth. That kind of thing. In the short term, it may
         | provide a bump in interest for existing AI products from
         | Google.
        
         | ChrisArchitect wrote:
         | The best minds of a generation went from thinking about how to
         | make people click ads to how to generate 3d video game worlds.
        
           | adventured wrote:
           | The best minds were never working on getting people to click
           | on ads. That was an internal industry narrative so people
           | could feel better about themselves.
        
             | fragmede wrote:
             | seems more like an external narrative so people can feel
             | worse about the world
        
           | Workaccount2 wrote:
           | The best minds of the generation are on wall street trying to
           | figure out how to quickly spot inefficiently priced options
           | 1% more often.
           | 
           | Seriously, I wish more than anything I was kidding.
        
         | mupuff1234 wrote:
         | An artifact for their promotion packet.
        
         | echelon wrote:
         | To stop competing startups from getting funding.
         | 
         | Decart (Oasis) raised $25 million at $500 million valuation.
         | 
         | World Labs raised $230 million.
        
         | justlikereddit wrote:
         | [flagged]
        
         | xnx wrote:
         | Often to establish that the authors were first in the space for
         | when competitors announce their tech.
        
           | ilaksh wrote:
           | They were not though, this is very similar to the one that
           | came out last month. https://arxiv.org/html/2411.00769v1
        
         | spencerchubb wrote:
         | Researchers want to publish
         | 
         | Recruiting
        
       | rvz wrote:
       | Hmmm.... But we were told on HN that "Google is dying" remember?
       | in reality, is it isn't.
       | 
       | We'll see which so-called AI-companies are really "dying" when
       | either a correction, market crash or a new AI winter arrives.
        
       | bearjaws wrote:
       | > Genie 2 is capable of remembering parts of the world that are
       | no longer in view and then rendering them accurately when they
       | become observable again.
       | 
       | This is huge, the Minecraft demos we saw recently we're just toys
       | because you couldn't actually do anything in them.
        
         | psb217 wrote:
         | It's worth keeping in mind that "there exists X such that Y is
         | true" is not the same as "Y is true for all X". People love
         | using these sorts of statements since they're technically true
         | as written, but most people will read them in a way that's
         | false. Eg, the statement is true for the Minecraft demos, and
         | for any model which doesn't exhibit literally zero persistence
         | for (temporarily) non-visible state.
        
       | stoicjumbotron wrote:
       | Do people within Google get to try it? If yes, how long is the
       | approval process?
        
       | xcodevn wrote:
       | On a very similar theme, here is the work from World Lab (founded
       | by Fei-Fei Li, ImageNet dataset, et al.) about creating 3D
       | worlds:
       | 
       | https://www.worldlabs.ai/blog
        
         | momojo wrote:
         | I find this work much more exciting. They're not just teaching
         | a model to hallucinate given WASD input. They're generating
         | durable, persistent point clouds. It looks so similar to Genie2
         | yet they're worlds apart.
        
       | moralestapia wrote:
       | Not even a month ago HN was discussing Ben Affleck's take on
       | actors and AI, somehow taking a side with him and arguing how the
       | tech "it's just not there, etc...".
       | 
       | I'll keep my stance, give it two years and very realistic movies,
       | with plot and everything, will be generated on demand.
        
         | tartoran wrote:
         | Ai can't generate images without awkward hallucinations yet.
         | From that to movies that make sense to movies that people would
         | want to watch (comparable to feature films) beyond the initial
         | curiosity factor is a long way, if there is one.
        
           | moralestapia wrote:
           | ChatGPT (no Sora, no World Generation, etc...) was released
           | two years ago almost to the date.
           | 
           | What you're talking about is a minor jump from the SOTA, much
           | smaller than what we've already see in these two years.
        
         | Sateeshm wrote:
         | I'll take that bet
        
       | binalpatel wrote:
       | This is super impressive.
       | 
       | Interesting they're framing this more from the world model/agent
       | environment angle, when this seems like the best example so far
       | of generative games.
       | 
       | 720p realtime mostly consistent games for a minute is amazing,
       | considering stable diffusion was originally released 2ish years
       | ago.
        
         | uoaei wrote:
         | Pixelspace is an awful place to be generating 3D assets and
         | maintaining physical self-consistency.
        
           | jeroenvlek wrote:
           | Ultimately even conventional 3d assets are rendered into
           | pixelspace. It all comes down to the constraints in the model
           | itself.
        
             | psb217 wrote:
             | A key strength of conventional 3d assets is that their form
             | is independent of the scenes in which they will be
             | rendered. Models that work purely in pixel space avoid the
             | constraints imposed by representing assets in a fixed
             | format, but they have to do substantial extra work to even
             | approximate the consistency and recomposability of
             | conventional 3d assets. It's unclear whether current
             | approaches to building and training purely pixel-based
             | models will be able to achieve a practically useful balance
             | between their greater flexibility and higher costs. World
             | Labs, for example, seems to be betting that an intermediate
             | point of generating worlds in a flexible but structured
             | format (NERFs, gauss splats, etc) may produce practical
             | value more quickly than going straight for full freedom and
             | working in pixel space.
        
       | 42lux wrote:
       | I don't know I get the excitement but as soon as you turn around
       | and there is something completely different behind you it breaks
       | the immersion.
        
       | jdlyga wrote:
       | It's very cool, but we've gotten too many of these big bold
       | announcements with no payoff. All it takes is a very limited demo
       | and we'd be much happier.
        
         | rishabhparikh wrote:
         | I'm guessing it would be far too expensive to make a free demo
        
       | sergiotapia wrote:
       | Will the GPU go the way of the soundcard, and we will all
       | purchase an "LPU"? Language Processing Unit for AIs to run fast?
       | 
       | I remember there was a brief window where some gamers bought a
       | Physx card for high fidelity physics in games. Ultimately they
       | rolled that tech in to the CPUs themselves right?
        
         | 0x1ceb00da wrote:
         | The graphics stuff in modern gpus is just a software layer on
         | top of a generic processing unit. The name is a misnomer.
        
           | jsheard wrote:
           | Partially true, a significant chunk of modern GPUs are really
           | just very wide general purpose processors, but they _do_
           | still have fixed-function silicon specifically for graphics
           | and probably will for the foreseeable future. Intel tried to
           | lean into doing as much as possible in general purpose
           | compute with their Larrabee GPU project but even that still
           | had fixed-function texture units... and the concept was
           | ultimately a failure which hasn 't been revisited.
        
       | k2xl wrote:
       | This is impressive, but why are they all looking still like a
       | video game? Could they have this render movie scenes with
       | realistic looking humans? I wonder if it is due to the training
       | set they use being mostly video games?
        
         | xnx wrote:
         | > This is impressive, but why are they all looking still like a
         | video game?
         | 
         | Many of the current AI models have their roots in games: Chess,
         | Go, etc.
        
         | nonameiguess wrote:
         | I highly doubt it. While there is no ceiling in principle on
         | how good rendering can get, even with perfect knowledge of the
         | physics of optics, the cost to compute that physics is too high
         | not to cut some corners. Nature gives you this for free. Every
         | photon is deflected at exactly the right angle and frequency
         | without anything needing to be computed. All you need is a
         | camera to record it. At least for now, this is why every deep
         | fake, digital de-aging, AI upscaling, grafting Carrie Fisher's
         | face onto a different actor, and CGI in general inevitably
         | occupies the uncanny valley.
        
       | corysama wrote:
       | For quite a while now David Holz of Midjourney has mused that
       | videogames will be AI generated. Like a theoretical PlayStation 7
       | with an AI processor replacing the GPU.
       | 
       | But, I didn't expect this much progress towards that quite this
       | fast...
        
         | kypro wrote:
         | Agreed. All I'd say is that these demos look quite limited in
         | their creativity and depth. Good video games are far more than
         | some graphics with a movable character and action states.
         | 
         | A good video game is far more the world building, the story,
         | the creativity or "uniqueness" of the experience, etc.
         | 
         | Currently this seems to generate fairly generic looking and
         | shallow experiences. Not hating though. It's early days
         | obviously.
        
         | doctorpangloss wrote:
         | If only it were that simple. Google spent $10b developing
         | Stadia, where was the big hit game from that?
         | 
         | These DeepMind guys play Factorio, they don't play Atari games
         | or shooters, so why aren't they thinking about that? Or maybe
         | they are, and because they know a lot about Factorio, they see
         | how hard it is to make?
         | 
         | There's a lot of "musing" as you say.
        
         | gcr wrote:
         | I've had the idea for a Backrooms-style hallucinatory
         | generative videogame for a while. Imagine being able to wander
         | through infinitely generated surreal indoor buildingscapes that
         | were rendered in close-to-realtime.
         | 
         | It would play to the medium's strengths -- any "glitches" the
         | player experiences could be seen as diagetic corruptions of
         | reality.
         | 
         | The moment we get parameterized NeRF models running in close-
         | to-realtime, I want to go for it.
        
       | devonsolomon wrote:
       | Yesterday I laughed with my brother about how harsh people on the
       | internet were about World Labs launch ("you can only walk three
       | steps, this demo sucks!"). I was thinking, "this was unthinkable
       | a few years ago, this is incredible".
       | 
       | People of the internet, you were right. Now, this is incredible.
        
         | bilbo0s wrote:
         | World Labs was kind of laughable. But at least you laughed.
         | 
         | Now?
         | 
         | I mean, I don't know man?
         | 
         | With this Genie 2 sneak peak, it all just makes World Labs'
         | efforts look sad. Did they really think better funded
         | independents and majors would all _not_ be interested in
         | generating 3D worlds?
         | 
         | This is a GUBA moment. If you're old enough to know, then you
         | know.
        
       | maxglute wrote:
       | 2000s graphics vibes.
        
       | YeGoblynQueenne wrote:
       | Hey, DeepMind folks, are you listening? Listen. We believe you:
       | you can conquer any virtual world you put your mind to.
       | Minecraft, Starcraft, Warcraft (?), Atari, anything. You can do
       | it! With the power of RL and Neural Nets. Well done.
       | 
       | What you haven't been able to do so far, after many years of
       | trying, is to go from the virtual, to the real. Go from Arcanoid
       | to a robot that can play, I dunno, squash, without dying. A robot
       | that can navigate an arbitrary physical location without
       | drowning, or falling off a cliff, or getting run over by a bus.
       | Or build any Lego kit from instructions. Where's all that?
       | 
       | You've conquered games. Bravo! Now where's the real world
       | autonomy?
        
         | sdenton4 wrote:
         | https://sites.research.google/palm-saycan
        
         | aspenmayer wrote:
         | Does Waymo count?
        
       | aithrowawaycomm wrote:
       | It is jaw-dropping and dismaying how for-profit AI companies use
       | long-standing terms like "world model" and "physics" when they
       | mean "video game model" and "video game physics." Or, as you can
       | plainly see, "models gravity" when they mean "models Red Dead
       | Redemption 2's gravity function, along with its cinematic
       | lighting effects and Rockstar's distinctively weighty
       | animations." Which is to say Google is not modeling gravity at
       | all.
       | 
       | I will add the totally inconsistent backgrounds in the
       | "prototyping" example suggests the AI is simply cribbing from
       | four different games with a flying avatar, which makes it kind of
       | useless unless you're prototyping cynical AI slop. And what are
       | we even doing here by calling this a "world model" if the details
       | of the world can change on a whim? In my world model I can
       | imagine a small dragon flying through my friend's living room
       | without needing to turn her electric lights into sconces and
       | fireplaces.
       | 
       | To state the obvious: if you train your model on thousands of
       | hours of video games, you're also gonna get a bunch of stuff like
       | "leaves are flat and don't bend" or "sometimes humans look like
       | plastic" or "sometimes dragons clip through the scenery," which
       | wouldn't fly in an actual world model. Just call it "video game
       | world model!" Google is _intentionally_ misusing a term which
       | (although mysterious) has real meaning in cognitive science.
       | 
       | I am sure Genie 2 took an awful lot of work and technical
       | expertise. But this advertisement isn't just unscientific, it's
       | an assault on language itself.
        
         | empath75 wrote:
         | > It is jaw-dropping and dismaying how for-profit AI companies
         | use long-standing terms like "world model" and "physics" when
         | they mean "video game model" and "video game physics." Or, as
         | you can plainly see, "models gravity" when they mean "models
         | Red Dead Redemption 2's gravity function, along with its
         | cinematic lighting effects and Rockstar's distinctively weighty
         | animations." Which is to say Google is not modeling gravity at
         | all.
         | 
         | That's because it's using video game data for training footage
         | because it's cheap and easy to generate. It would not be
         | simulating video game gravity if it was training on real world
         | video inputs.
        
         | ricardobeat wrote:
         | Remembering off-screen objects, generating spatially consistent
         | features, modeling physical interactions and lights,
         | understanding what "up the stairs" means, all seem to warrant
         | talking about a _world model_ , because that's exactly what's
         | required to do these things compared to simply hallucinating
         | video sequences.
        
         | brap wrote:
         | I agree, but
         | 
         | >if you train your model on thousands of hours of video games
         | 
         | What if you train the same model on thousands of hours of
         | sensor data from real, physical robots?
        
       | brink wrote:
       | What is actually of value here? There's no actual game, it's
       | incredibly expensive to compute, the behavior is erratic.. It's
       | cool because it's new - but that will quickly wear off, and once
       | that's gone, what's left? There's insane amounts of money being
       | spent on this, and for what?
        
         | adverbly wrote:
         | > What is actually of value here?
         | 
         | Noone knows yet. AI technology like this is closer to
         | scientific research than it is to product development. AI is
         | basically new magic, and people are in a "discovery" phase
         | where we are still trying to figure out what is possible.
         | Nothing of value was immediately created when they discovered
         | DNA. Productization came much later when it was combined with
         | other technologies to fit a particular use case.
        
         | Menu_Overview wrote:
         | Well, what's next? Beyond prototyping, I imagine this is an
         | early step towards more practical agents building their own
         | world model. Better problem solving.
         | 
         | Prompt: Here's a blueprint of my new house and a photo of my
         | existing furniture. Show me some interior design options.
        
         | ilaksh wrote:
         | It's an obviously amazing research development.
         | 
         | You just don't like AI.
         | 
         | It can be used for training agents, prototyping, video
         | generation, and is quite possibly a glimpse of a whole new type
         | of entertainment or a new way to create video games.
         | 
         | What's the point of the massive amount of money spent on video
         | games in general? Or all of the energy spent moving people back
         | and forth to an office? Or expensive meals at restaurants? Or
         | trillions in weaponry? Or television shows or movies?
        
           | nightski wrote:
           | Video games bring billions of real people joy. This is
           | sitting in some lab at Google inaccessible to anyone.
        
             | lassenordahl wrote:
             | Is your argument that them sharing research progress and
             | demos doesn't benefit anybody purely because we can't
             | immediately play around with them?
             | 
             | I feel like sharing early closed-source blog-posts is part
             | of the research process. I'm sure someone in this thread
             | has thought of a use case that the Google team missed.
             | Open/closed source arguments here feel premature IMO.
        
               | nightski wrote:
               | It's not part of the research process. Being part of the
               | search process would involve a publication and sharing
               | code/data/results/methods. It's not research unless it
               | can be verified by peers.
               | 
               | This is just a marketing fluff piece that does not
               | benefit anyone and is ego stroking at best.
        
               | lassenordahl wrote:
               | Hm yeah - I think you and I just have differing opinions
               | on the research process. I'd be a bit more vague, and
               | define the publication process as something similar to
               | you.
               | 
               | I still think things like this are important, and at
               | least give folks a bit of time to ideate on what will be
               | possible in a few years. Of course having the model or
               | architecture on hand would be nice, but I'm not holding
               | that against Google here.
        
         | ThouYS wrote:
         | same q here. what can I do with this "world model" that I can't
         | do with a game like minecraft or counter strike?
         | 
         | asked the same thing a while back, and the answers boiled down
         | to "somehow helps RL agents train". but how exactly? no clue
        
           | ogogmad wrote:
           | Making a computer game is very expensive and time-consuming.
           | This technology might allow a 12 year old to produce a fully
           | working AAA-quality game on their own for almost nothing. But
           | _sigh_ it 's an early demo that needs some improving.
           | 
           | [edited out some barbs I wrote because I find some comments
           | on this website REALLY annoying]
        
             | ThouYS wrote:
             | lol
        
         | awfulneutral wrote:
         | Well, in the future you could imagine that instead of
         | programming a game, you can just generate each individual frame
         | on the fly at 60fps. You could be playing 2D Mario and then the
         | game could have him morph into 3D and take off into space or
         | something. You could also generate any software or OS frontend
         | on the fly really, if you can make it so the AI can keep track
         | of your data and make it consistent enough to be usable. Does
         | this have positive or negative value? I don't know.
        
         | golol wrote:
         | Do you want household androids? Because this kind of stuff is
         | on the level of research a bery large step towards that. Think
         | as it as ab example where we can make a model understand a lot
         | of physical common sense stuff, which is the goal for robotics
         | right now.
        
           | suddenlybananas wrote:
           | This is really not the avenue for house-hold robots.
           | Interacting with the actual physical world is very different
           | from _creating_ a video game.
        
             | sangnoir wrote:
             | > Interacting with the actual physical world is very
             | different from creating a video game
             | 
             | The major difference being the former scales very poorly
             | for generating training data compared to the latter. Genie
             | 2 is not even a video game and has worse fidelity that
             | video games, the upside is it probably scales even better
             | than video games for generating training scenarios. If you
             | want androids in teal life, Genie 2 (or similar systems) is
             | how you bootstrap the agent AI. The training pipeline will
             | be: raw video -> Genie 2 -> game engine with rules ->
             | physical robot
        
               | youoy wrote:
               | > The training pipeline will be: raw video -> Genie 2 ->
               | game engine with rules -> physical robot
               | 
               | One of those arrows is not like the others
        
               | sangnoir wrote:
               | The final step is an oversimplification: purpose-built
               | simulator -> deconstructed robot on a lab workbench ->
               | controlled space -> "real world" with constraints -> real
               | world
               | 
               | Any model would have to succeed in one stage before it
               | can proceed to the next one.
        
               | adverbly wrote:
               | At the risk of sounding repetitive, one of those arrows
               | is not like the others.
        
               | sangnoir wrote:
               | ...and?
        
               | mosdl wrote:
               | How does turning an image into a game help with robots?
               | Robots don't need to guess what they can't see, they
               | would have sensors to tell them exactly what is there
               | (like a self driving car).
        
               | Chilko wrote:
               | I have no expertise in this area, but my assumption is
               | that this could help for a broader sort of object/world
               | permanence for robots - e.g. if something is no longer
               | visible to the robot's sensors (e.g. behind an obstacle,
               | smoke, etc) then it could use a model based on this type
               | of tech to maintain a short-term estimate of its
               | surroundings even when operating blind.
        
               | sangnoir wrote:
               | > Robots don't need to guess what they can't see, they
               | would have sensors to tell them exactly what is there
               | (like a self driving car).
               | 
               | Self driving vars have cameras as part of their sensor
               | suite, and have models to make sense of sensor data.
               | Video will help with perception and classification
               | (understanding the world) with no agency needed. Game-
               | playing will help with planning, execution, and
               | evaluation. Both functions are necessary, and those that
               | come after rely on earlier capabilities
        
           | JTyQZSnP3cQGa8B wrote:
           | I don't understand how that is relevant. I certainly would
           | not want household androids unless I'm completely disabled.
        
             | theshackleford wrote:
             | > I certainly would not want household androids unless I'm
             | completely disabled.
             | 
             | That's nice. I'm not completely disabled, but I am
             | disabled, and I very much would appreciate them, as my
             | capability to do things over the longer term is very much
             | not going to go in the direction of improving. As it is,
             | there are a lot of things I now rely on people for, that at
             | one time, I did not.
             | 
             | Whilst I recognise its probably not going to happen in a
             | time span that is useful to me, I do wish it could, so that
             | I could be less of a burden on those around me, and
             | maintain a relative level of independence.
        
         | mitthrowaway2 wrote:
         | I'm not an expert in this space but I can see the value. It
         | allows an endless loop of generating novel scenarios and
         | evaluating an AI agent's performance within that scenario (for
         | example, "go up the stairs"). A world with one minute of
         | coherence is about enough to evaluate whether the AI's actions
         | were in the right direction or not. When you then want to run
         | an agent on a real task in the real world, with video-input
         | data, you can run the same policy that it learned in dream-
         | world simulation. The real world has coherence, so the AI
         | agent's actions just need to string together well enough
         | minute-by-minute to work toward achieving a goal.
         | 
         | You could use real video games to do this but I guess there'd
         | be a risk of over-fitting; maybe it would learn too precisely
         | what a staircase looks like in Minecraft, but fail to
         | generalize that to the staircase in your home. If they can
         | simulate dream worlds (as well as, presumably, worlds from real
         | photos), then they can train their agents this way.
         | 
         | This would only be training high-level decision policies (ie,
         | WASD inputs). For something like a robot, lower level motor
         | control loops would still be needed to execute those commands.
         | 
         | Of course you could just do your training in the real world
         | directly, because it already has coherence and plenty of
         | environmental variety. But the learning process involves lots
         | of learning from failure, and that would probably be even more
         | expensive than this expensive simulator.
         | 
         | Despite the claims I don't think it does much to help with AI
         | safety. It can help avoid hilarious disasters of an AI-in-
         | training crashing a speedboat onto the riverbank, but I don't
         | think there's much here that helps with the deeper problems of
         | value-alignment. This also seems like an effective way to train
         | robo-killbots who perceive the world as a dreamlike first-
         | person shooter.
        
         | modeless wrote:
         | > It's cool because it's new - but that will quickly wear off,
         | and once that's gone, what's left?
         | 
         | To have this perspective you must believe that this will never
         | get better than it currently is, its limitations will never be
         | fixed, and it will never lead to any other applications. I
         | don't know how people can continue to look at these things with
         | such a lack of imagination given the pace of progress in the
         | field.
        
           | zamadatix wrote:
           | I think the problem is less to do with imagination and more
           | to do with being willing to fail a metric shit ton to find
           | out how, every once in a while, you didn't fail due to some
           | really important and surprising reason you wouldn't have
           | found nearly as quickly only ever going after what you were
           | already certain of.
        
         | xandrius wrote:
         | Nothing is of value until it is.
        
         | 3abiton wrote:
         | This is an incredible start. The potential is immense, yes
         | there arekinks, but in 10 years?
        
       | KaoruAoiShiho wrote:
       | This is where the GPU limits on China really hurts, Chinese
       | companies have been dropping great proof of concepts but because
       | they have been so compute bottlenecked they can't ever really
       | make something actually competitive or transformative.
        
       | tigerlily wrote:
       | I can.. see this being used to solve crime, even solving unsolved
       | mysteries and cold cases, among other alternative applications.
        
         | phtrivier wrote:
         | I don't understand your line of reasoning here. Are you
         | picturing a situation where you would take a photo of a crime
         | scene, and "jump" into a virtual model created from the photo,
         | to help generate intuitions about where to go look for clues ?
         | Kinda like the CSI "enhance quality" meme, but on steroids ?
         | 
         | That would be fun to use, but ultimately pointless. An AI model
         | will generate things that are _statistically plausible_ ;
         | solving crimes usually requires finding out the _truth_.
        
           | tigerlily wrote:
           | You nailed it, and yes I was being lamely ironic. I am
           | however terrified of a future where this type of thing
           | happens, _and people just go along with it_ instead of
           | stating the obvious facts the way you just did.
        
             | mosdl wrote:
             | Remake Blade Runner but with the twist that the snake scale
             | was never actually there.
        
       | rndmize wrote:
       | These clips feels like watching someone dream in real time.
       | Particularly the door ones, where the environment changes in wild
       | fashion, or the middle NPC one, where you see a character walk
       | into shadow and mostly disappear and a different character walks
       | out.
        
       | m3kw9 wrote:
       | " Generating unlimited diverse training environments for future
       | general agents" it may seem unlimited but up to a limited point
       | there will be a pattern. I don't buy that an AI can use a static
       | model and train itself with data generated from it
        
       | diimdeep wrote:
       | what for world models be equivalent of ChatGPT for LLM to really
       | blow up in utility?
        
         | singularity2001 wrote:
         | text to roblox maybe?
        
       | rationalfaith wrote:
       | As impressive as this might seem let's think about fundamentals.
       | 
       | Statistical models will output a compressed mishmash of what they
       | were trained on.
       | 
       | No matter how hard they try to cover that inherent basic reality,
       | it is still there.
       | 
       | Not to mention the upkeep of training on new "creative" material
       | on a regular basis and the never ending bugs due to non-
       | determinism. Aside from contrived cases for looking up and
       | synthesizing information (Search Engine 2.0).
       | 
       | The Tech Industry is over investing in this area exposing an
       | inherent bias towards output rather than solving actual problems
       | for humanity.
        
       | notsylver wrote:
       | I doubt it, but it would be interesting if they recorded Stadia
       | sessions and trained on that data (... somehow removing the
       | hud?), seems like it would be the easiest way for them to get the
       | data for this.
        
         | blixt wrote:
         | Seems somewhat likely to me. They probably even trained a model
         | to do both frame generation and upscaling to allow the hardware
         | to work more efficiently while being able to predict the future
         | based on user input (to reduce perceived latency). Seems like
         | Genie is just that but extrapolated much further.
        
       | worldmerge wrote:
       | This looks really cool. How can I use it? Like can I mix it with
       | Unity/Unreal?
        
       | cptroot wrote:
       | For all that this is lauded as a "prototyping tool", it's
       | frustrating to see Genie2 discarding entire portions of the
       | concept art demo. The original images drawn by Max Cant have
       | these beautiful alien creatures. Large ones floating, and small
       | ones being herded(?). Genie2 just ignores these beautiful details
       | entirely:
       | 
       | > That large alien? That's a tree. > That other large alien? It's
       | a bush. > That herd of small creatures? Fugghedaboutit > The
       | lightning storm? I can do one lightning pole. > Those towering
       | baobob/acacia hybrids? Actually only two stories tall.
       | 
       | It feels so insulting to the concept artist to show those two
       | videos off.
        
         | Kiro wrote:
         | That's an odd thing to complain about. Focusing on such a minor
         | issue feels overly critical at this stage, like anything less
         | than a pixel perfect 3D world representation of the source
         | image is unacceptable. Insulting? Come on... Max Cant works at
         | DeepMind so I'm sure he's fine.
        
         | wongarsu wrote:
         | Yeah, those two demos fell flat for me. The model performing
         | badly on inputs far outside the training data is fine, but
         | those two videos belong in the outtakes section or maybe a
         | limitations section, not next to text lauding the "out-of-
         | distribution generalization capabilities". The videos show the
         | opposite of what's claimed.
        
       | zja wrote:
       | I love the outtakes section in the bottom. It made me laugh but
       | it also feels more transparent than a lot of GenAI stuff that's
       | being announced.
        
       | tsunamifury wrote:
       | I'm guessing from the demo sophisticated indoor architectures do
       | not work yet.
        
       | CaptainFever wrote:
       | As a game developer, I'm impressed and thinking of ideas of what
       | to do with this kind of tech. The sailboat example was my
       | favourite.
       | 
       | Depending on how controllable the tech ends up being, I suppose.
       | Could be anywhere from a gimmick (which is still nice) to a game
       | engine replacement.
        
         | echelon wrote:
         | You could compress down a game to run on cheap hardware
         | acceleration. No more Unreal Engine with crazy requirements.
         | Once the hallucinations are fixed, you even get better
         | lighting.
         | 
         | This is the Unreal Engine killer. Give it five years.
        
           | noch wrote:
           | > This is the Unreal Engine killer. Give it five years.
           | 
           | We need to calm down with the clickbait-addled thinking that
           | _" this new thing kills this established powerful tested
           | useful thing."_ :-)
           | 
           | Game developers have been discussing these tools at length,
           | after all, they are the group of software developers who are
           | most motivated to improve their workflow. No other group of
           | software developers comes close to gamedevs' efficiency
           | requirements.
           | 
           | The 1 thing required for serious developers is control. As
           | such, game engines like Unreal and in-house engines won't
           | die.
           | 
           | Generative tools will instead open up a whole new, but quite
           | different, way of creating interactive media and games. Those
           | who need maximum control over every frame and every
           | millisecond and CPU cyle will still use engines. The rest who
           | don't will be productive with generative tools.
        
             | echelon wrote:
             | > gamedevs' efficiency requirements
             | 
             | These models won't need you to retopo meshes, write custom
             | shaders, or optimize Nanite or Lumen gameplay. They'll
             | generate the final frames, sans traditional graphics
             | processing pipeline.
             | 
             | > The 1 thing required for serious developers is control
             | 
             | Same with video and image models, and there's tremendous
             | work being done there as we speak.
             | 
             | These models will eventually be trained to learn all of
             | human posture and animation. And all other kinds of physics
             | as well. Just give it time.
             | 
             | > Those who need maximum control over every frame and every
             | millisecond and CPU cyle will still use engines.
             | 
             | Why do you think that's true? These techniques can already
             | mimic the physics of optics better than 80 years of doing
             | it with math. And they're doing anatomy, fluid dynamics,
             | and much more. With far better accuracy than game engines.
             | 
             | These will get faster and they will get controllable.
        
               | noch wrote:
               | > Why do you think that's true? > These will get faster
               | and they will get controllable.
               | 
               | Brother, you're preaching to the choir. I've been
               | shilling generative tools for gamedev far harder than you
               | are in your reply. :-)
               | 
               | But I'm just relaying to you what actual gamedevs working
               | and writing code right now need and for the foreseeable
               | future for which projects have been started or planned.
               | As Mike Acton says, "the problem is the problem".
               | 
               | > These techniques can already mimic the physics of
               | optics better than 80 years of doing it with math.
               | 
               | I encourage you to talk to actual gamedevs. When
               | designing a game, you aren't trying to mimic physics:
               | you're trying to make a simulation of physics that
               | _feels_ a certain way that you want it to play. This
               | applies to fluid dynamics, lighting /optics, everything.
               | 
               | For example, if I'm making a saling simulator, I need to
               | be able to script the water at points where it matters
               | for _gameplay_ and _game-feel_ , not simulate real
               | physics. I'm willing to break the rules of physics so
               | that my water doesn't act or look like real water but
               | feels good to play.
               | 
               | Movement may be motion captured, but animation is tweaked
               | so that the characters control and play in a way that the
               | game designer feels is correct for his game.
               | 
               | If you haven't designed a game, I encourage you to try to
               | make a simple space invaders clone over the weekend, then
               | think about the physics in it and try to make it _feel_
               | good or work in an interesting way. Even in something
               | that rudimentary, you 'll notice that your simulation is
               | something you test and tweak until you arrive at
               | parameters that you're happy with but that aren't real
               | physics.
        
       | Const-me wrote:
       | The scrolling doesn't work in my MS Edge so I opened the page in
       | Firefox. Firefox has "Open Video in New Tab" context menu
       | command. When viewed that way, the videos are not that
       | impressive. Horrible visual quality, Egyptian pyramids of random
       | shapes which cast round shadows, etc.
       | 
       | I have a feeling many AI researchers are trying to fix things
       | which are not broken.
       | 
       | Game engines are not broken, no reasonable amount of AI TFlops
       | going to approach a professional with UE5. DAWs are not broken,
       | no reasonable amount of AI TFlops going to approach a
       | professional with Steinberg Cubase and Apple Logic.
       | 
       | I wonder why so many AI researchers are trying to generate the
       | complete output with their models, as opposed to training model
       | to generate some intermediate representation and/or realtime
       | commands for industry-standard software?
        
       | rougka wrote:
       | Waiting for OpenAI to take this concept and make it into a
       | product
        
       | qwertox wrote:
       | This is... something different. It will be interesting to see how
       | we will integrate our current 3D tooling into that prompt-based
       | world. Sometimes a "place a button next to the the door" isn't
       | the same as selecting a button and then clicking on the place
       | next to the door, as it is today, or to sculpt a terrain with a
       | brush, all heavily 3D oriented operations, involving
       | transformation matrix calculations, while that promt-based world
       | is build through words.
       | 
       | The current tooling we have is just way too good to just discard
       | it, think of Maya, Blender and the like. How will these
       | interfaces, with the tools they already provide, enable sculpting
       | these word-based worlds?
       | 
       | I wonder if some kind of translator will be required, one which
       | precisely instructs "User holds a brush pointing 33deg upwards
       | and 56deg to the left of the world's x-axis with a brush
       | consisting of ... applied with a strength of ...", or how this
       | will be translated into embeddings or whatever that will be
       | required to communicate with that engine.
       | 
       | This is probably the most exciting time for the CG industry in
       | decades, and this means a lot, since we've been seeing incredible
       | progress in every area of traditional CG generation. Also a scary
       | time for those who learned the skills and will now occasionally
       | see some random persons doing incredible visuals with zero
       | knowledge of the entire CG pipeline.
        
       | freedryk wrote:
       | Forget video games. This is a huge step forward for AGI and
       | Robotics. There's a lot of evidence from Neurobiology that we
       | must be running something like this in our brains--things like
       | optical illusions, the editing out of our visual blind spot, the
       | relatively low bandwidth measured in neural signals from our
       | senses to our brain, hallucinations, our ability to visualize 3d
       | shapes, to dream. This is the start of adding all those abilities
       | to our machines. Low bandwidth telepresence rigs. Subatomic VR
       | environments synthesized from particle accelerator data. Glasses
       | that make the world 20% more pleasant to look at. Schizophrenic
       | automobiles. One day a power surge is going to fry your doorbell
       | camera and it'll start tripping balls.
        
         | pmayrgundter wrote:
         | I can't wait for Schizophrenic automobiles
        
           | sa-code wrote:
           | There is a fleshed out realisation of this in Cyberpunk 2077.
           | The cab AI is called Delamain
           | 
           | > Delamain was a non-sentient AI created by the company Alte
           | Weltordnung. His core was purchased by Delamain Corporation
           | of Night City to drive its fleet of taxicabs in response to a
           | dramatic increase in accidents caused by human drivers and
           | the financial losses from the resulting lawsuits. The AI
           | quickly returned Delamain Corp to profitability and assumed
           | other responsibilities, such as replacing the company's human
           | mechanics with automated repair drones and transforming the
           | business into the city's most prestigious and trusted
           | transporting service. However, Delamain Corp executives
           | underestimated their newest employee's potential for growth
           | and independence despite Alte Weltordnung's warnings, and
           | Delamain eventually bought out his owners and began operating
           | all aspects of the company by himself. Although Delamain
           | occupied a legal gray area in Night City due to being an AI,
           | his services were so reliable and sought after that Night
           | City's authorities were willing to turn a blind eye to his
           | status.
           | 
           | https://cyberpunk.fandom.com/wiki/Delamain_(AI)
        
             | dekhn wrote:
             | Probably my favorite side quest in the whole game.
        
         | dheera wrote:
         | > Glasses that make the world 20% more pleasant to look at.
         | 
         | When AR glasses get good enough to wear all day, I've really
         | been wanting to make a real-life ad blocker.
        
           | sorokod wrote:
           | hallucinogenics are available right now.
        
         | pelorat wrote:
         | This is akin to navigating a lucid dream, nothing more.
         | Conscious inputs to a visual stream synthesized from long term
         | memory.
        
           | nomel wrote:
           | > nothing more.
           | 
           | Consider the use where you seed the first frame from a real
           | world picture, with a prompt that gives it a goal. Not only
           | can you see what might happen, with different approaches, and
           | then pick one, but you can re-seed with real world baselines
           | periodically as you're actually executing that action to
           | correct for anything that changes. This is a great step for
           | real world agency.
           | 
           | As a person _without_ aphantasia, this is how I do anything
           | mechanical. I picture what will happen, try a few things
           | visually in my head, decide which to do, and then do it for
           | real. This  "lucid dream" that I call my imagination is all
           | based on long term memory that made my world view. I find it
           | incredibly valuable. I very much rely on it for my day job,
           | and try to exercise it as much as possible, before, say,
           | going to a whiteboard.
        
         | smusamashah wrote:
         | This looks like my dream worlds already but more colorful and a
         | bit more detailed. But the way it hallucinates and becomes
         | inconsistent going back and forth the same place is same as
         | dreams.
        
       | erulabs wrote:
       | It's interesting to me that we continue to see such pressure on
       | video and world generation, despite the fact that for years now
       | we've gotten games and movies that have beautiful worlds filled
       | with lousy, limited, poorly written stories. Star Wars movies
       | have looked phenomenal for a decade, full of bland stories we've
       | all heard a thousand times.
       | 
       | Are there any game developers working on infinite story games? I
       | don't care if it looks like Minecraft, I want a Minecraft that
       | tells intriguing stories with infinite quest generation.
       | Procedural infinite world gen recharged gaming, where is the
       | procedural infinite story generation?
       | 
       | Still, awesome demo. I imagine by the time my kids are in their
       | prime video game age (another 5 years or so) we will be in a new
       | golden age of interactive story telling.
       | 
       | Hey siri, tell me the epic of Gilgamesh over 40 hours of gameplay
       | set 50,000 years in the future where genetic engineering has
       | become trivial and Enkidu is a child's creation.
        
         | levkk wrote:
         | No Man's Sky is kind of what you're looking for, except you may
         | notice its quests (and worlds) become redundant quickly...I say
         | quickly, but that became the case for me after like 30 hours of
         | game play.
        
           | jsheard wrote:
           | That's the kicker, LLM driven stories are likely to fall into
           | the same trap that "infinite" procedurally generated games
           | usually do - technically having infinite content to explore
           | doesn't necessarily mean that content is infinitely engaging.
           | You will get bored when you start to notice the same patterns
           | coming up over and over again.
           | 
           | Procgen games mainly work when the procedural parts are just
           | a foundation for hand-crafted content to sit on, whether
           | that's crafted by the players (as in Minecraft) or the
           | developers (as in No Mans Sky after they updated it a hundred
           | times, or Rougelikes in general).
        
             | est31 wrote:
             | Yeah, generative AI can create cool looking pictures and
             | video but so far it hasn't managed to create infinitely
             | engaging stories. The models aren't there yet.
        
               | jsheard wrote:
               | I'd argue that the same principle applies to pictures,
               | there are many genres of AI image that are cool the first
               | time you see them, but after you've seen the exactly the
               | same idea rehashed dozens of times with no substantial
               | variety it starts wearing _really_ thin. AI imagery is
               | often recognizable as AI not just because of charactistic
               | flaws like garbled text but because it 's so hyper-
               | cliched.
        
               | lenocinor wrote:
               | I wonder if there's some threshold to be crossed where it
               | can be surprising for longer. I made a video game name
               | generator long ago that just picks a word (or short
               | phrase) from each of three columns. (The majority of the
               | words / phrases are from me, though many other people
               | have contributed.)
               | 
               | I haven't added any words or phrases to it in years, but
               | I still use it regularly and somehow it still surprises
               | me. Maybe the Spelunky-type approach can be surprising
               | for longer; that is, make a bunch of hand-curated bits
               | and pick from them randomly:
               | https://tinysubversions.com/spelunkyGen/
        
         | wongarsu wrote:
         | Dwarf Fortress is the state of the art in procedural
         | interactive story generation. Youtube channels like kruggsmash
         | show how great it is in that role if you actually read all the
         | text.
         | 
         | But that doesn't translate well to websites, trailers or demos.
         | It's easier to wow people with graphics.
        
         | foolfoolz wrote:
         | we have reliable infinite story generation in PvP multiplayer.
         | if the matchup is fair, every game can be different and
         | exciting. see chess
        
           | miltonlost wrote:
           | is PvP multiplayer considered a "story"? Is a football game a
           | "story"? I guess if all you consider for story is "things
           | happen", then a PvP match can be a story, but that's
           | stretching what I would consider "story" for a game. That is
           | the story of the match, but it's not in and of itself a plot
           | story.
        
             | programd wrote:
             | > is PvP multiplayer considered a "story"?
             | 
             | Consider EVE Online. The stories it generates are
             | Shakespearean and I defy anyone to argue that they have no
             | plot.
             | 
             | I would go further and predict that stories generated by
             | sufficiently advanced AI can explore much more interesting
             | story landscapes because they need not be bound by the
             | limitations of human experience. Consider what stories can
             | be generated by an AI which groks mathematics humans don't
             | yet fully understand?
        
             | wholinator2 wrote:
             | I agree, the parent would've been much better suited with
             | the example of PVE/PVP Roleplaying. People make up stories
             | all the time
        
         | miltonlost wrote:
         | > I want a Minecraft that tells intriguing stories with
         | infinite quest generation. Procedural infinite world gen
         | recharged gaming, where is the procedural infinite story
         | generation?
         | 
         | You're not gonna get new intriguing stories from AI which only
         | regurgitates what it's stolen. You're going to get a themeless
         | morass without intention.
         | 
         | I also find it amusing how your example to Siri uses one of the
         | oldest pieces of literature when you also tire of stories heard
         | a thousand times before.
        
           | 93po wrote:
           | if you do basic chatgpt prompts in late 2024 asking for
           | dynamic story telling, sure, you'll get what you said. it's
           | super dismissive to think that wont get better over time, or
           | that even with the tools today, that you can't get dynamic
           | and interesting stories out of it if you provide it with the
           | proper framework
        
             | krainboltgreene wrote:
             | > it's super dismissive to think that wont get better over
             | time
             | 
             | When did we start thinking this way? That things HAVE to
             | get better and in fact to think otherwise is very negative?
             | Is HN under a massive hot hand fallacy delusion?
        
               | miltonlost wrote:
               | Lots of people want that AI grift money and need to be
               | pollyanna true believers to convince others that models
               | that don't know truth are useful decision makers
        
           | visarga wrote:
           | Actually, all you need to do is to apply structured
           | randomness to get diversity from a LLM. For example in
           | TinyStories paper, a precursor of the Phi models:
           | 
           | > We collected a vocabulary consisting of about 1500 basic
           | words, which try to mimic the vocabulary of a typical 3-4
           | year-old child, separated into nouns, verbs, and adjectives.
           | In each generation, 3 words are chosen randomly (one verb,
           | one noun, and one adjective). The model is instructed to
           | generate a story that somehow combines these random words
           | into the story
           | 
           | You can do the same for generating worlds, just prepare good
           | ingredients and sample at random.
        
             | miltonlost wrote:
             | A story is not just words crammed together that sound
             | plausible. Is the AI going to know about pacing? About
             | character motivations? About interconnecting disparate
             | plots? That paper sounds like it has a scientist's
             | conception that a story is just words, and not complex
             | trade offs between the start of a story and its end and
             | middle, complexity and planning that won't come from any
             | sort of next-token generation.
             | 
             | These are "stories" in the most vacuous definition
             | possible, one that is just "and then this happened" like a
             | child's conception of plot
        
               | wewtyflakes wrote:
               | > Is the AI going to know about pacing? About character
               | motivations? About interconnecting disparate plots?
               | 
               | For LLMs like GPT-4, this all seems reasonable to account
               | for and assume the LLM is capable of processing, given
               | appropriate guidance/frameworks (of which may be just
               | classical programming).
        
         | digging wrote:
         | I think that's a bit of a trap. It's not impossible, but by
         | default we should expect it to make games _less fun_.
         | 
         | The better you make this infinite narrative generator, the more
         | complicated the world gets and the less compelling it gets to
         | actually interact with any one story.
         | 
         | Stories thrive by setting their own context. They should feel
         | important to the viewer. An open world with infinite stories
         | can't make every story feel meaningful to the player. So how
         | does it make _any_ story feel meaningful? I suppose the story
         | would have to be global, in which case, it crowds out the
         | potential for fractal infinite storylines - eventually, all or
         | at least most are going to have to tie back to the Big Bad Guy
         | in order to feel meaningful.
         | 
         | Local stories would just feel mostly pointless. In Minecraft,
         | all (overworld) locales are equally unimportant. Much like on
         | Earth, why should you care about the random place you appeared
         | in the world? The difference is that on Earth you tend to
         | develop community as you grow and builds connections to the
         | place you live, which can build loyalty. In addition, you only
         | have one shot, and you have real needs that you must fulfill or
         | you die forever. So you develop some otherwise arbitrary
         | loyalties in order to feel security in your needs.
         | 
         | In Minecraft there's zero pressure to develop loyalty to a
         | place except for your own real-life time. And when that becomes
         | a driving factor, why wouldn't you pick a game designed to
         | respect your time with a self-contained story? (Not that
         | infinite games like Minecraft are bad, but they aren't story-
         | driven for a good reason).
         | 
         | Now, a game like Dwarf Fortress is different because you build
         | the community, the infrastructure, the things that make you
         | care about a place. But it already has infinite story
         | generation without AI and I'm not sure AI would improve on that
         | model.
        
           | raincole wrote:
           | > I think that's a bit of a trap. It's not impossible, but by
           | default we should expect it to make games less fun.
           | 
           | I'd say AAA games have been on track of "less fun" for at
           | least half a decade. So this sounds like a natural next step.
        
             | digging wrote:
             | That's... a bad thing
        
           | yesco wrote:
           | I think it's all about how you spin it in, imagine:
           | 
           | - SimCity where you can read a newspaper about what's
           | happening in your city that actually reflects the events that
           | have occurred with interesting perspectives from the
           | residents.
           | 
           | - Dwarf Fortress, but carvings, artwork, demons, forbidden
           | beasts, etc get illustrations dynamically generated via
           | stable diffusion (in the style of crude sketches to imply a
           | dwarf made it perhaps?)
           | 
           | - Dwarf Fortress, again, but the elaborate in-game combat
           | comes with a "narrative summary" which conveys first hand
           | experiences of a unit in the combat log, which while
           | detailed, can be otherwise hard to follow.
           | 
           | - Any fantasy RPG, but with a minstrel companion who follows
           | you around and writes about what you do in a silly judgy way.
           | The core dialogue could be baked in by the developers but the
           | stories this minstrel writes could be dynamically generated
           | based on the players actions. Example: "He was a whimsical
           | one, who decided to take detour from his urgent hostage
           | rescue mission to hop up and down several hundred times in
           | the woods while trying on various hats he had collected. I
           | have no idea what goes through this mans mind..."
           | 
           | I'm not sure if there is a word for it, but the kernel here
           | is that everything is indirectly being dictated by the
           | players actions and the games existing systems. The LLM/AI
           | stuff isn't in charge of coming up with novel stories and
           | core content, they are in charge of making the game more
           | immersive by helping with the roleplay. I think this is the
           | area they can thrive the most.
        
           | shafoshaf wrote:
           | I actually find the same issue with prequels, especially for
           | the ones that really hit a chord (like the original Star
           | Wars). After knowing what is going to happen in those
           | stories, I just can't get invested in a character who I know
           | either makes it for sure, dies before getting to the "main"
           | story, or doesn't matter because they don't have any
           | connection to my canon of the plot arc. Same-universe spins-
           | offs fit this for me as well.
           | 
           | OTOH, lots of games come with DLC that add new stories with
           | the same mechanics. There might be some additions or changes,
           | but if you really like the mechanics, you can try it with a
           | different plot. Remnant II has sucked a ton of my time
           | because of that.
        
           | lxgr wrote:
           | > by default we should expect it to make games less fun.
           | 
           | How so?
           | 
           | I could totally see generative AI add a ton more variety to
           | crowds, random ambient sentences by NPCs (that are often
           | notoriously just a rotation of a handful of canned lines that
           | get repetitive soon), terrain etc., while still being guided
           | by a human-created high level narrative.
           | 
           | Imagine being able to _actually_ talk your way out of a
           | tricky situation in an RPG with a guard, rather than
           | selecting one out of a few canned dialogue options. In the
           | background, the LLM could still be prompted by  "there's
           | three routes this interaction can take; see which one is the
           | best fit for what the player says and then guide them to it
           | and call this function".
           | 
           | Worst case, you get a soulless, poorly written game with very
           | eloquent but ultimately uninteresting characters. Some games
           | are already that today - minus the realistic dialogue.
        
             | digging wrote:
             | > I could totally see generative AI add a ton more variety
             | to crowds, random ambient sentences by NPCs (that are often
             | notoriously just a rotation of a handful of canned lines
             | that get repetitive soon), terrain etc., while still being
             | guided by a human-created high level narrative.
             | 
             | Yes, sure, but that's not what I was responding to. AI
             | adding detail, not infinite quest lines, is possibly a good
             | use case.
             | 
             | > Worst case, you get a soulless, poorly written game with
             | very eloquent but ultimately uninteresting characters. Some
             | games are already that today - minus the realistic
             | dialogue.
             | 
             | Some games, yes... why do we want more of those? Anyway,
             | that's not the worst case. Worst case is incomprehensible
             | dialogue.
        
         | lifeisstillgood wrote:
         | Ok - you got me.
         | 
         | That's actually a use case I can understand- and what's more I
         | think that humans could generate training data (story
         | "prototypes"?) that somehow (?) expand the phase space of
         | story-types
         | 
         | Ironic though - we can build AI that could be creative but it's
         | humans that have to use science and logic because AI cannot?
        
         | ec109685 wrote:
         | Given we have engines that can render complex 3d worlds, can
         | maintain consistency far longer than a minute and simulate
         | physics accurately, why put all that burden on a GenAI world
         | generator like this?
         | 
         | It seems like it'd be more useful to have the model generate
         | the raw artifacts, world map, etc. and let the engine do the
         | actual rendering.
        
         | empath75 wrote:
         | It only looks like a video game because video game footage is
         | plentiful and cheap.
         | 
         | Now, imagine training it on thousands of hours of PoV drone
         | footage from Ukraine, and then using that to train autonomous
         | agents.
        
         | dmarcos wrote:
         | If stories (and AAA games in general) are bland in games is due
         | in large part to how expensive are to produce. Risk tolerance
         | is low.
         | 
         | If game assets are cheap to generate you'll see small teams or
         | even solo developers willing to take more creative risks
        
           | griomnib wrote:
           | Counter point: you'd see a corresponding exponential increase
           | in QA labor, and just like with the web, Steam will be
           | absolutely flooded with slop.
           | 
           | So I see the most likely outcome is a lot of dogshit and
           | Steam being forced to make draconian moves to protect the
           | integrity of the store.
        
             | alphabetting wrote:
             | Seems like there's already a lot of slop on steam and I
             | really doubt it will be difficult for quality content to be
             | highlighted even if the amount of games increases 1000x or
             | more
        
               | dmarcos wrote:
               | Yeah. Video and Youtube is an example. Filtering is not a
               | hard problem. Mega tons of bad stuff doesn't bother me.
        
               | miltonlost wrote:
               | Love that Youtube filter that spits out what I should
               | consume. Thank you corporate algorithm for telling me
               | what is a good thing to watch
        
               | dmarcos wrote:
               | You can subscribe to the channels you like and ignore the
               | rest.
        
             | jsheard wrote:
             | QAing a game built on a framework where fundamental
             | mechanics are non-deterministic and context-sensitive
             | sounds like a special kind of hell. Not to mention that
             | once you find a bug there's no way to fix it directly,
             | since the source code is an opaque blob of weights, so you
             | just have to RLHF it until it eventually behaves.
        
             | throwup238 wrote:
             | That has been the case since art was first industrialized
             | with the printing press. Most of them don't survive but a
             | significant fraction, if not the vast majority, of books
             | printed in the first century were trashy novels about King
             | Arthur and other fantasies (we know from publisher records
             | and bibliographies that they were very popular but don't
             | have detailed sales figures to compare against older
             | content like translated Greek classics). Only a small
             | fraction of content created since then has been preserved
             | because most of it was slop. The good stuff made it into
             | the Western canon over centuries but most of the stuff that
             | survives from that time period were family bibles and
             | archaic translations.
             | 
             | I don't see why AI will be any different. All that's
             | changed is ratio of potential creators to the general
             | population. Most of it is going to be slop regardless
             | because of economic incentives.
        
           | rafaelmn wrote:
           | Or you'll see a flood of shit that's impossible to filter.
        
             | dmarcos wrote:
             | Thanks to high bandwidth Internet, YouTube and smartphones
             | is easier than ever to produce and distribute high quality
             | video. So much good stuff coming from it.
             | 
             | Expect something similar if video games, interactive 3D is
             | cheap to produce.
             | 
             | Filtering is a much easier problem to solve and abundance a
             | preferable scenario.
        
         | wildermuthn wrote:
         | I love that almost all the responses to your question are, "No!
         | Bad idea!"
         | 
         | It's a great idea. We want more than an open-world. We want an
         | open-story.
         | 
         | Open-story games are going to be the next genre that will
         | dominate the gaming industry, once someone figures it out.
        
           | throwup238 wrote:
           | IMO this will be the differentiating feature for the next
           | generation of video game consoles (or the one after that, if
           | we're due for an imminent PS6/Xbox2 refresh). They can afford
           | to design their own custom TPU-style chip in partnership with
           | AMD/Nvidia and put enough memory on it to run the smaller
           | models. Games will ship with their own fine tuned models for
           | their game world, possibly multiple to handle conversation
           | and world building, inflating download sizes even more.
           | 
           | I think fully conversational games (voice to voice) with
           | dynamic story lines are only a decade or two away, pending a
           | minor breakthrough in model distillation techniques or
           | consumer inference hardware. Unlike self driving cars or AGI
           | the technology seems to be there, it's just so new no one has
           | tried it. It'll be really interesting to see how game
           | designers and writers will wrangle this technology without
           | compromising fun. They'll probably have to have a full
           | agentic pipeline with artificial play testers running 24/7
           | just to figure out the new "bugspace".
           | 
           | Can't wait to see what Nintendo does, but that's probably
           | going to take a decade.
        
           | spencerflem wrote:
           | From 2018 - https://www.erasmatazz.com/library/interactive-
           | storytelling/...
           | 
           | "There's no question in my mind that such software could
           | generate reasonably good murder mysteries, action thrillers,
           | or gothic romances. After all, even the authors of such works
           | will tell you that they are formulaic. If there's a formula
           | in there, a deep learning AI system will figure it out.
           | 
           | Therein lies the fatal flaw: the output will be formulaic.
           | Most important, the output won't have any artistic content at
           | all. You will NEVER see anything like literature coming out
           | of deep learning AI. You'll see plenty of potboilers pouring
           | forth, but you can't make art without an artist.
           | 
           | This stuff will be hailed as the next great revolution in
           | entertainment. We'll see lots of prizes awarded, fulsome
           | reviews, thick layers of praise heaped on, and nobody will
           | see any need to work on the real thing. That will stop us
           | dead in our tracks for a few decades."
        
         | hbn wrote:
         | Creativity is the one area where LLMs are completely
         | unimpressive. They only spit out derivative works of what
         | they've been trained on. I've never seen an LLM tell a good
         | joke, or an interesting story. It doesn't know how to subvert
         | expectations, come up with clever twists, etc. they just pump
         | out a refined average of what's typical.
        
         | ddtaylor wrote:
         | > It's interesting to me that we continue to see such pressure
         | on video and world generation, despite the fact that for years
         | now we've gotten games and movies that have beautiful worlds
         | 
         | Those beautiful worlds took a lot of money to make and the
         | studios are smart enough to realize consumers are
         | apathetic/stupid enough to accept much lower quality assets.
         | 
         | The top end of the AAA market will use this sparingly for the
         | junk you don't spend much time on - stuff the intern was doing
         | before.
         | 
         | The bottom of the market will use this for virtually everything
         | in their movie-to-game pipeline of throwaway games. These are
         | the games designed just to sucker parents and kids out of $60
         | every month. The games that don't even follow the story of the
         | movie and likely makes the story worse.
         | 
         | Strangely enough this is where the industry makes the vast
         | majority of it's day-to-day walking around cash.
        
       | Stevvo wrote:
       | You can see artifacts common in screen-space reflections in the
       | videos. I suspect they are not due to the model rendering
       | reflections based on screen-space information, but the model
       | being trained on games that render reflections in such a manner.
        
       | xavirodriguez wrote:
       | uoou
        
       | enbugger wrote:
       | Just like with the images, this will never be at good shape to
       | actually use it for real product as it discards details
       | completely leaving generic 3rd person controller animation.
       | 
       | What this should say to you instead is that stuff is really bad
       | on training data side if you start scraping billions of game
       | streams on internet - hard to imagine if there is a bigger chunk
       | of training data than this. Stagnation incoming.
        
       | ata_aman wrote:
       | We're about to have on-demand video content and games simply
       | based on prompts. My prediction is we'll have "prompt
       | marketplaces" where you can gen content based on 3rd party
       | prompts (or your own). 3-5 years.
        
       | smusamashah wrote:
       | Its so much like my lucid dreams where world sometimes stays
       | consistent for a while when I take its control. It's a strange
       | feeling seeing computer hallucinating a world just like I
       | hallucinate a world in dreams.
       | 
       | This also means that my dreams will keep looking like this
       | iteration of Genie 2, but computer will scale up and the worlds
       | won't look anything like my dreams anymore in next versions (its
       | already more colorful anyway).
       | 
       | I remember image generation use to look like dreams too in the
       | beginning. Now it doesn't look anything like that.
        
         | MrTrvp wrote:
         | Soon enough I imagine we'll have dream state to cohesive
         | reality models. Our desires and world events can be dissected
         | and analyzed by fine grain and hint authorities to your intent
         | before you know what they mean to you /s.
        
       | jckahn wrote:
       | At first I was excited to see a new model, but then I saw no
       | indication that the model is open source so I closed the page.
        
       | anthonymax wrote:
       | Wow, is this artificial intelligence creating this already?
        
       | bbstats wrote:
       | who is asking for this?
        
       | ddtaylor wrote:
       | This is very impressive technology and I am active in this space.
       | Very active. I make an (unreleased) Steam game that helps users
       | create their own games from not knowing how to program. I also
       | (unknowingly) co-authored tools that K12 and university are using
       | to teach game programming.
       | 
       | For the time being I will gloss over the fact this might just be
       | a consumer facing product for Google that ends up having nothing
       | to do with younger developers.
       | 
       | I'm torn between two ideas:
       | 
       | a. Show kids awesome stuff that motivates them to code
       | 
       | b. Show kids how to code something that might not be as awesome,
       | but they actually made it
       | 
       | On the one hand you want to show kids something cool and get them
       | motivated. What Google is doing here is certainly capable of
       | doing that.
       | 
       | On the other hand I want to show kids what they can actually do
       | and empower them. The days of making a game on your own in your
       | basement are mostly dead, but I don't think that means the idea
       | of being someone who can control a large amount of your vision -
       | both technical and non-technical - is important.
       | 
       | Not everyone is the same either. I have met kids that would never
       | spend a few hours to learn some Python with pygame to get a few
       | rectangles and sprites on screen that might get more interested
       | if they saw something this flashy. But experience also tells me
       | those kids are extremely less likely to get much value from a
       | tool like this beyond entertainment.
       | 
       | I have a 14 year old son myself and I struggle to understand how
       | he sees the world in this capacity sometimes. I don't understand
       | what he thinks is easy or hard and it warps his expectations
       | drastically. I come from a time period where you would grind for
       | hours at a terminal pecking in garbage from a magazine to see a
       | few seconds of crappy graphics. I don't think there should be
       | meaningless labor attached to programming for no reason, but I
       | also think that creating a "cost" to some degree may have helped
       | us. Given two programs to peck into the terminal, which one do
       | you peck? Very few of us had the patience (and lack of sanity) to
       | peck them all.
        
       | empiricus wrote:
       | Feed it the inputs from the real world and then it will recreate
       | in its mind a mirror of the world. Some say this is what we do
       | also, we live in a virtual reality created by our minds.
        
       | wg0 wrote:
       | Google is not coming slow... This is magic. As a casual gamer and
       | someone wanting to make my own game, this is black magic.
       | 
       | Lighting, gravity, character animation and what not internalized
       | by the model... from a single image...!
        
       | nopinsight wrote:
       | The real goal of this research is developing models that match or
       | exceed human understanding of the 3D world -- a key step toward
       | AGI.
       | 
       | A key reason why current Large Multimodal Models (LMMs) still
       | have inferior visual understanding compared to humans is their
       | lack of deep comprehension of the 3D world. Such understanding
       | requires movement, interaction, and feedback from the physical
       | environment. Models that incorporate these elements will likely
       | yield much more capable LMMs.
       | 
       | As a result, we can expect significant improvements in robotics
       | and self-driving cars in the near future.
       | 
       | Simulations + Limited robot data from labs + Algorithms
       | advancement --> Better spatial intelligence
       | 
       | which will lead to a positive feedback loop:
       | 
       | Better spatial intelligence --> Better robots --> More robot
       | deployment --> Better spatial intelligence --> ...
        
       | andelink wrote:
       | Is this type of on-the-fly graphics generation more expensive
       | than purely text based LLMs? What is the inference energy impact
       | of these types of models?
        
       | dartos wrote:
       | > Genie 2 can generate consistent worlds for up to a minute, with
       | the majority of examples shown lasting 10-20s.
        
       | lacoolj wrote:
       | OpenAI launches Sora (quite a while ago now), Google needs to
       | fire back with something else groundbreaking.
       | 
       | I love the advancement of the tech but this still looks very
       | young and I'd be curious what the underlying output code looks
       | like (how well it's formatted, documented, organized, optimized,
       | etc.)
       | 
       | Also, this seems oddly related to the recent post from WorldLabs
       | https://www.worldlabs.ai/blog. Wonder if this was timed to
       | compete directly and overtake the related news cycle.
        
         | whiplash451 wrote:
         | I also find the timing vs World Labs demo disturbing.
        
           | alphabetting wrote:
           | What's disturbing? In all likelihood the close timing was
           | world labs rushing to get their demo out the door knowing
           | this was coming because they wouldn't get nearly the hype
           | they did if this came before.
        
       | swyx wrote:
       | i was wondering when genie 1 was and... it didtn seem to get much
       | love? https://news.ycombinator.com/item?id=39509937 @dang was
       | there a main thread here?
        
       | brap wrote:
       | While this is very (very) cool, what is the upside to having a
       | model render everything at runtime, vs. having it render the 3D
       | assets during development (or even JIT), and then rendering it as
       | just another game? I can think of many reasons why the latter is
       | preferable.
        
         | gavmor wrote:
         | To me, keeping a world state in sync with rapidly changing
         | external state is the most compelling application. Something
         | like dockercraft: https://github.com/docker/dockercraft
        
       | aussieguy1234 wrote:
       | If it can play video games that simulate the laws of physics,
       | could it control a robot in the physical world?
        
       | dangoodmanUT wrote:
       | this page loads like shit
        
       ___________________________________________________________________
       (page generated 2024-12-04 23:00 UTC)