[HN Gopher] "Self-reflecting" AI agents explore like animals
       ___________________________________________________________________
        
       "Self-reflecting" AI agents explore like animals
        
       Author : chdoyle
       Score  : 58 points
       Date   : 2023-07-06 21:00 UTC (1 hours ago)
        
 (HTM) web link (hai.stanford.edu)
 (TXT) w3m dump (hai.stanford.edu)
        
       | ftxbro wrote:
       | So from this hacker news title I definitely thought it was saying
       | that when you give some AI agents a self reflection like maybe by
       | putting an internal monologue loop then they unlock an emergent
       | animal-like exploration behavior.
       | 
       | But this is not what happened. Instead, some guys told AI agents
       | to explore in the way that the guys think that animals explore.
       | "Stanford researchers invented the "curious replay" training
       | method based on studying mice to help AI agents"
        
         | onetokeoverthe wrote:
         | [dead]
        
       | piyh wrote:
       | Direct arxiv link: https://arxiv.org/pdf/2306.15934.pdf
        
       | FrustratedMonky wrote:
       | Exactly. We keep leaving out 'motivation' on these models. Since
       | they are reacting to prompts. But put them on a loop with goals
       | and see what happens.
       | 
       | And, things like GPT are not 'embodied', since they don't live in
       | the 'world' they can't associate language with physical reality.
       | Put them in a simulated environment like a game, and it looks a
       | lot more 'conscious'.
        
       | jjtheblunt wrote:
       | it's kind of interesting how increasingly frequently
       | "stanford.edu" is finding its way into HN submissions, and did
       | the increasing frequency start with the GPT-4 enthusiasm?
       | 
       | is that coincidence?
        
       | xianshou wrote:
       | The result is mildly interesting - improvement on an isolated
       | task but none on the full benchmark - but what would be much more
       | compelling is curiosity-driven replay in an LLM context combined
       | with chain- or tree-of-thought techniques. This would be the
       | machine analogy to noticing your confusion, a sort of "what do I
       | need to know" or "what am I overlooking"? Anecdotally, language
       | models perform better when you prompt them to ask their own
       | questions in the process of answering yours, so I would expect
       | curiosity to have a meaningful impact.
        
       ___________________________________________________________________
       (page generated 2023-07-06 23:00 UTC)