[HN Gopher] Q-Transformer: Scalable Reinforcement Learning via A...
       ___________________________________________________________________
        
       Q-Transformer: Scalable Reinforcement Learning via Autoregressive
       Q-Functions
        
       Author : GaggiX
       Score  : 83 points
       Date   : 2023-09-20 04:00 UTC (19 hours ago)
        
 (HTM) web link (q-transformer.github.io)
 (TXT) w3m dump (q-transformer.github.io)
        
       | greesil wrote:
       | This leads to one shot learning for robots?
        
       | tysam_and wrote:
       | This is what RWKV (https://github.com/BlinkDL/RWKV-LM) was made
       | for, and what it will be good at.
       | 
       | Wow. Pretty darn cool! <3 :'))))
        
         | evolvingstuff wrote:
         | Why would RWKV have a particular advantage in this context? (I
         | may be missing some key intuitions)
        
           | tysam_and wrote:
           | RNN inference on a smaller edge controller (all history is
           | cached in a single state point for each layer, so much less
           | memory and computation requirements IIRC) :')
           | 
           | Very mobile-device and battery-powered systems friendly.
           | :')))) ;'DDDD
        
             | algo_trader wrote:
             | I havent yet fully grokked RWKV..
             | 
             | Just how much compute/memory are we saving here?
             | 
             | My understanding is that a 1BN transformer is about 2BN
             | flops/inference, so about 1TFLOP for a 500 sequence of
             | inferences (and also about several GB of memory)
             | 
             | What would be the equivalent RWKV (let ignore the
             | inevitable loss penalty which could be significant..)
        
               | tysam_and wrote:
               | It's an RNN, there is no N^2 component over time.
               | 
               | It only requires the previous state.
               | 
               | (there's a discord, you should join it with further
               | questions! I unfortunately am not as informed as I should
               | be on this one, other than the fact that it is _very_
               | mobile friendly). The performance diff is slight but not
               | too bad really, all things considered. And I think it
               | comes out on top for raw efficiency per parameter/flop,
               | IIRC.
               | 
               | An interesting concept, for sure! :'DDDD :'))))
        
       | radarsat1 wrote:
       | Cool to see an approach to using transformers that sticks closer
       | to traditional RL than the decision transformer. The action
       | dimension trick here is clever. Curious to see where this can be
       | taken, game playing, multiagent, etc.
        
         | ashupadhi01 wrote:
         | I want to know how you build up intuition and knowledge in the
         | space of RL.
        
           | The_Amp_Walrus wrote:
           | I enjoyed this course
           | https://youtu.be/2pWv7GOvuf0?si=DKkhPXQmVjA3ySIn
        
             | gtoubassi wrote:
             | +1 you beat me to the punch! I think its helpful to start
             | with simple RL and ignore the "deep" part to get the
             | basics. The first several lectures in this series do that
             | well. It helped me build a simple "cat and mouse" RL
             | simulation
             | https://github.com/gtoubassi/SimpleReinforcementLearning
             | and ultimately a reproduction of the DQN atari game playing
             | agent: https://github.com/gtoubassi/dqn-atari.
        
             | PartiallyTyped wrote:
             | Whenever somebody recommends a course, you can be pretty
             | certain that it's that one :)
        
           | radarsat1 wrote:
           | Honestly the best way is starting with implementing a Q table
           | for some small grid-world problem. You get a lot of knowledge
           | from doing that. Then a bit more work on understanding
           | various approaches, e.g policy learning, world models. Then,
           | reading text books, blogs tutorials, etc.
           | 
           | But "getting" the idea of Q learning for a small state space
           | is fundamental and surprisingly approachable.
        
             | wegfawefgawefg wrote:
             | https://learndrl.com
             | 
             | I wrote this extensive tutorial for teaching deep
             | reinforcement learning, with a focus on getting intuition
             | from code. you will find RL theory is heavy on math despite
             | needing math for very little other than abstractly
             | representing some machine goal and intuition, of which code
             | serves a native programmer already very well.
             | 
             | i spent years failing to learn machine learning and RL
             | until i just started reading source code. books of
             | integrals i never ended up needing.
             | 
             | dont be turned away by the joking nature of my tutorials.
             | there is a real depth in there
        
       ___________________________________________________________________
       (page generated 2023-09-20 23:02 UTC)