[HN Gopher] Q-Transformer: Scalable Reinforcement Learning via A...
___________________________________________________________________
Q-Transformer: Scalable Reinforcement Learning via Autoregressive
Q-Functions
Author : GaggiX
Score : 83 points
Date : 2023-09-20 04:00 UTC (19 hours ago)
(HTM) web link (q-transformer.github.io)
(TXT) w3m dump (q-transformer.github.io)
| greesil wrote:
| This leads to one shot learning for robots?
| tysam_and wrote:
| This is what RWKV (https://github.com/BlinkDL/RWKV-LM) was made
| for, and what it will be good at.
|
| Wow. Pretty darn cool! <3 :'))))
| evolvingstuff wrote:
| Why would RWKV have a particular advantage in this context? (I
| may be missing some key intuitions)
| tysam_and wrote:
| RNN inference on a smaller edge controller (all history is
| cached in a single state point for each layer, so much less
| memory and computation requirements IIRC) :')
|
| Very mobile-device and battery-powered systems friendly.
| :')))) ;'DDDD
| algo_trader wrote:
| I havent yet fully grokked RWKV..
|
| Just how much compute/memory are we saving here?
|
| My understanding is that a 1BN transformer is about 2BN
| flops/inference, so about 1TFLOP for a 500 sequence of
| inferences (and also about several GB of memory)
|
| What would be the equivalent RWKV (let ignore the
| inevitable loss penalty which could be significant..)
| tysam_and wrote:
| It's an RNN, there is no N^2 component over time.
|
| It only requires the previous state.
|
| (there's a discord, you should join it with further
| questions! I unfortunately am not as informed as I should
| be on this one, other than the fact that it is _very_
| mobile friendly). The performance diff is slight but not
| too bad really, all things considered. And I think it
| comes out on top for raw efficiency per parameter/flop,
| IIRC.
|
| An interesting concept, for sure! :'DDDD :'))))
| radarsat1 wrote:
| Cool to see an approach to using transformers that sticks closer
| to traditional RL than the decision transformer. The action
| dimension trick here is clever. Curious to see where this can be
| taken, game playing, multiagent, etc.
| ashupadhi01 wrote:
| I want to know how you build up intuition and knowledge in the
| space of RL.
| The_Amp_Walrus wrote:
| I enjoyed this course
| https://youtu.be/2pWv7GOvuf0?si=DKkhPXQmVjA3ySIn
| gtoubassi wrote:
| +1 you beat me to the punch! I think its helpful to start
| with simple RL and ignore the "deep" part to get the
| basics. The first several lectures in this series do that
| well. It helped me build a simple "cat and mouse" RL
| simulation
| https://github.com/gtoubassi/SimpleReinforcementLearning
| and ultimately a reproduction of the DQN atari game playing
| agent: https://github.com/gtoubassi/dqn-atari.
| PartiallyTyped wrote:
| Whenever somebody recommends a course, you can be pretty
| certain that it's that one :)
| radarsat1 wrote:
| Honestly the best way is starting with implementing a Q table
| for some small grid-world problem. You get a lot of knowledge
| from doing that. Then a bit more work on understanding
| various approaches, e.g policy learning, world models. Then,
| reading text books, blogs tutorials, etc.
|
| But "getting" the idea of Q learning for a small state space
| is fundamental and surprisingly approachable.
| wegfawefgawefg wrote:
| https://learndrl.com
|
| I wrote this extensive tutorial for teaching deep
| reinforcement learning, with a focus on getting intuition
| from code. you will find RL theory is heavy on math despite
| needing math for very little other than abstractly
| representing some machine goal and intuition, of which code
| serves a native programmer already very well.
|
| i spent years failing to learn machine learning and RL
| until i just started reading source code. books of
| integrals i never ended up needing.
|
| dont be turned away by the joking nature of my tutorials.
| there is a real depth in there
___________________________________________________________________
(page generated 2023-09-20 23:02 UTC)