[HN Gopher] DrEureka: Language Model Guided SIM-to-Real Transfer
___________________________________________________________________
DrEureka: Language Model Guided SIM-to-Real Transfer
Author : jasondavies
Score : 41 points
Date : 2024-05-03 16:48 UTC (6 hours ago)
(HTM) web link (eureka-research.github.io)
(TXT) w3m dump (eureka-research.github.io)
| throwup238 wrote:
| This research is all beyond me so maybe someone can explain: How
| does this compare to the state of the art in using simulators to
| train physical robots? Does using transformers help in any way or
| can this just as easily be done with other architectures?
|
| To the uninitiated this looks cool as all heck and yet another
| step towards the Star Trek future where we do everything in a
| simulator first and it always kinda just works in the real world
| (plot requirements notwithstanding).
|
| Although I can also hear the distant sounds of a hundred military
| R&D labs booting up Metalhead [1] simulators.
|
| Edit: Looks like the previous SOTA was still a manual process
| where the user had to come up with a reward function that
| actually rewards the actions they wanted to the algorithm to
| learn. This research uses language models to do that tedious step
| instead.
|
| [1] https://en.wikipedia.org/wiki/Metalhead_(Black_Mirror)
| claytonwramsey wrote:
| Your edit is roughly correct. It's a bit odd that they're
| comparing a single human-made policy with the very best of the
| DrEureka outputs; in practice, I would expect to make multiple
| different reward functions and then do validation on the
| functions and hyperparameters based on the resulting trained
| models. However, their comparison isn't necessarily wrong,
| since it seems (I could be wrong) that they used the reward
| function from prior papers in the field of policy learning.
|
| If you're interested in methods for actually learning policies
| for these sorts of dynamic motions, note that this paper is
| simply applying proximal-policy optimization. They're pulling
| in the training and implementation methods from Margolis's [1]
| and Shan's [2] work.
|
| So, in sum, the contribution of this paper is exclusively the
| method for generating reward functions (which is still pretty
| cool!!!!!), not all the learning-based policy stuff.
|
| [1]:
| https://web.archive.org/web/20220703005502id_/http://www.rob...
| [2]: https://arxiv.org/pdf/2309.06440
| refulgentis wrote:
| Every single second of every example has a handler holding a
| leash - and not just holding it, holding it without any slack.
|
| Blindingly obvious interference from Ouija board effect.
|
| I don't mean to denigrate the work, I believe the researchers are
| honest and I hope there's demoes outside the published one. Just,
| at best, an obvious unforced error that leaves open a big
| question.
|
| EDIT: Replier below shared a gif with failures, tl;dr this looks
| like two different experiment protocols, one for success, one for
| failure. https://imgur.com/a/DmepBVU
| Imnimo wrote:
| This sample on Twitter shows how other controllers fail:
|
| https://twitter.com/JasonMa2020/status/1786433841613390023
|
| I agree it's hard to tell whether the controller learned with
| DrEureka would be sufficient without the leash, but I'm at
| least convinced that the leash is not sufficient to hold a
| robot on the ball without a decently competent controller.
| refulgentis wrote:
| Oh my, that looks quite damning. https://imgur.com/a/DmepBVU
|
| The good case leash is held taught at half the distance of
| failures, at a parallel angle to the bot and orthogonal to
| failures.
|
| The failures all held with slack, on a leash held at 2x the
| distance of successes, at an angle orthogonal to the bot.
|
| (do correct me, we're seeing opposite things, and those are
| very small and I last took physics...16 years ago :< )
| Imnimo wrote:
| Hmm, I do see what you mean.
| FrustratedMonky wrote:
| Kind of like how a human visualizes before a sport.?
|
| Like visualizing free throws in basketball, makes you measurably
| better, without actually doing free throws for real?
| canadiantim wrote:
| So the robot dog that's going to kill me in the near future will
| atleast be adorably balancing on a big rubber ball
| codetrotter wrote:
| Death by giant rubber ball.
|
| In a scene reminiscent of the giant boulder rolling after
| Indiana Jones, a robot dog is balancing on top of an enormous
| rubber ball down the streets of some big city, flattening
| everything in its way.
|
| Cronch, cronch, cronch, go the cars.
|
| Squish, squish, squish, go the people.
| magicalhippo wrote:
| > Death by giant rubber ball.
|
| As long as it's not white...
|
| https://www.youtube.com/watch?v=I6Ffr1U7KMY
___________________________________________________________________
(page generated 2024-05-03 23:00 UTC)