[HN Gopher] Robot Dexterity Still Seems Hard
___________________________________________________________________
Robot Dexterity Still Seems Hard
Author : mhb
Score : 30 points
Date : 2025-04-26 17:53 UTC (5 hours ago)
(HTM) web link (www.construction-physics.com)
(TXT) w3m dump (www.construction-physics.com)
| levocardia wrote:
| Surprised that there isn't any explicit discussion of _why_
| dexterity is so hard, beyond sensory perception. One of the root
| causes (IMHO the biggest one) is that modeling contact, ie the
| static and dynamic friction between two objects, is extremely
| complicated. There are various modeling strategies but their
| results are highly sensitive to various tuning parameters which
| makes it very hard to learn in simulation. From what I remember,
| the OpenAI Rubik 's Cube solver basically learned across a giant
| set of worlds of many different possible tuning parameters for
| the contact models and was able to generalize okay to the real
| world, in various situations.
|
| It seems most likely that this sort of boring domain
| randomization will be what works, or works well enough, for
| solving contact in this generation of robotics, but it would be
| much more exciting if someone figures out a better way to learn
| contact models (or a latent representation of them) in real time.
| sho_hn wrote:
| This rings super plausible to me. I dabbled a bit in hobby
| electronics making DIY walkers, and the more time you spend on
| junior stuff like that (trying to model a good response to
| servo load feedback that works in every situation, etc.) the
| more it dawns on you that what humans and other animals do with
| the sensor feedback they get from their limbs is so rich in
| "magic" and intelligence.
|
| Figuring out physical interaction with the environment and
| traversal is truly one of the most stunning early achievements
| of life.
| hahaxdxd123 wrote:
| Here's an interesting blog post on the limitations of domain
| randomization for OpenAI's results:
| https://www.alexirpan.com/2019/10/29/openai-rubiks.html
|
| Basically the solve rate was much lower without the use of a
| Bluetooth sensor, and they did a bunch of other things that
| made the result less impressive. Still a long way to go here.
| rapjr9 wrote:
| Dexterity is also hard because, at least in humans, it relies
| on knowing something of the nature of an object _before_
| manipulating it. Is it light or heavy? Soft or rigid? Is it a
| bag of popcorn, popcorn kernels, a bag of powder, or a pillow?
| How tightly is it packed in the bag? Fabric or cardboard?
| Attached to other objects or not? Is the USB plug the right
| type and oriented correctly? (Even humans have trouble with
| this one.) Does it have a slippery surface or a grippy surface?
| To be immediately successful in manipulation, pre-knowledge
| based on sensing and identification is usually required.
| Possibly it would be ok if a robot took several tries to figure
| this out based on some general principles, but it will seem
| clumsy and be slower. It seems there is an ontology problem
| here, which requires understanding a lot about the world in
| order to be able to successfully manipulate it.
|
| More generally, continuous learning in real-time is something
| current models don't do well. Retraining an entire LLM every
| time something new is encountered is not scalable. Temporary
| learning does not easily transfer to long term knowledge.
| Continuous learning still seems in its infancy.
| pixl97 wrote:
| Also when we don't know the properties of an object we are
| about to manipulate we'll approach it cautiously and learn it
| before we apply too much force. This tends to happen
| transparently and quickly for adults, but for infants you can
| watch it play out more slowly.
| beau_g wrote:
| On a freestanding humanoid robot, you have an inverse kinematic
| chain running all the way from the touch point to the ground,
| with many actuators in between, each of which to some degree
| squares the complexity of the problem. The parent article
| mentions a Fanuc or Kuka bot, which lets say is 6 axis - they
| are incredibly stiff/strong, in many cases many orders of
| magnitude stronger than they really need to be for the job they
| are tasked with, they do not move, modeling things like
| clashing with the environment/itself is much simpler because
| they are placed in 100% controlled environments - remove all of
| those qualifiers (weak robot because it needs to be light,
| dynamic environment, and count the DOF between the robots
| finger and it's ankles) and it gives a clearer picture than the
| article offers of why all this stuff is difficult. Can't take
| much of a divide and conquer approach like you can in other
| domains.
| iandanforth wrote:
| For some perspective we have not yet scaled robot training. The
| amount of data that Pi is using to train their impressively
| capable robots is in the range of thousands of hours of data. In
| contrast language models are trained over trillions of tokens
| comprising the entirety of human knowledge. So if you're saying
| things like "this still seems hard" just remember we have yet to
| hit this with the data hammer. Simulation is proving a great way
| to augment / bootstrap robot dexterity but it still pales in
| comparison to data in the real world. So, as the author points
| out, we may get capability scaling like Waymo where one company
| painstakingly collects real data over a decade, but we may also
| see the rapid progress in simulators and simulator _speed_
| overtake for practical household / industrial tasks. My bet is
| on the latter.
| hahaxdxd123 wrote:
| Correct me if I'm wrong, but I haven't seen any simulator
| progress in years (e.g. MuJoCo hasn't changed in 5 years but is
| still SOTA accuracy)
| sashank_1509 wrote:
| Recently I had a chance to listen to a set of talks powering
| Waymo Technology. I think the average academic roboticist will be
| shocked by the complete lack of end to end deep learning models
| or even large models powering Waymo. It's interesting to me that
| the only working self driving car on the market right now,
| basically has painstakingly listed every possible road obstacle,
| has coded every possible driving logic to it, and manually
| addressed every edge case. maybe Tesla's end to end approach will
| work, and that will be the way moving forward, but the real world
| seems to provide an almost limitless amount of edge cases that
| neural networks don't seem great at handling. In fact the winning
| approach to humanoids, if Waymo is proven to be the right
| approach might be listing every possible item a humanoid can see
| an environment, detecting them and then planning for them.
| sho_hn wrote:
| I suppose the (crummy) analog is that a human's "models" are
| equally not entirely general; we have evolved a particular
| architecture that is baked into our hardware and perpetuated
| via our DNA.
|
| It's fuzzy and plastic and complex, but the brain has
| functional areas, there is intelligence more local to specific
| sensors, pipelines where fusion happens, governors and
| supervisors, specific numeric limits to certain tasks, etc.
|
| This is a bit akin to your "listing every possible item", in a
| way, in the sense that there are definitely finite structures
| tuned toward the application of being human.
|
| This interplay via our supposed "AGI" and what is "cached" in
| our also not static but evolving hardware is really one of the
| most fascinating aspects of biology.
| Zigurd wrote:
| Not every edge case, but enough that the vehicle can correctly
| determine it doesn't know how to proceed and must ask a human
| to choose from among a menu of choices. This is how Waymo
| described how supervision works. Nobody actually drives the
| vehicle remotely. They just make a decision the on-board
| intelligence has decided it can't make.
|
| One good bet based on Waymo's decision to expand is that the
| amount of supervision each robotaxi needs keeps going down, so
| supervision is not tightly coupled to fleet size.
| huevosabio wrote:
| I think it would be the other way around, academic roboticists
| are very well aware of how damn hard the physical world is.
| boulos wrote:
| (Disclosure: I work for Waymo)
|
| While there is plenty of classical robotics code in our
| planner, I wouldn't want people to assume that we don't use
| neural networks for planning.
|
| Just because we don't deploy end-to-end models (e.g., sensors
| to controls), but have separate perception and planning
| components doesn't mean there isn't ML in each part. Having the
| components separate means we can train and update each
| individually, test them individually, inject overrides as
| needed, and so on. On the flip side, it's true that because
| it's not learned end-to-end today that there might exist a
| vastly simpler or higher quality system.
|
| So we do a lot of research in this area, like EMMA
| (https://waymo.com/research/emma/) but don't assume that our
| planning isn't heavily ML based. A lot of our progress in the
| last couple of years has been driven by increasing the amount
| of ML used for planning, especially for behavior prediction
| (e.g., https://waymo.com/research/wayformer/)
| DGAP wrote:
| Do these challenges apply to surgical robots? There's a lot of
| interest in essentially creating automated Davincis, for which
| there is a great deal of training data and for which the robots
| are prepositioned.
|
| Maybe all this setup means that completing surgical tasks doesn't
| counter as dexterity.
| Zigurd wrote:
| There are half a dozen successful commercially available surgical
| robot products out there. None try to mimic a surgeon's hands.
|
| Even if biomimicry turns out to be a useful strategy in designing
| general purpose robots, I would bet against humans being the
| right shape to mimic. And that's assuming general purpose robots
| will ever be more useful than robots designed or configured for
| specific tasks.
| michaelt wrote:
| The reason people keep working on human-like hands for robots
| is: The world is absolutely full of things adapted to be
| operated with human hands.
|
| Handling heavy boxes? Baking a cake? Operating a circular saw?
| Assembling a PC? Performing surgery? Loading a ream of paper
| into a printer? Playing a violin? Opening a door? You can do it
| all with two five-fingered hands.
| ndileas wrote:
| I think it's important to note that many individual humans
| are adapted to only a few of these tasks. A construction
| worker's hands and a magician have very different muscles,
| skin thickness, grip strength, dexterity, etc. even though
| they can both wash a dish and open a door.
___________________________________________________________________
(page generated 2025-04-26 23:00 UTC)