hngopher.com

       [HN Gopher] Robot Dexterity Still Seems Hard
       ___________________________________________________________________
        
       Robot Dexterity Still Seems Hard
        
       Author : mhb
       Score  : 30 points
       Date   : 2025-04-26 17:53 UTC (5 hours ago)
        
 (HTM) web link (www.construction-physics.com)
 (TXT) w3m dump (www.construction-physics.com)
        
       | levocardia wrote:
       | Surprised that there isn't any explicit discussion of _why_
       | dexterity is so hard, beyond sensory perception. One of the root
       | causes (IMHO the biggest one) is that modeling contact, ie the
       | static and dynamic friction between two objects, is extremely
       | complicated. There are various modeling strategies but their
       | results are highly sensitive to various tuning parameters which
       | makes it very hard to learn in simulation. From what I remember,
       | the OpenAI Rubik 's Cube solver basically learned across a giant
       | set of worlds of many different possible tuning parameters for
       | the contact models and was able to generalize okay to the real
       | world, in various situations.
       | 
       | It seems most likely that this sort of boring domain
       | randomization will be what works, or works well enough, for
       | solving contact in this generation of robotics, but it would be
       | much more exciting if someone figures out a better way to learn
       | contact models (or a latent representation of them) in real time.
        
         | sho_hn wrote:
         | This rings super plausible to me. I dabbled a bit in hobby
         | electronics making DIY walkers, and the more time you spend on
         | junior stuff like that (trying to model a good response to
         | servo load feedback that works in every situation, etc.) the
         | more it dawns on you that what humans and other animals do with
         | the sensor feedback they get from their limbs is so rich in
         | "magic" and intelligence.
         | 
         | Figuring out physical interaction with the environment and
         | traversal is truly one of the most stunning early achievements
         | of life.
        
         | hahaxdxd123 wrote:
         | Here's an interesting blog post on the limitations of domain
         | randomization for OpenAI's results:
         | https://www.alexirpan.com/2019/10/29/openai-rubiks.html
         | 
         | Basically the solve rate was much lower without the use of a
         | Bluetooth sensor, and they did a bunch of other things that
         | made the result less impressive. Still a long way to go here.
        
         | rapjr9 wrote:
         | Dexterity is also hard because, at least in humans, it relies
         | on knowing something of the nature of an object _before_
         | manipulating it. Is it light or heavy? Soft or rigid? Is it a
         | bag of popcorn, popcorn kernels, a bag of powder, or a pillow?
         | How tightly is it packed in the bag? Fabric or cardboard?
         | Attached to other objects or not? Is the USB plug the right
         | type and oriented correctly? (Even humans have trouble with
         | this one.) Does it have a slippery surface or a grippy surface?
         | To be immediately successful in manipulation, pre-knowledge
         | based on sensing and identification is usually required.
         | Possibly it would be ok if a robot took several tries to figure
         | this out based on some general principles, but it will seem
         | clumsy and be slower. It seems there is an ontology problem
         | here, which requires understanding a lot about the world in
         | order to be able to successfully manipulate it.
         | 
         | More generally, continuous learning in real-time is something
         | current models don't do well. Retraining an entire LLM every
         | time something new is encountered is not scalable. Temporary
         | learning does not easily transfer to long term knowledge.
         | Continuous learning still seems in its infancy.
        
           | pixl97 wrote:
           | Also when we don't know the properties of an object we are
           | about to manipulate we'll approach it cautiously and learn it
           | before we apply too much force. This tends to happen
           | transparently and quickly for adults, but for infants you can
           | watch it play out more slowly.
        
         | beau_g wrote:
         | On a freestanding humanoid robot, you have an inverse kinematic
         | chain running all the way from the touch point to the ground,
         | with many actuators in between, each of which to some degree
         | squares the complexity of the problem. The parent article
         | mentions a Fanuc or Kuka bot, which lets say is 6 axis - they
         | are incredibly stiff/strong, in many cases many orders of
         | magnitude stronger than they really need to be for the job they
         | are tasked with, they do not move, modeling things like
         | clashing with the environment/itself is much simpler because
         | they are placed in 100% controlled environments - remove all of
         | those qualifiers (weak robot because it needs to be light,
         | dynamic environment, and count the DOF between the robots
         | finger and it's ankles) and it gives a clearer picture than the
         | article offers of why all this stuff is difficult. Can't take
         | much of a divide and conquer approach like you can in other
         | domains.
        
       | iandanforth wrote:
       | For some perspective we have not yet scaled robot training. The
       | amount of data that Pi is using to train their impressively
       | capable robots is in the range of thousands of hours of data. In
       | contrast language models are trained over trillions of tokens
       | comprising the entirety of human knowledge. So if you're saying
       | things like "this still seems hard" just remember we have yet to
       | hit this with the data hammer. Simulation is proving a great way
       | to augment / bootstrap robot dexterity but it still pales in
       | comparison to data in the real world. So, as the author points
       | out, we may get capability scaling like Waymo where one company
       | painstakingly collects real data over a decade, but we may also
       | see the rapid progress in simulators and simulator _speed_
       | overtake for practical household  / industrial tasks. My bet is
       | on the latter.
        
         | hahaxdxd123 wrote:
         | Correct me if I'm wrong, but I haven't seen any simulator
         | progress in years (e.g. MuJoCo hasn't changed in 5 years but is
         | still SOTA accuracy)
        
       | sashank_1509 wrote:
       | Recently I had a chance to listen to a set of talks powering
       | Waymo Technology. I think the average academic roboticist will be
       | shocked by the complete lack of end to end deep learning models
       | or even large models powering Waymo. It's interesting to me that
       | the only working self driving car on the market right now,
       | basically has painstakingly listed every possible road obstacle,
       | has coded every possible driving logic to it, and manually
       | addressed every edge case. maybe Tesla's end to end approach will
       | work, and that will be the way moving forward, but the real world
       | seems to provide an almost limitless amount of edge cases that
       | neural networks don't seem great at handling. In fact the winning
       | approach to humanoids, if Waymo is proven to be the right
       | approach might be listing every possible item a humanoid can see
       | an environment, detecting them and then planning for them.
        
         | sho_hn wrote:
         | I suppose the (crummy) analog is that a human's "models" are
         | equally not entirely general; we have evolved a particular
         | architecture that is baked into our hardware and perpetuated
         | via our DNA.
         | 
         | It's fuzzy and plastic and complex, but the brain has
         | functional areas, there is intelligence more local to specific
         | sensors, pipelines where fusion happens, governors and
         | supervisors, specific numeric limits to certain tasks, etc.
         | 
         | This is a bit akin to your "listing every possible item", in a
         | way, in the sense that there are definitely finite structures
         | tuned toward the application of being human.
         | 
         | This interplay via our supposed "AGI" and what is "cached" in
         | our also not static but evolving hardware is really one of the
         | most fascinating aspects of biology.
        
         | Zigurd wrote:
         | Not every edge case, but enough that the vehicle can correctly
         | determine it doesn't know how to proceed and must ask a human
         | to choose from among a menu of choices. This is how Waymo
         | described how supervision works. Nobody actually drives the
         | vehicle remotely. They just make a decision the on-board
         | intelligence has decided it can't make.
         | 
         | One good bet based on Waymo's decision to expand is that the
         | amount of supervision each robotaxi needs keeps going down, so
         | supervision is not tightly coupled to fleet size.
        
         | huevosabio wrote:
         | I think it would be the other way around, academic roboticists
         | are very well aware of how damn hard the physical world is.
        
         | boulos wrote:
         | (Disclosure: I work for Waymo)
         | 
         | While there is plenty of classical robotics code in our
         | planner, I wouldn't want people to assume that we don't use
         | neural networks for planning.
         | 
         | Just because we don't deploy end-to-end models (e.g., sensors
         | to controls), but have separate perception and planning
         | components doesn't mean there isn't ML in each part. Having the
         | components separate means we can train and update each
         | individually, test them individually, inject overrides as
         | needed, and so on. On the flip side, it's true that because
         | it's not learned end-to-end today that there might exist a
         | vastly simpler or higher quality system.
         | 
         | So we do a lot of research in this area, like EMMA
         | (https://waymo.com/research/emma/) but don't assume that our
         | planning isn't heavily ML based. A lot of our progress in the
         | last couple of years has been driven by increasing the amount
         | of ML used for planning, especially for behavior prediction
         | (e.g., https://waymo.com/research/wayformer/)
        
       | DGAP wrote:
       | Do these challenges apply to surgical robots? There's a lot of
       | interest in essentially creating automated Davincis, for which
       | there is a great deal of training data and for which the robots
       | are prepositioned.
       | 
       | Maybe all this setup means that completing surgical tasks doesn't
       | counter as dexterity.
        
       | Zigurd wrote:
       | There are half a dozen successful commercially available surgical
       | robot products out there. None try to mimic a surgeon's hands.
       | 
       | Even if biomimicry turns out to be a useful strategy in designing
       | general purpose robots, I would bet against humans being the
       | right shape to mimic. And that's assuming general purpose robots
       | will ever be more useful than robots designed or configured for
       | specific tasks.
        
         | michaelt wrote:
         | The reason people keep working on human-like hands for robots
         | is: The world is absolutely full of things adapted to be
         | operated with human hands.
         | 
         | Handling heavy boxes? Baking a cake? Operating a circular saw?
         | Assembling a PC? Performing surgery? Loading a ream of paper
         | into a printer? Playing a violin? Opening a door? You can do it
         | all with two five-fingered hands.
        
           | ndileas wrote:
           | I think it's important to note that many individual humans
           | are adapted to only a few of these tasks. A construction
           | worker's hands and a magician have very different muscles,
           | skin thickness, grip strength, dexterity, etc. even though
           | they can both wash a dish and open a door.
        
       ___________________________________________________________________
       (page generated 2025-04-26 23:00 UTC)