[HN Gopher] Google DeepMind's Aloha Unleashed is pushing the bou...
___________________________________________________________________
Google DeepMind's Aloha Unleashed is pushing the boundaries of
robot dexterity
Author : modeless
Score : 180 points
Date : 2024-04-16 16:28 UTC (6 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| riidom wrote:
| Reminds me about "Foxes in Love" somehow.
| modeless wrote:
| More videos:
|
| Hanging multiple shirts in a row:
| https://twitter.com/ayzwah/status/1780263770440491073
|
| Generalizing to unseen sweater:
| https://twitter.com/ayzwah/status/1780263771858194809
|
| Struggling to unfold a shirt:
| https://twitter.com/DannyDriess/status/1780270239185588732
|
| Assembling gears:
| https://twitter.com/ayzwah/status/1780263775213629497
| godelski wrote:
| I wish more of these were shown at 1x (last one is). Sure, a
| bit slower, but if you watch the OP link at 1/2 speed it is
| still impressive.
|
| Here's a shoe tying at 1x:
| https://twitter.com/ayzwah/status/1780263776694182311
|
| Interesting here how it ties the knot. The first knot is
| already in place and they just do the bow. I don't think how
| most people would tie their shoes would work well for a robot
| (bunny around tree method[0]) but I actually tie my shoes like
| this[1]. This is the same way the robot ties.
|
| So I gotta ask, was this a purely learned policy or was this
| taught or pushed in that direction[^]? I suspect the latter.
|
| [0] https://www.youtube.com/watch?v=YwqQvKtmefE
|
| [1] https://www.youtube.com/watch?v=XPIgR89jv3Q
|
| [^] by pushed in that way, would include watching that video or
| any other videos like it.
| fragmede wrote:
| UBTECH and Baidu out of China demoed a clothes folding robot
| demo two weeks ago (early April 2024), and the video is claimed
| to be 1x/realtime.
|
| https://youtu.be/8MRDF2pkIRs
| throwup238 wrote:
| Finally, a robot that can tie my shoes for me!
| linsomniac wrote:
| In the last year I've started using that knot that those robots
| used, the "Ian Knot", to tie my shoes, and I'm loving it.
| https://www.fieggen.com/shoelace/ianknot.htm
| mjamesaustin wrote:
| Yeah these robots tie a better shoelace than most humans!
| p1mrx wrote:
| Do most humans leave their laces dragging on the ground?
|
| Though to be fair, those laces are really long. The robot
| needs to unlace the shoes, cut some length from the middle,
| tie a double fisherman's knot, and relace them.
| ozten wrote:
| The speed that these arms/hands move at is incredible compared to
| 4 months ago.
| chabons wrote:
| The videos are all shown at 2x speed, but your point stands,
| this is still pretty quick.
| lyapunova wrote:
| Sorry, but this is a lot of marketing for the same thing over and
| over again. I'm not against Aloha as an _affordable_ platform,
| but skimping on hardware is kind of a bug not a feature. Moreover
| it's not even _lowcost_, its BoM is still like 20k and collecting
| all the data is labor intensive and not cheap.
|
| And if we're focusing on the idea, it has existed since the 1950s
| and they were doing it relatively well then:
|
| https://www.youtube.com/watch?v=LcIKaKsf4cM
| modeless wrote:
| These videos are all autonomous. They didn't have that in the
| 1950s.
| lyapunova wrote:
| I can appreciate that, but also they are recording and
| replaying motor signals from specific teleoperation
| demonstrations. Something that _was_ possible in the 1950s.
| You might say that it is challenging to replay demonstrations
| well on lower-quality hardware. And so there is academic
| value in trying to make it work on worse hardware, but it
| would not be my goto solution for real industry problems.
| E.g. this is not a route I would fund for a startup, for
| example.
| modeless wrote:
| They do not replay recorded motor signals. They use
| recorded motor signals only to train neural policies, which
| then run autonomously on the robot and can generalize to
| new instances of a task (such as the above video
| generalizing to an adult size sweater when it was only ever
| trained on child size polo shirts).
|
| Obviously some amount of generalization is required to fold
| a shirt, as no two shirts will ever be in precisely the
| same configuration after being dropped on a table by a
| human. Playback of recorded motor signals could never solve
| this task.
| adolph wrote:
| > recorded motor signals only to train neural policies
|
| Is interesting that they are using "Leader Arms" [0] to
| encode tasks instead of motion capture. Is it just a
| matter of reduced complexity to get off the ground? I
| suppose the task of mapping human arm motion to what a
| robot can do is tough.
|
| 0. https://www.trossenrobotics.com/widowx-aloha-set
| ewjt wrote:
| This is not preprogrammed replay. Replay would not be able
| handle even tiny variations in the starting positions of
| the shirt.
| lyapunova wrote:
| So, a couple things here.
|
| It is true that replay in the world frame will not handle
| initial position changes for the shirt. But if the
| commands are in the frame of the end-effector and the
| data is object-centric, replay will somewhat
| generalize.(Please also consider the fact that you are
| watching the videos that have survived the "should I
| upload this?" filter.)
|
| The second thing is that large-scale behavior cloning
| (which is the technique used here), is essentially replay
| with a little smoothing. Not bad inherently, but just a
| fact.
|
| My point is that there was an academic contribution made
| back when the first aloha paper came out and they showed
| doing BC on low-quality hardware could work, but this is
| like the 4th paper in a row of sort of the same stuff.
|
| Since this is YC, I'll add - As an academic (physics)
| turned investor, I would like to see more focus on
| systems engineering and first-principles thinking. Less
| PR for the sake of PR. I love robotics and really want to
| see this stuff take off, but for the right reasons.
| modeless wrote:
| > large-scale behavior cloning (which is the technique
| used here), is essentially replay with a little smoothing
|
| A definition of "replay" that involves extensive
| correction based on perception in the loop is really
| stretching it. But let me take your argument at face
| value. This is essentially the same argument that people
| use to dismiss GPT-4 as "just" a stochastic parrot. Two
| things about this:
|
| One, like GPT-4, replay with generalization based on
| perception can be exceedingly useful by itself, far more
| so than strict replay, even if the generalization is
| limited.
|
| Two, obviously this doesn't generalize as much as GPT-4.
| But the reason is that it doesn't have enough training
| data. With GPT-4 scale training data it would generalize
| amazingly well and be super useful. Collecting human
| demonstrations may not get us to GPT-4 scale, but it will
| be enough to bootstrap a robot useful enough to be
| deployed in the field. Once there is a commercially
| successful dextrous robot in the field we will be able to
| collect orders of magnitude more data, unsupervised data
| collection should start to work, and robotics will fall
| to the bitter lesson just as vision, ASR, TTS,
| translation, and NLP before.
| lyapunova wrote:
| Thank you for your rebuttal. It is good to think about
| the "just a stochastic parrot" thing. In many ways this
| is true, but it might not be bad. I'm not against replay.
| I'm just pointing out that I would not start with an
| _affordable_ 20k robot with fairly undeveloped
| engineering fundamentals. It's kind of like trying to dig
| a foundation to your house with a plastic beach shovel.
| Could you do it? Maybe, if you tried hard enough. Is it
| the best bet for success? doubtful.
| klowrey wrote:
| The detail about end-effector frame is pretty critical as
| doing this BC with joint angles would not be tractable.
| You can tell there was a big shift from the RL approaches
| trying to do very generalizing algorithms to more recent
| works that are heavily focused on this arms/manipulators
| because end-effector control enables more flashy results.
|
| Another limiting factor is that data collection is a big
| problem: not only will you never be sure you've collected
| enough data, they're collecting data of a human trying to
| do this work through a janky teleoperation rig. The
| behavior they're trying to clone is of a human working
| poorly, which isn't a great source of data! Furthermore
| limiting the data collection to (typically) 10Hz means
| that the scene will always have to be quasi-static, and
| I'm not sure these huge models will speed up enough to
| actually understand velocity as a 'sufficient statistic'
| of the underlying dynamics.
|
| Ultimately, it's been frustrating to see so much money
| dumped into the recent humanoid push using teleop / BC.
| It's going to hamper the folks actually pursing first-
| principles thinking.
| johntb86 wrote:
| What do you mean by saying that they're replaying signals
| from teleoperation demonstrations? Like in
| https://twitter.com/DannyDriess/status/1780270239185588732,
| was someone demonstrating how to struggle to fold a shirt,
| then they put a shirt in the same orientation and had the
| robot repeat the same motor commands?
| xg15 wrote:
| > _skimping on hardware is kind of a bug not a feature._
|
| I have to disagree here. Not for 20k, but if you could really
| build a robot arm out of basically a desk lamp, some servos and
| a camera and had some software to control it as precisely as
| this video claims it does, this would be a complete game
| changer. We'd probably see an explosion of attempts to automate
| all kind of everyday household tasks that are infeasible to
| automate cost-effectively today (folding laundry, cleaning up
| the room, cooking, etc)
|
| Also, every self-respecting maker out there would probably try
| to build one :)
|
| > _And if we 're focusing on the idea, it has existed since the
| 1950s and they were doing it relatively well then:_
|
| I don't quite understand how the video fits here. That's a
| manually operated robot arm. The point of Aloha is that it's
| fully controlled by software, right?
| sashank_1509 wrote:
| I follow this space closely and I never saw the 1950
| teleoperation video which literally blows my mind that people
| had this working in 1950. Now you just need to connect that to
| a transformer / diffusion and it will be able to perform that
| task autonomously maybe 80% of the time with 200+
| demonstrations and close to 100% of the time with 1000+
| demonstrations.
|
| Aloha was not new, but it's still good work because robotics
| researchers were not focused on this form of data collection.
| The issue was most people went into the simulation rabbit hole
| where they had to solve sim-to-real.
|
| Others went into the VR handset and hand tracking idea, where
| you never got super precise manipulations and so any robots
| trained on that always showed choppy movement.
|
| Others including OpenAI decided to go full reinforcement
| learning foregoing human demonstrations which had some decent
| results but after 6 months of RL on an arm farm led by Google
| and Sergey Levine, the results were underwhelming to say the
| least.
|
| So yes it's not like Aloha invented teleoperation, they
| demonstrated that using this mode of teleoperation you could
| collect a lot of data that can train autonomous robot policies
| easily and beat other methods which I think is a great
| contribution!
| taylorfinley wrote:
| I wonder if this unfortunate naming choice will cause a stir
| similar to: https://kawaiola.news/cover/aloha-not-for-sale-
| cultural-in-a...
| math_dandy wrote:
| Hopefully DeepMind will think twice before sending cease-and-
| desist orders to any Hawaiian AI robotics businesses with aloha
| in the name!
| taylorfinley wrote:
| I should definitely hope so! Though I think the name would
| cause a stir in local circles even without any legal actions.
| Tech companies in general are deeply unpopular here (see:
| Larry Ellison, Mark Zuckerburg, and Marc Benioff buying up
| big chunks of land, AirBnB and digital nomads driving up
| rental prices so high such that more native Hawaiians now
| live on the mainland than in Hawai`i, and perceived lack of
| cultural respect from projects like the Thirty Meter
| Telescope leading to major protests).
|
| The other thing is that words have a lot of power in the
| cultural frame, even just the concept of aloha being
| something that could be "unleashed" is likely to offend.
|
| All to say nothing off the palpable fear people have here of
| robots taking hospitality industry jobs like housekeeping
| (which are unionized in many hotels out here, and are
| actually one of the few low-barrier-to-entry jobs out here
| that can support a reasonable quality of life)
|
| I'm sure I'll get a ton of downvotes for bringing up cultural
| sensitivity and pointing out these concerns -- I don't mean
| to imply they're all 100% rational nor that no one should say
| "aloha" unless they're Hawaiian, but if anyone at DeepMind
| had a Hawaiian cultural frame I think they likely would have
| flagged these concerns and recommended a different name.
| 1024core wrote:
| > Tech companies in general are deeply unpopular here
|
| Which is such a shame, as Univ of Hawaii was one of the
| pioneers of the Internet:
| https://en.wikipedia.org/wiki/ALOHAnet
| bastawhiz wrote:
| The story you linked either omits the information or buries it
| deep enough to obscure the _actual_ source of the controversy.
| I was living in Chicago at the time, and the scandal wasn't the
| name choice, it was the fact that Aloha Poke sent cease and
| desist letters to other poke shops across the country demanding
| that they remove "aloha" from their names:
|
| https://chicago.eater.com/2018/7/31/17634686/aloha-poke-co-c...
|
| > the Chicago-born restaurant chain whose attorneys sent cease
| and desist messages to poke shop owners in Hawai'i, Alaska, and
| Washington state demanding they change names by dropping the
| terms "aloha" and "poke" when used together. While Aloha Poke
| contends it sent notes in a "cooperative manner" to defend
| intellectual property, Native Hawaiians feel the poke chain is
| trying to restrict how they can embrace their own heritage.
| im3w1l wrote:
| To me the most impressive thing is the arms servicing each other.
| When they can self replicate it could potentially have big
| consequences.
|
| I have a dream that we put self-replicating robots on Mars and
| let them build a mostly by-robots for-robots civilization that
| can potentially export stuff to earth, do various science
| projects and build spacecraft.
| btbuildem wrote:
| I guess this is obvious in retrospect.. but having two arms vs
| one greatly expands the range of possible tasks.
| baron816 wrote:
| Skin is also incredible when you think about it. Each square cm
| is able to sense temperature, texture, whether it's wet,
| sticky, etc. And it's self healing. It's hard to imagine robots
| getting very far without artificial skin.
| yakz wrote:
| Skin can't sense "wet" can it? I thought it was mostly just
| temperature, but also in combination with a few other
| properties you perceive it as moisture but it can be easily
| fooled because there's not a direct sense for it.
| reaperman wrote:
| Your point is entirely valid despite my critiques.
|
| Technically, skin can't sense whether something is wet, and
| isn't particularly great at sensing temperature. Skin senses
| pressure and heat flow (derived via sensing temperature
| change _of the flesh itself_ , rather than the temperature of
| the object it is touching), and perhaps can sense shear
| (there is a unique sensation when skin is stretched/pulled
| apart), as well as the weight of an object (if it is
| absorbent and more wet than damp). This distinction about
| what skin can directly sense manifests itself to deceive the
| human brain about wetness and temperature, specifically.
|
| Wetness is a perception derived from feeling higher-than-
| expected heat loss and unusual pressure/sheer, and even
| through the sound made when squeezing an absorbent material
| or the sensation of water pooling around the finger
| (broadening the area of heat loss) when you squeeze into the
| material. Damp laundry at room temperature is perceived as
| obviously wet because it feels colder than it should if it
| were dry, but when we're pulling laundry out of a dryer we
| often can't tell if it's dry vs. still a bit damp -- the
| higher temperature of the object removes the sensation of
| heat flowing away from our fingers, so there's nothing our
| fingers can sense to tell us the clothes aren't dry until the
| clothes finally cool down to room temperature.
|
| Our skin also doesn't sense the temperature of an object well
| if that object has a particularly high or low heat transfer
| coefficient of conduction. I recently bought a 6-pack of beer
| cans which have a moderately thick plastic vinyl label shrunk
| around the can. When I reach in my fridge, I can't convince
| myself to perceive it as chilled no matter how hard I try.
| Even though the vinyl is the same temperature as everything
| else in the fridge, it doesn't pull heat out of my finger
| tissue, so my brain cannot perceive that it isn't "room
| temperature". Conversely, picking up a normal metal can of
| beer that is just barely below room temperature, my brain
| perceives it to be much colder than it actually is because
| the metal draws heat away from my fingers so quickly compared
| to other objects. If wood is cooled 5 degrees below room
| temperature, it doesn't feel cold, but a can of beer
| certainly does!
|
| It is absolutely incredible that our skin can sense things to
| such a high resolution that it seems like we have a lot more
| abilities than we actually have. It is also amazing how our
| brain integrates this into a rich perception. But there
| actually aren't many physical properties actually being
| measured, and this distinction matters sometimes for edge
| cases, some of which are quite common.
| pixl97 wrote:
| > skin can't sense whether something is wet
|
| Ah the "Are the clothes in the dryer cold or are they wet"
| effect.
| mikepurvis wrote:
| True, and I think it was on that basis that PR2 was conceived
| as a bi-manual mobile manipulator... it just also has a massive
| impact on cost.
| m3kw9 wrote:
| Don't look impressive because this is what you see a lot in
| factories anyways, maybe a little better then Sota
| danpalmer wrote:
| The difference with factories is that every movement is
| programmed by someone in quite intricate detail. Factory robots
| aren't "smart" in any sense.
| adrr wrote:
| Factories/Distribution Centers are doing hard goods not soft
| goods.
| dghughes wrote:
| If I ever move and end up living in a giant concrete warehouse
| devoid of furniture I'll keep this robot in mind.
| julienreszka wrote:
| I am skeptical. Lots of fake demos out there. Can we rule out
| that it turns out it's actually remotely controlled by some dude
| in india, just like the amazon's << Just walk out >>?
| InPanthera wrote:
| Amazons was remote controlled?
| sp332 wrote:
| https://gizmodo.com/amazon-reportedly-ditches-just-walk-
| out-... "Just Walk Out relied on more than 1,000 people in
| India watching and labeling videos to ensure accurate
| checkouts."
| jjjjjjjkjjjjjj wrote:
| I've worked in humanoid robots and manipulation for the past
| decade and this is mind blowing. For robots. Still pathetic
| compared to any human, but mind blowing for robots. I remember
| when we were hoping one humanoid would someday be able to replace
| a broken limb on another humanoid and we were designing super
| easy quick disconnects to make that possible. This is already way
| beyond that. Very impressive.
| adolph wrote:
| It isn't clear how "Aloha Unleashed" is different from "Mobile
| ALOHA"
|
| Paper: Learning Bimanual Mobile Manipulation with Low-Cost Whole-
| Body Teleoperation: https://arxiv.org/abs/2401.02117
|
| Video set: https://mobile-aloha.github.io/
|
| Tutorial:
| https://docs.google.com/document/d/1_3yhWjodSNNYlpxkRCPIlvIA...
|
| Kits for sale: https://www.trossenrobotics.com/aloha-kits
| ingend88 wrote:
| Is there ready made low-cost arm that is available ?
| adolph wrote:
| You have to think of this as an entire system. The arm is
| necessary but not sufficient. An "arm" could be as simple as
| small servos and popsicle sticks [0]. In the case of ALOHA,
| below is an outline of the basic components.
| * arms (aka follower arms) - effector (i.e. gripper)
| - sensors (i.e. cameras, depth sensors, specced Intel
| RealSense D405) - gravity compensation (so the
| relatively delicate servos aren't overloaded) *
| controller - runs Robot Operating System (ROS [1])
| plus other software (i.e. arm, gripper interfaces [2])
| - runs ALOHA model in inference to tell ROS what to do based
| on task and sensor input - trains ALOHA models using
| arm motion encoder and ACT: Action Chunking with Transformers
| [4] * leader arms - motion encoders
| (essentialy an arm in reverse that can be used by a human to
| telecontrol the arm to encode motions into model training)
|
| The system at this point is "research grade" which is at once
| expensive due to custom/nice materials/units and not super
| user friendly--you must know a lot. See the build
| instructions [5].
|
| 0. https://github.com/evildmp/BrachioGraph
|
| 1. https://www.ros.org/
|
| 2. https://github.com/interbotix
|
| 3. https://www.trossenrobotics.com/aloha-kits
|
| 4. https://github.com/tonyzhaozh/act
|
| 5. https://docs.google.com/document/d/1sgRZmpS7HMcZTPfGy3kAxD
| rq...
| InPanthera wrote:
| Was expecting more from google than a robot that can tie shoe
| laces, wats the use case for this? Toys for the 1%?
| mikepurvis wrote:
| I don't think the point is lace-tying; it's to demonstrate what
| is possible in terms of analogous tasks requiring a similar
| level of dexterity and environmental adaptability.
|
| In any case, the real start of this show is clearly the shirt
| hanging.
| yosito wrote:
| There's something odd about the way the arms move, like they are
| two distinct entities cooperating rather than being part of one
| coordinated mind. Maybe this is an example of the uncanny valley,
| or maybe it's because they are two physically separate arms, but
| it seems to me like one arm moves while the other waits for its
| turn. It's as if engineers programmed them to work sequentially.
| I wonder if it might be beneficial for engineers to study videos
| of humans doing these tasks and try to mimic those movements
| rather than trying to program a sequential procedure.
| patcon wrote:
| Now I'm trying to imagine how our limb movements might be
| perceived by a creature that natively evolved the style of
| coordination in the video :) it would be "weird" but how might
| they describe that weirdness and what might underlie it in
| us..?
| dylan604 wrote:
| Look, it moves its mouth while it reads. Like it can't do one
| thing without doing the other thing moving at the same time
| pixl97 wrote:
| Which reminds me of my favorite interpretation of "They're
| made out of meat"
|
| https://www.youtube.com/watch?v=7tScAyNaRdQ
| visarga wrote:
| this was good
| lachlan_gray wrote:
| Sometimes I feel this about myself... I don't think much to
| walk or do something with both hands, they work stuff out on
| their own. How much do my legs or hands understand about each
| other?
| rotexo wrote:
| I've been reading Vernor Vinge's A Fire Upon the Deep where
| that is a characteristic of one of the species in the novel,
| and had the exact same thought.
| CooCooCaCha wrote:
| That's because the robot has gone ultra instinct.
| williamcotton wrote:
| If you monitor your own movements you'll find plenty of
| sequential procedures. The big difference with how these robot
| arms move is that they are firmly planted on a large table,
| whereas your arms attached to this self-balancing, lightweight
| gyrating torso.
| smusamashah wrote:
| This is very exciting but because its from Google this tech won't
| get out of their quarters.
| macromaniac wrote:
| It's impressive that transformers, diffusion, and human generated
| data can go so far in robotics. I would have expected simulation
| would be needed to achieve such results.
|
| My fear is that we see a similar problem with other generative AI
| in that it gets stuck in loops on complex problems and is unable
| to correct itself because the training data covers the problem
| but not the failure modes.
| visarga wrote:
| That's because most models have been trained on data created by
| humans for humans, it needs data created by AI for itself.
| Better learn from your mistakes than from the mistakes of
| others, they are more efficient and informative.
|
| When an AI is set up to learn from its own mistakes it might
| turn out like AlphaZero, who rediscovered the strategy of Go
| from scratch. LLMs are often incapable of solving complex
| tasks, but they are greatly helped by evolutionary algorithms.
| If you combine LLMs with EA you get black box optimization and
| intuition. It's all based on learning from the environment,
| interactivity & play. LLMs can provide the mutation operation,
| or function as judge to select surviving agents, or act as the
| agents themselves.
| netcan wrote:
| Ooh! it can almost fold a shirt.
|
| Shade aside, robotics is so damned hard.
|
| The current under/over for godlike superintelligence before a
| robot that can make sandwiches and work the laundry machine... So
| unintuitive.
| nabla9 wrote:
| Look at the grapples it has to work with.
|
| If you would have to work with two chopsticks, or two spanners
| as hands, you would not do any better.
| moffkalast wrote:
| The neat part about working in robotics is that nobody can tell
| if you're a genius or a moron because neither of the two can
| get the damn thing working properly.
| RobotToaster wrote:
| I assume, being google, none of this is going to be open source?
| krasin wrote:
| It's already open-source; most of it anyway:
|
| 1. https://github.com/tonyzhaozh/aloha
|
| 2. https://aloha-2.github.io/
|
| 3. https://github.com/tonyzhaozh/aloha/tree/main/aloha2
| we_love_idf wrote:
| Google's days are numbered. OpenAI showed that AI is about
| delivering AGI, not playing some board games and doing PR
| stunts. Unfortunately Google hasn't learned its lessons. It's
| still doing PR stunts and people are falling for it.
| n0us wrote:
| Can't wait for someone to turn this into a product and make this
| available to the public!
| a_wild_dandan wrote:
| The Laundry Folding Helping Hands will sell so goddamn hard.
| When the tech gets there, I'll be first in line. I'll even buy
| the Vegetable Chopping DLC.
| quux wrote:
| Unexpected Ian Knot https://www.fieggen.com/shoelace/ianknot.htm
| bastawhiz wrote:
| This makes me wonder whether Google regrets spinning Boston
| Dynamics back off as its own entity.
| throwaway29303 wrote:
| https://www.theverge.com/2017/6/8/15766434/alphabet-google-b...
___________________________________________________________________
(page generated 2024-04-16 23:01 UTC)