[HN Gopher] ReKep: Spatio-Temporal Reasoning of Relational Keypo...
___________________________________________________________________
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints
for Robots
Author : jasondavies
Score : 39 points
Date : 2024-09-29 12:59 UTC (10 hours ago)
(HTM) web link (rekep-robot.github.io)
(TXT) w3m dump (rekep-robot.github.io)
| aithrowawaycomm wrote:
| I didn't think this GitHub Pages write-up was very clear, but the
| linked paper on arXiv is interesting (haven't finished reading
| yet!) and this is a cool project.
|
| Ultimately the weaknesses seem to come from "outsourcing" true
| spatio-mechanical reasoning to a language model which designs the
| according constraints, but does so with the same kind of brittle
| reasoning and odd limitations we've come to expect. It's not
| really "artificial" spatial reasoning so much as "virtual":
| sometimes quite good, but paper-thin and largely based on
| memorizing patterns. I think the authors overstated a few
| conclusions, e.g. the clothes folding don't appear to be
| following any strategy at all, let alone a "novel" strategy -
| whatever apparent hints of strategy the authors are seeing is
| probably better explained by the symmetry of human clothing,
| which the vision model picks up on.
|
| And note they didn't ask the robot to fold _messy_ clothing like
| a human does when it 's fresh out of the dryer; I suspect the
| robot needs shirts and pants to be laid out neatly, otherwise the
| vision model will misidentify it.
|
| More generally, the authors did not do enough to stress-test the
| robot in situations that don't line up with the training data.
| It's cool to pour tea from a pot into a mug, but the vision model
| presumably has thousands of photos of this for the robot to
| imitate. What if you ask the robot to pour a mug into an open
| teapot? Presumably the vision model itself is less adept with
| this prompt; maybe the robot will still work, it's a simple task.
|
| But experience with ANNs suggests it's likely to falter in these
| off-the-golden-path cases, and that it'll falter in ways that are
| bizarre and unpredictable. I would have liked to see more
| comprehensive stress testing before using fancy terms like
| "spatio-temporal reasoning." AI does not need more fancy tech
| demos driving unrealistic hype.
|
| Regardless the results are very cool, and the underlying
| machinery is sophisticated without being too mysterious (once you
| accept the mysterious AI models it's based on...). I think the
| edge case issue might mitigate _industrial_ deployment in e.g. a
| factory, but I think robotics tinkerers and hobbyists would have
| a blast with these ideas, and people much cleverer than me could
| even make a real product.
| leetrout wrote:
| I work tangentially to robots making use of computer vision...
| we're standing on the shoulders of many giants with OpenCV and
| ROS (for all their warts).
|
| Being able to get reliable object detection without custom
| training weights seems like the next domino to fall and become
| ubiquitous. Based on playing around with A111 and similar I
| suspect we're headed to a place where we can get reliable
| models that are small that will work well enough to accomplish
| your task of pouring it back into the tea pot.
|
| I also find the procedural animation demos with UE 5[0]
| interesting as well since what I see first hand is a lot of
| "key frame" robotics programming for complex movements...
| combining all these concepts will lead to some very unique
| solutions with a lot less hand-holding. Wonder how fast we'll
| get there...
|
| 0:
| https://www.linkedin.com/feed/update/urn:li:activity:7235116...
| CodeGroyper wrote:
| I love that Twitter is linked as tl;dr.
___________________________________________________________________
(page generated 2024-09-29 23:01 UTC)