[HN Gopher] ReKep: Spatio-Temporal Reasoning of Relational Keypo...
       ___________________________________________________________________
        
       ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints
       for Robots
        
       Author : jasondavies
       Score  : 39 points
       Date   : 2024-09-29 12:59 UTC (10 hours ago)
        
 (HTM) web link (rekep-robot.github.io)
 (TXT) w3m dump (rekep-robot.github.io)
        
       | aithrowawaycomm wrote:
       | I didn't think this GitHub Pages write-up was very clear, but the
       | linked paper on arXiv is interesting (haven't finished reading
       | yet!) and this is a cool project.
       | 
       | Ultimately the weaknesses seem to come from "outsourcing" true
       | spatio-mechanical reasoning to a language model which designs the
       | according constraints, but does so with the same kind of brittle
       | reasoning and odd limitations we've come to expect. It's not
       | really "artificial" spatial reasoning so much as "virtual":
       | sometimes quite good, but paper-thin and largely based on
       | memorizing patterns. I think the authors overstated a few
       | conclusions, e.g. the clothes folding don't appear to be
       | following any strategy at all, let alone a "novel" strategy -
       | whatever apparent hints of strategy the authors are seeing is
       | probably better explained by the symmetry of human clothing,
       | which the vision model picks up on.
       | 
       | And note they didn't ask the robot to fold _messy_ clothing like
       | a human does when it 's fresh out of the dryer; I suspect the
       | robot needs shirts and pants to be laid out neatly, otherwise the
       | vision model will misidentify it.
       | 
       | More generally, the authors did not do enough to stress-test the
       | robot in situations that don't line up with the training data.
       | It's cool to pour tea from a pot into a mug, but the vision model
       | presumably has thousands of photos of this for the robot to
       | imitate. What if you ask the robot to pour a mug into an open
       | teapot? Presumably the vision model itself is less adept with
       | this prompt; maybe the robot will still work, it's a simple task.
       | 
       | But experience with ANNs suggests it's likely to falter in these
       | off-the-golden-path cases, and that it'll falter in ways that are
       | bizarre and unpredictable. I would have liked to see more
       | comprehensive stress testing before using fancy terms like
       | "spatio-temporal reasoning." AI does not need more fancy tech
       | demos driving unrealistic hype.
       | 
       | Regardless the results are very cool, and the underlying
       | machinery is sophisticated without being too mysterious (once you
       | accept the mysterious AI models it's based on...). I think the
       | edge case issue might mitigate _industrial_ deployment in e.g. a
       | factory, but I think robotics tinkerers and hobbyists would have
       | a blast with these ideas, and people much cleverer than me could
       | even make a real product.
        
         | leetrout wrote:
         | I work tangentially to robots making use of computer vision...
         | we're standing on the shoulders of many giants with OpenCV and
         | ROS (for all their warts).
         | 
         | Being able to get reliable object detection without custom
         | training weights seems like the next domino to fall and become
         | ubiquitous. Based on playing around with A111 and similar I
         | suspect we're headed to a place where we can get reliable
         | models that are small that will work well enough to accomplish
         | your task of pouring it back into the tea pot.
         | 
         | I also find the procedural animation demos with UE 5[0]
         | interesting as well since what I see first hand is a lot of
         | "key frame" robotics programming for complex movements...
         | combining all these concepts will lead to some very unique
         | solutions with a lot less hand-holding. Wonder how fast we'll
         | get there...
         | 
         | 0:
         | https://www.linkedin.com/feed/update/urn:li:activity:7235116...
        
       | CodeGroyper wrote:
       | I love that Twitter is linked as tl;dr.
        
       ___________________________________________________________________
       (page generated 2024-09-29 23:01 UTC)