hngopher.com

       [HN Gopher] How can we make robotics more like generative modeling?
       ___________________________________________________________________
        
       How can we make robotics more like generative modeling?
        
       Author : ericjang
       Score  : 39 points
       Date   : 2022-07-30 16:16 UTC (6 hours ago)
        
 (HTM) web link (evjang.com)
 (TXT) w3m dump (evjang.com)
        
       | ilaksh wrote:
       | It seems like these things could perform much better if the
       | modeling was deliberately decomposed more and there was less
       | emphasis on doing everything in one parallel step.
       | 
       | For example, translating the visual sampling into a 3d model
       | first. Or maybe some neural representations that can generate the
       | 3d models. Then train the movement on that rather than raw
       | pixels.
       | 
       | Similarly, for textual prompts of interactions, first create a
       | model that relates the word embedding to the same 3d modeling and
       | physics interactions.
       | 
       | Obviously much easier said than done.
        
         | lostdog wrote:
         | In industry you do see some decomposition into steps to solve
         | problems, and there's lots of research papers that decompose
         | too. It does work to solve problems, but decomposition also has
         | major weaknesses.
         | 
         | First, you have to come up with a good intermediate
         | representation, and that's pretty difficult. Your suggestion of
         | a 3d model is good, but has a lot of complex design choices.
         | Should you use a mesh or voxel representation? What resolution
         | will work? How do you train the upstream model? As your problem
         | gets more and more complex, so does your intermediate
         | representation, and the engineering to get the intermediate
         | representation working correctly become prohibitively
         | difficult.
         | 
         | But we're not done yet! So now you've got an intermediate
         | representation, a net that produces it, and a net that consumes
         | it. Maybe your current results aren't good enough to publish,
         | so you spend some time optimizing the upstream model. You take
         | a few weeks and make massive improvements on the accuracy.
         | Great! Right? But then you plug your improved upstream model
         | into the system and get no change in overall performance. Turns
         | out that your intermediate output got better, but in a way that
         | doesn't matter to the task. Now you get to spend a month
         | guessing how to tune your intermediate loss so a trained
         | upstream model does improve end-to-end performance.
         | 
         | So yeah, decomposing the problem is often what you need to do
         | in practice, but there's a reason that many researchers are
         | trying to work on more scalable end-to-end approaches. It's
         | very very difficult to answer both "What intermediate
         | representation carries the information to perform this task,"
         | and "Which aspect of the intermediate representation is most
         | important to optimize?"
         | 
         | It's especially difficult in the area the blog post is about,
         | "unstructured robotics." You can't focus your representation on
         | certain types of objects. The representation needs to somehow
         | capture disparate things like "The drill bit must go near the
         | screw," "This is the squishy part," and "This is attached with
         | a hinge." Now you need to program some type of model that can
         | describe everything possible in the world, and no amount of OOP
         | courses will help you.
         | 
         | There is a 3rd way, which doesn't work quite yet: Do
         | unsupervised/self-supervised training of the world model, with
         | a representation that's just a bag of floats. Maybe you can
         | learn a world model that's informative enough that you can plug
         | any sort of goal in and get good results. It's still unclear
         | whether this will beat out an end-to-end system, so there's
         | more research to be done.
        
       ___________________________________________________________________
       (page generated 2022-07-30 23:01 UTC)