[HN Gopher] How can we make robotics more like generative modeling?
___________________________________________________________________
How can we make robotics more like generative modeling?
Author : ericjang
Score : 39 points
Date : 2022-07-30 16:16 UTC (6 hours ago)
(HTM) web link (evjang.com)
(TXT) w3m dump (evjang.com)
| ilaksh wrote:
| It seems like these things could perform much better if the
| modeling was deliberately decomposed more and there was less
| emphasis on doing everything in one parallel step.
|
| For example, translating the visual sampling into a 3d model
| first. Or maybe some neural representations that can generate the
| 3d models. Then train the movement on that rather than raw
| pixels.
|
| Similarly, for textual prompts of interactions, first create a
| model that relates the word embedding to the same 3d modeling and
| physics interactions.
|
| Obviously much easier said than done.
| lostdog wrote:
| In industry you do see some decomposition into steps to solve
| problems, and there's lots of research papers that decompose
| too. It does work to solve problems, but decomposition also has
| major weaknesses.
|
| First, you have to come up with a good intermediate
| representation, and that's pretty difficult. Your suggestion of
| a 3d model is good, but has a lot of complex design choices.
| Should you use a mesh or voxel representation? What resolution
| will work? How do you train the upstream model? As your problem
| gets more and more complex, so does your intermediate
| representation, and the engineering to get the intermediate
| representation working correctly become prohibitively
| difficult.
|
| But we're not done yet! So now you've got an intermediate
| representation, a net that produces it, and a net that consumes
| it. Maybe your current results aren't good enough to publish,
| so you spend some time optimizing the upstream model. You take
| a few weeks and make massive improvements on the accuracy.
| Great! Right? But then you plug your improved upstream model
| into the system and get no change in overall performance. Turns
| out that your intermediate output got better, but in a way that
| doesn't matter to the task. Now you get to spend a month
| guessing how to tune your intermediate loss so a trained
| upstream model does improve end-to-end performance.
|
| So yeah, decomposing the problem is often what you need to do
| in practice, but there's a reason that many researchers are
| trying to work on more scalable end-to-end approaches. It's
| very very difficult to answer both "What intermediate
| representation carries the information to perform this task,"
| and "Which aspect of the intermediate representation is most
| important to optimize?"
|
| It's especially difficult in the area the blog post is about,
| "unstructured robotics." You can't focus your representation on
| certain types of objects. The representation needs to somehow
| capture disparate things like "The drill bit must go near the
| screw," "This is the squishy part," and "This is attached with
| a hinge." Now you need to program some type of model that can
| describe everything possible in the world, and no amount of OOP
| courses will help you.
|
| There is a 3rd way, which doesn't work quite yet: Do
| unsupervised/self-supervised training of the world model, with
| a representation that's just a bag of floats. Maybe you can
| learn a world model that's informative enough that you can plug
| any sort of goal in and get good results. It's still unclear
| whether this will beat out an end-to-end system, so there's
| more research to be done.
___________________________________________________________________
(page generated 2022-07-30 23:01 UTC)