[HN Gopher] Scaling Vision with Sparse Mixture of Experts
___________________________________________________________________
Scaling Vision with Sparse Mixture of Experts
Author : panarky
Score : 28 points
Date : 2022-01-13 18:54 UTC (4 hours ago)
(HTM) web link (ai.googleblog.com)
(TXT) w3m dump (ai.googleblog.com)
| fundamental wrote:
| Perhaps the described patch based routing to experts isn't a
| problem in practice, but at first glance it does seem to discard
| more spatial information than you'd like as well as introducing
| more image boundaries than would be ideal. You could argue that
| the former is a known issue with many DNN architectures, though
| if the intent is to enable larger scale generalization it seems
| like this paper might be trading away more information in the
| source material for speed than would be desired. AFAIK the
| shuffling would be less of an issue in textual models than image
| processing tasks. As per the boundaries, I guess there could be
| padding in play, though I suspect that the resulting network is
| going to have higher sensitivities to shifts up/down or
| left/right by a few pixels.
|
| Even with those issues I'd imagine there could be some nice
| benefits and the authors are correct (IMO) for leaning on the
| areas of conditional execution and routing as it allows for the
| network to specialize on a given subdomain while being
| computationally efficient. We'll have to see where subsequent
| work takes this approach.
___________________________________________________________________
(page generated 2022-01-13 23:01 UTC)