[HN Gopher] Socratic Models - Composing Zero-Shot Multimodal Rea...
___________________________________________________________________
Socratic Models - Composing Zero-Shot Multimodal Reasoning with
Language
Author : parsadotsh
Score : 67 points
Date : 2022-04-10 16:51 UTC (6 hours ago)
(HTM) web link (socraticmodels.github.io)
(TXT) w3m dump (socraticmodels.github.io)
| mountainriver wrote:
| This is really awesome, multimodal is definitely where
| transformers are headed and holds the promise of solving a lot of
| the grounding issues we see with the current sota
| robbedpeter wrote:
| Elon's robots might actually work out, at least in software.
|
| This type of methodology, doing meta-cognitive programming by
| linking together different models, is awesome. They're
| constructing low resolution imitations of brains - gpt-3 and
| BERT and the like can do things that no individual model can
| achieve. A predicate logic layer can document and explain
| decision history, and the other modules start to resemble
| something like the subconscious mind.
| nynx wrote:
| This is super impressive. Transformers have consistently done
| better than almost anyone thought.
|
| I still hold the opinion that we're going to need to move to
| spiking neuron (SNN) models in the future to keep growing the
| networks. Spiking networks require lots of storage, but a lot,
| lot less compute. They also propagate additional information in
| the _timing_ of the spikes, not just the values. There are a lot
| of low-hanging fruit in SNNs and I think people are still trying
| to copy biological systems too much.
|
| Unfortunately, the main issue with SNNs is that no one has
| figured out a way to train them as effectively as ANNs.
| vagabund wrote:
| The comments of every ML paper posted on this site are
| dominated by people either baselessly discounting the results
| as a party trick or illusion, or shoehorning in their
| conjecture about what approach the field is overlooking.
|
| As someone just trying to learn more about the implications of
| new research, I find myself resorting to /r/machinelearning, or
| even twitter threads, to get timely and informed discussions.
| That's a shame, given what HN sets out to be.
| ceeplusplus wrote:
| As a community grows it attracts people who don't have the
| same background that drew the original members of the
| community together, so it becomes inevitable to see this kind
| of layman commentary. I've seen it happen to r/hardware which
| has been taken over by gamers with no CS background and AMD
| shareholders when it used to have a lot of knowledgable
| people commenting.
| nynx wrote:
| I don't claim to be an expert, but I actually do
| undergraduate neuromorphic computing research. So, I don't
| know much, but I do know a little about what I'm talking
| about.
| mountainriver wrote:
| As an ML engineer I found the comment insightful. I agree HN
| takes a critical approach to list ML but that's largely
| because there's been so much snake oil with it
| nynx wrote:
| I'm certainly not discounting the results and I don't see
| anything wrong with suggesting what I think would generally
| be a good path to look at in the future.
| vagabund wrote:
| It's not wrong per se, and I'm obviously in no place to
| police the discussion, but it's only tangentially related
| to the post and often clouds out what would be a more
| pointed deliberation over this research.
|
| Maybe I'm expecting too much of HN, but I've seen these
| same two top level comments under myriad ML posts.
|
| Sorry for the meta-discussion that's gotten us further away
| from this really remarkable paper.
| nynx wrote:
| Point taken, I do agree with you that it's probably best
| to stay on topic in these kinds of posts.
| gwern wrote:
| Don't forget /r/mlscaling!
| derefr wrote:
| > a lot of storage
|
| Is this fundamental, or just a problem with mapping these
| models to our current serially-bottlenecked compute
| architectures? Could a move to "hyperconverged infrastructure
| in-the-small" -- striping DRAM or NVMe and tiny RISC cores
| together on a die, where each CPU gets its own storage (or, you
| might say, where each small cluster of storage cells has its
| own tiny CPU attached), such that one stick has millions of
| independent+concurrent [+slow+memory-constrained] processors --
| resolve these difficulties?
| nynx wrote:
| They require roughly the same amount of storage as modern ANN
| networks except that "neurons/synapses" may have some
| additional state that needs to be stored. Compared to the
| compute they require in relation to the compute needed for
| large-scale ANNs though, the storage is a lot.
| arjvik wrote:
| We've come to the consensus that large language models are just
| stochastic parrots... What makes us think that we can achieve a
| higher level of intelligence by putting them in conversation?
|
| I think the next step in NLP will be a drastic innovation on
| today's learning model.
| gjm11 wrote:
| "Stochastic parrots" -- have you seen, e.g., the examples in
| the PaLM paper of how it does on "chained inference" tasks? I
| don't see how you can classify that as mere parroting.
| robbedpeter wrote:
| There is no such consensus. Transformers navigate problem
| spaces with various mechanisms that include recursion, and
| multi-pass inference means the depth can be arbitrary. This
| means that models pick up on the functions that generate
| answers, not simple statistical relationships you see in Markov
| chains.
|
| "Stochastic parrot" is a derogatory term and I've never seen
| anyone who actually understands the technology use that phrase
| unironically. If anything, it's a shibboleth for bias or
| ignorance.
| mountainriver wrote:
| We have not come to that consensus and large language models
| display really interesting capabilities like few shot learning,
| which before we thought would require a widely different
| architecture
| moconnor wrote:
| This is not the consensus among ML researchers. Transformers
| are showing strong generalisation[1] and their performance
| continues to surprise us as they scale[2].
|
| The Socratic paper is not about "higher intelligence", it's
| about demonstrating useful behaviour purely by connecting
| several large models via language.
|
| [1] https://arxiv.org/abs/2201.02177
|
| [2] https://arxiv.org/abs/2204.02311
| exdsq wrote:
| I asked something similar previously on HN and a researcher in
| the field said that scaling size/computation actually does keep
| showing significant improvements
___________________________________________________________________
(page generated 2022-04-10 23:00 UTC)