[HN Gopher] The convolution empire strikes back
___________________________________________________________________
The convolution empire strikes back
Author : che_shr_cat
Score : 63 points
Date : 2023-10-27 19:39 UTC (3 hours ago)
(HTM) web link (gonzoml.substack.com)
(TXT) w3m dump (gonzoml.substack.com)
| adamnemecek wrote:
| All machine learning is just convolution in the context of Hopf
| algebra convolution.
| mensetmanusman wrote:
| Is this an intellectual leap aiming to make the field more
| cohesive, like the quest for unifying theories in physics?
| adamnemecek wrote:
| That is one of the goals, yes. In addition, it seems like you
| get neural architecture search (architecture is optimized),
| faster training, inference and interpretability. I'm working
| it out as we speak.
|
| Ironically, convolution provides some unification in physics
| too, e.g. renormalization is a convolution.
| dpflan wrote:
| Interesting, please do elaborate...
| cwillu wrote:
| I read it as a riff on "monads are just monoids in the
| category of endofunctors", but maybe that wasn't intended.
| uoaei wrote:
| It kind of is. The commenter has been working on this
| formalism for a year or more. I'm sure he will come by with
| his link for the Discord channel where he discusses with
| and finds collaborators soon.
| adamnemecek wrote:
| I has been less than 9 months. But yeah there is a
| discord if you want to follow progress
| https://discord.cofunctional.ai.
| visarga wrote:
| My theory is that architecture doesn't matter - convolutional,
| transformer or recurrent, as long as you can efficiently train
| models of the same size, what counts is the dataset.
|
| Similarly, humans achieve about the same results when they have
| the same training. Small variations. What matters is not the
| brain but the education they get.
|
| Of course I am exaggerating a bit, just saying there are a
| multitude of architectures of brain and neural nets with similar
| abilities, and the differentiating factor is the data not the
| model.
|
| For years we have seen hundreds of papers trying to propose sub-
| quadratic attention. They all failed to get traction, big labs
| still use almost vanilla transformer. At some point a paper
| declared "mixing is all you need" (MLP-Mixers) to replace
| "attention is all you need". Just mixing, the optimiser adapts to
| what it gets.
|
| If you think about it, maybe language creates a virtual layer
| where language operations are performed. And this works similarly
| in humans and AIs. That's why the architecture doesn't matter,
| because it is running the language-OS on top. Similarly for
| vision.
|
| I place 90% the merits of AI on language and 10% on the model
| architecture. Finding intelligence was inevitable, it was hiding
| in language, that's how we get to be intelligent as well. A human
| raised without language is even worse than a primitive.
| Intelligence is encoded in software, not hardware. Our language
| software has more breadth and depth than any one of us can create
| or contain.
| kookamamie wrote:
| Dataset counts, but also the number of total parameters in the
| network, i.e. capacity.
| visarga wrote:
| Agreed, it's in my first phrase "as long as you can
| efficiently train models of the same size, what counts is the
| dataset". But useful sizes are just a few. 7, 13, 35, 70,
| 120B - because they are targeted to various families of GPUs.
| A 2T model I can't run or too expensive to use on APIs is of
| no use. Not just dataset size, but data quality matters just
| as much, and diversity.
|
| I believe LLMs will train mostly on synthetic data engineered
| to have extreme diversity and very high quality. This kind of
| data confers 5x gains in efficiency as demonstrated by
| Microsoft in the Phi-1.5 paper.
| rdedev wrote:
| I wish someone performed a large scale experiment to evaluate
| all these alternate architectures. I kind of feel that they get
| drowned out by new sota results from openai and others. What I
| wish is something that tries to see if emergent behaviors pop
| up with enough data and parameters.
|
| Maybe vision is special enough that convnets and approache
| transformer level performance or it could be generalized to any
| modality. I haven't read enough papers to know if someone has
| already done something like this but everywhere I look on the
| application side of things, vanilla transformers seems to be
| dominating
| gradascent wrote:
| This is great, but what is a possible use-case of these massive
| classifier models? I'm guessing they won't be running at the
| edge, which precludes them from real-time applications like self-
| driving cars, smartphones, or military. So then what? Facial
| recognition for police/governments or targeted advertisement
| based on your Instagram/Google photos? I'm genuinely curious.
| constantly wrote:
| Hard to classify items. Subclasses of subclasses that have
| little to differentiate them and possibly few pixels or lots of
| noise in the data.
| currymj wrote:
| 1) it's basic research, 2) you can always chop off the last
| layer and use the embeddings, which I guess might be useful for
| something
| pjs_ wrote:
| https://external-preview.redd.it/du7KQXLvBmVqc5G0T3tIEbWsYn8...
___________________________________________________________________
(page generated 2023-10-27 23:00 UTC)