[HN Gopher] Sequential modeling enables scalable learning for l...
___________________________________________________________________
Sequential modeling enables scalable learning for large vision
models
Author : og_kalu
Score : 97 points
Date : 2023-12-05 14:14 UTC (8 hours ago)
(HTM) web link (yutongbai.com)
(TXT) w3m dump (yutongbai.com)
| og_kalu wrote:
| I'd simply been thinking about Large Vision Models in an
| annotation sense. QandA, Captioning...That sort of thing.
|
| Even though it makes so much sense, I never thought about it like
| this. Inpainting, Object Detection, Rotation, Lighting,
| Segmentation, Edge Detection, Pose Estimation, Surface Normal,
| Colorization and much more achieved by a single model.
|
| I believe this and Codi-2(https://codi-2.github.io/) offer a
| glimpse of the future of Large Multimodal Models.
| kaibee wrote:
| So this is artificial general intelligence, right?
| ben_w wrote:
| The problem with the phrase "artificial general intelligence"
| is that everyone is arguing about the definition of all three
| words, and has a different threshold for the boolean
| pass/fail boundary.
| og_kalu wrote:
| General Intelligence is a gradient, not a hard on/off.
| Obviously, these are machines so artificial. They're
| certainly not narrow in scope or abilities so general. They
| perform tasks we consider intelligent. So...sure. Like ENIAC,
| i imagine we'll build or have built agi well before everyone
| can agree it is so.
| BoiledCabbage wrote:
| One of the things sci-fi really seemed to get right is that we
| will have AGI _long_ before we 'll have agreement that it is
| actually AGI.
|
| People will keep finding some small case or reason why not to
| call it AGI. And then finally once that last case is knocked
| down and we have agreement on a definition, we'll realize we
| crossed that threshold a "long" while back.
|
| And I'm not saying we have AGI now, just that it's now clear to
| me how this process will play out.
|
| (Where "long" in AI development timeliness probably doesn't
| mean the same thing "long" meant even in the 2010s.)
| mickdarling wrote:
| Has anyone tried using transformers on weather forecasts yet?
| ben_w wrote:
| Yes, several.
|
| * https://arxiv.org/pdf/2106.14742.pdf
|
| *
| https://rmets.onlinelibrary.wiley.com/doi/full/10.1002/met.2...
|
| * https://ieeexplore.ieee.org/document/9671442
|
| Also non-transformer models, because of the scaling complexity
| on input: https://arxiv.org/pdf/2212.12794.pdf
| toxik wrote:
| Would have been neat to see some animations since it is video
| frames in many cases.
| mpeg wrote:
| The "add a frame to a video" use-cases are probably the least
| exciting here, the image annotation capabilities seem to me the
| bigger deal.
| cs702 wrote:
| Upon reading this, my immediate thought is:
|
| It's only a matter of time before we have robots powered by large
| models pretrained to "predict the next token" across a bunch of
| different _sensory modalities_ -- sight, sound, smell, touch,
| taste, etc. in a variety of artificial and natural settings,
| including social-interaction settings. Learning to read, learning
| to talk, learning to interact with the physical world, and so on
| -- all of it could very well be built upon the simple idea of
| learning to "predict the next token."
|
| We live in interesting times.
| catchnear4321 wrote:
| across a bunch of different gradients. senses will be the next
| step, humans grok that good and easy. once multiple gradients
| are considered, non-sensory gradients are going to be next.
|
| this is all a bunch of gobbledygook until it isn't.
| raidicy wrote:
| These sentiments are pretty close to my own. I read a paper
| that claimed that llms are General Pattern Machines and could
| be used to complete small games in gym environments. It seems
| to me that if these things really are General Pattern Machines
| all we have to do is figure out a way to represent any data as
| a pattern and try and predict the next step in the pattern
| right?
|
| The multi token[1] project which allows you to take any type of
| data and turn it into a token it's pretty interesting and seems
| like it's going in this direction.
|
| I would really like to see a framework where you can take any
| modality of any type turn it into a series of tokens and just
| cram it into a language model and effectively turning into a
| multimodal model with almost no effort.
|
| [0]https://general-pattern-machines.github.io/ [1]
| https://github.com/sshh12/multi_token
| kelseyfrog wrote:
| If they're looking for a name, The Glass Bead Game isn't a
| bad start.
| og_kalu wrote:
| Oh yeah. The exciting thing is that is pretty low hanging fruit
| (at least for the common modalities)
|
| What would a Large Language Model that can manipulate audio-
| visual data as expertly as it can manipulate text look like ?
| This is beyond just Text to Speech or Captioning and Image Q&A.
| I think we'll find out very soon.
| gwern wrote:
| That pretty much already exists. Look at DeepMind's Gato: all
| tasks and modalities are simply sequences of tokens, everything
| from 'predict English text' to 'predict VAE image token
| sequences' to 'predict robotic arms commands and movements
| IRL'.
| jerpint wrote:
| GATO was heavily biased towards tasks in a simple simulator,
| but didn't exhibit emergent behaviours
| cs702 wrote:
| Ah, yes, I'd forgotten about Gato. Thank you for reminding
| me. There's so much research activity that the Gato paper
| feels as if it was published _eons_ ago. There 's only so
| much I can retain in my puny little human mind at once!
|
| In any case, I'm not sure Gato qualifies as a "large" model
| with 1.2B parameters -- it's kinda right below the threshold
| at which it could or would start exhibiting emergent
| behaviors. Maybe a new Gato with 10's or 100's of billions of
| parameters operating in the physical world?
| gwern wrote:
| Yes. Gato was a good proof-of-concept that the Decision
| Transformer approach of 'just model literally everything as
| a sequence' scales well and doesn't exhibit some sort of
| catastrophic interference and can successfully imitation-
| learn from all the expert datasets, and a bit of transfer.
| But they need to push it at least another OOM or 2 to show
| major transfer, some emergences, and ideally do both from-
| scratch learning and additional learning on many new tasks.
| We continue to wait. :(
|
| I hope it didn't all get rolled up into Gemini and become a
| state secret they'll never publish on again, or lost in the
| shuffle in the chaos of the DeepMind/Brain
| merger/liquidation.
| ldjkfkdsjnv wrote:
| Yeah its going to happen. We will see and speak with
| intelligent machines
| jefft255 wrote:
| I did that a few years back: https://arxiv.org/abs/2011.11751
| cs702 wrote:
| With a _large_ model? How many parameters?
|
| See my other comment here:
|
| https://news.ycombinator.com/item?id=38536178
| dottedmag wrote:
| One can argue there are ~8bln of these robots already roaming
| the Earth.
| ww520 wrote:
| Let me guess. Someone will train a LLM on the stock prices to
| predict the stock market. It might work as well as how human
| has predicted the market.
| astrange wrote:
| Past performance is not a guarantee of future results.
| Kelkonosemmel wrote:
| Yepp. I think a robot who can easily follow your commands
| should be doable in 10-20 years.
|
| I plan to buy a farm when I have the money and I'm pretty sure
| while I will/want to do a lot of hands on renovation and
| sculpting (park, etc) long term some type of robot should be
| good and affordable enough to take over when I'm too old.
| agarsev wrote:
| I'm reading my thesis next tuesday, and the advances in AI the
| last couple of years have already made most of it obsolete :/
|
| Anyway I'm excited and looking forward to the code and models to
| be released, hopefully I can use them for my research! I think
| it's easy to overlook how revolutionary the transformer "way" of
| doing things has been, and the fact that so many different tasks
| can be reformulated in a "language" way I believe hints at
| something deeper about how the universe, our minds and language
| work.
| bambax wrote:
| Do captchas have a future? It seems inevitable that AI will beat
| humans on captcha real soon (if not already). What's next?
| jksk61 wrote:
| I don't know about you but I use chat gpt to solve captcha
| because I can't.
| jazzyjackson wrote:
| I should patent this idea but here it goes anyway: in the
| future Captcha's will consist of requesting you make an
| antisemitic or misogynist remark to prove you are human, since
| the bots will be held to higher moral standards than man.
| cooper_ganglia wrote:
| This is a completely silly comment, but I will admit that it
| made me chuckle just because of how incredibly random and
| shocking it was, especially here on HN, lol
| PeterisP wrote:
| Obviously, not all bots will be held to such standards -
| these are the standards which apply to western corporations
| who offer free bots for PR and marketing purposes, since
| 'politically incorrect' behavior stunts that goal, however,
| anyone keeping their own bot (e.g. a spammer wanting to solve
| millions of captchas) has no problems running an 'uncensored'
| bot - right now you can get reasonably large open source
| models without the RLHF pretraining and thus also no attempt
| at 'moral standards' other than those _you_ choose to put in
| when finetuning the bot for your purposes.
|
| It's relatively trivial to flip the sign on that training
| data and have a bot that will instead refuse to make
| antifascist or feminist arguments; if someone wants a
| Hitlerbot to ghostwrite "My struggle" for them, there's
| nothing that could prevent them from finetuning such a model
| from one of the publicly available models; there is no one
| that can enforce any 'moral standards' on the bots other than
| their creators.
| swfsql wrote:
| Considering this, the only remaining captcha will be hard cold
| money. Paywalls.
| mola wrote:
| So this can solve visual analogies from iq tests?
| jerpint wrote:
| I would have loved to see videos on the blog post of completions
| dwaltrip wrote:
| Hypothesis: Intelligence _is_ prediction?
| red75prime wrote:
| Yep, prediction of what needs to be done.
| iandanforth wrote:
| You might enjoy 'On Intelligence' by Jeff Hawkins to learn more
| about this hypothesis. (It's an older book / theory at this
| point but still worth reading IMHO)
| fancyfredbot wrote:
| Next step, accept user input from VR glasses and we've basically
| got a holodeck.
| iandanforth wrote:
| If this paper is coming out of BAIR at a max 3B parameter model I
| suspect we'll quickly see much larger models from the industrial
| players. Hopefully Mistral takes an interest and releases an OSI
| licensed model.
___________________________________________________________________
(page generated 2023-12-05 23:00 UTC)