[HN Gopher] Meta-Transformer: A unified framework for multimodal...
___________________________________________________________________
Meta-Transformer: A unified framework for multimodal learning
Author : ulrikhansen54
Score : 88 points
Date : 2023-07-24 17:33 UTC (5 hours ago)
(HTM) web link (kxgong.github.io)
(TXT) w3m dump (kxgong.github.io)
| ImHereToVote wrote:
| This seems like a step in the dangerous direction.
| sebzim4500 wrote:
| I am also concerned about existential threats from AI, but part
| of the problem is that I have no idea which research directions
| help and which ones hurt.
| faktory wrote:
| Why?
| FrustratedMonky wrote:
| Because up till now many people that discount AI threats base
| that discount on a few assumptions like 'its just a parrot',
| 'it doesn't have any drives', 'it doesn't really understand',
| 'it isn't conscious', etc... ad-Infinium.
|
| But the more different technology is plugged together to
| start resembling a brain, like a visual cortex, a speech
| center, motor controls, etc...
|
| At some point the distinction between carbon based life and
| silicon becomes meaningless vanishes. All the arguments or
| proofs that humans are conscious would equally prove AI is
| conscious. Or that neither truly are. Proving an AI is not
| conscious would also prove humans aren't.
|
| And of course, Terminators.
| valine wrote:
| It'll be ok. The technology for "dangerous" AI doesn't actually
| exist. The near term risks we face from AI are constrained to
| the realms of spam and privacy. World ending super-bots are
| science fiction.
| danielbln wrote:
| Before superintelligence scifi stuff we'll probably get some
| sort of superworm. Some rogue autonomous agent network that
| is improving itself via some framework like SKILL[1] going
| around 0-day'ing systems left and right and wreaking havoc.
|
| [1] https://arxiv.org/abs/2010.11944
| naasking wrote:
| WormGPT already exists. These will only become more
| dangerous as the tech evolves.
| flangola7 wrote:
| Blind denial. No argument or evidence presented, merely bold
| statements made with the expectation they be taken without
| question.
|
| Flying humans was science fiction 120 years ago. A single
| bomb able to destroy an entire city was science fiction 80
| years ago. A machine that can complete more mathematical
| calculations in one minute than all human manual computation
| in history was science fiction 60 years ago. EUV
| photolithography capable of creating molecule-sized
| transistors was science fiction 30 years ago. A computer that
| can create visual art and talk to you in plain English was
| science fiction 2 years ago. A computer that can clone your
| voice and mannerisms was science fiction 1 year ago.
|
| Science fiction has a way of becoming non-fiction, often
| within the span of a generation or less.
| naasking wrote:
| > It'll be ok. The technology for "dangerous" AI doesn't
| actually exist.
|
| Nobody's worried about the tech that exists.
|
| > The near term risks we face from AI are constrained to the
| realms of spam and privacy.
|
| Define "near term".
|
| > World ending super-bots are science fiction.
|
| Science fiction has become science fact before. Where's the
| knockdown argument that won't happen in this case?
| valine wrote:
| Its not feasible to worry about the implications of every
| imaginary technology. Nuclear chain reactions were first
| theorized to exist a decade before the first bomb dropped.
| Should scientists have stopped exploring quantum mechanics
| in the 30s? Fear of the unknown shouldn't be allowed to
| stop scientific progress.
|
| We can deal with the implications of dangerous AI if and
| when it becomes a problem.
| [deleted]
| FrustratedMonky wrote:
| >> "The technology for "dangerous" AI doesn't actually exist"
|
| What? Did you not see the Netflix documentary on AI for
| military use? They literally have AI's that can beat fighter
| pilots in dog fighting.
|
| Just because it isn't walking around having coffee and
| chatting you up, doesn't mean it isn't already very advanced
| and deadly.
| valine wrote:
| Dog fighting AI isn't going to end the world. When people
| talk about the "risks" associated with AI they're talking
| about an AI that spirals out of control and destroys
| civilization. Something something infinite paper clip
| optimizer.
|
| It's scifi themed end-times cosplay.
| FrustratedMonky wrote:
| I get that.
|
| But the post was just saying it seems 'dangerous'. It is
| already 'dangerous'.
|
| Yes, it will probably become even 'more dangerous'.
|
| I'd disagree that many people agree on common definitions
| of risk. Some people think autonomous drones that can
| beat humans in a dogfight is already too far, others are
| holding out for some paper clip optimizers before getting
| worried.
|
| You included 'world ending' as the definition of risk,
| others have lower bar than that.
| kristjank wrote:
| Yo dawg, we heard you like transformers so we put transformers on
| your transformers so you can train while you train. The spider
| web graph shows metatransformers performing worse to their
| counterparts in almost all fields. Is there a reason I should not
| believe that an expert model will always outperform a general
| purpose one, even if it's a metatransformer?
| sebzim4500 wrote:
| >Is there a reason I should not believe that an expert model
| will always outperform a general purpose one, even if it's a
| metatransformer?
|
| If a general purpose model beats the specialized one, you could
| almost certainly distill the general purpose one into a better
| specialized one.
| throwawayadvsec wrote:
| I'm pretty sure it's a relatively small model?
|
| If you had the same quantity of text data as GPT-4 + comparable
| quantity of data for other domains, it could probably learn
| transferable skills across those domains.
|
| But it would take a huge amount of processing power that is
| probably not attainable today
| nh23423fefe wrote:
| performance is bounded and so outperformance will approach
| episilon?
| danielbln wrote:
| I mean, there is a somewhat unique value proposition of a
| multimodal framework like this meta transfirmer. Its goal isn't
| necessarily to beat expert models in their own game, but to
| provide a unified framework for processing diverse modalities
| of data.
|
| I think it aims to leverage the cross-modal relationships and
| unified learning, which might not be possible with expert
| models designed for only a single modality.
|
| Even if it performs slightly worse on some tasks, the ability
| to handle multiple modalities within a single framework is an
| pretty sweet advantage in scenarios where data from various
| sources need to be processed simultaneously, and patterns
| across modalities need to be captured somehow.
|
| A general-purpose model could also be a more cost-effective
| solution in some cases, ensemble experts are difficult to scale
| and parallelize.
| bick_nyers wrote:
| Yo dawg, we just need to figure out what x converges to as you
| apply transformer() infinite times and then finally attention
| will no longer be all you need:
|
| transformer(transformer(transformer( ... x ... ))) = ?
| AndrewKemendo wrote:
| >an expert model will always outperform a general purpose one,
| even if it's a metatransformer
|
| It's an interesting question as it begs questions of conceptual
| "boundaries."
|
| The sense-plan-do process requires a search and filter process
| for task switching, assuming an agent can do more than one
| thing.
|
| So assuming you have a robotic/autonomous agent that is a
| collection of systems (locomotion, dexterous gripper, visual
| perception, etc...), if each system could be represented as an
| "expert module", say for example the dexterous manipulator,
| then so long as a discriminator can appropriately switch states
| using the sensor/system inputs, then it's conceptually possible
| that there is a canonical "expert module" that everyone uses
| and therefore "general purpose" would apply to the agent as a
| whole while expert model would apply to the dexterous
| manipulator.
|
| You can walk that reasoning up the abstraction layers then to
| conclude that (as usual with these turtle stacks) the
| distinctions come as each sub system/module specializes more
| granularly for the environment they operate in.
|
| I think that it's probably forever and always true that any
| system designed to explore/exploit a bounded environment with
| comprehensive observations, will always outperform a system
| that is required to adapt it's sense-plan-do components to the
| bounded environment without similar observations.
|
| A system would either have to generate different observations
| than the native agent, or change the boundaries of the
| environment in a way that is unavailable to the native agent in
| order to outperform it.
| dorkusBdork wrote:
| [dead]
| Oras wrote:
| According to the website, the model can then fine-tuned for
| certain tasks such as image classification.
|
| 1. How does the multi-model help here in improving the accuracy
| of image classification when training is combined from text,
| images, and audio?
|
| 2. How about the speed? I would imagine a model with text, audio
| and image data would be larger compared to text-only models?
| orwin wrote:
| Yeah, that's where I thought it would go shortly after I tried
| GPT-4 from openAI. We're clearly at the transformer limits imho
| (comparing the effectiveness between 3.5 and 4, and the number of
| parameter in each model is why I think we reached a soft cap).
|
| So since it'll be hard to go deeper, going broader by interlacing
| different model types might be a way to pierce through.
| whimsicalism wrote:
| > We're clearly at the transformer limits imho
|
| GPT-4 did not scale up substantially in depth, going from 175 b
| to 220 b per transformer.
| CSMastermind wrote:
| Wouldn't making the model multimodal require scaling the
| models significantly?
|
| Or is the idea to keep the network the same size and trade
| off some of its nodes for image, video, etc. data?
|
| If so has anyone shown that doing so results in better
| overall performance?
|
| My lay-observation is that GPT-4 seems to be on the border of
| usability for most applications so if nothing is gained by
| simply changing the input data type as opposed to expanding
| the model then it feels like it won't be of much use yet.
|
| Also apologies if I'm not making sense, I'm almost certainly
| not using to correct technical terms to articulate what I'm
| thinking.
| whimsicalism wrote:
| > Wouldn't making the model multimodal require scaling the
| models significantly?
|
| Just width if that makes sense. Basically, you add another
| encoder model but you are not actually increasing the depth
| that much.
| FrustratedMonky wrote:
| Just few more steps like this, put it in a robot body, and Voila
| , we have start of the first AI wars. How many centuries after
| this does the Butlerian Jihad start, lead by John Conner, of
| course?
| ccheney wrote:
| We need to start ingesting raw scientific data through these
| models and see what it comes up with. What could these models
| identify by parsing through raw JWST or Hubble data? Or training
| against every published scientific paper? Is anyone doing this
| sort of thing already?
| danielbln wrote:
| Meta's Galactica was an attempt to train an LLM predominantly
| on scientific papers, articles and so on. It failed pretty
| spectacularly but Galactica 2, if that's ever a things, might
| rectify that.
| RC_ITR wrote:
| GP likely means training transformers on raw data (similar to
| protein folding transformers) to find patterns that humans
| cannot (due to lack of context, bias, or whatever).
|
| Problem with the assumption though is that transformers are
| good at identifying and replicating patterns given a set of
| rules (i.e. how proteins fold and misfold depending on the
| environment).
|
| Hubble data isn't so much "we know the rules but not their
| interactions" as much as "we don't really know the full set
| of rules," so that particular example probably wouldn't be
| that fruitful.
|
| In general, biology (where we understand the basic rules but
| not the complex ways they are combined) is the most fertile
| ground for transformer driven research.
___________________________________________________________________
(page generated 2023-07-24 23:01 UTC)