[HN Gopher] 20B-parameter Alexa model sets new marks in few-shot...
___________________________________________________________________
20B-parameter Alexa model sets new marks in few-shot learning
Author : reckel
Score : 55 points
Date : 2022-08-02 18:52 UTC (4 hours ago)
(HTM) web link (www.amazon.science)
(TXT) w3m dump (www.amazon.science)
| zaroth wrote:
| 20 billion parameters and the UI for voice is still cringe level
| terrible.
|
| Or is it just me and I've turned into a get-off-my-lawn
| curmudgeon when it comes to audio interfaces?
|
| > _Find a reservation far from my work location in eight hours
| for 8 people at Union Auto Company._
|
| Said absolutely no one ever, right? I guess if this is what it's
| trained on it's no wonder.
| creeble wrote:
| _I_ can 't parse that sentence.
| jonathankoren wrote:
| This is the type of thing you learn to say when you're dealing
| with a slot filling algorithm that allows for
| overspecification. By putting it in one utterance, you avoid it
| saying coming back and saying "Where do you want a
| reservation?" "When do you want your reservations?" "How many
| people are in your party?"
| ctoth wrote:
| > We follow Hoffmann et al. (2022) and pre-train the model for
| roughly 1 Trillion tokens (longer than the 300B token updates of
| GPT-3).
|
| If I'm understanding the discussion of the Chinchilla paper
| correctly[0] then this should offer a significantly better boost
| than increasing the number of parameters would have. Also really
| cool that they make the model easy(ish) to run and play with!
|
| [0]:
| https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla...
| rafaelero wrote:
| Not sure how much scaling laws apply here, since this is a seq-
| to-seq model instead of a autoregressive causal model. It's
| interesting to see AlexaTM performing better than GPT-3 on
| SuperGLUE and SQuADv2, but it fails on Chain of Thought prompt,
| which is a bummer. So, is it because it's a different model or
| because it is positively leveraging multilingual tokens? I wish
| they compared this architecture to a classic GPT family model.
| mrlonglong wrote:
| Should we worry if it achieves sentience? Any reason why we
| shouldn't?
| transcriptase wrote:
| Question for those familiar with the backend of things like
| Alexa, Google Home, Siri:
|
| At what point can we say things like "turn off the bedroom light
| in 5 minutes" or "stop music and turn off all the lights"? Even
| something like "keep the lights on" in a motion sensor system is
| impossible it seems. Because to me they feel like low hanging
| fruit, and yet despite all the advances in machine learning and
| these systems being around for the better part of a decade,
| anything but the simplest single-task no-modifier commands result
| in a "sorry... I didn't understand that" or completely
| unpredictable results. Is there something inherently difficult
| about these types of queries?
| runnerup wrote:
| Also why doesn't Siri work at all in Honda Civics and Honda
| HR-V's when connected to car play and driving on the highway
| with no radio playing?
|
| Google assistant works fine, for the most part anyways.
| jeffbee wrote:
| Weird. Works fine in an Insight, which is in all respects a
| Civic Hybrid.
| jeffbee wrote:
| My benchmark for Siri will be when it learns to do "Siri, wake
| me up an hour earlier tomorrow". What it currently does it set
| a new alarm for 1 hour in the future.
| alphabetting wrote:
| I have google home and the first one worked. The 2nd did not
| but I don't think I'd ever say that bc I just have routines
| where I say "i'm leaving" or "i'm going to bed" and everything
| shuts off at once.
| Closi wrote:
| These are all subtly different problems I think, but in general
| most of these architectures currently assume there is a single
| intent.
|
| > stop music and turn off all the lights
|
| This is probably the easiest of the bunch because you are
| asking it to perform two distinct actions.
|
| > turn off the bedroom light in 5 minutes
|
| This is much more complex, because you are asking the
| application to setup some sort of workflow - after it
| understands what you want it to do, it then has to work out how
| to execute that, which will be utilising the device API's /
| services. This is a simple example, but there are lots of
| permutations of different actions here, for example you might
| want to say "turn off the sound system once this song finishes
| playing" which assumes that the assistant has the capability to
| then understand you want it to create a task specifically
| waiting for the trigger of a particular song finishing playing,
| and that it has the ability to setup that trigger.
|
| > "keep the lights on" in a motion sensor system
|
| Now this is where the orchestration gets tricky -
|
| The assistant has to:
|
| * Work out that the lights are being affected by a motion
| sensor system, which is likely outside it's own platform.
|
| * Work out that your intent is that you want the assistant to
| override that.
|
| * Understand how to connect to the platform in order to control
| it.
|
| * Work out what parameter it is supposed to alter to achieve
| this task.
|
| * Override the existing users settings, and presumably
| reinstate the settings after some portion of time.
| visarga wrote:
| Language models can generate code from text instructions. It
| just needs a training set to learn the target APIs. I expect
| in the next couple of years to see automated desktop
| operation (RPA) from text commands, generalising access over
| human interfaces.
|
| It's really a shame the good language models are not deployed
| as voice assistants. It would probably be expensive to offer
| and they don't have the scale necessary. Just to load one of
| these models you need a $100K computer.
| Closi wrote:
| It also depends what's the biggest priority - I would
| assume there is a bigger 'quick win' from becoming more
| reliable at single-intent actions from a market/customer
| experience perspective rather than pursuing highly complex
| multi-intent statements.
|
| >99% of commands will be single intent, and they probably
| work 80% of the time at the moment, so getting those to 99%
| will have a much bigger short-term impact than focussing on
| solving the 1% (with the added benfit that once you have
| solved the first case of getting single-intent right all
| the time, solving the second more complex queries will be
| easier as you will have built a more robust base).
| omega3 wrote:
| > This is much more complex, because you are asking the
| application to setup some sort of workflow - after it
| understands what you want it to do, it then has to work out
| how to execute that, which will be utilising the device API's
| / services.
|
| I don't see conceptually the difference between the first and
| the second example. You're still executing two distinct
| actions, first being the waiting for x amount of time?
| liquidwax wrote:
| My guess is that it's a very different (and difficult) problem
| to generalize that way. Interpreting intent and taking action
| are different aspects. Someone needs to write code to call a
| vendor's API to execute those actions and that's a super
| specialized action. Next step is probably instruct a CoPilot-
| like tool to do it.
| csnweb wrote:
| > turn off the bedroom light in 5 minutes
|
| This actually works already with Siri (and as mentioned in a
| sibling comment with google Home as well). I just tried that
| for fun a few days ago and was surprised that it actually
| worked.
| isatty wrote:
| It didn't work for me.
|
| "Turn off the lights in 5 minutes did but "turn off the
| floorstanding lamp in 5 minutes" did not
|
| Honestly that's more frustrating when it's not uniform and
| now I've to remember this weird behavior.
| jcoder wrote:
| It's sad that the only option is a voice assistant that must
| learn how to interpret my words through this slow, error-prone
| process. I would much rather have a pure speech-to-text option
| where I must learn the exact words to say to get a reliable
| result.
| kupopuffs wrote:
| Yeah. I wouldn't even mind learning weird syntax or grammar,
| like
|
| Thread.sleep(5 * 60 * 1000); light.off();
| xxpor wrote:
| As a complete outsider: has ML research just become a phallus
| measuring contest to see who can stuff the most parameters into a
| model? In other words, who can acquire the most Nvidia cards? The
| model size seems to always be the headline in stuff I see on HN.
| visarga wrote:
| This is a small model keeping up with the big guys. 20B
| parameters might fit in 2 beefy GPUs, that's a bargain compared
| to GPT-3.
| cuuupid wrote:
| +1, also this is a teacher model. The implications are huge
| here as AWS will likely spin this into an offering like they
| did with their other AI products. Building a model downstream
| of GPT-3 is difficult and usually yields suboptimal results;
| however 20b is small enough that it would be easy to finetune
| this on a smaller dataset for a specific task.
|
| You could then distill that model and end up with something
| that's a fraction of the size (6b parameters for example,
| just under 1/3, would fit on commercial GPUs like 3090s).
| There are some interesting examples of this with smaller
| models like BERT/BART or PEGASUS in Huggingface Transformer's
| seq2seq distillation examples.
| pjfin123 wrote:
| Yeah this is the opposite, they did impressively well with
| fewer parameters.
|
| In general larger models and more data has been an effective
| strategy for getting better performance but getting the right
| ratio is also important:
| https://www.deepmind.com/publications/an-empirical-
| analysis-...
| pjfin123 wrote:
| The most notable thing about this model is that they use fewer
| parameters (20 billion) than many of the other LLM which makes it
| less resource intensive to train and easier to run.
|
| They also use an encoder-decoder architecture, which is common
| for machine translation, unlike most large language models which
| are decoder-only.
|
| https://community.libretranslate.com/t/alexatm-a-20b-multili...
___________________________________________________________________
(page generated 2022-08-02 23:00 UTC)