[HN Gopher] Google's Pathways Language Model and Chain-of-Thought
___________________________________________________________________
Google's Pathways Language Model and Chain-of-Thought
Author : vackosar
Score : 59 points
Date : 2022-04-18 15:43 UTC (7 hours ago)
(HTM) web link (vaclavkosar.com)
(TXT) w3m dump (vaclavkosar.com)
| phoe18 wrote:
| The article quotes the cost as roughly 10B$ in the first
| paragraph. Likely a typo? They quote 10M$ in a later paragraph.
| vackosar wrote:
| Yes, of course, thanks!
| azinman2 wrote:
| Ya I was like there's no way google spend 10B on this.
| vackosar wrote:
| Correction!! The model costed around 10M not 10B! Thanks for
| raising that. Mistake during copying from the second slide :(
| PaulHoule wrote:
| I've talked about structural deficiencies in earlier language
| models, this one seems to be doing something about them.
| vackosar wrote:
| Sounds interesting! Would you link to that or describe them
| here? Thanks!
| PaulHoule wrote:
| A very simple one is "can you write a program that might
| never terminate?"
|
| If a neural network does a fixed amount of computation and
| that is that it is never going to be able to do things that
| require a program that may not terminate.
|
| There are numerous results of theoretical computer science
| that apply just as well to neural networks and other
| algorithms even though people seem to forget it.
|
| Another is "can an error discovered in late stage processing
| be fed back to an early stage and be repaired?" That's
| important if you are parsing a sentence like
| Squad helps dog bite victim.
|
| It was funny because I saw Geoff Hinton give a talk in 2005,
| before he got super-famous, and he was talking about the idea
| that led to deep networks and he had a criticism of
| "blackboard" systems and other architectures that produced
| layered representations (say the radar of an anti-aircraft
| system that is going to start with raw signals, turn those
| into a set of 'blips', coalesce the 'blips' into tracks,
| interpret the tracks as aircraft, etc.)
|
| Hinton said that you should build the whole system in an
| integrated manner and train the whole thing working end-to-
| end and I thought "what a neat idea" but also "there is no
| way this would work for the systems I'm building because it
| doesn't have an answer for correcting itself.
| cygaril wrote:
| You're assuming here that there are discrete stages that do
| different things. I think a better way to conceptualise
| these deepnets is that they're doing exactly what you want
| - each layer is "correcting" the mistakes of the previous
| layer.
| PaulHoule wrote:
| Most "deep" networks are organized into layers and
| information flows in a particular direction although it
| doesn't have to be that way. Hinton wasn't saying we
| shouldn't have layers but that we should train the layers
| together rather than as black boxes that work in
| isolation.
|
| Also, when people talk about solving problems they talk
| about layers, layers play a big role in the conceptual
| models people have for how they do tasks even if they
| don't really do them that way.
|
| For instance in that ambiguous sentence somebody might
| say it hinges on whether or not you think "bite" is a
| verb or a noun.
|
| (Every concept in linguistics is suspect, if only because
| linguistics has proven to have little value for
| developing systems that understand language. For instance
| I'd say a "word" doesn't exist because there are subword
| objects that depend like a word "non-" and phrases that
| behave like a word (e.g. "dog bite" fills the same slot
| as "bite"))
|
| Another ambiguous example is this notorious picture
|
| https://www.livescience.com/63645-optical-illusion-young-
| old...
|
| which most people experience as "flapping" between two
| states. Since you only see one at a time there is some
| kind of inhibition between the two states. Who knows how
| people really see things, but if I'm going to talk about
| features I'm going to say that one part is the nose of
| one of the ladies or the chin of the other lady.
|
| Deep networks as we know it have nothing like that.
| space_fountain wrote:
| I'm by no means an expert, but a lot of choices machine
| learning algorithms make are more about training
| parallelization than anything. In many ways it feels like
| something like a recursive neural network or some
| architecture even more weird should be better for language,
| but in practice it's harder to train an architecture that
| demands each new output depend on the one before.
| Introducing dependencies on prier output typically kills
| parallelization. Obviously this is less of a problem for
| say a brain that has years of training time, but more of
| problem if you want to train one up in much less time using
| compute that can't do sequential things very quickly
| simulate-me wrote:
| The amount of capital needed to train these high-quality models
| is eye watering (not to mention the costs needed to acquire the
| data). Does anyone know of any well capitalized startups
| exploring this space?
| lumost wrote:
| OpenAI would be the best example. However these large language
| models also have limited business value _today_ , making an
| startup a speculative bet that the team will beat
| Google/FB/AI/Academics at making a language model _and_ find a
| viable business model for the resulting model.
|
| I'd take one of those bets or the other, both are tough to pull
| off. Considering that the first task of such a startup would be
| to hand ~100-500MM to a hardware or cloud vendor I'd be
| hesitant to invest as an investor.
| visarga wrote:
| It costs less than 10M to train. Why hand so much to hardware
| or cloud? Soon enough there will be open source GPT-3's, at
| least two are in training as we speak (BigScience and
| EleutherAI).
|
| > these large language models also have limited business
| value today
|
| The Instruct version of GPT-3 has become very easy to steer
| with just a task description. It can do so many tasks so well
| it's crazy. Try some interactions with the beta API.
|
| I believe GPT-3 is already above average human level at
| cognitive tasks that fit in a 4000 token window. In 2-3 years
| I think all developers will have to adapt to the new status
| quo.
| nawgz wrote:
| > I believe GPT-3 is already above average human level at
| cognitive tasks that fit in a 4000 token window.
|
| How can you possibly make a claim like this without like 80
| links justifying it? The claim is fuzzy and absurd, my
| least favorite combo
| visarga wrote:
| Gut feeling based on playing with it. Here's an example:
|
| > Colorless green ideas sleep furiously, and other
| grammatical nonsense by Noam Chomsky
|
| He was a man without a country, A linguist without a
| language, A mind without a thought, A dream without a
| dreamer. He was lost in a world of words, A world where
| ideas slept furiously, And grammar was a never-ending
| nightmare.
|
| But he persevered, For he knew that language was the key
| to understanding the world. And so he continued to study,
| To learn all that he could, In the hopes that one day, He
| would find his way home.
| nawgz wrote:
| > Gut feeling based on playing with it.
|
| Ok, your phrasing made it sound like some article or
| material had convinced you of this opinion on my first
| reading, now I understand.
|
| This is kind of my point about 80 links though - you're
| using a definition of "cognitive tasks" that more closely
| resembles knowledge, and then you're letting your
| personal feelings about profundity guide your conclusions
| on said cognition.
|
| I don't deny that the machine can output pretty words and
| has a breadth of knowledge to put us each to shame on
| some simple queries, but "cognition in a 4000 token
| window" is an incredibly large place and I don't even
| understand how you would be able to claim a machine has
| above-human-average cognition based solely on your own
| interactions... That's a pretty crazy leap.
|
| PS: I saw the downvotes, I was downvoted for questioning
| the validity of information that was actually just pure
| conjecture, be better with your votes
| refulgentis wrote:
| > Gut feeling based on playing with it
|
| You should check out the post we're commenting on, it has
| graphs for this exact metric.
|
| Spoiler: Google's model with 3x the parameters does pass
| average human in a couple categories, but not at all. I
| don't think GPT-3 does in any.
|
| It's doubly puzzling to me because you have access and
| are asserting it feels like an average human to you. It's
| awesome and it does magical stuff, I use it daily both
| for code and prose. It also majorly screws up sometimes.
| It only at an average human level if we play word games
| with things like "well, the average human wouldn't know
| the Dart implementation of the 1D gaussian function.
| Therefore it's better than the average human."
| simulate-me wrote:
| I agree 100%, but I think viable businesses will begin to
| emerge especially as these large models move from text to
| images (and eventually to video and 3d models). If the
| examples shown of DALL-E 2 are indicative of its quality,
| then a large number of creative jobs could be replaced with a
| single "creative director" using the model. But the high
| entry cost just to attempt to train such a model will likely
| remain a hurdle until more business value is proven.
| lumost wrote:
| aye - I suspect the other concern is hat the high entry
| costs can quickly lead to a "second mover" advantage. The
| first team spends all the money doing the hard R&D and the
| second team implements a slightly better version for a
| fraction of the money.
| sjg007 wrote:
| I'd just solve some existing problem with the most basic
| language model you can get your hands on and then move up
| from there. Sell it first.
| vackosar wrote:
| Correction! Cost is around $10M not $10B.
| gwern wrote:
| The data here is effectively free. I don't think they would
| exhaust The Pile, which you can download for free. This is also
| true for text2image models like DALL-E 2: while OA may have
| invested in its own datasets, everyone else can just download
| LAION-400M (or if they are really ambitious, LAION-5B
| https://laion.ai/laion-5b-a-new-era-of-open-large-scale-mult...
| ).
| visarga wrote:
| > The amount of capital needed to train these high-quality
| models is eye watering
|
| It's relative. It would cost more to open a 40 room hotel
| (about 320k/room), and hotels can't be copied like software.
| Vetch wrote:
| It's not like that many people are opening 40 room hotels
| either. Such amounts are atypical within programming and CS
| communities.
|
| A more relevant example is video games, imagine if the only
| viable ones were top end AAA games whose completed versions
| could only be accessed by cloud gaming?
| rafaelero wrote:
| That's literally nothing for the benefits it could provide if
| applied on the real world.
___________________________________________________________________
(page generated 2022-04-18 23:01 UTC)