[HN Gopher] Some Remarks on Large Language Models
___________________________________________________________________
Some Remarks on Large Language Models
Author : backpropaganda
Score : 129 points
Date : 2023-01-03 15:54 UTC (7 hours ago)
(HTM) web link (gist.github.com)
(TXT) w3m dump (gist.github.com)
| anyonecancode wrote:
| Interesting post. I find myself moving away from the sort of
| "compare/contrast with humans" mode and more "let's figure out
| exactly what this machine _is_" way of thinking.
|
| If we look back at the history of mechanical machines, we see a
| lot of the same kind of debates happening there that we do around
| AI today -- comparing them to the abilities of humans or animals,
| arguing that "sure, this machine can do X, but humans can do Y
| better..." But over time, we've generally stopped doing that as
| we've gotten used to mechanical machines. I don't know that I've
| ever heard anyone compare a wheel to leg, for instance, even
| though both "do" the same thing, because at this point we take
| wheels for granted. Wheels are much more efficient at
| transporting objects across a surface in some circumstances, but
| no one's going around saying "yeah, but they will never be able
| to climb stairs as well" because, well, at this point we
| recognize that's not an actual argument we need to have. We know
| what wheels do and don't.
|
| These AI machine are a fairly novel type of machine, so we don't
| yet really understand what arguments make sense to have and which
| ones are unnecessary. But I like these posts that get more into
| exactly what an LLM _is_, as I find them helpful in understanding
| better exactly what kind of machine an LLM is. They're not
| "intelligent" any more than any other machine is (and
| historically, people have sometimes ascribed intelligence, even
| sentience, to simple mechanical machines), but that's not so
| important. Exactly what we'll end up doing with these machines
| will be very interesting.
| sgt101 wrote:
| I think ChatGPT is a Chinese Room (as per John Searle's famous
| description). The problem with this is that ChatGPT has no idea
| what it is saying and doesn't/can't understand when it is wrong
| or uncertain (or even when it is right or certain).
|
| I believe that this is dangerous in many valuable applications,
| and will mean that the current generation of LLM's will be more
| limited in value that some people believe. I think this is
| quite similar to the problems that self driving cars have; we
| can make good ones for sure, but they are not good enough or
| predictable enough to be trusted and used without significant
| constraints.
|
| My worry is that LLM's will get used inappropriately and will
| hurt lots of people, I wonder if there is a way to stop this?
| pyrolistical wrote:
| We stop this like any other issue with the law. Somebody is
| going to use LLM and cause harm. They will then get sued and
| people will have to reconsider the risk of using LLM.
|
| It's just a tool
| andrepd wrote:
| It is a relevant argument though, as a reply to claims of "GPT4
| will replace doctors and lawyers and programmers in 6 months".
| abarker wrote:
| Conversely, these models open up philosophical questions of
| "exactly what a human is" beyond language abilities. How much
| of what we think, do, and perceive comes from the use of
| language?
| dekhn wrote:
| I would like to see the section on "Common-yet-boring" arguments
| cleaned up a bit. There is a whole category of "researchers" who
| just spend their time criticizing LLMs with common-yet-boring
| arguments (Emily Bender is the best example) such as "they cost a
| lot to train" (uhhh have you seen how much enterprise spends on
| cloud for non-LLM stuff? Or seen the power consumption of an
| aluminum smelting plant? Or calcuated the costs of all the
| airplanes flying around taking tourists to vacation?)
|
| By improving this section I think we can have a standard go-to
| doc to refute the common-but-boring arguments. By pre-
| anticipating what they say (and yes, Bender is _very_
| predictable... yuo could almost make a chatbot that predicts her)
| it greatly weakens their argument.
| jeffbee wrote:
| > Or calculated the costs of all the airplanes flying around
|
| This is the key comparison. A 747-400 burns 10+ metric tons of
| kerosene per hour, which means its basic energy consumption is
| > 110MW. The cost to train GPT-3 was approximately the same
| energy spent by _one_ 8-hour airline flight.
| riverdweller wrote:
| Equivalently, the energy used to train GPT-3 was the same as
| the energy consumed by Bitcoin in just four minutes.
| mannykannot wrote:
| It looks pretty good as it stands, I think - to spend too much
| time on these arguments is to play their game.
|
| Having said that, I would add a note about the whole category
| of ontological or "nothing but" arguments - saying that an LLM
| is nothing but a fancy database, search engine, autocomplete or
| whatever. There's an element of question-begging when these
| statements are prefaced with "they will never lead to machine
| understanding because...", and beyond that, the more they are
| conflated with everyday technology, the more noteworthy their
| performance appears.
| evrydayhustling wrote:
| Loved that section as well. An addendum I'd include is that
| many of these arguments are boring as criticisms, but super
| interesting as research areas. AIs burn energy? Great, let's
| make efficient architectures. AIs embed bias? Let's get better
| and measuring and aligning bias. AIs don't cite sources? Most
| humans don't either, but it sure would make the AI more useful
| if it did...
|
| (As a PS, I've seen that last one mainly as a refutation for
| the "LLMs are ready to kill search" meme. In that context it's
| a very valid objection.)
| [deleted]
| [deleted]
| lalaithion wrote:
| > In particular, if the model is trained on multiple news stories
| about the same event, it has no way of knowing that these texts
| all describe the same thing, and it cannot differentiate it from
| several texts describing similar but unrelated events
|
| And... the claim is that humans can do this? Is it just the
| boring "This AI can only receive information via tokens, whereas
| humans get it via more high resolution senses of various types,
| and somehow that is what causes the ability to figure out two
| things are actually the same thing?" thing?
| jerf wrote:
| I'd say this is more related to the observation that LLMs
| aren't going to be good at math. (As the article says, their
| current performance is surprising enough as it is but I agree
| that it seems unlikely that just making bigger and bigger LLMs
| is going to get substantially better at even arithmetic, to say
| nothing of higher math.) They have a decent understanding of "X
| before Y" as a textual phrase, but I think it would be hard for
| them do very much further logic based on that temporal logic
| because it lacks the representation for it, as it lacks the
| representation suitable for math.
|
| I expect if you asked "Did $FAMOUS_EVENT happen before
| $OTHER_FAMOUS_EVENT" it would do OK, just as "What is
| $FAMOUS_NUMBER plus $FAMOUS_NUMBER?" does OK, but as you get
| more obscure it will fall down badly on tasks that humans would
| generally do OK at.
|
| Though, no, humans are not perfect at this by any means either.
|
| It is important to remember that what this entire technology
| boils down to is "what word is most likely to follow the
| content up to this point?", iterated. What that can do is
| impressive, no question, but at the same time, if you can try
| to imagine interacting with the world through that one and only
| tool, you may be able to better understand the limitations of
| this technology too. There are some tasks that just can't be
| performed that way.
|
| (You'll have a hard time doing so, though. It is very hard to
| think in that manner. As a human I really tend to think in a
| bare minimum of sentences at a time, which I then serialize
| into words. Trying to imagine operating in terms of "OK, what's
| the next word?" "OK, what's the next word?" "OK, what's the
| next word?" with no forward planning beyond what is implied by
| your choice of this particular word is not something that comes
| even remotely naturally to us.)
|
| When this tech answers the question "Did $FAMOUS_EVENT happen
| before $OTHER_FAMOUS_EVENT?", it is not thinking, OK, this
| event happened in 1876 and the other event happened in 1986,
| so, yes, it's before. It is thinking "What is the most likely
| next word after '... $OTHER_FAMOUS_EVENT?" "What is the next
| most likely word after that?" and so on. For famous events it
| is reasonably likely to get them right because the training
| data has relationships for the famous events. It might even
| make mistakes in a very human manner. But it's not doing
| temporal logic, because it can't. There's nowhere for "temporal
| logic" to be taking place.
| jjeaff wrote:
| Of course humans can do this. You don't recognize when articles
| are talking about the same thing?
| seydor wrote:
| > Another way to say it is that the model is "not grounded". The
| symbols the model operates on are just symbols, and while they
| can stand in relation to one another, they do not "ground" to any
| real-world item.
|
| This is what Math is, abstract syntactic rules. GPTs however seem
| to struggle in particular at counting, probably because their
| structure does not have a notion of order. I wonder if future
| LLMs built for math will basically solve all math (if they will
| be able to find any proof that is provable or not).
|
| Grounding LLMs to images will be super interesting to see though,
| because images have order and so much of abstract thinking is
| spatial/geometric in its base. Perhaps those will be the first
| true AIs
| eternalban wrote:
| GPT-3 is limited, but it has delivered a jolt that demands a
| general reconsideration of machine vs human intelligence. Has it
| made you change your mind about anything?
|
| At this point for me, the notion of machine "intelligence" is a
| more reasonable proposition. However this shift is the result of
| a reconsideration of the binary proposition of "dumb or
| intelligent like humans".
|
| First, I propose a possible discriminant for "intelligence" vs
| "computation" to be the ability of an algorithm to brute force
| compute a response given the input corpus of the 'AI' under
| consideration, where the machine has provided a reasonable
| response.
|
| It also seems reasonable to begin to differentiate 'kinds' of
| intelligence. On this very planet there are a variety of
| creatures that exhibit some form of intelligence. And they seem
| to be distinct kinds. Social insects are arguably intelligent.
| Crows are discussed frequently on hacker news. Fluffy is not
| entirely dumb either. But are these all the same 'kind' of
| intelligence?
|
| Putting cards on the table, at this point it seems eminently
| possible that we will create some form of _mechanical insectoid
| intelligence_. I do not believe insects have any need for
| 'meaning' - form will do. That distinction also takes the sticky
| 'what is consciousness?' Q out of the equation.
| CarbonCycles wrote:
| Unfortunate, an opportunity to further enlighten others, but the
| author took a dismissive and antagonistic perspective.
| [deleted]
| brooksbp wrote:
| Sometimes I read text like this and really enjoy the deep
| insights and arguments once I filter out the emotion, attitude,
| or tone. And I wonder if the core of what they're trying to
| communicate would be better or more efficiently received if the
| text was more neutral or positive. E.g. you can be 'bearish' on
| something and point out 'limitations', or you can say 'this is
| where I think we are' and 'this is how I think we can improve',
| but your insights and arguments about the thing can more or less
| be the same in either form of delivery.
| mannykannot wrote:
| When reading as well-constructed an article as this one, I tend
| to assume its tone pretty accurately reflects the author's
| position.
| teekert wrote:
| Maybe you can ask chatGPT for a summary in a less emotional
| tone ;)
| omeze wrote:
| I tried but the original article is too long :/
| neuronexmachina wrote:
| > Sometimes I read text like this and really enjoy the deep
| insights and arguments once I filter out the emotion, attitude,
| or tone.
|
| Curiously enough, I imagine that sort of filtering/translation
| is the sort of thing a Large Language Model would be pretty
| good at.
| azinman2 wrote:
| But then it would feel less personal and be more boring.
| Writing should convey emotion - it's what we have as humans to
| offer in linking with others, and great writing should in turn
| make you feel something.
| blackbear_ wrote:
| Disagree: great (non-fiction) writing should provide
| information in an efficient and structured way, so that
| readers can quickly understand the key points and if reading
| further is worth their time.
| stevenhuang wrote:
| What emotional tone did you arrive to? If "bearish" is your
| take away I think you should read it more carefully.
| dekhn wrote:
| I'd rewrite sections like this to be a bit less "insulting
| the intelligence of the question-asker".
|
| _The models do not understand language like humans do._
|
| Duh? they are not humans? Of course they differ in some of
| their mechanisms. They still can tell us a lot about language
| structure. And for what they don't tell us, we can look
| elsewhere.
| eli_gottlieb wrote:
| Sometimes I read scientific and technical texts that are candy-
| coated to appeal to people who can't stand criticism, and wish
| the author was allowed to say what they really think.
| stevenhuang wrote:
| I found the "grounding" explanation provided by human feedback
| very insightful:
|
| > Why is this significant? At the core the model is still doing
| language modeling, right? learning to predict the next word,
| based on text alone? Sure, but here the human annotators inject
| some level of grounding to the text. Some symbols ("summarize",
| "translate", "formal") are used in a consistent way together with
| the concept/task they denote. And they always appear in the
| beginning of the text. This make these symbols (or the
| "instructions") in some loose sense external to the rest of the
| data, making the act of producing a summary grounded to the human
| concept of "summary". Or in other words, this helps the model
| learn the communicative intent of the a user who asks for a
| "summary" in its "instruction". An objection here would be that
| such cases likely naturally occur already in large text
| collections, and the model already learned from them, so what is
| new here? I argue that it might be much easier to learn from
| direct instructions like these than it is to learn from non-
| instruction data (think of a direct statement like "this is a
| dog" vs needing to infer from over-hearing people talk about
| dogs). And that by shifting the distribution of the training data
| towards these annotated cases, substantially alter how the model
| acts, and the amount of "grounding" it has. And that maybe with
| explicit instructions data, we can use much less training text
| compared to what was needed without them. (I promised you hand
| waving didn't I?)
| [deleted]
| seydor wrote:
| the only word grounded there was the word "summary" . There are
| so many more which are not possible to be delivered in the same
| way
| ilaksh wrote:
| He seems to have missed the biggest difference which is the lack
| of visual information.
| aunch wrote:
| Great focus on the core model itself! I think a complimentary
| aspect of making LLM's "useful" from a productionization
| perspective is all of the engineering around the model itself.
| This blog post did a pretty good job highlighting those
| complementary points: https://lspace.swyx.io/p/what-building-
| copilot-for-x-really
| light_hue_1 wrote:
| The dismissal of biases and stereotypes is exactly why AI
| research needs more people who are part of the minority. Yoav can
| dismiss this because it just doesn't affect him much.
|
| It's easy to say "Oh well, humans are biased too" when the biases
| of these machines don't: misgender you, mistranslate text that
| relates to you, have negative affect toward you, are more likely
| to write violent stories related to you, have lower performance
| on tasks related to you, etc.
| rintakumpu wrote:
| Don't understand the downvotes but I do disagree. What the
| industry needs is not overtly racist hiring practises but
| rather people who are aware of these issues and have the know-
| how and the power to address them.
|
| I'll take an example. I'm making an adventure/strategy game
| that is set in the 90s Finland. We had a lot of Somali refugees
| coming from the Soviet Union back then and to reflect that I've
| created a female Somali character who is unable to find
| employment due to the racist attitudes of the time.
|
| I'm using DALL-E 2 to create some template graphics for the
| game and using the prompt "somali middle aged female pixel art
| with hijab" produces some real monstrosities
| https://imgur.com/a/1o2CEi9 whereas "nordic female middle age
| minister short dark hair pixel art portrait pixelated smiling
| glasses" produces exclusively decent results
| https://imgur.com/a/ag2ifqi .
|
| I'm an extremely privileged white, middle-aged, straight cis
| male and I'm able to point out a problem. Of course I'm not
| against hiring minorities, just saying that you don't need to
| belong to any minority group to spot the biases.
| jeffreyrogers wrote:
| I mean he is presumably Jewish and lives in Israel, so I would
| guess he knows quite a bit about being a minority and
| experiencing bias.
| rideontime wrote:
| That's true, he probably sees a lot of anti-Palestinian bias
| on a regular basis.
| azinman2 wrote:
| And globally a disproportionate amount of anti-Israeli and
| anti-Jewish bias, obviously.
| teddyh wrote:
| Inside Israel?
| jjeaff wrote:
| Jews are a minority in Israel?
| eli_gottlieb wrote:
| We're the "majority" by virtue of internationally
| gerrymandered borders. In the region as a whole? Yes, we're
| an indigenous minority.
| pessimizer wrote:
| The Boers are an indigenous minority in southern Africa,
| but in the 80s I wouldn't have used the Boers as an
| example of people who really understand the experience of
| bias as a minority.
| rideontime wrote:
| What does the word "indigenous" mean?
| eli_gottlieb wrote:
| >The Boers are an indigenous minority in southern Africa,
|
| No they're not.
| jeffreyrogers wrote:
| No of course not, nor did I claim they are.
| tomrod wrote:
| Similar question to one above, I don't follow why the
| (positive) ad hominem bolsters or detract from the
| arguments.
| jeffreyrogers wrote:
| Because the claim of the comment I was responding to was
| that Yoav doesn't experience bias and consequently
| dismisses it, so in this case my comment is a direct
| refutation of the argument.
| adamsmith143 wrote:
| "
|
| The models encode many biases and stereotypes.
| Well, sure they do. They model observed human's language, and
| we humans are terrible beings, we are biased and are constantly
| stereotyping. This means we need to be careful when applying
| these models to real-world tasks, but it doesn't make them less
| valid, useful or interesting from a scientiic perspective."
|
| Not sure how this can be seen as dismissive.
|
| >Yoav can dismiss this because it just doesn't affect him much.
|
| Maybe just maybe someone named Yoav Goldberg might maybe be in
| a group where bias affects him quite strongly.
| jjeaff wrote:
| Or maybe he is blind to or unaffected by such biases either
| due to luck or wealth or other outliers. Especially as a
| Jewish person in Israel. There are always plenty of people in
| minority groups that feel (either correctly or incorrectly)
| that bias doesn't affect them. Take Clarence Thomas for
| example, or Candace Owens. Simply being a member of a
| minority group does not make your opinion correct. Thomas
| even said in recent oral arguments that there wasn't much
| diversity in his university when he attended and so he
| doesn't really see how a diverse student body is beneficial
| to one's education.
| adamsmith143 wrote:
| >Simply being a member of a minority group does not make
| your opinion correct.
|
| Nor does being a member of the majority make yours
| incorrect.
| azinman2 wrote:
| Or maybe he recognizes that it's literally impossible to
| train a system to output a result that isn't biased.
| Creating a model will result in bias, even if that model is
| a calculator. You put your perspective in its creation, its
| utility, its fundamental language(s) (including design
| language), its method of interaction, its existence and
| place in the world. If you train a model on the web, it'll
| be billions of biases included, including the choice to
| train on the web. If you train on a "sanctioned list," what
| you include or don't include will also be bias. Even
| training on just Nature papers would give you a gigantic
| amount of bias.
|
| This is what I really don't like about the AI ethics
| critics (of the woke variety): it's super easy to be
| dismissive, but it's crazy hard to do anything that moves
| the world. If you move the world, some people will
| naturally be happy and others angry. Even creating a super
| "balanced" dataset will piss off those who want an
| imbalanced world!
|
| No opinion is "correct" - they're just opinions, including
| mine right now!
| JW_00000 wrote:
| In fact, the author even says this argument is "true but
| uninspiring / irrelevant". He's just deciding not to focus on
| that aspect in this article.
| eli_gottlieb wrote:
| [flagged]
| zzzeek wrote:
| "The models are biased, don't cite their sources, and we have no
| idea if there may be very negative effects on society by machines
| that very confidently spew truth/garbage mixtures which are very
| difficult to fact check"
|
| dumb boring critiques, so what? so boring! we'll "be careful",
| OK? so just shut up!
| phillipcarter wrote:
| While not unsolvable, I think the author is understating this
| problem a lot:
|
| > Also, let's put things in perspective: yes, it is
| enviromentally costly, but we aren't training that many of them,
| and the total cost is miniscule compared to all the other energy
| consumptions we humans do.
|
| Part of the reason LLMs aren't that big in the grand scheme of
| things is because they haven't been good enough and businesses
| haven't started to really adopt them. That will change, but the
| costs will be high because they're also extremely expensive to
| run. I think the author is focusing on the training costs for
| now, but that will likely get dwarfed by operational costs. What
| then? Waving one's arms and saying it'll just "get cheaper over
| time" isn't an acceptable answer because it's hard work and we
| don't really know how cheap we can get right now. It must be a
| focus if we actually care about widespread adoption and
| environmental impact.
| dekhn wrote:
| Think about aluminum smelting. At some point in the past, only
| a few researchers could smelt aluminum, and while it used a ton
| of energy, it was just a few research projects. Then, people
| realized that aluminum was lighter than steel and could replace
| it... so suddenly everybody was smelting aluminum. The method
| to do this involves massive amounts of electricity... but it
| was fine, because the _value_ of the product (to society) was
| more than high enough to justify it. Eventually, smelters moved
| to places where there were natural sources of energy... for
| example, the Columbia Gorge dam was used to power a massive
| smelter. Guess where Google put their west coast data center?
| Right there, because aluminum smelting led to a superfund site
| and we exported those to growing countries for pollution
| reasons. So there is lots of "free, carbon-neutral" power from
| hydro plants.
|
| The interesting details are: the companies with large GPU/TPU
| fleets are already running them in fairly efficient setups,
| with high utilization (so you're not blowing carbon emissions
| on idle machines), and can scale those setups if demand
| increases. This is not irresonsible. And, the scaleup will only
| happen if the systems are actually useful.
|
| Basically there are 100 other things I'd focus on trimming
| environment impact for before LLMs.
| woodson wrote:
| I think quantization (e.g. 4-bit,
| https://arxiv.org/abs/2212.09720) and sparsity (e.g. SparseGPT,
| https://arxiv.org/abs/2301.00774) will bring down inference
| cost.
|
| Edit: This isn't handwaving btw, this is to say some fairly
| decent solutions are available now.
| eli_gottlieb wrote:
| >Part of the reason LLMs aren't that big in the grand scheme of
| things is because they haven't been good enough and businesses
| haven't started to really adopt them. That will change, but the
| costs will be high because they're also extremely expensive to
| run. I think the author is focusing on the training costs for
| now, but that will likely get dwarfed by operational costs.
| What then?
|
| Now maybe I'm naive somehow because I'm a machine-learning
| person who doesn't work on LLMs/big-ass-transformers, but uh...
| why do they actually have to be this large to get this level of
| performance?
| phillipcarter wrote:
| Dunno! It could be the case that there just needs to be a
| trillion parameters to be useful enough outside of highly-
| constrained scenarios. But I would certainly challenge those
| who work on LLMs to figure out how to require far less
| compute for the same outcome.
| seydor wrote:
| The models will become much smaller, there are already some
| papers that show promising results with pruned models
|
| And transformers are not even the final model , who knows what
| will come next
| macleginn wrote:
| I cannot understand where the boundary between some of the
| "common-yet-boring arguments" and "real limitations" is. E.g.,
| the ideas that "You cannot learn anything meaningful based only
| on form" and "It only connects pieces its seen before according
| to some statistics" are "boring", but the fact that models have
| no knowledge of knowledge, or knowledge of time, or any
| understanding of how texts relate to each other is "real". These
| are essentially the same things! This is what people may mean
| when they proffer their "boring critiques", if you press them
| hard enough. Of course Yoav, being abrest of the field, knows all
| the details and can talk about the problem in more concrete
| terms, but "vague" and "boring" are still different things.
|
| I also cannot fathom how models can develop a sense of time, or
| structured knowledge of the world consisting of discrete objects,
| even with a large dose of RLHF, if the internal representations
| are continuous, and layer normalised, and otherwise incapable of
| arriving at any hard-ish, logic-like rules? All these models seem
| have deep seated architectural limitations, and they are almost
| at the limit of the available training data. Being non-vague and
| positive-minded about this doesn't solve the issue. The models
| can write polite emails and funny reviews of Persian rags in
| haiku, but they are deeply unreasonable and 100% unreliable.
| There is hardly a solid business or social case for this stuff.
| Der_Einzige wrote:
| Actually, they struggle even with haikus if you care about
| proper syllable counts.
___________________________________________________________________
(page generated 2023-01-03 23:00 UTC)