[HN Gopher] On the Dangers of Stochastic Parrots [pdf]
___________________________________________________________________
On the Dangers of Stochastic Parrots [pdf]
Author : tmfi
Score : 43 points
Date : 2021-03-01 18:23 UTC (4 hours ago)
(HTM) web link (faculty.washington.edu)
(TXT) w3m dump (faculty.washington.edu)
| superbcarrot wrote:
| From the authors Shmargaret Shmitchell
| shmargaret.shmitchell@gmail.com The Aether
|
| Is this some meta joke or a reference to anything?
| djoldman wrote:
| https://twitter.com/mmitchell_ai
|
| https://www.theguardian.com/technology/2021/feb/19/google-fi...
| bryanrasmussen wrote:
| reading it I thought - if language models can be too big that
| could be a problem for Google given that at least one of their
| major competitive advantages is being able to have the biggest
| language models there are.
|
| Although I don't really know if that's so (about the competitive
| advantage), it certainly seems like it is something Google might
| think from what I remember about earlier Google arguments about
| automated translation.
| peachfuzz wrote:
| I don't get it. The paper reads like 10 pages of opinion and
| casting aspersions on language models. No math. No graphs.
| tsimionescu wrote:
| You don't need math to explain why, for example, a statistical
| model trained to find the plausible combinations of words or
| phrases in its corpus that matches a prompt is not a promising
| approach to NLP and general understanding, and using it to win
| at tasks designed to assess NLP success essentialy amounts to
| cheating. You also don't need math to explain that, in very
| real terms, the meaning of any phrase produced by GPT-3 lies in
| the mind of the reader and not in GPT-3's output, which doesn't
| see any meaning in the phrases it produces (unlike, say, the
| output of a rule-based system, which, while much more
| rudimentary, generally has reasoning about the topic at hand
| behind it, and not about word probabilities).
| blackbear_ wrote:
| > to explain why, for example, [...] is not a promising
| approach to NLP and general understanding
|
| I am not aware of any definitive proof or at least convincing
| argument of why this is _not_ possible, though. Timnit 's
| paper takes that as a given fact and illustrates consequent
| dangers and high-level workarounds.
| tsimionescu wrote:
| I didn't say that it's not possible, just that it's not
| likely. It's clearly not how the human mind works, and
| since that is the only computer we know for sure can do
| NLP, any approach that is so obviously alien to it is
| unlikely to be a good approach.
|
| In case this is not clear, the human mind obviously doesn't
| do stochastic prediction on phrases, it has a model of the
| world ( based on agents interacting with objects through
| mechanical laws) and it produces or interprets speech by
| assessing its interaction with this model of the world.
| When assessing whether to produce phrase A or phrase B, it
| doesn't assess the likelihood of this phrase in the corpus
| of phrases it has seen/heard before, but on its
| appropriateness to the situation. Most prompts for language
| use are not language at all, but come from the world itself
| [0], something which pure LMs can't even in principle do
| (they they could potentially be combined with other kinds
| of models to achieve this).
|
| [0] for example, 'seeing a fire in a crowded theater' is
| the prompt for yelling 'Fire!'; the correct response/next
| step from hearing a shout of 'Fire!' in a crowded theater
| is not a phrase, it is a desperate attempt to run away
| blackbear_ wrote:
| > In case this is not clear, the human mind obviously
| doesn't do stochastic prediction on phrases.
|
| Are you sure? Because this is exactly what you and I are
| doing right now. A language model is a very appropriate
| description for how we see each other: you produce some
| text, I produce some more, and you respond based on that.
| And I inject some stochasticity by re-typing this
| paragraph five times trying to make it sound like natural
| English and a cohesive text, a bit like beam search if
| you are familiar with that.
|
| What I am trying to say is that language models are a
| good interface (in the programming sense) to describe
| human interactions on the internet: text in, text out. So
| while I agree that GPT-3 is not a realistic "human-like"
| implementation of this interface, I don't see why _a
| priori_ a neural network cannot eventually incorporate
| world models with agents and so on and reach actual
| understanding (whatever that means) of text.
| wmf wrote:
| If Google execs believe that AIs trained on the public Web are
| the future of Google, this paper basically argues that those AIs,
| and by extension Google's future, are unethical and probably
| can't be fixed at any reasonable cost.
| oh_sigh wrote:
| This is the paper surrounding Timnit's "departure" from Google.
|
| If you're on Timnit's side, "departure" means "firing", and the
| paper is the reason she was fired.
|
| If you're on Google's side, "departure" means "mutually-agreeable
| resignation", with Timnit's melodramatic and unprofessional
| response to normal feedback.
|
| Personally, I don't see anything in this paper that implicates
| Google or would be reasonable for Google to try to suppress, so
| I'm falling into the camp of trusting Google's side of the story.
| But who knows?
| joshuamorton wrote:
| > Personally, I don't see anything in this paper that
| implicates Google or would be reasonable for Google to try to
| suppress
|
| Can you explain why this leads you to support Google? Google
| _still_ claims that the paper doesn 't meet their publication
| standards, despite, as you say, it containing nothing that it
| would be reasonable for Google to suppress.
|
| > Timnit's melodramatic and unprofessional response to normal
| feedback.
|
| Keep in mind the feedback was that _you cannot publish this
| paper_ , and that isn't disputed by Google.
| oh_sigh wrote:
| As a general principle, when there is a "they-said she-said"
| situation, and the certain facts about one side of the story
| don't add up, from my perspective that increases the odds
| that the other side is telling the truth.
|
| I have no idea what Google's standards are for letting a
| paper be published. My guess is that they don't make
| standards up on the fly, and no unique standards are applied
| to Timnit and not to literally everyone else.
|
| > Keep in mind the feedback was that you cannot publish this
| paper, and that isn't disputed by Google.
|
| Yes, and that in itself makes me more likely to think there
| is some benign, standard reason for not allowing publication
| rather than some nefarious motive about suppressing Timnit or
| hiding their own guilt, or whatever reason Timnit has
| provided as to why she thinks Google doesn't want the paper
| published.
|
| If the argument is "Google won't let me publish this paper
| because it's too dangerous for them", and then I look and see
| that there isn't really anything dangerous for Google in the
| paper, then I would say that perhaps the argument is
| incorrect as to the reasons Google wouldn't let them publish
| joshuamorton wrote:
| > My guess is that they don't make standards up on the fly,
| and no unique standards are applied to Timnit and not to
| literally everyone else.
|
| It is not disputed by Google that special standards (an
| additional, non-standard review process) were applied to
| this paper. There is some lack of clarity about how often
| that additional review process is applied
| (https://artificialintelligence-news.com/2020/12/24/google-
| te..., https://www.reuters.com/article/us-alphabet-google-
| research-...). But broadly it seems to not have much to do
| with the technical merit of the papers, despite what Google
| originally claimed, and instead be a legal/PR process.
| cjbenedikt wrote:
| ...or you can argue exactly the opposite along the same
| lines...
| oh_sigh wrote:
| What? If someone says a paper is too dangerous for
| Google, but then I read the paper and there is nothing
| dangerous at all for Google - that is evidence that there
| is in fact something too dangerous for Google in the
| paper?
| joshuamorton wrote:
| I think the argument is moreso that Google won't even
| allow milquetoast criticism of LLMs, which falls
| perfectly in line with the events.
| majormajor wrote:
| If Google had a big problem with something you find mild,
| that could easily make you believe _Google_ is making
| errors in judgment.
| nerdponx wrote:
| What's the backstory here?
| joshuamorton wrote:
| Editorializing as little as possible:
|
| This paper was originally going to be coauthored by Timnit
| Gebru, Margaret Mitchell, Emily Bender, and a few other
| collaborators from Google and UW.
|
| The paper, at a high level, offers criticisms of large
| language models (including BERT, a Google model). In
| ~October/November, the paper went through the normal Google
| paper review process and was approved to be published
| externally (i.e. submitted to a conference). Later, Gebru
| was informed that the paper was not fit for publication due
| to some additional review, and needed to be unsubmitted
| from the conference, or the Google coauthors needed to
| removed their names. Initially, this was provided with no
| context or reason.
|
| Upon pushing back, Gebru was given some information on why
| the paper was unfit for publication. Publicly, what we know
| is that Google's reasoning here was that the paper did not
| cite relevant work and was not up to Google's publication
| standards (of note here, the paper cites nearly 200 other
| works, which is huge for a CS paper, and it later passed
| peer review at the conference, so this claim seems
| dubious).
|
| Gebru complained that this feedback was essentially trying
| to bury the paper, especially given that she was not given
| the opportunity to address or incorporate the feedback,
| only drop the paper. She sent two emails, one to her
| management stating that this kind of process was not
| conducive to research and stating that she would consider
| resigning if things didn't change. She also sent an email
| complaining about the process to a mailing list about
| diversity and inclusion work, noting that DE&I work was
| going to continue to be a waste of time without executive
| buy-in.
|
| Google "accepted Gebru's resignation", noting that the
| second email she sent was unprofessional. Under the
| relevant law, Gebru didn't resign and was fired by Google.
| Google has since partially walked back their statements,
| and refers to the situation as her "departure", leaving it
| amusingly vague.
|
| The paper is published in January, after passing peer
| review, coauthroed by "Schmargaret Schmitchell", among
| others. It's since come to light that some other papers
| have also gone through this additional review since then,
| this additional review process was formalized, it seems,
| only after Gebru's paper was submitted. The sensitive
| review process involves legal approval and requires authors
| to remove statements like "having concerns". [0]
|
| Margaret Mitchell, Gebru's co-lead is also fired, for
| sharing confidential information. The investigation into
| her misconduct took over a month, and she was fired the
| same day as a reorganization of the AI ethics team was
| announced.
|
| [0]: https://www.reuters.com/article/us-alphabet-google-
| research-...
| 3940530 wrote:
| Gebru also publicly accused Jeff Dean of being complicit
| in "silenc[ing]" female researchers over a year before
| she resigned/was fired [1], and the "email complaining
| about the process to a mailing list about diversity and
| inclusion work" also included a brief reference to past
| legal tussles between herself and Google:
|
| > I'm always amazed at how people can continue to do
| thing after thing like this and then turn around and ask
| me for some sort of extra DEI work or input. This
| happened to me last year. I was in the middle of a
| potential lawsuit for which Kat Herller and I hired
| feminist lawyers who threatened to sue Google (which is
| when they backed off--before that Google lawyers were
| prepared to throw us under the bus and our leaders were
| following as instructed) and the next day I get some
| random "impact award." Pure gaslighting.
|
| From the outside, it looks like whatever relationship
| Gebru and Google leadership had was extremely strained
| well before this paper.
|
| [1] https://twitter.com/timnitgebru/status/11932384147425
| 48480?l...
|
| [2] https://www.platformer.news/p/the-withering-email-
| that-got-a...
| [deleted]
| tsimionescu wrote:
| Apart from the external dangers described (social,
| environmental), which I'm sure many will disagree with on
| multiple grounds, the article in general raises some very good
| points about the internal dangers these models pose to the field
| of NLP itself:
|
| > The problem is, if one side of the communication does not have
| meaning, then the comprehension of the implicit meaning is an
| illusion arising from our singular human understanding of
| language (independent of the model). Contrary to how it may seem
| when we observe its output, an LM is a system for haphazardly
| stitching together sequences of linguistic forms it has observed
| in its vast training data, according to probabilistic information
| about how they combine, but without any reference to meaning: a
| stochastic parrot.
|
| > However, from the perspective of work on language technology,
| it is far from clear that all of the effort being put into using
| large LMs to 'beat' tasks designed to test natural language
| understanding, and all of the effort to create new such tasks,
| once the existing ones have been bulldozed by the LMs, brings us
| any closer to long-term goals of general language understanding
| systems. If a large LM, endowed with hundreds of billions of
| parameters and trained on a very large dataset, can manipulate
| linguistic form well enough to cheat its way through tests meant
| to require language understanding, have we learned anything of
| value about how to build machine language understanding or have
| we been led down the garden path?
| blackbear_ wrote:
| Designing benchmarks and metrics to test for deep understanding
| is really hard! Even our education system is nowhere close to a
| good solution. It would probably warrant an entire research
| field, maybe using NLP systems to evaluate how good benchmarks
| are at evaluating understanding.
| opwieurposiu wrote:
| In a recent conversation regarding colors, my three year old
| explained to me that clear:color as silent:sound. I thought
| this showed a pretty good grasp of meaning between these
| concepts. I have no idea if he came up with this "meaning"
| himself or if he learned it from some input corpus(likely the
| netflix kids show corpus).
|
| Later he related that because noises become quieter when they
| are far away, colors become more transparent when they are far
| away. This one I am pretty sure he came up with himself, as it
| is conflating "hard to see" with "transparent".
|
| My point is it not so easy to tell when humans are behaving as
| stochastic parrots, or when they produce language based on
| meaning. Indeed, this may be a distinction without a
| difference.
| monkeybutton wrote:
| The paper mentions "... similar to the ones used in GPT-2's
| training data, i.e. documents linked to from Reddit [25], plus
| Wikipedia and a collection of books". Does anyone know what
| collection of books they are talking about?
|
| I tried following the chain of references but ended up at a pay-
| walled source. Is it based on project gutenberg? Also, does
| Google train their models on the contents of all the books they
| scanned for Google Books or are they not allowed to because of
| copyright right issues?
| wfhpw wrote:
| The "bookcorpus" is from this paper by Zhu, Kiro, et. al. [0].
| You can see the project web page here [1] which indicates you
| can crawl books from this site [2] to create your own. This
| repo seeks to replicate the original dataset [3]
|
| [0] https://www.cv-
| foundation.org/openaccess/content_iccv_2015/p...
|
| [1] https://yknzhu.wixsite.com/mbweb
|
| [2] https://www.smashwords.com/
|
| [3] https://github.com/sgraaf/Replicate-Toronto-BookCorpus
| nickvincent wrote:
| I'm curious about this as well!
|
| Search results sent me to this Twitter thread which suggests it
| was still a mystery in Aug 2020:
| https://twitter.com/vboykis/status/1290030614410702848?lang=...
|
| To speculate a bit: The GPT-3 paper (section 2.2) mentions
| using two datasets referred to as "books1" and "books2", which
| are 12B and 55B byte pair encoded tokens each.
|
| Project Gutenberg has 3B word tokens I believe, so it seems
| like it could be one of them, assuming the ratio of word tokens
| to byte-pair tokens is something like 3:12 to 3:55.
|
| Another likely candidate alongside Gutenberg is libgen,
| apparently, and looks like there have been successful efforts
| to create a similar dataset called bookcorpus:
| https://github.com/soskek/bookcorpus/issues/27). The discussion
| on that github issue suggests bookcorpus is very similar to
| "books2", which would make gutenberg "books1"?
|
| This might be why the paper is intentionally vague about the
| books used?
| unbiasedml wrote:
| See also
|
| "The Slodderwetenschap (Sloppy Science) of Stochastic Parrots - A
| Plea for Science to NOT take the Route Advocated by Gebru and
| Bender" by Michael Lissack.
|
| https://arxiv.org/ftp/arxiv/papers/2101/2101.10098.pdf
|
| I found this a reasonable critique of the original, despite
| apparent TOS violations by Lissack leading to his Twitter account
| being locked.
| [deleted]
___________________________________________________________________
(page generated 2021-03-01 23:02 UTC)