[HN Gopher] On the Dangers of Stochastic Parrots [pdf]
       ___________________________________________________________________
        
       On the Dangers of Stochastic Parrots [pdf]
        
       Author : tmfi
       Score  : 43 points
       Date   : 2021-03-01 18:23 UTC (4 hours ago)
        
 (HTM) web link (faculty.washington.edu)
 (TXT) w3m dump (faculty.washington.edu)
        
       | superbcarrot wrote:
       | From the authors                 Shmargaret Shmitchell
       | shmargaret.shmitchell@gmail.com            The Aether
       | 
       | Is this some meta joke or a reference to anything?
        
         | djoldman wrote:
         | https://twitter.com/mmitchell_ai
         | 
         | https://www.theguardian.com/technology/2021/feb/19/google-fi...
        
       | bryanrasmussen wrote:
       | reading it I thought - if language models can be too big that
       | could be a problem for Google given that at least one of their
       | major competitive advantages is being able to have the biggest
       | language models there are.
       | 
       | Although I don't really know if that's so (about the competitive
       | advantage), it certainly seems like it is something Google might
       | think from what I remember about earlier Google arguments about
       | automated translation.
        
       | peachfuzz wrote:
       | I don't get it. The paper reads like 10 pages of opinion and
       | casting aspersions on language models. No math. No graphs.
        
         | tsimionescu wrote:
         | You don't need math to explain why, for example, a statistical
         | model trained to find the plausible combinations of words or
         | phrases in its corpus that matches a prompt is not a promising
         | approach to NLP and general understanding, and using it to win
         | at tasks designed to assess NLP success essentialy amounts to
         | cheating. You also don't need math to explain that, in very
         | real terms, the meaning of any phrase produced by GPT-3 lies in
         | the mind of the reader and not in GPT-3's output, which doesn't
         | see any meaning in the phrases it produces (unlike, say, the
         | output of a rule-based system, which, while much more
         | rudimentary, generally has reasoning about the topic at hand
         | behind it, and not about word probabilities).
        
           | blackbear_ wrote:
           | > to explain why, for example, [...] is not a promising
           | approach to NLP and general understanding
           | 
           | I am not aware of any definitive proof or at least convincing
           | argument of why this is _not_ possible, though. Timnit 's
           | paper takes that as a given fact and illustrates consequent
           | dangers and high-level workarounds.
        
             | tsimionescu wrote:
             | I didn't say that it's not possible, just that it's not
             | likely. It's clearly not how the human mind works, and
             | since that is the only computer we know for sure can do
             | NLP, any approach that is so obviously alien to it is
             | unlikely to be a good approach.
             | 
             | In case this is not clear, the human mind obviously doesn't
             | do stochastic prediction on phrases, it has a model of the
             | world ( based on agents interacting with objects through
             | mechanical laws) and it produces or interprets speech by
             | assessing its interaction with this model of the world.
             | When assessing whether to produce phrase A or phrase B, it
             | doesn't assess the likelihood of this phrase in the corpus
             | of phrases it has seen/heard before, but on its
             | appropriateness to the situation. Most prompts for language
             | use are not language at all, but come from the world itself
             | [0], something which pure LMs can't even in principle do
             | (they they could potentially be combined with other kinds
             | of models to achieve this).
             | 
             | [0] for example, 'seeing a fire in a crowded theater' is
             | the prompt for yelling 'Fire!'; the correct response/next
             | step from hearing a shout of 'Fire!' in a crowded theater
             | is not a phrase, it is a desperate attempt to run away
        
               | blackbear_ wrote:
               | > In case this is not clear, the human mind obviously
               | doesn't do stochastic prediction on phrases.
               | 
               | Are you sure? Because this is exactly what you and I are
               | doing right now. A language model is a very appropriate
               | description for how we see each other: you produce some
               | text, I produce some more, and you respond based on that.
               | And I inject some stochasticity by re-typing this
               | paragraph five times trying to make it sound like natural
               | English and a cohesive text, a bit like beam search if
               | you are familiar with that.
               | 
               | What I am trying to say is that language models are a
               | good interface (in the programming sense) to describe
               | human interactions on the internet: text in, text out. So
               | while I agree that GPT-3 is not a realistic "human-like"
               | implementation of this interface, I don't see why _a
               | priori_ a neural network cannot eventually incorporate
               | world models with agents and so on and reach actual
               | understanding (whatever that means) of text.
        
       | wmf wrote:
       | If Google execs believe that AIs trained on the public Web are
       | the future of Google, this paper basically argues that those AIs,
       | and by extension Google's future, are unethical and probably
       | can't be fixed at any reasonable cost.
        
       | oh_sigh wrote:
       | This is the paper surrounding Timnit's "departure" from Google.
       | 
       | If you're on Timnit's side, "departure" means "firing", and the
       | paper is the reason she was fired.
       | 
       | If you're on Google's side, "departure" means "mutually-agreeable
       | resignation", with Timnit's melodramatic and unprofessional
       | response to normal feedback.
       | 
       | Personally, I don't see anything in this paper that implicates
       | Google or would be reasonable for Google to try to suppress, so
       | I'm falling into the camp of trusting Google's side of the story.
       | But who knows?
        
         | joshuamorton wrote:
         | > Personally, I don't see anything in this paper that
         | implicates Google or would be reasonable for Google to try to
         | suppress
         | 
         | Can you explain why this leads you to support Google? Google
         | _still_ claims that the paper doesn 't meet their publication
         | standards, despite, as you say, it containing nothing that it
         | would be reasonable for Google to suppress.
         | 
         | > Timnit's melodramatic and unprofessional response to normal
         | feedback.
         | 
         | Keep in mind the feedback was that _you cannot publish this
         | paper_ , and that isn't disputed by Google.
        
           | oh_sigh wrote:
           | As a general principle, when there is a "they-said she-said"
           | situation, and the certain facts about one side of the story
           | don't add up, from my perspective that increases the odds
           | that the other side is telling the truth.
           | 
           | I have no idea what Google's standards are for letting a
           | paper be published. My guess is that they don't make
           | standards up on the fly, and no unique standards are applied
           | to Timnit and not to literally everyone else.
           | 
           | > Keep in mind the feedback was that you cannot publish this
           | paper, and that isn't disputed by Google.
           | 
           | Yes, and that in itself makes me more likely to think there
           | is some benign, standard reason for not allowing publication
           | rather than some nefarious motive about suppressing Timnit or
           | hiding their own guilt, or whatever reason Timnit has
           | provided as to why she thinks Google doesn't want the paper
           | published.
           | 
           | If the argument is "Google won't let me publish this paper
           | because it's too dangerous for them", and then I look and see
           | that there isn't really anything dangerous for Google in the
           | paper, then I would say that perhaps the argument is
           | incorrect as to the reasons Google wouldn't let them publish
        
             | joshuamorton wrote:
             | > My guess is that they don't make standards up on the fly,
             | and no unique standards are applied to Timnit and not to
             | literally everyone else.
             | 
             | It is not disputed by Google that special standards (an
             | additional, non-standard review process) were applied to
             | this paper. There is some lack of clarity about how often
             | that additional review process is applied
             | (https://artificialintelligence-news.com/2020/12/24/google-
             | te..., https://www.reuters.com/article/us-alphabet-google-
             | research-...). But broadly it seems to not have much to do
             | with the technical merit of the papers, despite what Google
             | originally claimed, and instead be a legal/PR process.
        
             | cjbenedikt wrote:
             | ...or you can argue exactly the opposite along the same
             | lines...
        
               | oh_sigh wrote:
               | What? If someone says a paper is too dangerous for
               | Google, but then I read the paper and there is nothing
               | dangerous at all for Google - that is evidence that there
               | is in fact something too dangerous for Google in the
               | paper?
        
               | joshuamorton wrote:
               | I think the argument is moreso that Google won't even
               | allow milquetoast criticism of LLMs, which falls
               | perfectly in line with the events.
        
               | majormajor wrote:
               | If Google had a big problem with something you find mild,
               | that could easily make you believe _Google_ is making
               | errors in judgment.
        
           | nerdponx wrote:
           | What's the backstory here?
        
             | joshuamorton wrote:
             | Editorializing as little as possible:
             | 
             | This paper was originally going to be coauthored by Timnit
             | Gebru, Margaret Mitchell, Emily Bender, and a few other
             | collaborators from Google and UW.
             | 
             | The paper, at a high level, offers criticisms of large
             | language models (including BERT, a Google model). In
             | ~October/November, the paper went through the normal Google
             | paper review process and was approved to be published
             | externally (i.e. submitted to a conference). Later, Gebru
             | was informed that the paper was not fit for publication due
             | to some additional review, and needed to be unsubmitted
             | from the conference, or the Google coauthors needed to
             | removed their names. Initially, this was provided with no
             | context or reason.
             | 
             | Upon pushing back, Gebru was given some information on why
             | the paper was unfit for publication. Publicly, what we know
             | is that Google's reasoning here was that the paper did not
             | cite relevant work and was not up to Google's publication
             | standards (of note here, the paper cites nearly 200 other
             | works, which is huge for a CS paper, and it later passed
             | peer review at the conference, so this claim seems
             | dubious).
             | 
             | Gebru complained that this feedback was essentially trying
             | to bury the paper, especially given that she was not given
             | the opportunity to address or incorporate the feedback,
             | only drop the paper. She sent two emails, one to her
             | management stating that this kind of process was not
             | conducive to research and stating that she would consider
             | resigning if things didn't change. She also sent an email
             | complaining about the process to a mailing list about
             | diversity and inclusion work, noting that DE&I work was
             | going to continue to be a waste of time without executive
             | buy-in.
             | 
             | Google "accepted Gebru's resignation", noting that the
             | second email she sent was unprofessional. Under the
             | relevant law, Gebru didn't resign and was fired by Google.
             | Google has since partially walked back their statements,
             | and refers to the situation as her "departure", leaving it
             | amusingly vague.
             | 
             | The paper is published in January, after passing peer
             | review, coauthroed by "Schmargaret Schmitchell", among
             | others. It's since come to light that some other papers
             | have also gone through this additional review since then,
             | this additional review process was formalized, it seems,
             | only after Gebru's paper was submitted. The sensitive
             | review process involves legal approval and requires authors
             | to remove statements like "having concerns". [0]
             | 
             | Margaret Mitchell, Gebru's co-lead is also fired, for
             | sharing confidential information. The investigation into
             | her misconduct took over a month, and she was fired the
             | same day as a reorganization of the AI ethics team was
             | announced.
             | 
             | [0]: https://www.reuters.com/article/us-alphabet-google-
             | research-...
        
               | 3940530 wrote:
               | Gebru also publicly accused Jeff Dean of being complicit
               | in "silenc[ing]" female researchers over a year before
               | she resigned/was fired [1], and the "email complaining
               | about the process to a mailing list about diversity and
               | inclusion work" also included a brief reference to past
               | legal tussles between herself and Google:
               | 
               | > I'm always amazed at how people can continue to do
               | thing after thing like this and then turn around and ask
               | me for some sort of extra DEI work or input. This
               | happened to me last year. I was in the middle of a
               | potential lawsuit for which Kat Herller and I hired
               | feminist lawyers who threatened to sue Google (which is
               | when they backed off--before that Google lawyers were
               | prepared to throw us under the bus and our leaders were
               | following as instructed) and the next day I get some
               | random "impact award." Pure gaslighting.
               | 
               | From the outside, it looks like whatever relationship
               | Gebru and Google leadership had was extremely strained
               | well before this paper.
               | 
               | [1] https://twitter.com/timnitgebru/status/11932384147425
               | 48480?l...
               | 
               | [2] https://www.platformer.news/p/the-withering-email-
               | that-got-a...
        
       | [deleted]
        
       | tsimionescu wrote:
       | Apart from the external dangers described (social,
       | environmental), which I'm sure many will disagree with on
       | multiple grounds, the article in general raises some very good
       | points about the internal dangers these models pose to the field
       | of NLP itself:
       | 
       | > The problem is, if one side of the communication does not have
       | meaning, then the comprehension of the implicit meaning is an
       | illusion arising from our singular human understanding of
       | language (independent of the model). Contrary to how it may seem
       | when we observe its output, an LM is a system for haphazardly
       | stitching together sequences of linguistic forms it has observed
       | in its vast training data, according to probabilistic information
       | about how they combine, but without any reference to meaning: a
       | stochastic parrot.
       | 
       | > However, from the perspective of work on language technology,
       | it is far from clear that all of the effort being put into using
       | large LMs to 'beat' tasks designed to test natural language
       | understanding, and all of the effort to create new such tasks,
       | once the existing ones have been bulldozed by the LMs, brings us
       | any closer to long-term goals of general language understanding
       | systems. If a large LM, endowed with hundreds of billions of
       | parameters and trained on a very large dataset, can manipulate
       | linguistic form well enough to cheat its way through tests meant
       | to require language understanding, have we learned anything of
       | value about how to build machine language understanding or have
       | we been led down the garden path?
        
         | blackbear_ wrote:
         | Designing benchmarks and metrics to test for deep understanding
         | is really hard! Even our education system is nowhere close to a
         | good solution. It would probably warrant an entire research
         | field, maybe using NLP systems to evaluate how good benchmarks
         | are at evaluating understanding.
        
         | opwieurposiu wrote:
         | In a recent conversation regarding colors, my three year old
         | explained to me that clear:color as silent:sound. I thought
         | this showed a pretty good grasp of meaning between these
         | concepts. I have no idea if he came up with this "meaning"
         | himself or if he learned it from some input corpus(likely the
         | netflix kids show corpus).
         | 
         | Later he related that because noises become quieter when they
         | are far away, colors become more transparent when they are far
         | away. This one I am pretty sure he came up with himself, as it
         | is conflating "hard to see" with "transparent".
         | 
         | My point is it not so easy to tell when humans are behaving as
         | stochastic parrots, or when they produce language based on
         | meaning. Indeed, this may be a distinction without a
         | difference.
        
       | monkeybutton wrote:
       | The paper mentions "... similar to the ones used in GPT-2's
       | training data, i.e. documents linked to from Reddit [25], plus
       | Wikipedia and a collection of books". Does anyone know what
       | collection of books they are talking about?
       | 
       | I tried following the chain of references but ended up at a pay-
       | walled source. Is it based on project gutenberg? Also, does
       | Google train their models on the contents of all the books they
       | scanned for Google Books or are they not allowed to because of
       | copyright right issues?
        
         | wfhpw wrote:
         | The "bookcorpus" is from this paper by Zhu, Kiro, et. al. [0].
         | You can see the project web page here [1] which indicates you
         | can crawl books from this site [2] to create your own. This
         | repo seeks to replicate the original dataset [3]
         | 
         | [0] https://www.cv-
         | foundation.org/openaccess/content_iccv_2015/p...
         | 
         | [1] https://yknzhu.wixsite.com/mbweb
         | 
         | [2] https://www.smashwords.com/
         | 
         | [3] https://github.com/sgraaf/Replicate-Toronto-BookCorpus
        
         | nickvincent wrote:
         | I'm curious about this as well!
         | 
         | Search results sent me to this Twitter thread which suggests it
         | was still a mystery in Aug 2020:
         | https://twitter.com/vboykis/status/1290030614410702848?lang=...
         | 
         | To speculate a bit: The GPT-3 paper (section 2.2) mentions
         | using two datasets referred to as "books1" and "books2", which
         | are 12B and 55B byte pair encoded tokens each.
         | 
         | Project Gutenberg has 3B word tokens I believe, so it seems
         | like it could be one of them, assuming the ratio of word tokens
         | to byte-pair tokens is something like 3:12 to 3:55.
         | 
         | Another likely candidate alongside Gutenberg is libgen,
         | apparently, and looks like there have been successful efforts
         | to create a similar dataset called bookcorpus:
         | https://github.com/soskek/bookcorpus/issues/27). The discussion
         | on that github issue suggests bookcorpus is very similar to
         | "books2", which would make gutenberg "books1"?
         | 
         | This might be why the paper is intentionally vague about the
         | books used?
        
       | unbiasedml wrote:
       | See also
       | 
       | "The Slodderwetenschap (Sloppy Science) of Stochastic Parrots - A
       | Plea for Science to NOT take the Route Advocated by Gebru and
       | Bender" by Michael Lissack.
       | 
       | https://arxiv.org/ftp/arxiv/papers/2101/2101.10098.pdf
       | 
       | I found this a reasonable critique of the original, despite
       | apparent TOS violations by Lissack leading to his Twitter account
       | being locked.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-03-01 23:02 UTC)