https://www.cs.utexas.edu/~gdurrett/courses/online-course/materials.html

CS388: Natural Language Processing (online MS version)

These are the course materials for an online masters course in NLP.
All lectures are videos available on YouTube.

Note on enrollment for on-campus students: This course is listed in
the course catalog as "Natural Language Processing-WB". It is a
partially asynchronous course taught for certain online masters
programs at UT ("Option III" programs, as the university calls them).
If you are a student enrolled on-campus at UT Austin, you are not
eligible to take this course. This is a hard requirement from the
university due to the fact that this course is part of an Option III
program. There is an on-campus version of CS388 that is typically
taught once per year by either me, Eunsol Choi, or Ray Mooney, which
you are eligible to take (or CS371N if you're an undergraduate
student). Regardless, you are free to consult the materials here!

Assignments

Assignment 1: Linear Sentiment Classification [code and dataset
download] [see edX for code walkthrough and debugging tips]

Assignment 2: Feedforward Neural Networks, Word Embeddings, and
Generalization [code and dataset download] [see edX for code
walkthrough and debugging tips]

Assignment 3: Transformer Language Modeling [code and dataset
download] [see edX for code walkthrough and debugging tips]

Assignment 4: Factuality and ChatGPT [code and dataset download]

Final Project: Dataset Artifacts [code and dataset download] [example
1] [example 2] [peer assessment instructions]

Lecture Videos and Readings

YouTube playlist containing all videos

Download the slides and handwritten notes here (88MB tgz)

+-------------------------------------------------------------------------------------------------------------------------+
| Topics and Videos                             | Readings                                                                |
|-------------------------------------------------------------------------------------------------------------------------|
| Week 1: Intro and Linear Classification                                                                                 |
|-------------------------------------------------------------------------------------------------------------------------|
| Course Preview                                |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Note: this introduction video is from an older run of the class and     |
| Introduction                                  | references an outdated schedule. Please refer to the new course         |
|                                               | structure here.                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Eisenstein 2.0-2.5, 4.2-4.4.1                                           |
| Linear Binary Classification                  |                                                                         |
|                                               | Perceptron and logistic regression                                      |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Sentiment Analysis and Basic Feature          | Eisenstein 4.1                                                          |
| Extraction                                    |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Basics of Learning, Gradient Descent          |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Perceptron                                    |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Perceptron as Minimizing Loss                 |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Logistic Regression                           | Perceptron and LR connections                                           |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Thumbs up? Sentiment Classification using Machine Learning Techniques   |
|                                               | Bo Pang et al., 2002                                                    |
|                                               | Baselines and Bigrams: Simple, Good Sentiment and Topic                 |
| Sentiment Analysis                            | Classification Sida Wang and Christopher Manning, 2012                  |
|                                               | Convolutional Neural Networks for Sentence Classification Yoon Kim,     |
|                                               | 2014                                                                    |
|                                               | [GitHub] NLP Progress on Sentiment Analysis                             |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Optimization Basics                           |                                                                         |
|-------------------------------------------------------------------------------------------------------------------------|
| Week 2: Multiclass and Neural Classification                                                                            |
|-------------------------------------------------------------------------------------------------------------------------|
|                                               | Eisenstein 4.2                                                          |
| Multiclass Classification                     |                                                                         |
|                                               | Multiclass lecture note                                                 |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Multiclass Perceptron and Logistic Regression |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | A large annotated corpus for learning natural language inference Sam    |
| Multiclass Classification Examples            | Bowman et al., 2015                                                     |
|                                               | Authorship Attribution of Micro-Messages Roy Schwartz et al., 2013      |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | 50 Years of Test (Un)fairness: Lessons for Machine Learning Ben         |
|                                               | Hutchinson and Margaret Mitchell, 2018                                  |
| Fairness in Classification                    |                                                                         |
|                                               | [Article] Amazon scraps secret AI recruiting tool that showed bias      |
|                                               | against women                                                           |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Neural Networks                               |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Neural Network Visualization                  | [Blog] Neural Networks, Manifolds, and Topology Chris Olah              |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Feedforward Neural Networks, Backpropagation  | Eisenstein Chapter 3.1-3.3                                              |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Neural Net Implementation                     |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Dropout: a simple way to prevent neural networks from overfitting       |
|                                               | Nitish Srivastava et al., 2014                                          |
|                                               | Batch Normalization: Accelerating Deep Network Training by Reducing     |
| Neural Net Training, Optimization             | Internal Covariate Shift Sergey Ioffe and Christian Szegedy, 2015       |
|                                               | Adam: A Method for Stochastic Optimization Durk Kingma and Jimmy Ba,    |
|                                               | 2015                                                                    |
|                                               | The Marginal Value of Adaptive Gradient Methods in Machine Learning     |
|                                               | Ashia Wilson et al., 2017                                               |
|-------------------------------------------------------------------------------------------------------------------------|
| Week 3: Word Embeddings                                                                                                 |
|-------------------------------------------------------------------------------------------------------------------------|
| Word Embeddings                               |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Skip-gram                                     | Distributed Representations of Words and Phrases and their              |
|                                               | Compositionality Tomas Mikolov et al., 2013                             |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | A Scalable Hierarchical Distributed Language Model Andriy Mnih and      |
|                                               | Geoff Hinton, 2008                                                      |
|                                               | Neural Word Embedding as Implicit Matrix Factorization Omer Levy and    |
| Other Word Embedding Methods                  | Yoav Goldberg, 2014                                                     |
|                                               | GloVe: Global Vectors for Word Representation Jeffrey Pennington et     |
|                                               | al., 2014                                                               |
|                                               | Enriching Word Vectors with Subword Information Piotr Bojanowski et     |
|                                               | al., 2016                                                               |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Man is to Computer Programmer as Woman is to Homemaker? Debiasing       |
|                                               | Word Embeddings Tolga Bolukbasi et al., 2016                            |
|                                               | Black is to Criminal as Caucasian is to Police: Detecting and           |
| Bias in Word Embeddings                       | Removing Multiclass Bias in Word Embeddings Thomas Manzini et al.,      |
|                                               | 2019                                                                    |
|                                               | Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender         |
|                                               | Biases in Word Embeddings But do not Remove Them Hila Gonen and Yoav    |
|                                               | Goldberg, 2019                                                          |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Applying Embeddings, Deep Averaging Networks  | Deep Unordered Composition Rivals Syntactic Methods for Text            |
|                                               | Classification Mohit Iyyer et al., 2015                                 |
|-------------------------------------------------------------------------------------------------------------------------|
| Week 4: Language Modeling and Self-Attention                                                                            |
|-------------------------------------------------------------------------------------------------------------------------|
| n-gram LMs                                    | Eisenstein 6.1                                                          |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Smoothing in n-gram LMs                       | Eisenstein 6.2                                                          |
|-----------------------------------------------+-------------------------------------------------------------------------|
| LM Evaluation                                 | Eisenstein 6.4                                                          |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Neural Language Models                        |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Eisenstein 6.3                                                          |
| RNNs and their Shortcomings                   |                                                                         |
|                                               | [Blog] Understanding LSTMs Chris Olah                                   |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Attention                                     | Neural Machine Translation by Jointly Learning to Align and Translate   |
|                                               | Dzmitry Bahdanau et al., 2015                                           |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Self-Attention                                | Attention Is All You Need Ashish Vaswani et al., 2017                   |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Attention Is All You Need Ashish Vaswani et al., 2017                   |
| Multi-Head Self-Attention                     |                                                                         |
|                                               | [Blog] The Illustrated Transformer Jay Alammar                          |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Attention Is All You Need Ashish Vaswani et al., 2017                   |
|                                               |                                                                         |
|                                               | Train Short, Test Long: Attention with Linear Biases Enables Input      |
| Position Encodings                            | Length Extrapolation Ofir Press et al., 2021                            |
|                                               |                                                                         |
|                                               | The Impact of Positional Encoding on Length Generalization in           |
|                                               | Transformers Amirhossein Kazemnejad et al., 2023                        |
|-------------------------------------------------------------------------------------------------------------------------|
| Week 5: Transformers and Decoding                                                                                       |
|-------------------------------------------------------------------------------------------------------------------------|
| Transformer Architecture                      | Attention Is All You Need Ashish Vaswani et al., 2017                   |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Using Transformers                            |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Transformer Language Modeling                 |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Scaling Laws for Neural Language Models Jared Kaplan et al., 2020       |
|                                               |                                                                         |
|                                               | Efficient Transformers: A Survey Yi Tay et al., 2020                    |
| Transformer Extensions                        |                                                                         |
|                                               | Rethinking Attention with Performers Krzysztof Choromanski et al.,      |
|                                               | 2021                                                                    |
|                                               |                                                                         |
|                                               | Longformer: The Long-Document Transformer Iz Beltagy et al., 2021       |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Beam Search                                   |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Nucleus Sampling                              | The Curious Case of Neural Text Degeneration Ari Holtzman et al.,       |
|                                               | 2019                                                                    |
|-------------------------------------------------------------------------------------------------------------------------|
| Week 6: Pre-training, seq2seq LMs                                                                                       |
|-------------------------------------------------------------------------------------------------------------------------|
| BERT: Masked Language Modeling                | BERT: Pre-training of Deep Bidirectional Transformers for Language      |
|                                               | Understanding Jacob Devlin et al., 2019                                 |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | BERT: Pre-training of Deep Bidirectional Transformers for Language      |
|                                               | Understanding Jacob Devlin et al., 2019                                 |
|                                               | To Tune or Not to Tune? Adapting Pretrained Representations to          |
|                                               | Diverse Tasks Matthew Peters et al., 2019                               |
| BERT: Model and Applications                  | GLUE: A Multi-Task Benchmark and Analysis Platform for Natural          |
|                                               | Language Understanding Alex Wang et al., 2019                           |
|                                               | What Does BERT Look At? An Analysis of BERT's Attention Kevin Clark     |
|                                               | et al., 2019 RoBERTa: A Robustly Optimized BERT Pretraining Approach    |
|                                               | Yinhan Liu et al., 2019                                                 |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Seq2seq Models                                |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | BART: Denoising Sequence-to-Sequence Pre-training for Natural           |
| BART                                          | Language Generation, Translation, and Comprehension Mike Lewis et       |
|                                               | al., 2019                                                               |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Exploring the Limits of Transfer Learning with a Unified Text-to-Text   |
|                                               | Transformer Colin Raffel et al., 2020                                   |
| T5                                            |                                                                         |
|                                               | UnifiedQA: Crossing Format Boundaries With a Single QA System Daniel    |
|                                               | Khashabi et al., 2020                                                   |
|                                               |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Neural Machine Translation of Rare Words with Subword Units Rico        |
|                                               | Sennrich et al., 2016                                                   |
| Word Piece and Byte Pair Encoding             |                                                                         |
|                                               | Byte Pair Encoding is Suboptimal for Language Model Pretraining Kaj     |
|                                               | Bostrom and Greg Durrett, 2020                                          |
|-------------------------------------------------------------------------------------------------------------------------|
| Week 7-8: Structured Prediction: Part-of-speech, Syntactic Parsing                                                      |
| Note: this unit was previously presented as Week 4 right after                                                          |
| classification. There are a few references to it being our first                                                        |
| brush with structured models. In this structure of the course, it's                                                     |
| still true that it's our first exposure to models dealing with                                                          |
| linguistic structure as opposed to surface-level sequential structure                                                   |
| (i.e., token sequences in generation).                                                                                  |
|-------------------------------------------------------------------------------------------------------------------------|
| Part-of-Speech Tagging                        | Eisenstein 8.1                                                          |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Sequence Labeling, Tagging with Classifiers   | Eisenstein 7.1                                                          |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Hidden Markov Models                          | Eisenstein 7.4                                                          |
|-----------------------------------------------+-------------------------------------------------------------------------|
| HMMs: Parameter Estimation                    | Eisenstein 7.4.1                                                        |
|-----------------------------------------------+-------------------------------------------------------------------------|
| HMMs: Viterbi Algorithm                       | Eisenstein 7.3                                                          |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | TnT - A Statistical Part-of-Speech Tagger Thorsten Brants, 2000         |
|                                               | Enriching the Knowledge Sources Used in a Maximum Entropy               |
|                                               | Part-of-Speech Tagger Kristina Toutanvoa and Christopher Manning,       |
| HMMs for POS Tagging                          | 2000                                                                    |
|                                               | Part-of-Speech Tagging from 97% to 100%: Is It Time for Some            |
|                                               | Linguistics? Christopher Manning, 2011                                  |
|                                               | Natural Language Processing with Small Feed-Forward Networks Jan        |
|                                               | Botha et al., 2017                                                      |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Constituency Parsing                          | Eisenstein 10.1-10.2                                                    |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Probabilistic Context-Free Grammars           | Eisenstein 10.3-10.4                                                    |
|-----------------------------------------------+-------------------------------------------------------------------------|
| CKY Algorithm                                 | Eisenstein 10.3.1                                                       |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Accurate Unlexicalized Parsing Dan Klein and Chris Manning, 2003        |
| Refining Grammars                             |                                                                         |
|                                               | Eisenstein 10.5                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Eisenstein 11.1                                                         |
| Dependencies                                  |                                                                         |
|                                               | Finding Optimal 1-Endpoint-Crossing Trees Emily Pitler et al., 2013     |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Transition-based Dependency Parsing           | Eisenstein 11.3                                                         |
|-------------------------------------------------------------------------------------------------------------------------|
| Week 9: Modern Large Language Models                                                                                    |
|-------------------------------------------------------------------------------------------------------------------------|
|                                               | Language Models are Unsupervised Multitask Learners Alec Radford et     |
|                                               | al., 2019                                                               |
|                                               |                                                                         |
|                                               | Language Models are Few-Shot Learners Tom B. Brown et al., 2020         |
|                                               |                                                                         |
| GPT-3                                         | Llama 2: Open Foundation and Fine-Tuned Chat Models Hugo Touvron et     |
|                                               | al., 2023                                                               |
|                                               |                                                                         |
|                                               | Llama 2 is one of the latest models with publicly available weights     |
|                                               | (although it is not fully open-source, as many details of the           |
|                                               | training are not public).                                               |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Zero-shot Prompting                           | Demystifying Prompts in Language Models via Perplexity Estimation       |
|                                               | Hila Gonen et al., 2022                                                 |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Calibrate Before Use: Improving Few-Shot Performance of Language        |
|                                               | Models Tony Z. Zhao et al., 2021                                        |
|                                               |                                                                         |
| Few-shot Prompting                            | Holistic Evaluation of Language Models Percy Liang et al., 2022         |
|                                               |                                                                         |
|                                               | Rethinking the Role of Demonstrations: What Makes In-Context Learning   |
|                                               | Work? Sewon Min et al., 2022                                            |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Understanding ICL: Induction Heads            | In-context Learning and Induction Heads Catherine Olsson et al., 2022   |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Multitask Prompted Training Enables Zero-Shot Task Generalization       |
|                                               | Victor Sanh et al., 2021                                                |
| Instruction Tuning                            |                                                                         |
|                                               | Scaling Instruction-Finetuned Language Models Hyung Won Chung et al.,   |
|                                               | 2022                                                                    |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Training language models to follow instructions with human feedback     |
| Reinforcement Learning from Human Feedback    | Long Ouyang et al., 2022                                                |
| (RLHF)                                        |                                                                         |
|                                               | [Website] Stanford Alpaca: An Instruction-following LLaMA Model Rohan   |
|                                               | Taori et al., 2023                                                      |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Revisiting the Gold Standard: Grounding Summarization Evaluation with   |
|                                               | Robust Human Evaluation Yixin Liu et al., 2023                          |
|                                               |                                                                         |
|                                               | WiCE: Real-World Entailment for Claims in Wikipedia Ryo Kamoi et al.,   |
|                                               | 2023                                                                    |
|                                               |                                                                         |
| Factuality of LLMs                            | SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in     |
|                                               | Summarization Philippe Laban et al., 2022                               |
|                                               |                                                                         |
|                                               | FActScore: Fine-grained Atomic Evaluation of Factual Precision in       |
|                                               | Long Form Text Generation Sewon Min et al., 2023                        |
|                                               |                                                                         |
|                                               | RARR: Researching and Revising What Language Models Say, Using          |
|                                               | Language Models Luyu Gao et al., 2022                                   |
|-------------------------------------------------------------------------------------------------------------------------|
| Week 10: Explanations                                                                                                   |
|-------------------------------------------------------------------------------------------------------------------------|
|                                               | The Mythos of Model Interpretability Zach Lipton, 2016                  |
|                                               | Deep Unordered Composition Rivals Syntactic Methods for Text            |
| Explainability in NLP                         | Classification Mohit Iyyer et al., 2015                                 |
|                                               | Analysis Methods in Neural Language Processing: A Survey Yonatan        |
|                                               | Belinkov and Jim Glass, 2019                                            |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | "Why Should I Trust You?" Explaining the Predictions of Any             |
| Local Explanations: Highlights                | Classifier Marco Tulio Ribeiro et al., 2016                             |
|                                               | Axiomatic Attribution for Deep Networks Mukund Sundararajan et al.,     |
|                                               | 2017                                                                    |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | BERT Rediscovers the Classical NLP Pipeline Ian Tenney et al., 2019     |
| Model Probing                                 | What Do You Learn From Context? Probing For Sentence Structure In       |
|                                               | Contextualized Word Represenations Ian Tenney et al., 2019              |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Annotation Artifacts in Natural Language Inference Data Suchin          |
|                                               | Gururangan et al., 2018                                                 |
|                                               | Hypothesis Only Baselines in Natural Language Inference Adam Poliak     |
| Annotation Artifacts                          | et al., 2018                                                            |
|                                               | Did the Model Understand the Question? Pramod Kaushik Mudrakarta et     |
|                                               | al., 2018                                                               |
|                                               | Swag: A Large-Scale Adversarial Dataset for Grounded Commonsense        |
|                                               | Inference Rowan Zellers et al., 2018                                    |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Generating Visual Explanations Lisa-Anne Hendricks et al., 2016         |
|                                               | e-SNLI: Natural Language Inference with Natural Language Explanations   |
| Text Explanations                             | Oana-Maria Camburu et al., 2018                                         |
|                                               | Explaining Question Answering Models through Text Generation Veronica   |
|                                               | Latcinnik and Jonathan Berant, 2020                                     |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Program Induction by Rationale Generation : Learning to Solve and       |
|                                               | Explain Algebraic Word Problems Wang Ling et al., 2017                  |
|                                               |                                                                         |
|                                               | Chain-of-Thought Prompting Elicits Reasoning in Large Language Models   |
|                                               | Jason Wei et al., 2022                                                  |
| Chain-of-thought                              |                                                                         |
|                                               | The Unreliability of Explanations in Few-shot Prompting for Textual     |
|                                               | Reasoning Xi Ye and Greg Durrett, 2022                                  |
|                                               |                                                                         |
|                                               | Large Language Models are Zero-Shot Reasoners Takeshi Kojima et al.,    |
|                                               | 2022                                                                    |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Complementary Explanations for Effective In-Context Learning Xi Ye et   |
|                                               | al., 2023                                                               |
|                                               |                                                                         |
| Chain-of-thought: Extensions and Analysis     | PAL: Program-aided Language Models Luyu Gao et al., 2022                |
|                                               |                                                                         |
|                                               | Measuring and Narrowing the Compositionality Gap in Language Models     |
|                                               | Ofir Press et al., 2022                                                 |
|-------------------------------------------------------------------------------------------------------------------------|
| Week 11: Question Answering, Dialogue Systems                                                                           |
|-------------------------------------------------------------------------------------------------------------------------|
| Reading comprehension intro                   |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension   |
| Reading comprehension: setup and baselines    | of Text Matthew Richardson et al., 2013                                 |
|                                               | SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav      |
|                                               | Rajpurkar et al., 2016                                                  |
|-----------------------------------------------+-------------------------------------------------------------------------|
| BERT for QA                                   |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Problems with Reading Comprehension           | Adversarial Examples for Evaluating Reading Comprehension Systems       |
|                                               | Robin Jia and Percy Liang, 2017                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Reading Wikipedia to Answer Open-Domain Questions Danqi Chen et al.,    |
|                                               | 2017                                                                    |
|                                               |                                                                         |
|                                               | Latent Retrieval for Weakly Supervised Open Domain Question Answering   |
|                                               | Kenton Lee et al., 2019                                                 |
|                                               |                                                                         |
| Open-domain QA                                | [Website] Natural Questions Tom Kwiatkowski et al., 2019                |
|                                               |                                                                         |
|                                               | Most modern open-domain QA systems are either "closed-book" models      |
|                                               | like ChatGPT or "open-book" models that do retrieval, similar to the    |
|                                               | Chen et al. and Lee et al. papers above. These are typically            |
|                                               | described under the general framework of retrieval-augmented            |
|                                               | generation and an example of how these systems work is WebGPT           |
|                                               | (similar to the "new Bing" chatbot).                                    |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question         |
|                                               | Answering Zhilin Yang et al., 2018                                      |
|                                               | Understanding Dataset Design Choices for Multi-hop Reasoning Jifan      |
|                                               | Chen and Greg Durrett, 2019                                             |
| Multi-hop QA                                  | Learning to Retrieve Reasoning Paths over Wikipedia Graph for           |
|                                               | Question Answering Akari Asai et al., 2020                              |
|                                               |                                                                         |
|                                               | Modern QA systems operating over the web are largely multi-hop by       |
|                                               | default; multi-hop QA has been subsumed by open-domain QA to a large    |
|                                               | extent. For a more recent multi-hop QA dataset, see QAMPARI             |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Dialogue: Chatbots                            |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Wizards of Wikipedia: Knowledge-Powered Conversational Agents Emily     |
| Task-Oriented Dialogue                        | Dinan et al., 2019                                                      |
|                                               |                                                                         |
|                                               | Task-Oriented Dialogue as Dataflow Synthesis Semantic Machines, 2020    |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | A Neural Network Approach to Context-Sensitive Generation of            |
|                                               | Conversational Responses Alessandro Sordoni et al., 2015                |
|                                               |                                                                         |
|                                               | A Diversity-Promoting Objective Function for Neural Conversation        |
|                                               | Models Jiwei Li et al., 2016                                            |
|                                               |                                                                         |
| Neural Chatbots                               | Recipes for building an open-domain chatbot Stephen Roller et al.,      |
|                                               | 2020                                                                    |
|                                               |                                                                         |
|                                               | Note: an updated version of BlenderBot is described in Kurt Shuster     |
|                                               | et al.. Other chatbots discussed, like character.ai, can be found       |
|                                               | online and you can play with them, but less information about their     |
|                                               | precise internals is available in published papers.                     |
|-------------------------------------------------------------------------------------------------------------------------|
| Week 12: Machine Translation, Summarization                                                                             |
|-------------------------------------------------------------------------------------------------------------------------|
| Machine Translation Intro                     | Eisenstein 18.1                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| MT: Framework and Evaluation                  | Eisenstein 18.1                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| MT: Word alignment                            |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| MT: IBM Models                                | HMM-Based Word Alignment in Statistical Translation Stephan Vogel et    |
|                                               | al., 1996                                                               |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine     |
|                                               | Translation Models Philipp Koehn, 2004                                  |
|                                               |                                                                         |
| Phrase-based Machine Translation              | Minimum Error Rate Training in Statistical Machine Translation Franz    |
|                                               | Och, 2003                                                               |
|                                               |                                                                         |
|                                               | Eisenstein 18.4                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Revisiting Low-Resource Neural Machine Translation: A Case Study Rico   |
|                                               | Sennrich and Biao Zhang, 2019                                           |
|                                               |                                                                         |
|                                               | In Neural Machine Translation, What Does Transfer Learning Transfer?    |
|                                               | Alham Fikri Aji et al., 2020                                            |
| Neural and Pre-Trained Machine Translation    |                                                                         |
|                                               | Multilingual Denoising Pre-training for Neural Machine Translation      |
|                                               | Yinhan Liu et al., 2020                                                 |
|                                               |                                                                         |
|                                               | Large Language Models Are State-of-the-Art Evaluators of Translation    |
|                                               | Quality Tom Kocmi and Christian Federmann, 2023                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
| Summarization Intro                           |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | The use of MMR, diversity-based reranking for reordering documents      |
|                                               | and producing summaries Jaime Carbonell and Jade Goldstein, 1998        |
|                                               | LexRank: Graph-based Lexical Centrality as Salience in Text             |
| Extractive Summarization                      | Summarization Gunes Erkan and Dragomir Radev, 2004                      |
|                                               | A Scalable Global Model for Summarization Dan Gillick and Benoit        |
|                                               | Favre, 2009                                                             |
|                                               | Revisiting the Centroid-based Method: A Strong Baseline for             |
|                                               | Multi-Document Summarization Demian Gholipour Ghalandari, 2017          |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | BART: Denoising Sequence-to-Sequence Pre-training for Natural           |
|                                               | Language Generation, Translation, and Comprehension Mike Lewis et       |
|                                               | al., 2019                                                               |
|                                               | PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive      |
|                                               | Summarization Jingqing Zhang et al., 2020                               |
|                                               | Evaluating Factuality in Generation with Dependency-level Entailment    |
|                                               | Tanya Goyal and Greg Durrett, 2020                                      |
|                                               |                                                                         |
| Pre-trained Summarization and Factuality      | Asking and Answering Questions to Evaluate the Factual Consistency of   |
|                                               | Summaries Alex Wang et al., 2020                                        |
|                                               |                                                                         |
|                                               | Note: while the specific fine-tuned modeling approaches and             |
|                                               | factuality detection systems are no longer state-of-the-art as stated   |
|                                               | in the video, they are representative of ideas from pre-training that   |
|                                               | are still used today. For discussion of how LLMs relate to              |
|                                               | summarization, see News Summarization and Evaluation in the Era of      |
|                                               | GPT-3 by Tanya Goyal, Junyi Jessy Li, and Greg Durrett                  |
|-------------------------------------------------------------------------------------------------------------------------|
| Week 13-14: Multilinguality, Language Grounding, Ethical Issues                                                         |
|-------------------------------------------------------------------------------------------------------------------------|
| Morphology                                    |                                                                         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based          |
|                                               | Projections Dipanjan Das and Slav Petrov, 2011                          |
| Cross-lingual Tagging and Parsing             |                                                                         |
|                                               | Multi-Source Transfer of Delexicalized Dependency Parsers Ryan          |
|                                               | McDonald et al., 2011                                                   |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Massively Multilingual Word Embeddings Waleed Ammar et al., 2016        |
|                                               |                                                                         |
|                                               | Massively Multilingual Sentence Embeddings for Zero-Shot                |
| Cross-lingual Pre-training                    | Cross-Lingual Transfer and Beyond Mikel Artetxe and Holger Schwenk,     |
|                                               | 2019                                                                    |
|                                               |                                                                         |
|                                               | How multilingual is Multilingual BERT? Telmo Pires et al., 2019         |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Climbing towards NLU: On Meaning, Form, and Understanding in the Age    |
|                                               | of Data Emily Bender and Alexander Koller, 2020                         |
|                                               |                                                                         |
|                                               | Provable Limitations of Acquiring Meaning from Ungrounded Form: What    |
| Language Grounding                            | Will Future Language Models Understand? Will Merrill et al., 2021       |
|                                               |                                                                         |
|                                               | Entailment Semantics Can Be Extracted from an Ideal Language Model      |
|                                               | Will Merrill et al., 2022                                               |
|                                               |                                                                         |
|                                               | Experience Grounds Language Yonatan Bisk et al., 2020                   |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | VQA: Visual Question Answering Aishwarya Agrawal et al., 2015           |
| Language and Vision                           |                                                                         |
|                                               | Learning Transferable Visual Models From Natural Language Supervision   |
|                                               | Alex Radford et al., 2021                                               |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | The Social Impact of Natural Language Processing Dirk Hovy and          |
|                                               | Shannon Spruit, 2016                                                    |
| Ethics: Bias                                  |                                                                         |
|                                               | Men Also Like Shopping: Reducing Gender Bias Amplification using        |
|                                               | Corpus-level Constraints Jieyu Zhao et al., 2017                        |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained   |
|                                               | Language Models Da Yin et al., 2022                                     |
| Ethics: Exclusion                             |                                                                         |
|                                               | Visually Grounded Reasoning across Languages and Cultures Fangyu Liu    |
|                                               | et al., 2021                                                            |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?   |
|                                               | Emily Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret         |
| Ethics: Dangers of Automation                 | Shmitchell, 2021                                                        |
|                                               |                                                                         |
|                                               | RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language   |
|                                               | Models Samuel Gehman et al., 2020                                       |
|-----------------------------------------------+-------------------------------------------------------------------------|
|                                               | Datasheets for Datasets Timnit Gebru et al., 2018                       |
| Ethics: Unethical Use and Paths Forward       |                                                                         |
|                                               | Closing the AI Accountability Gap: Defining an End-to-End Framework     |
|                                               | for Internal Algorithmic Auditing Deb Raji et al., 2020                 |
+-------------------------------------------------------------------------------------------------------------------------+