https://arxiv.org/abs/2106.06981 close this message Donate to arXiv Please join the Simons Foundation and our generous member organizations in supporting arXiv during our giving campaign September 23-27. 100% of your contribution will fund improvements and new initiatives to benefit arXiv's global scientific community. DONATE [secure site, no need to create account] Skip to main content Cornell University We gratefully acknowledge support from the Simons Foundation and member institutions. arXiv.org > cs > arXiv:2106.06981 [ ] Help | Advanced Search [All fields ] Search arXiv Cornell University Logo [ ] GO quick links * Login * Help Pages * About Computer Science > Machine Learning arXiv:2106.06981 (cs) [Submitted on 13 Jun 2021] Title:Thinking Like Transformers Authors:Gail Weiss, Yoav Goldberg, Eran Yahav Download PDF Abstract: What is the computational model behind a Transformer? Where recurrent neural networks have direct parallels in finite state machines, allowing clear discussion and thought around architecture variants or trained models, Transformers have no such familiar parallel. In this paper we aim to change that, proposing a computational model for the transformer-encoder in the form of a programming language. We map the basic components of a transformer-encoder -- attention and feed-forward computation -- into simple primitives, around which we form a programming language: the Restricted Access Sequence Processing Language (RASP). We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer, and how a Transformer can be trained to mimic a RASP solution. In particular, we provide RASP programs for histograms, sorting, and Dyck-languages. We further use our model to relate their difficulty in terms of the number of required layers and attention heads: analyzing a RASP program implies a maximum number of heads and layers necessary to encode a task in a transformer. Finally, we see how insights gained from our abstraction might be used to explain phenomena seen in recent works. Comments: ICML 2021 Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL) Cite as: arXiv:2106.06981 [cs.LG] (or arXiv:2106.06981v1 [cs.LG] for this version) Submission history From: Gail Weiss [view email] [v1] Sun, 13 Jun 2021 13:04:46 UTC (3,616 KB) Full-text links: Download: * PDF * Other formats (license) Current browse context: cs.LG < prev | next > new | recent | 2106 Change to browse by: cs cs.CL References & Citations * NASA ADS * Google Scholar * Semantic Scholar a export bibtex citation Loading... Bibtex formatted citation x [loading... ] Data provided by: Bookmark BibSonomy logo Mendeley logo Reddit logo ScienceWISE logo (*) Bibliographic Tools Bibliographic and Citation Tools [ ] Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) ( ) Code & Data Code and Data Associated with this Article [ ] arXiv Links to Code Toggle arXiv Links to Code & Data (What is Links to Code & Data?) ( ) Related Papers Recommenders and Search Tools [ ] Connected Papers Toggle Connected Papers (What is Connected Papers?) [ ] Core recommender toggle CORE Recommender (What is CORE?) ( ) About arXivLabs arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs and how to get involved. Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?) * About * Help * Click here to contact arXiv Contact * Click here to subscribe Subscribe * Copyright * Privacy Policy * Web Accessibility Assistance * arXiv Operational Status Get status notifications via email or slack