[HN Gopher] Tighter bounds on the expressivity of transformer en...
___________________________________________________________________
Tighter bounds on the expressivity of transformer encoders
Author : bmc7505
Score : 38 points
Date : 2023-04-25 17:02 UTC (5 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| sharemywin wrote:
| We connect and strengthen these results by identifying a variant
| of first-order logic with counting quantifiers that is
| simultaneously an upper bound for fixed-precision transformer
| encoders and a lower bound for transformer encoders.
|
| not sure what a fixed precision transformer is?
| eutectic wrote:
| I think it means the analysis accounts for rounding error.
| ftxbro wrote:
| Yes it's this. They have identified an expressivity level
| that they are calling FOC[+;MOD] which is intuitively
| supposed to capture the expressivity that transformers and
| transformers-with-rounding have in common. More formally they
| are trying to track down exactly how this level compares to
| that of transformers, transformers-with-rounding, simplified
| stateless counter machines, and uniform TC(0), and they have
| done it with partial success and they are reporting their
| results.
| bmc7505 wrote:
| See Merrill & Sabharwal (2023) [1]: A
| transformer N is a specific kind of a computation graph family
| over vectors of floating-point numbers with p precision. We
| work with fixed-precision transformers, i.e., we take p to be
| constant with respect to the input sequence length n. In
| practice, transformers typically use p = 32, and, several
| newer, larger transformer language models use p = 16 (Brown et
| al., 2020; Zhang et al., 2022), even while allowing larger
| context lengths n than their predecessors.
|
| [1]: https://arxiv.org/pdf/2210.02671.pdf#page=7
___________________________________________________________________
(page generated 2023-04-25 23:00 UTC)