[HN Gopher] Tighter bounds on the expressivity of transformer en...
       ___________________________________________________________________
        
       Tighter bounds on the expressivity of transformer encoders
        
       Author : bmc7505
       Score  : 38 points
       Date   : 2023-04-25 17:02 UTC (5 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | sharemywin wrote:
       | We connect and strengthen these results by identifying a variant
       | of first-order logic with counting quantifiers that is
       | simultaneously an upper bound for fixed-precision transformer
       | encoders and a lower bound for transformer encoders.
       | 
       | not sure what a fixed precision transformer is?
        
         | eutectic wrote:
         | I think it means the analysis accounts for rounding error.
        
           | ftxbro wrote:
           | Yes it's this. They have identified an expressivity level
           | that they are calling FOC[+;MOD] which is intuitively
           | supposed to capture the expressivity that transformers and
           | transformers-with-rounding have in common. More formally they
           | are trying to track down exactly how this level compares to
           | that of transformers, transformers-with-rounding, simplified
           | stateless counter machines, and uniform TC(0), and they have
           | done it with partial success and they are reporting their
           | results.
        
         | bmc7505 wrote:
         | See Merrill & Sabharwal (2023) [1]:                 A
         | transformer N is a specific kind of a computation graph family
         | over vectors of floating-point numbers with p precision. We
         | work with fixed-precision transformers, i.e., we take p to be
         | constant with respect to the input sequence length n. In
         | practice, transformers typically use p = 32, and, several
         | newer, larger transformer language models use p = 16 (Brown et
         | al., 2020; Zhang et al., 2022), even while allowing larger
         | context lengths n than their predecessors.
         | 
         | [1]: https://arxiv.org/pdf/2210.02671.pdf#page=7
        
       ___________________________________________________________________
       (page generated 2023-04-25 23:00 UTC)