[HN Gopher] Why are QR Codes with capital letters smaller than Q...
       ___________________________________________________________________
        
       Why are QR Codes with capital letters smaller than QR codes with
       lower case?
        
       Author : todsacerdoti
       Score  : 113 points
       Date   : 2025-02-23 13:25 UTC (2 days ago)
        
 (HTM) web link (shkspr.mobi)
 (TXT) w3m dump (shkspr.mobi)
        
       | parsimo2010 wrote:
       | tl;dr: Upper case letters can be represented in "alphanumeric"
       | mode, which uses 11 bits per two characters (5.5 bits per
       | character, but padded if the length is uneven). Lower case
       | letters are not included in alphanumeric mode, so the QR code has
       | to be represented in byte mode, which uses 8 bits per character.
        
       | thequux wrote:
       | One of the things that Data Matrix got right was being able to
       | shift between encoding regimes mid-stream. Many character sets
       | can be represented in radix-40 (so three characters per two
       | bytes), and the occasional capital character can be handled by a
       | shift byte. If you have a long string of digits, they can be
       | encoded in 4 bits/char. You can even put raw binary in there if
       | need be
        
         | nayuki wrote:
         | A QR Code consists of a sequence of segments. Each segment has
         | a mode - numeric, alphanumeric, kanji, or byte. It is possible
         | to shift between encoding regimes by ending a segment and
         | beginning a new segment with a different mode.
         | https://www.nayuki.io/page/optimal-text-segmentation-for-qr-...
         | 
         | Some 1D barcodes have inline shift symbols like you said for
         | Data Matrix, though. e.g.
         | https://en.wikipedia.org/wiki/Code_128
        
         | silotis wrote:
         | I seems to me best approach would be to compress the contents
         | with a Huffman code or some other entropy encoding. All this
         | business of restricted character sets is just an ad-hoc way of
         | reducing the size of each symbol and we've got much more mature
         | solutions for that.
        
       | lifthrasiir wrote:
       | And that's why you should probably use base45 for binary data in
       | QR codes: https://www.rfc-editor.org/rfc/rfc9285.html
        
       | cypherpunks01 wrote:
       | My favorite QR code visual explainer:
       | 
       | https://qr.blinry.org/
        
         | SirFatty wrote:
         | Thank you for that! A lot is over my head, but very interesting
         | none-the-less.
        
       | smashed wrote:
       | > Byte mode uses (you guessed it!) 8 bits per single character.
       | 
       | 8 bits is enough to represent the entire ascii char table, there
       | must be some other limitation going on. QR code control chars
       | maybe?
       | 
       | The linked "byte mode" table only has 45 individual chars. This
       | could be represented with 6 bits with room to spare..
        
         | wvbdmp wrote:
         | Apparently you can specify the text encoding in a thing called
         | "ECI", but support varies and most readers just guess the
         | encoding by the bytes. I imagine these days most are UTF8
         | https://stackoverflow.com/questions/9699657/is-utf-8-the-enc...
        
         | duskwuff wrote:
         | > 8 bits is enough to represent the entire ascii char table,
         | there must be some other limitation going on. QR code control
         | chars maybe?
         | 
         | The specified capacity of "25 characters" for QR code size is
         | 25 characters in alphanumeric mode, not in byte mode.
         | 
         | > The linked "byte mode" table only has 45 individual chars.
         | This could be represented with 6 bits with room to spare..
         | 
         | Even better than that - it's 5.5 bits per character! Each pair
         | of characters is represented as a single 11-bit code unit.
         | (This works because 45 x 45 = 2025, which is just barely under
         | 2^11 = 2048.)
         | 
         | There's apparently some support in the QR standard for mixed-
         | encoding codes, but few encoders seem to use that.
        
       | recursive wrote:
       | All this bitwise optimization, but most of the QR codes I see in
       | the wild have >100 bytes of useless cruft like
       | https://engagement.bigcompany.com/campaigns/1485-0123/landin...
        
         | nomel wrote:
         | It's not useless, it just makes the tracking data (which it
         | almost always is) _easier to implement_.
        
       ___________________________________________________________________
       (page generated 2025-02-25 23:00 UTC)