[HN Gopher] Why are QR Codes with capital letters smaller than Q...
___________________________________________________________________
Why are QR Codes with capital letters smaller than QR codes with
lower case?
Author : todsacerdoti
Score : 113 points
Date : 2025-02-23 13:25 UTC (2 days ago)
(HTM) web link (shkspr.mobi)
(TXT) w3m dump (shkspr.mobi)
| parsimo2010 wrote:
| tl;dr: Upper case letters can be represented in "alphanumeric"
| mode, which uses 11 bits per two characters (5.5 bits per
| character, but padded if the length is uneven). Lower case
| letters are not included in alphanumeric mode, so the QR code has
| to be represented in byte mode, which uses 8 bits per character.
| thequux wrote:
| One of the things that Data Matrix got right was being able to
| shift between encoding regimes mid-stream. Many character sets
| can be represented in radix-40 (so three characters per two
| bytes), and the occasional capital character can be handled by a
| shift byte. If you have a long string of digits, they can be
| encoded in 4 bits/char. You can even put raw binary in there if
| need be
| nayuki wrote:
| A QR Code consists of a sequence of segments. Each segment has
| a mode - numeric, alphanumeric, kanji, or byte. It is possible
| to shift between encoding regimes by ending a segment and
| beginning a new segment with a different mode.
| https://www.nayuki.io/page/optimal-text-segmentation-for-qr-...
|
| Some 1D barcodes have inline shift symbols like you said for
| Data Matrix, though. e.g.
| https://en.wikipedia.org/wiki/Code_128
| silotis wrote:
| I seems to me best approach would be to compress the contents
| with a Huffman code or some other entropy encoding. All this
| business of restricted character sets is just an ad-hoc way of
| reducing the size of each symbol and we've got much more mature
| solutions for that.
| lifthrasiir wrote:
| And that's why you should probably use base45 for binary data in
| QR codes: https://www.rfc-editor.org/rfc/rfc9285.html
| cypherpunks01 wrote:
| My favorite QR code visual explainer:
|
| https://qr.blinry.org/
| SirFatty wrote:
| Thank you for that! A lot is over my head, but very interesting
| none-the-less.
| smashed wrote:
| > Byte mode uses (you guessed it!) 8 bits per single character.
|
| 8 bits is enough to represent the entire ascii char table, there
| must be some other limitation going on. QR code control chars
| maybe?
|
| The linked "byte mode" table only has 45 individual chars. This
| could be represented with 6 bits with room to spare..
| wvbdmp wrote:
| Apparently you can specify the text encoding in a thing called
| "ECI", but support varies and most readers just guess the
| encoding by the bytes. I imagine these days most are UTF8
| https://stackoverflow.com/questions/9699657/is-utf-8-the-enc...
| duskwuff wrote:
| > 8 bits is enough to represent the entire ascii char table,
| there must be some other limitation going on. QR code control
| chars maybe?
|
| The specified capacity of "25 characters" for QR code size is
| 25 characters in alphanumeric mode, not in byte mode.
|
| > The linked "byte mode" table only has 45 individual chars.
| This could be represented with 6 bits with room to spare..
|
| Even better than that - it's 5.5 bits per character! Each pair
| of characters is represented as a single 11-bit code unit.
| (This works because 45 x 45 = 2025, which is just barely under
| 2^11 = 2048.)
|
| There's apparently some support in the QR standard for mixed-
| encoding codes, but few encoders seem to use that.
| recursive wrote:
| All this bitwise optimization, but most of the QR codes I see in
| the wild have >100 bytes of useless cruft like
| https://engagement.bigcompany.com/campaigns/1485-0123/landin...
| nomel wrote:
| It's not useless, it just makes the tracking data (which it
| almost always is) _easier to implement_.
___________________________________________________________________
(page generated 2025-02-25 23:00 UTC)