[HN Gopher] 10 > 64, in QR Codes
       ___________________________________________________________________
        
       10 > 64, in QR Codes
        
       Author : yvan
       Score  : 250 points
       Date   : 2024-04-01 13:48 UTC (1 days ago)
        
 (HTM) web link (huonw.github.io)
 (TXT) w3m dump (huonw.github.io)
        
       | komlan wrote:
       | This is particularly useful for numeric data that is usually
       | displayed in hex, like UUIDs [1]
       | 
       | I used this for digital QR code tickets [2], and it made the
       | codes so much easier to scan, even with bad lighting.
       | 
       | [1] https://news.ycombinator.com/item?id=39094251
       | 
       | [2]
       | https://workspace.google.com/marketplace/app/qr_code_ticket_...
        
       | zygentoma wrote:
       | qrencode -t UTF8 https://www.service.nsw.gov.au/campaign/service-
       | nsw-mobile-app?data=eyJ0IjoiY292aWQxOV9idXNpbmVzcyIsImJpZCI6IjEyM
       | TMyMSIsImJuYW1lIjoiVGVzdCBOU1cgR292ZXJubWVudCBRUiBjb2RlIiwiYmFkZH
       | Jlc3MiOiJCdXNpbmVzcyBhZGRyZXNzIGdvZXMgaGVyZSAifQ==
       | 
       | vs                   qrencode -t UTF8
       | https://www.service.nsw.gov.au/campaign/service-nsw-mobile-app?da
       | ta=07268568088551018982199489257790063821578941925846323948853349
       | 92789559112405122791116333362867370890083842930669319743113055333
       | 37894591404330656702603998035920596585517131555967430155259257402
       | 71167169927643240820915139763817497440984288389845652728902601340
       | 4155725275860173673194594939
       | 
       | The latter one is actually smaller. TIL
        
         | pimlottc wrote:
         | Note that the URL also has to be encoded in two segments, so
         | that the decimal part can use a more efficient QR encoding than
         | the alphanumeric base URL.
         | 
         | I'm not sure that qrencode CLI tool will automatically do this
         | for you.
         | 
         | > In a URL, the rest of the URL is not purely numeric, so
         | actually seeing the benefits of this encoding requires using
         | two segments:
         | 
         | > * one with the "boring" bits of the URL at the start, likely
         | using the Binary mode
         | 
         | > * one with the big blob of base10 data, using the Numeric
         | mode
        
           | lifthrasiir wrote:
           | > I'm not sure that qrencode CLI tool will automatically do
           | this for you.
           | 
           | If I'm looking at the correct repository, it does [1].
           | 
           | [1]
           | https://github.com/fukuchi/libqrencode/blob/master/split.c
        
             | pimlottc wrote:
             | You're right! I briefly searched the code but I missed
             | seeing that since they don't use the term "segment".
        
       | MrBuddyCasino wrote:
       | We have a similar problem at work right now, but due to different
       | constraints we've settled on Base85. Slightly denser than Base64,
       | but still just plain old printable ASCII characters and the
       | following characters are still "free" so one can use them as
       | field delimiters in a CSV-style format:                   "',/[]\
       | 
       | Incidentally, this also makes them JSON-Safe.
       | 
       | Base94 uses all printable characters, and Base122 uses both
       | printable characters and whitespace.
       | 
       | UUIDs encoded in various alphabets:                   len
       | algo               value         24 Base64 padded
       | wScmB8cVS/K05Wk+nORR8Q==         22 Base64 unpadded
       | osnQ3DUDTDuUQBc9mBRYFw         20 Base85
       | rHoLuTk%W0fgpY+`c>xc          20 Base94
       | d(+H"Q/hP}i}d9<KeAt)%         18 Base122
       | @#FoALt`92vSt@
        
       | pimlottc wrote:
       | This is really great, I didn't know you could switch encoding
       | schemes within the same QR code. There's a nifty visualization
       | tool [0] that shows how this can reduce QR code sizes. It can
       | determine the optimal segmentation strategy for any string and
       | display a color-code version with statistics. Very nice!
       | 
       | 0: https://www.nayuki.io/page/optimal-text-segmentation-for-
       | qr-...
        
         | nick238 wrote:
         | Seems like the lede was buried in the article; I know a bit
         | about QR codes: there's different modes for alphanum, binary,
         | kanji, etc, and error correcting capacity...but being able to
         | switch character sets in the middle was new to me.
        
           | pclmulqdq wrote:
           | I am not entirely sure why you would want to switch encodings
           | for URLs, personally. If you use alphanumeric encoding and a
           | URL in Base36, you are pretty much information-theoretically
           | optimal.
        
             | planede wrote:
             | > you are pretty much information-theoretically optimal
             | 
             | base36 with alphanumeric mode encoding has around 6.38%
             | overhead compared to base10's 0.34% overhead in numeric
             | mode. So numeric mode gets you closer to optimal.
        
             | daxterspeed wrote:
             | The issue is that QR's alphanumeric segments are uppercase
             | only, and while browsers will automatically lowercase the
             | protocol and domain name, you'll have to either have all
             | your paths be uppercase or automatically lowercase paths.
             | On top of that when someone scans the code it will likely
             | be presented with an uppercase URL (if it doesn't
             | automatically open in a browser) and that should alert
             | anyone that doesn't already know that uppercase domains are
             | equivalent to lowercase domains.
             | 
             | Ideally QR codes would have had a segment to encode URIs
             | more efficiently (73-82 characters depending on how the
             | implementation decided to handle the "unreserved marks"),
             | but that ship has long sailed.
        
               | pclmulqdq wrote:
               | Many QR code readers will auto-lowercase URLs that are
               | encoded in alphanumeric encoding. The rest will recognize
               | uppercase URLs just fine. Alphanumeric encoding was
               | basically made for URLs.
        
               | pimlottc wrote:
               | The QR alphanumeric input encoding does not include basic
               | URL query string characters like '?' '&' '='
        
         | zamfi wrote:
         | Speaking of visualization...that last figure in this post is
         | super interesting in part because you can actually _see_ some
         | of the redundancy in the base64 encoding on the left, in the
         | patterns of vertical lines.
         | 
         | In general, better compression means output that looks more
         | like "randomness"--any redundancy implies there was room for
         | more compression--and that figure makes this quite clear
         | visually!
        
       | londons_explore wrote:
       | Or... Don't encode data in the URL at all. If your data isn't
       | secret or per-user, have it go to https://yoursite.com/gh. If it
       | is security sensitive, go to https://yoursite.com/Qhm4Qr55mS
       | 
       | 2 alphanumerics (=4000 links) is plenty to encode a link to all
       | the major pages of your website/service you may want to
       | advertise. 10 alphanumerics (=10^18) is plenty that even if every
       | person in the world had a QR code, nobody could guess one before
       | hitting your rate limiter.
       | 
       | The user experience gained by fast reliable scanning is far
       | greater than that enabled by slightly improved offline support
       | (offline functionality requires that the user already has your
       | app installed, and in that case, you could preload any codes that
       | user had access to).
        
         | pimlottc wrote:
         | As the article mentions, they need to include the data so that
         | the app could work offline, at least to some degree.
        
         | chrisfinazzo wrote:
         | To play devil's advocate for a moment...
         | 
         | Wouldn't this break Deep/Universal links which send a user
         | directly to a specific location within an app?
         | 
         | I get that there are potential security/privacy concerns, but
         | if you are in full control of URL schemes, isn't that purpose
         | of this feature?
        
         | nneonneo wrote:
         | In the case of vaccine cards, which the OP uses as the case
         | study, it's better to have the entire card offline for both
         | privacy and offline use purposes.
        
       | Karellen wrote:
       | I'm not that familiar with QR codes. Anyone know how
       | base16/hexadecimal encoding with 0-9A-F fares in comparison? It
       | seems like an obvious encoding to test, especially for simplicity
       | of implementation compared to base64 and base10, and an odd one
       | to miss for comparison?
        
         | komlan wrote:
         | Hex is worse, see here [1] for UUIDs
         | 
         | [1] https://news.ycombinator.com/item?id=39094251
        
         | pimlottc wrote:
         | The QR standard does not have a specific encoding mode [0] for
         | hexademical, it would have to use alphanumeric. Since you'd
         | only be using 16 out of 35 possible characters, it would be
         | much less efficient.
         | 
         | 0: https://en.wikipedia.org/wiki/QR_code#Information_capacity
        
         | Karliss wrote:
         | Most compact QR encoding capable of representing hex symbols is
         | alphanumeric mode which requires 5.5 bits per character. Which
         | means the output will be 5.5/4 = 1.375 times longer than
         | encoded binary data or 37.5% overhead. That's even worse than
         | 8/6 =1.33 you get for doing base64 encoding on top of byte
         | mode.
        
         | dbaupp wrote:
         | Ah, it is a good point that it might be worth comparing to, but
         | it is far worse.
         | 
         | Abstractly, it requires approximately log(45)/log(16) output
         | bits per input bit, an overhead of 37%.
         | 
         | Making this more concrete: each input byte is encoded as two
         | hex digits, and two hex digits have to be encoded as two
         | Alphanumeric characters. It thus takes 11 bits in the QR code
         | bit stream to store 8 bits of input.
        
           | dbaupp wrote:
           | (I've added an analysis of this and other bases to the
           | article: https://huonw.github.io/blog/2024/03/qr-
           | base10-base64/#fn:ot...)
        
       | ptramo wrote:
       | https://zat.is uses uppercase base32 for URL checksums, as
       | alphanumeric QR codes can contain 0-9, A-Z (upper-case only),
       | space, $, %, *, +, -, ., /, :. Overhead is only 10% (5.5 bits / 5
       | bits). All links fit in a 3333px image, margins included, so
       | little point in improving on that for URLs so short. The tradeoff
       | is that the checksum to URL mapping is stored in a backend and
       | networking is required to learn anything about the real URL.
        
       | YoshiRulz wrote:
       | Thanks to the author's previous post, I instantly recognised the
       | `eyJ` prefix as the start of a JSON object!
        
         | JadeNB wrote:
         | The previous post: [Mechanical sympathy for QR codes: making
         | NSW check-in better](https://huonw.github.io/blog/2021/10/nsw-
         | covid-qr).
        
       | planede wrote:
       | base10 can be awkward to work with for large data, one can also
       | consider:
       | 
       | base8 in numeric mode: 8 input bits -> 3 digits -> 10 output
       | bits, 25% overhead
       | 
       | base32 in alphanumeric mode: 5 input bits -> 1 character -> 5.5
       | output bits, 10% overhead
       | 
       | I would prefer base32 out of these too, but it's interesting that
       | even base8 beats base64 here.
        
         | dbaupp wrote:
         | Good point! I've added an analysis of this and other bases to
         | https://huonw.github.io/blog/2024/03/qr-base10-base64/#fn:ot...
        
       | sfmz wrote:
       | I had an idea to embed a webpage in a dataurl and convert that to
       | a QR code; the website would only exist on if you snapped the QR
       | Code. I was dreaming of code-golf, demoscene, nft and weird
       | business card applications, but the web-browsers ruined my fun
       | because they won't display dataURL unless you manually copy/paste
       | it into the URL bar.
       | 
       | https://issues.chromium.org/issues/40502904
        
         | ptramo wrote:
         | Good idea. Built https://srv.us/d that does (edited):
         | <html><body><script>document.body.innerHTML = decodeURI(window.
         | location.hash.substring(1))</script></body></html>
         | 
         | So you can point to https://srv.us/d#<h1>Demo</h1>
        
           | sfmz wrote:
           | Almost, page doesn't have a body yet, so you get a null ref.
           | I thought of spinning it up or hosting on ipfs, but it still
           | won't live forever, somebody will lose interest and stop
           | paying the DNS costs or similar.
        
             | ptramo wrote:
             | Fixed, thanks! Yes, longevity is an issue with anything
             | online. On the other hand, in this case recovery is not a
             | huge issue for anybody technical enough.
        
       | pimlottc wrote:
       | In an ideal world, the QR standard would include a specific URL
       | encoding scheme that exactly matches the URL-safe character set.
       | But I suppose there's no real practical way to make big changes
       | to the QR spec now, what with all the thousands of
       | implementations in the wild.
        
         | adrianmonk wrote:
         | I'd rather they had just done entropy coding. It's good at
         | using a smaller number of bits per character to represent a
         | subset of characters, which is what they're trying to do. But
         | it's more general, so you wouldn't be limited to only those
         | characters.
         | 
         | Huffman is probably simple enough. The typical approach is
         | adaptive Huffman, which doesn't compress the initial data very
         | well since it needs to adjust to actual character frequencies.
         | So that wouldn't work well for QR codes since they're short.
         | 
         | But you can start adaptive Huffman with a pre-agreed initial
         | tree (as static Huffman does), which would give good
         | compression from the start. There could be several standard
         | pre-agreed Huffman trees, and instead of using bits in the QR
         | code to select a character set, those bits could select one of
         | a few pre-agreed initial Huffman trees.
        
       | planede wrote:
       | For dealing with larger data I would probably split the input
       | bits into 63 bit chunks, which can be encoded in 19 decimal
       | digits. 63 input bits turn into 19 digits which in turn is
       | encoded in 63.33... output bits on average. This has an overhead
       | of 0.53% instead of 0.34% of pure base10, which I think is
       | acceptable. But then you don't have to bring bignum libraries
       | into the picture, as each chunk fits into a 64bit integral type.
       | 
       | 64bit chunks are a little bit worse, with 4.16% overhead, so it
       | might be worth dealing with the little complexity of 63 bit
       | chunks.
       | 
       | I would also output the decimal digits in little-endian order.
       | 
       | edit: If you are willing to go for larger chunks then 93bit
       | chunks would be my next candidate, there the overhead is 0.36%,
       | barely more than pure base10's 0.34%. I don't think it's worth
       | going any higher.
        
         | __s wrote:
         | 127 bit integers get 38 digits, which lines up well with 128
         | bit integers
        
           | leni536 wrote:
           | No, they get 39 decimal digits (1.7e38 is 39 decimal digits).
           | 127bit chunks would get you 2.36% overhead, which is not bad.
           | However at 93bit chunks can (barely) be encoded in 28 digits
           | (2^93 ~= 9.9e27) and it's more efficient at around 0.36%
           | overhead. So once you have 128 bit arithmetic, it's still not
           | worth using all or most of those bits per chunk, 93bit chunks
           | is the most efficient under 128 bits.
        
       | ingen0s wrote:
       | you had me at hello
        
       | buildsjets wrote:
       | I need to re-run the math based on this info, but a while back, I
       | wanted to figure out the maximum density of QR codes that could
       | be reliably printed on a sheet of plain paper with a laser
       | printer, then optically scanned and re-digitized. I recall the
       | answer was about the same as a double-density 5.25" floppy disk,
       | which is 320kb.
        
       | JadeNB wrote:
       | Why does the title have a "'" that isn't in the document ("'10 >
       | 64, in QR Codes" versus "10 > 64, in QR Codes")?
        
         | dbaupp wrote:
         | Hacker News strips leading digits targeted at "listicles" (e.g.
         | "10 ways to fizz buzz" -> "Ways to fizz buzz"), so tricks are
         | required if the digits are actually important.
        
       | PanMan wrote:
       | Cool article. What I've wondered, and the article doesn't touch
       | on: In "normal" usage (not damaged QR codes), what's the best
       | error correction to use, with a fixed output size (eg a sticker)?
       | Using a higher level, results in more bytes, and thus a larger
       | QR, which, when printed, results in smaller details. Is it better
       | to have a low error correction, resulting in large blobs, or to
       | have higher error correction, resulting in smaller details, which
       | I guess will be harder to scan, but more room for correction?
        
         | master-lincoln wrote:
         | I guess this is a trade off that depends on your use case: from
         | which distance does the qr code need to be scannable, what
         | cameras do we expect to be used for scanning, how likely is
         | what kind of damage to parts of the qr code, ...
        
           | spamatica wrote:
           | Indeed. The local bus transit has a digital ticket system
           | with QR codes for tickets. I haven't actually tried decoding
           | the codes but just seeing them and interacting with them I
           | can tell they have gone WAY overboard with either the amount
           | of data they try to fit or the amount of error correction.
           | Probably both. They are nearly unscannable due to their size
           | and all the bus drivers just wave you along if you don't
           | manage to scan it.
        
         | dbaupp wrote:
         | Yeah, I had had the same question! One of my earlier articles
         | experiments with this: https://huonw.github.io/blog/2021/09/qr-
         | error-correction/
         | 
         | Figure 8 and its surrounding section are the undamaged case.
        
       | chpatrick wrote:
       | Didn't the EU covid passport decide to use the text encoding mode
       | because it's the only one that scanners supported reliably?
        
       | pclmulqdq wrote:
       | I'm not sure if anyone uses Base36 any more (or its more obscure
       | sister, Base32), but it uses [0-9, A-Z] as its alphabet. It is
       | URL safe and also smaller than base 10 in character count for
       | each number, and is the smallest standard URL-safe encoding that
       | works with alphanumeric QR codes.
       | 
       | I sort of assumed this was common knowledge, but I guess not.
        
         | 91bananas wrote:
         | Tooling is probably what dictates this more than anything.
         | atob() is everywhere.
        
           | knallfrosch wrote:
           | Yeah, I don't get it. Assume I have a standard URL with query
           | params, the web browser doesn't understand the decimal
           | encoding - right?
           | 
           | Let's assume... this: https://news.ycombinator.com/reply?id=3
           | 9907672&goto=item%3Fi...
           | 
           | The special encoding is just about sending data to the
           | backend?
        
         | dbaupp wrote:
         | I implicitly ignored encoding schemes like base 36 and 32 (and
         | 16, referenced elsewhere in the thread) because they're not as
         | good as the schemes referenced in the post. The best you can
         | get that's fully URL safe with Alphanumeric is a hypothetical
         | base 39, referenced in a footnote, and only using 39 of the 45
         | possible characters has 3.9% overhead (even ignoring the 50%
         | overhead of the https://www.rfc-editor.org/rfc/rfc9285.html
         | encoding).
         | 
         | I've added an analysis of many more bases to the article:
         | https://huonw.github.io/blog/2024/03/qr-base10-base64/#fn:ot...
        
       | FullyFunctional wrote:
       | This is fascinating, but I was curious about the last two QR
       | codes. The left one is scannable on my iPhone (iOS 17.4.1)
       | leading to http://example.com/AAE..._w8fL whereas the one on the
       | right gets only http://example.com (both Safari and Firefox). Is
       | this an iOS URL length limitation?
        
         | dbaupp wrote:
         | Good catch! I should've tested. I've added a paragraph to
         | https://huonw.github.io/blog/2024/03/qr-base10-base64/#extre...
         | about this.
        
       | bytecodes wrote:
       | 1. Pretty neat to switch encoding in the middle of the URL. It
       | does look like it works and it does look like a better encoding.
       | This is cool.
       | 
       | 2. I'd have called this base-1000. It's using 3-digit numbers
       | encoded into 10 bits. Base64 doesn't encode into 64 bits, it uses
       | 64 characters encoded into 6 bits. And this encoding uses 000 to
       | 999, encoded into 10 bits. But that messes up the title when you
       | compare apples to apples, 1000 > 64 is just obvious and true.
        
       ___________________________________________________________________
       (page generated 2024-04-02 23:00 UTC)