[HN Gopher] 10 > 64, in QR Codes
___________________________________________________________________
10 > 64, in QR Codes
Author : yvan
Score : 250 points
Date : 2024-04-01 13:48 UTC (1 days ago)
(HTM) web link (huonw.github.io)
(TXT) w3m dump (huonw.github.io)
| komlan wrote:
| This is particularly useful for numeric data that is usually
| displayed in hex, like UUIDs [1]
|
| I used this for digital QR code tickets [2], and it made the
| codes so much easier to scan, even with bad lighting.
|
| [1] https://news.ycombinator.com/item?id=39094251
|
| [2]
| https://workspace.google.com/marketplace/app/qr_code_ticket_...
| zygentoma wrote:
| qrencode -t UTF8 https://www.service.nsw.gov.au/campaign/service-
| nsw-mobile-app?data=eyJ0IjoiY292aWQxOV9idXNpbmVzcyIsImJpZCI6IjEyM
| TMyMSIsImJuYW1lIjoiVGVzdCBOU1cgR292ZXJubWVudCBRUiBjb2RlIiwiYmFkZH
| Jlc3MiOiJCdXNpbmVzcyBhZGRyZXNzIGdvZXMgaGVyZSAifQ==
|
| vs qrencode -t UTF8
| https://www.service.nsw.gov.au/campaign/service-nsw-mobile-app?da
| ta=07268568088551018982199489257790063821578941925846323948853349
| 92789559112405122791116333362867370890083842930669319743113055333
| 37894591404330656702603998035920596585517131555967430155259257402
| 71167169927643240820915139763817497440984288389845652728902601340
| 4155725275860173673194594939
|
| The latter one is actually smaller. TIL
| pimlottc wrote:
| Note that the URL also has to be encoded in two segments, so
| that the decimal part can use a more efficient QR encoding than
| the alphanumeric base URL.
|
| I'm not sure that qrencode CLI tool will automatically do this
| for you.
|
| > In a URL, the rest of the URL is not purely numeric, so
| actually seeing the benefits of this encoding requires using
| two segments:
|
| > * one with the "boring" bits of the URL at the start, likely
| using the Binary mode
|
| > * one with the big blob of base10 data, using the Numeric
| mode
| lifthrasiir wrote:
| > I'm not sure that qrencode CLI tool will automatically do
| this for you.
|
| If I'm looking at the correct repository, it does [1].
|
| [1]
| https://github.com/fukuchi/libqrencode/blob/master/split.c
| pimlottc wrote:
| You're right! I briefly searched the code but I missed
| seeing that since they don't use the term "segment".
| MrBuddyCasino wrote:
| We have a similar problem at work right now, but due to different
| constraints we've settled on Base85. Slightly denser than Base64,
| but still just plain old printable ASCII characters and the
| following characters are still "free" so one can use them as
| field delimiters in a CSV-style format: "',/[]\
|
| Incidentally, this also makes them JSON-Safe.
|
| Base94 uses all printable characters, and Base122 uses both
| printable characters and whitespace.
|
| UUIDs encoded in various alphabets: len
| algo value 24 Base64 padded
| wScmB8cVS/K05Wk+nORR8Q== 22 Base64 unpadded
| osnQ3DUDTDuUQBc9mBRYFw 20 Base85
| rHoLuTk%W0fgpY+`c>xc 20 Base94
| d(+H"Q/hP}i}d9<KeAt)% 18 Base122
| @#FoALt`92vSt@
| pimlottc wrote:
| This is really great, I didn't know you could switch encoding
| schemes within the same QR code. There's a nifty visualization
| tool [0] that shows how this can reduce QR code sizes. It can
| determine the optimal segmentation strategy for any string and
| display a color-code version with statistics. Very nice!
|
| 0: https://www.nayuki.io/page/optimal-text-segmentation-for-
| qr-...
| nick238 wrote:
| Seems like the lede was buried in the article; I know a bit
| about QR codes: there's different modes for alphanum, binary,
| kanji, etc, and error correcting capacity...but being able to
| switch character sets in the middle was new to me.
| pclmulqdq wrote:
| I am not entirely sure why you would want to switch encodings
| for URLs, personally. If you use alphanumeric encoding and a
| URL in Base36, you are pretty much information-theoretically
| optimal.
| planede wrote:
| > you are pretty much information-theoretically optimal
|
| base36 with alphanumeric mode encoding has around 6.38%
| overhead compared to base10's 0.34% overhead in numeric
| mode. So numeric mode gets you closer to optimal.
| daxterspeed wrote:
| The issue is that QR's alphanumeric segments are uppercase
| only, and while browsers will automatically lowercase the
| protocol and domain name, you'll have to either have all
| your paths be uppercase or automatically lowercase paths.
| On top of that when someone scans the code it will likely
| be presented with an uppercase URL (if it doesn't
| automatically open in a browser) and that should alert
| anyone that doesn't already know that uppercase domains are
| equivalent to lowercase domains.
|
| Ideally QR codes would have had a segment to encode URIs
| more efficiently (73-82 characters depending on how the
| implementation decided to handle the "unreserved marks"),
| but that ship has long sailed.
| pclmulqdq wrote:
| Many QR code readers will auto-lowercase URLs that are
| encoded in alphanumeric encoding. The rest will recognize
| uppercase URLs just fine. Alphanumeric encoding was
| basically made for URLs.
| pimlottc wrote:
| The QR alphanumeric input encoding does not include basic
| URL query string characters like '?' '&' '='
| zamfi wrote:
| Speaking of visualization...that last figure in this post is
| super interesting in part because you can actually _see_ some
| of the redundancy in the base64 encoding on the left, in the
| patterns of vertical lines.
|
| In general, better compression means output that looks more
| like "randomness"--any redundancy implies there was room for
| more compression--and that figure makes this quite clear
| visually!
| londons_explore wrote:
| Or... Don't encode data in the URL at all. If your data isn't
| secret or per-user, have it go to https://yoursite.com/gh. If it
| is security sensitive, go to https://yoursite.com/Qhm4Qr55mS
|
| 2 alphanumerics (=4000 links) is plenty to encode a link to all
| the major pages of your website/service you may want to
| advertise. 10 alphanumerics (=10^18) is plenty that even if every
| person in the world had a QR code, nobody could guess one before
| hitting your rate limiter.
|
| The user experience gained by fast reliable scanning is far
| greater than that enabled by slightly improved offline support
| (offline functionality requires that the user already has your
| app installed, and in that case, you could preload any codes that
| user had access to).
| pimlottc wrote:
| As the article mentions, they need to include the data so that
| the app could work offline, at least to some degree.
| chrisfinazzo wrote:
| To play devil's advocate for a moment...
|
| Wouldn't this break Deep/Universal links which send a user
| directly to a specific location within an app?
|
| I get that there are potential security/privacy concerns, but
| if you are in full control of URL schemes, isn't that purpose
| of this feature?
| nneonneo wrote:
| In the case of vaccine cards, which the OP uses as the case
| study, it's better to have the entire card offline for both
| privacy and offline use purposes.
| Karellen wrote:
| I'm not that familiar with QR codes. Anyone know how
| base16/hexadecimal encoding with 0-9A-F fares in comparison? It
| seems like an obvious encoding to test, especially for simplicity
| of implementation compared to base64 and base10, and an odd one
| to miss for comparison?
| komlan wrote:
| Hex is worse, see here [1] for UUIDs
|
| [1] https://news.ycombinator.com/item?id=39094251
| pimlottc wrote:
| The QR standard does not have a specific encoding mode [0] for
| hexademical, it would have to use alphanumeric. Since you'd
| only be using 16 out of 35 possible characters, it would be
| much less efficient.
|
| 0: https://en.wikipedia.org/wiki/QR_code#Information_capacity
| Karliss wrote:
| Most compact QR encoding capable of representing hex symbols is
| alphanumeric mode which requires 5.5 bits per character. Which
| means the output will be 5.5/4 = 1.375 times longer than
| encoded binary data or 37.5% overhead. That's even worse than
| 8/6 =1.33 you get for doing base64 encoding on top of byte
| mode.
| dbaupp wrote:
| Ah, it is a good point that it might be worth comparing to, but
| it is far worse.
|
| Abstractly, it requires approximately log(45)/log(16) output
| bits per input bit, an overhead of 37%.
|
| Making this more concrete: each input byte is encoded as two
| hex digits, and two hex digits have to be encoded as two
| Alphanumeric characters. It thus takes 11 bits in the QR code
| bit stream to store 8 bits of input.
| dbaupp wrote:
| (I've added an analysis of this and other bases to the
| article: https://huonw.github.io/blog/2024/03/qr-
| base10-base64/#fn:ot...)
| ptramo wrote:
| https://zat.is uses uppercase base32 for URL checksums, as
| alphanumeric QR codes can contain 0-9, A-Z (upper-case only),
| space, $, %, *, +, -, ., /, :. Overhead is only 10% (5.5 bits / 5
| bits). All links fit in a 3333px image, margins included, so
| little point in improving on that for URLs so short. The tradeoff
| is that the checksum to URL mapping is stored in a backend and
| networking is required to learn anything about the real URL.
| YoshiRulz wrote:
| Thanks to the author's previous post, I instantly recognised the
| `eyJ` prefix as the start of a JSON object!
| JadeNB wrote:
| The previous post: [Mechanical sympathy for QR codes: making
| NSW check-in better](https://huonw.github.io/blog/2021/10/nsw-
| covid-qr).
| planede wrote:
| base10 can be awkward to work with for large data, one can also
| consider:
|
| base8 in numeric mode: 8 input bits -> 3 digits -> 10 output
| bits, 25% overhead
|
| base32 in alphanumeric mode: 5 input bits -> 1 character -> 5.5
| output bits, 10% overhead
|
| I would prefer base32 out of these too, but it's interesting that
| even base8 beats base64 here.
| dbaupp wrote:
| Good point! I've added an analysis of this and other bases to
| https://huonw.github.io/blog/2024/03/qr-base10-base64/#fn:ot...
| sfmz wrote:
| I had an idea to embed a webpage in a dataurl and convert that to
| a QR code; the website would only exist on if you snapped the QR
| Code. I was dreaming of code-golf, demoscene, nft and weird
| business card applications, but the web-browsers ruined my fun
| because they won't display dataURL unless you manually copy/paste
| it into the URL bar.
|
| https://issues.chromium.org/issues/40502904
| ptramo wrote:
| Good idea. Built https://srv.us/d that does (edited):
| <html><body><script>document.body.innerHTML = decodeURI(window.
| location.hash.substring(1))</script></body></html>
|
| So you can point to https://srv.us/d#<h1>Demo</h1>
| sfmz wrote:
| Almost, page doesn't have a body yet, so you get a null ref.
| I thought of spinning it up or hosting on ipfs, but it still
| won't live forever, somebody will lose interest and stop
| paying the DNS costs or similar.
| ptramo wrote:
| Fixed, thanks! Yes, longevity is an issue with anything
| online. On the other hand, in this case recovery is not a
| huge issue for anybody technical enough.
| pimlottc wrote:
| In an ideal world, the QR standard would include a specific URL
| encoding scheme that exactly matches the URL-safe character set.
| But I suppose there's no real practical way to make big changes
| to the QR spec now, what with all the thousands of
| implementations in the wild.
| adrianmonk wrote:
| I'd rather they had just done entropy coding. It's good at
| using a smaller number of bits per character to represent a
| subset of characters, which is what they're trying to do. But
| it's more general, so you wouldn't be limited to only those
| characters.
|
| Huffman is probably simple enough. The typical approach is
| adaptive Huffman, which doesn't compress the initial data very
| well since it needs to adjust to actual character frequencies.
| So that wouldn't work well for QR codes since they're short.
|
| But you can start adaptive Huffman with a pre-agreed initial
| tree (as static Huffman does), which would give good
| compression from the start. There could be several standard
| pre-agreed Huffman trees, and instead of using bits in the QR
| code to select a character set, those bits could select one of
| a few pre-agreed initial Huffman trees.
| planede wrote:
| For dealing with larger data I would probably split the input
| bits into 63 bit chunks, which can be encoded in 19 decimal
| digits. 63 input bits turn into 19 digits which in turn is
| encoded in 63.33... output bits on average. This has an overhead
| of 0.53% instead of 0.34% of pure base10, which I think is
| acceptable. But then you don't have to bring bignum libraries
| into the picture, as each chunk fits into a 64bit integral type.
|
| 64bit chunks are a little bit worse, with 4.16% overhead, so it
| might be worth dealing with the little complexity of 63 bit
| chunks.
|
| I would also output the decimal digits in little-endian order.
|
| edit: If you are willing to go for larger chunks then 93bit
| chunks would be my next candidate, there the overhead is 0.36%,
| barely more than pure base10's 0.34%. I don't think it's worth
| going any higher.
| __s wrote:
| 127 bit integers get 38 digits, which lines up well with 128
| bit integers
| leni536 wrote:
| No, they get 39 decimal digits (1.7e38 is 39 decimal digits).
| 127bit chunks would get you 2.36% overhead, which is not bad.
| However at 93bit chunks can (barely) be encoded in 28 digits
| (2^93 ~= 9.9e27) and it's more efficient at around 0.36%
| overhead. So once you have 128 bit arithmetic, it's still not
| worth using all or most of those bits per chunk, 93bit chunks
| is the most efficient under 128 bits.
| ingen0s wrote:
| you had me at hello
| buildsjets wrote:
| I need to re-run the math based on this info, but a while back, I
| wanted to figure out the maximum density of QR codes that could
| be reliably printed on a sheet of plain paper with a laser
| printer, then optically scanned and re-digitized. I recall the
| answer was about the same as a double-density 5.25" floppy disk,
| which is 320kb.
| JadeNB wrote:
| Why does the title have a "'" that isn't in the document ("'10 >
| 64, in QR Codes" versus "10 > 64, in QR Codes")?
| dbaupp wrote:
| Hacker News strips leading digits targeted at "listicles" (e.g.
| "10 ways to fizz buzz" -> "Ways to fizz buzz"), so tricks are
| required if the digits are actually important.
| PanMan wrote:
| Cool article. What I've wondered, and the article doesn't touch
| on: In "normal" usage (not damaged QR codes), what's the best
| error correction to use, with a fixed output size (eg a sticker)?
| Using a higher level, results in more bytes, and thus a larger
| QR, which, when printed, results in smaller details. Is it better
| to have a low error correction, resulting in large blobs, or to
| have higher error correction, resulting in smaller details, which
| I guess will be harder to scan, but more room for correction?
| master-lincoln wrote:
| I guess this is a trade off that depends on your use case: from
| which distance does the qr code need to be scannable, what
| cameras do we expect to be used for scanning, how likely is
| what kind of damage to parts of the qr code, ...
| spamatica wrote:
| Indeed. The local bus transit has a digital ticket system
| with QR codes for tickets. I haven't actually tried decoding
| the codes but just seeing them and interacting with them I
| can tell they have gone WAY overboard with either the amount
| of data they try to fit or the amount of error correction.
| Probably both. They are nearly unscannable due to their size
| and all the bus drivers just wave you along if you don't
| manage to scan it.
| dbaupp wrote:
| Yeah, I had had the same question! One of my earlier articles
| experiments with this: https://huonw.github.io/blog/2021/09/qr-
| error-correction/
|
| Figure 8 and its surrounding section are the undamaged case.
| chpatrick wrote:
| Didn't the EU covid passport decide to use the text encoding mode
| because it's the only one that scanners supported reliably?
| pclmulqdq wrote:
| I'm not sure if anyone uses Base36 any more (or its more obscure
| sister, Base32), but it uses [0-9, A-Z] as its alphabet. It is
| URL safe and also smaller than base 10 in character count for
| each number, and is the smallest standard URL-safe encoding that
| works with alphanumeric QR codes.
|
| I sort of assumed this was common knowledge, but I guess not.
| 91bananas wrote:
| Tooling is probably what dictates this more than anything.
| atob() is everywhere.
| knallfrosch wrote:
| Yeah, I don't get it. Assume I have a standard URL with query
| params, the web browser doesn't understand the decimal
| encoding - right?
|
| Let's assume... this: https://news.ycombinator.com/reply?id=3
| 9907672&goto=item%3Fi...
|
| The special encoding is just about sending data to the
| backend?
| dbaupp wrote:
| I implicitly ignored encoding schemes like base 36 and 32 (and
| 16, referenced elsewhere in the thread) because they're not as
| good as the schemes referenced in the post. The best you can
| get that's fully URL safe with Alphanumeric is a hypothetical
| base 39, referenced in a footnote, and only using 39 of the 45
| possible characters has 3.9% overhead (even ignoring the 50%
| overhead of the https://www.rfc-editor.org/rfc/rfc9285.html
| encoding).
|
| I've added an analysis of many more bases to the article:
| https://huonw.github.io/blog/2024/03/qr-base10-base64/#fn:ot...
| FullyFunctional wrote:
| This is fascinating, but I was curious about the last two QR
| codes. The left one is scannable on my iPhone (iOS 17.4.1)
| leading to http://example.com/AAE..._w8fL whereas the one on the
| right gets only http://example.com (both Safari and Firefox). Is
| this an iOS URL length limitation?
| dbaupp wrote:
| Good catch! I should've tested. I've added a paragraph to
| https://huonw.github.io/blog/2024/03/qr-base10-base64/#extre...
| about this.
| bytecodes wrote:
| 1. Pretty neat to switch encoding in the middle of the URL. It
| does look like it works and it does look like a better encoding.
| This is cool.
|
| 2. I'd have called this base-1000. It's using 3-digit numbers
| encoded into 10 bits. Base64 doesn't encode into 64 bits, it uses
| 64 characters encoded into 6 bits. And this encoding uses 000 to
| 999, encoded into 10 bits. But that messes up the title when you
| compare apples to apples, 1000 > 64 is just obvious and true.
___________________________________________________________________
(page generated 2024-04-02 23:00 UTC)