hngopher.com

       [HN Gopher] 10 > 64, in QR Codes
       ___________________________________________________________________
        
       10 > 64, in QR Codes
        
       Author : yvan
       Score  : 326 points
       Date   : 2024-04-01 13:48 UTC (2 days ago)
        
 (HTM) web link (huonw.github.io)
 (TXT) w3m dump (huonw.github.io)
        
       | komlan wrote:
       | This is particularly useful for numeric data that is usually
       | displayed in hex, like UUIDs [1]
       | 
       | I used this for digital QR code tickets [2], and it made the
       | codes so much easier to scan, even with bad lighting.
       | 
       | [1] https://news.ycombinator.com/item?id=39094251
       | 
       | [2]
       | https://workspace.google.com/marketplace/app/qr_code_ticket_...
        
       | zygentoma wrote:
       | qrencode -t UTF8 https://www.service.nsw.gov.au/campaign/service-
       | nsw-mobile-app?data=eyJ0IjoiY292aWQxOV9idXNpbmVzcyIsImJpZCI6IjEyM
       | TMyMSIsImJuYW1lIjoiVGVzdCBOU1cgR292ZXJubWVudCBRUiBjb2RlIiwiYmFkZH
       | Jlc3MiOiJCdXNpbmVzcyBhZGRyZXNzIGdvZXMgaGVyZSAifQ==
       | 
       | vs                   qrencode -t UTF8
       | https://www.service.nsw.gov.au/campaign/service-nsw-mobile-app?da
       | ta=07268568088551018982199489257790063821578941925846323948853349
       | 92789559112405122791116333362867370890083842930669319743113055333
       | 37894591404330656702603998035920596585517131555967430155259257402
       | 71167169927643240820915139763817497440984288389845652728902601340
       | 4155725275860173673194594939
       | 
       | The latter one is actually smaller. TIL
        
         | pimlottc wrote:
         | Note that the URL also has to be encoded in two segments, so
         | that the decimal part can use a more efficient QR encoding than
         | the alphanumeric base URL.
         | 
         | I'm not sure that qrencode CLI tool will automatically do this
         | for you.
         | 
         | > In a URL, the rest of the URL is not purely numeric, so
         | actually seeing the benefits of this encoding requires using
         | two segments:
         | 
         | > * one with the "boring" bits of the URL at the start, likely
         | using the Binary mode
         | 
         | > * one with the big blob of base10 data, using the Numeric
         | mode
        
           | lifthrasiir wrote:
           | > I'm not sure that qrencode CLI tool will automatically do
           | this for you.
           | 
           | If I'm looking at the correct repository, it does [1].
           | 
           | [1]
           | https://github.com/fukuchi/libqrencode/blob/master/split.c
        
             | pimlottc wrote:
             | You're right! I briefly searched the code but I missed
             | seeing that since they don't use the term "segment".
        
       | MrBuddyCasino wrote:
       | We have a similar problem at work right now, but due to different
       | constraints we've settled on Base85. Slightly denser than Base64,
       | but still just plain old printable ASCII characters and the
       | following characters are still "free" so one can use them as
       | field delimiters in a CSV-style format:                   "',/[]\
       | 
       | Incidentally, this also makes them JSON-Safe.
       | 
       | Base94 uses all printable characters, and Base122 uses both
       | printable characters and whitespace.
       | 
       | UUIDs encoded in various alphabets:                   len
       | algo               value         24 Base64 padded
       | wScmB8cVS/K05Wk+nORR8Q==         22 Base64 unpadded
       | osnQ3DUDTDuUQBc9mBRYFw         20 Base85
       | rHoLuTk%W0fgpY+`c>xc          20 Base94
       | d(+H"Q/hP}i}d9<KeAt)%         18 Base122
       | @#FoALt`92vSt@
        
         | Zamicol wrote:
         | I'm not following as Base85 isn't JSON safe. For example, { and
         | } carry meaning in JSON.
        
       | pimlottc wrote:
       | This is really great, I didn't know you could switch encoding
       | schemes within the same QR code. There's a nifty visualization
       | tool [0] that shows how this can reduce QR code sizes. It can
       | determine the optimal segmentation strategy for any string and
       | display a color-code version with statistics. Very nice!
       | 
       | 0: https://www.nayuki.io/page/optimal-text-segmentation-for-
       | qr-...
        
         | nick238 wrote:
         | Seems like the lede was buried in the article; I know a bit
         | about QR codes: there's different modes for alphanum, binary,
         | kanji, etc, and error correcting capacity...but being able to
         | switch character sets in the middle was new to me.
        
           | pclmulqdq wrote:
           | I am not entirely sure why you would want to switch encodings
           | for URLs, personally. If you use alphanumeric encoding and a
           | URL in Base36, you are pretty much information-theoretically
           | optimal.
        
             | planede wrote:
             | > you are pretty much information-theoretically optimal
             | 
             | base36 with alphanumeric mode encoding has around 6.38%
             | overhead compared to base10's 0.34% overhead in numeric
             | mode. So numeric mode gets you closer to optimal.
        
             | daxterspeed wrote:
             | The issue is that QR's alphanumeric segments are uppercase
             | only, and while browsers will automatically lowercase the
             | protocol and domain name, you'll have to either have all
             | your paths be uppercase or automatically lowercase paths.
             | On top of that when someone scans the code it will likely
             | be presented with an uppercase URL (if it doesn't
             | automatically open in a browser) and that should alert
             | anyone that doesn't already know that uppercase domains are
             | equivalent to lowercase domains.
             | 
             | Ideally QR codes would have had a segment to encode URIs
             | more efficiently (73-82 characters depending on how the
             | implementation decided to handle the "unreserved marks"),
             | but that ship has long sailed.
        
               | pclmulqdq wrote:
               | Many QR code readers will auto-lowercase URLs that are
               | encoded in alphanumeric encoding. The rest will recognize
               | uppercase URLs just fine. Alphanumeric encoding was
               | basically made for URLs.
        
               | pimlottc wrote:
               | The QR alphanumeric input encoding does not include basic
               | URL query string characters like '?' '&' '='
        
               | djbusby wrote:
               | I've been putting URL in QR for like a decade, mixed case
               | and query string included. How has it never been an
               | issue?
        
               | IAmLiterallyAB wrote:
               | Because you used bytes mode, not alphanumeric mode
        
         | zamfi wrote:
         | Speaking of visualization...that last figure in this post is
         | super interesting in part because you can actually _see_ some
         | of the redundancy in the base64 encoding on the left, in the
         | patterns of vertical lines.
         | 
         | In general, better compression means output that looks more
         | like "randomness"--any redundancy implies there was room for
         | more compression--and that figure makes this quite clear
         | visually!
        
           | klodolph wrote:
           | That's undoubtedly some redundancy in the underlying data,
           | not in the encoding itself.
        
             | dbaupp wrote:
             | Yes, the data is the bytes 00, 01, ..., FF repeating, and
             | that pattern is highly visible with a power-of-2 encodings,
             | but not visible with other bases (for similar reasons that
             | 0.1 as a (binary) float doesn't behave as people expect).
        
       | londons_explore wrote:
       | Or... Don't encode data in the URL at all. If your data isn't
       | secret or per-user, have it go to https://yoursite.com/gh. If it
       | is security sensitive, go to https://yoursite.com/Qhm4Qr55mS
       | 
       | 2 alphanumerics (=4000 links) is plenty to encode a link to all
       | the major pages of your website/service you may want to
       | advertise. 10 alphanumerics (=10^18) is plenty that even if every
       | person in the world had a QR code, nobody could guess one before
       | hitting your rate limiter.
       | 
       | The user experience gained by fast reliable scanning is far
       | greater than that enabled by slightly improved offline support
       | (offline functionality requires that the user already has your
       | app installed, and in that case, you could preload any codes that
       | user had access to).
        
         | pimlottc wrote:
         | As the article mentions, they need to include the data so that
         | the app could work offline, at least to some degree.
        
         | chrisfinazzo wrote:
         | To play devil's advocate for a moment...
         | 
         | Wouldn't this break Deep/Universal links which send a user
         | directly to a specific location within an app?
         | 
         | I get that there are potential security/privacy concerns, but
         | if you are in full control of URL schemes, isn't that purpose
         | of this feature?
        
         | nneonneo wrote:
         | In the case of vaccine cards, which the OP uses as the case
         | study, it's better to have the entire card offline for both
         | privacy and offline use purposes.
        
         | Zamicol wrote:
         | Or just go directly to 2^256 and have enough links for every
         | atom in the observable universe.
         | 
         | More importantly, it's enough links that at the Landauer limit
         | a collision can't happen without consuming ~300,000 solar
         | systems of energy, vastly beyond human technological ability.
         | With this property, each link can also be considered private.
        
       | Karellen wrote:
       | I'm not that familiar with QR codes. Anyone know how
       | base16/hexadecimal encoding with 0-9A-F fares in comparison? It
       | seems like an obvious encoding to test, especially for simplicity
       | of implementation compared to base64 and base10, and an odd one
       | to miss for comparison?
        
         | komlan wrote:
         | Hex is worse, see here [1] for UUIDs
         | 
         | [1] https://news.ycombinator.com/item?id=39094251
        
           | Zamicol wrote:
           | I'm not confident of the math there.
           | 
           | https://i.imgur.com/cAVbqka.png
           | 
           | Because of quirks, in edge cases decimal is more efficient,
           | but overall alphanumeric is better in QR code.
        
             | komlan wrote:
             | Ah, I was assuming _numeric_ data rendered as hex, like
             | UUIDs. Decimal works wonders for those, because the numeric
             | mode of QR codes is the most efficient.
        
         | pimlottc wrote:
         | The QR standard does not have a specific encoding mode [0] for
         | hexademical, it would have to use alphanumeric. Since you'd
         | only be using 16 out of 35 possible characters, it would be
         | much less efficient.
         | 
         | 0: https://en.wikipedia.org/wiki/QR_code#Information_capacity
        
         | Karliss wrote:
         | Most compact QR encoding capable of representing hex symbols is
         | alphanumeric mode which requires 5.5 bits per character. Which
         | means the output will be 5.5/4 = 1.375 times longer than
         | encoded binary data or 37.5% overhead. That's even worse than
         | 8/6 =1.33 you get for doing base64 encoding on top of byte
         | mode.
        
         | dbaupp wrote:
         | Ah, it is a good point that it might be worth comparing to, but
         | it is far worse.
         | 
         | Abstractly, it requires approximately log(45)/log(16) output
         | bits per input bit, an overhead of 37%.
         | 
         | Making this more concrete: each input byte is encoded as two
         | hex digits, and two hex digits have to be encoded as two
         | Alphanumeric characters. It thus takes 11 bits in the QR code
         | bit stream to store 8 bits of input.
        
           | dbaupp wrote:
           | (I've added an analysis of this and other bases to the
           | article: https://huonw.github.io/blog/2024/03/qr-
           | base10-base64/#fn:ot...)
        
             | Zamicol wrote:
             | RFC 3986 says that * is a sub-delim. It cannot be assumed
             | to be URI safe.
             | 
             | A base 38 alphabet is the maximal possible URI unreserved
             | alphabet.
        
       | ptramo wrote:
       | https://zat.is uses uppercase base32 for URL checksums, as
       | alphanumeric QR codes can contain 0-9, A-Z (upper-case only),
       | space, $, %, *, +, -, ., /, :. Overhead is only 10% (5.5 bits / 5
       | bits). All links fit in a 3333px image, margins included, so
       | little point in improving on that for URLs so short. The tradeoff
       | is that the checksum to URL mapping is stored in a backend and
       | networking is required to learn anything about the real URL.
        
       | YoshiRulz wrote:
       | Thanks to the author's previous post, I instantly recognised the
       | `eyJ` prefix as the start of a JSON object!
        
         | JadeNB wrote:
         | The previous post: [Mechanical sympathy for QR codes: making
         | NSW check-in better](https://huonw.github.io/blog/2021/10/nsw-
         | covid-qr).
        
       | planede wrote:
       | base10 can be awkward to work with for large data, one can also
       | consider:
       | 
       | base8 in numeric mode: 8 input bits -> 3 digits -> 10 output
       | bits, 25% overhead
       | 
       | base32 in alphanumeric mode: 5 input bits -> 1 character -> 5.5
       | output bits, 10% overhead
       | 
       | I would prefer base32 out of these too, but it's interesting that
       | even base8 beats base64 here.
        
         | dbaupp wrote:
         | Good point! I've added an analysis of this and other bases to
         | https://huonw.github.io/blog/2024/03/qr-base10-base64/#fn:ot...
        
           | planede wrote:
           | Doh! I don't know why I went with per-byte encoding for
           | octal. Yeah, you can do 11.1% overhead with base8, not much
           | worse than base32, surprisingly.
        
       | sfmz wrote:
       | I had an idea to embed a webpage in a dataurl and convert that to
       | a QR code; the website would only exist on if you snapped the QR
       | Code. I was dreaming of code-golf, demoscene, nft and weird
       | business card applications, but the web-browsers ruined my fun
       | because they won't display dataURL unless you manually copy/paste
       | it into the URL bar.
       | 
       | https://issues.chromium.org/issues/40502904
        
         | ptramo wrote:
         | Good idea. Built https://srv.us/d that does (edited):
         | <html><body><script>document.body.innerHTML = decodeURI(window.
         | location.hash.substring(1))</script></body></html>
         | 
         | So you can point to https://srv.us/d#<h1>Demo</h1>
        
           | sfmz wrote:
           | Almost, page doesn't have a body yet, so you get a null ref.
           | I thought of spinning it up or hosting on ipfs, but it still
           | won't live forever, somebody will lose interest and stop
           | paying the DNS costs or similar.
        
             | ptramo wrote:
             | Fixed, thanks! Yes, longevity is an issue with anything
             | online. On the other hand, in this case recovery is not a
             | huge issue for anybody technical enough.
        
               | sfmz wrote:
               | I was thinking at first it would be better if it takes
               | the entire webpage as a datauri instead of the raw html
               | like this basic business card template:
               | 
               | data:text/html;base64,PCFET0NUWVBFIGh0bWw+DQo8aHRtbCBsYW5
               | nPSJlbiI+DQogIDxoZWFkPg0KICAgIDxtZXRhIGNoYXJzZXQ9IlVURi04
               | IiAvPg0KICAgIDxtZXRhIG5hbWU9InZpZXdwb3J0IiBjb250ZW50PSJ3a
               | WR0aD1kZXZpY2Utd2lkdGgsIGluaXRpYWwtc2NhbGU9MS4wIiAvPg0KIC
               | AgIDx0aXRsZT5GbGV4IENhcmQ8L3RpdGxlPg0KICAgIDxzdHlsZT4NCiA
               | gICAgIGJvZHkgew0KICAgICAgICBmb250LWZhbWlseTogQXJpYWwsIHNh
               | bnMtc2VyaWY7DQogICAgICAgIG1hcmdpbjogMDsNCiAgICAgICAgcGFkZ
               | GluZzogMDsNCiAgICAgICAgYmFja2dyb3VuZC1jb2xvcjogI2YwZjBmMD
               | sNCiAgICAgICAgZGlzcGxheTogZmxleDsNCiAgICAgICAganVzdGlmeS1
               | jb250ZW50OiBjZW50ZXI7DQogICAgICAgIGFsaWduLWl0ZW1zOiBjZW50
               | ZXI7DQogICAgICAgIGhlaWdodDogMTAwdmg7DQogICAgICB9DQogICAgI
               | CAuY2FyZCB7DQogICAgICAgIGJhY2tncm91bmQtY29sb3I6ICNmZmY7DQ
               | ogICAgICAgIGJvcmRlci1yYWRpdXM6IDEwcHg7DQogICAgICAgIGJveC1
               | zaGFkb3c6IDAgMCAxMHB4IHJnYmEoMCwgMCwgMCwgMC4xKTsNCiAgICAg
               | ICAgcGFkZGluZzogMjBweDsNCiAgICAgICAgbWF4LXdpZHRoOiAzMDBwe
               | DsNCiAgICAgICAgdGV4dC1hbGlnbjogY2VudGVyOw0KICAgICAgfQ0KIC
               | AgICAgLmNhcmQgaW1nIHsNCiAgICAgICAgYm9yZGVyLXJhZGl1czogNTA
               | lOw0KICAgICAgICBtYXgtd2lkdGg6IDE1MHB4Ow0KICAgICAgICBtYXJn
               | aW4tYm90dG9tOiAyMHB4Ow0KICAgICAgfQ0KICAgICAgLmNhcmQgaDEge
               | w0KICAgICAgICBtYXJnaW4tYm90dG9tOiAxMHB4Ow0KICAgICAgfQ0KIC
               | AgICAgLmNhcmQgcCB7DQogICAgICAgIGNvbG9yOiAjNjY2Ow0KICAgICA
               | gfQ0KICAgIDwvc3R5bGU+DQogIDwvaGVhZD4NCiAgPGJvZHk+DQogICAg
               | PGRpdiBjbGFzcz0iY2FyZCI+DQogICAgICA8aDEgc3R5bGU9ImZvbnQtd
               | mFyaWFudDogc21hbGwtY2FwcyI+RWV5b3JlPC9oMT4NCiAgICAgIDxwPk
               | ltYWdpbmVlcjwvcD4NCiAgICAgIDxwPk5pbmJvIGZsb2F0aW5nIGNpdHk
               | sIHByZWZlY3R1cmUgOTwvcD4NCiAgICAgIDxwPigxMjMpIDQ1Ni03ODkw
               | PC9wPg0KICAgICAgPHA+am9obkBleGFtcGxlLmNvbTwvcD4NCiAgICA8L
               | 2Rpdj4NCiAgPC9ib2R5Pg0KPC9odG1sPg0K
               | 
               | but that's already about half the max bytes for a QR
               | Code, so maybe its not really that interesting.
        
               | Gormo wrote:
               | There's no need to base64 encode HTML, since it's already
               | plaintext. You can just omit the encoding declaration in
               | the data URL and include the raw HTML, e.g.
               | 'data:text/html,<html><body><h1>Hello,
               | world!</h1></body></html>'. That should save a bit of
               | overhead.
        
       | pimlottc wrote:
       | In an ideal world, the QR standard would include a specific URL
       | encoding scheme that exactly matches the URL-safe character set.
       | But I suppose there's no real practical way to make big changes
       | to the QR spec now, what with all the thousands of
       | implementations in the wild.
        
         | adrianmonk wrote:
         | I'd rather they had just done entropy coding. It's good at
         | using a smaller number of bits per character to represent a
         | subset of characters, which is what they're trying to do. But
         | it's more general, so you wouldn't be limited to only those
         | characters.
         | 
         | Huffman is probably simple enough. The typical approach is
         | adaptive Huffman, which doesn't compress the initial data very
         | well since it needs to adjust to actual character frequencies.
         | So that wouldn't work well for QR codes since they're short.
         | 
         | But you can start adaptive Huffman with a pre-agreed initial
         | tree (as static Huffman does), which would give good
         | compression from the start. There could be several standard
         | pre-agreed Huffman trees, and instead of using bits in the QR
         | code to select a character set, those bits could select one of
         | a few pre-agreed initial Huffman trees.
        
       | planede wrote:
       | For dealing with larger data I would probably split the input
       | bits into 63 bit chunks, which can be encoded in 19 decimal
       | digits. 63 input bits turn into 19 digits which in turn is
       | encoded in 63.33... output bits on average. This has an overhead
       | of 0.53% instead of 0.34% of pure base10, which I think is
       | acceptable. But then you don't have to bring bignum libraries
       | into the picture, as each chunk fits into a 64bit integral type.
       | 
       | 64bit chunks are a little bit worse, with 4.16% overhead, so it
       | might be worth dealing with the little complexity of 63 bit
       | chunks.
       | 
       | I would also output the decimal digits in little-endian order.
       | 
       | edit: If you are willing to go for larger chunks then 93bit
       | chunks would be my next candidate, there the overhead is 0.36%,
       | barely more than pure base10's 0.34%. I don't think it's worth
       | going any higher.
        
         | __s wrote:
         | 127 bit integers get 38 digits, which lines up well with 128
         | bit integers
        
           | leni536 wrote:
           | No, they get 39 decimal digits (1.7e38 is 39 decimal digits).
           | 127bit chunks would get you 2.36% overhead, which is not bad.
           | However at 93bit chunks can (barely) be encoded in 28 digits
           | (2^93 ~= 9.9e27) and it's more efficient at around 0.36%
           | overhead. So once you have 128 bit arithmetic, it's still not
           | worth using all or most of those bits per chunk, 93bit chunks
           | is the most efficient under 128 bits.
        
       | ingen0s wrote:
       | you had me at hello
        
       | buildsjets wrote:
       | I need to re-run the math based on this info, but a while back, I
       | wanted to figure out the maximum density of QR codes that could
       | be reliably printed on a sheet of plain paper with a laser
       | printer, then optically scanned and re-digitized. I recall the
       | answer was about the same as a double-density 5.25" floppy disk,
       | which is 320kb.
        
         | Zamicol wrote:
         | https://i.imgur.com/cAVbqka.png
         | 
         | Of course, pure binary (byte) encoding is best, but many
         | systems have the constraint of text characters or non-control
         | characters. With that constraint, alphanumeric encoding is
         | best.
         | 
         | https://zamicol.com/assets/11580064.pdf
        
       | JadeNB wrote:
       | Why does the title have a "'" that isn't in the document ("'10 >
       | 64, in QR Codes" versus "10 > 64, in QR Codes")?
        
         | dbaupp wrote:
         | Hacker News strips leading digits targeted at "listicles" (e.g.
         | "10 ways to fizz buzz" -> "Ways to fizz buzz"), so tricks are
         | required if the digits are actually important.
        
       | PanMan wrote:
       | Cool article. What I've wondered, and the article doesn't touch
       | on: In "normal" usage (not damaged QR codes), what's the best
       | error correction to use, with a fixed output size (eg a sticker)?
       | Using a higher level, results in more bytes, and thus a larger
       | QR, which, when printed, results in smaller details. Is it better
       | to have a low error correction, resulting in large blobs, or to
       | have higher error correction, resulting in smaller details, which
       | I guess will be harder to scan, but more room for correction?
        
         | master-lincoln wrote:
         | I guess this is a trade off that depends on your use case: from
         | which distance does the qr code need to be scannable, what
         | cameras do we expect to be used for scanning, how likely is
         | what kind of damage to parts of the qr code, ...
        
           | spamatica wrote:
           | Indeed. The local bus transit has a digital ticket system
           | with QR codes for tickets. I haven't actually tried decoding
           | the codes but just seeing them and interacting with them I
           | can tell they have gone WAY overboard with either the amount
           | of data they try to fit or the amount of error correction.
           | Probably both. They are nearly unscannable due to their size
           | and all the bus drivers just wave you along if you don't
           | manage to scan it.
        
         | dbaupp wrote:
         | Yeah, I had had the same question! One of my earlier articles
         | experiments with this: https://huonw.github.io/blog/2021/09/qr-
         | error-correction/
         | 
         | Figure 8 and its surrounding section are the undamaged case.
        
           | derf_ wrote:
           | That was a nice read.
           | 
           |  _> I also tested only one background image, so the behaviour
           | may differ greatly with QR codes contained in different
           | surrounds._
           | 
           | This likely does not matter much. It could theoretically
           | affect binarization near the edges of the code (near module
           | boundaries, depending on how you did the resizing), but in
           | practice as long as the code itself is high-contrast, this is
           | unlikely. The more usual issue is that real images often do
           | not have a proper quiet zone around the code, but that is
           | mostly going to be irrelevant for what you are trying to test
           | here.
           | 
           |  _> The QR codes are generated to be perfectly rectangular
           | and aligned to the image pixel grid, which is unlikely to
           | happen in the real world._
           | 
           | This is a much bigger deal. A large source of decoding errors
           | for larger versions (for a fixed "field of view") is due to
           | alignment / sampling issues. A lot of work goes into trying
           | to find _where_ the code is in the image and identify the
           | grid pattern, and that is just inherently less reliable for
           | larger versions, particularly if there is projective
           | distortion (so the module size is not constant). The periodic
           | alignment patterns try to keep the number of parameters that
           | can be used to fit this grid roughly constant relative to the
           | number of modules in the grid, but locating those patterns is
           | itself error-prone and subject to false positives (they are
           | not nearly as unique-looking as finder patterns), and the
           | initial global transform estimate has to get pretty close for
           | them to work. I am actually happy that damaging these was not
           | causing you more trouble. This is definitely somewhere that
           | ZBar can be improved. It currently does not use the timing
           | patterns at all, for example. I 'm not actually aware of an
           | open-source QR decoder that does.
           | 
           | (I'm the original author of ZBar's QR decoder)
        
             | dbaupp wrote:
             | Thanks for the kind words and the insight!
        
               | derf_ wrote:
               | A slightly more common way to express "field of view" is
               | "module pitch", measured in pixels between adjacent
               | module centers. I went back and tried to express the
               | numbers from Figured 6 as a module pitch, and I think it
               | works out to around 1.6 pixels / module. IIRC, the QR
               | code standard recommends a module pitch of at least 4
               | pixels. So it is nice that ZBar is able to do around 2.5x
               | better before running into issues (a margin that is a lot
               | bigger than the gains from higher EC levels).
               | 
               | In theory there could still be room for improvement.
               | Right now ZBar estimates finder and alignment pattern
               | locations to quarter-pel precision, but rounds each
               | module location to the nearest pixel so it can sample a
               | binarized version of the image to decide the value of
               | that module. At the extreme limits of small module pitch
               | this effectively turns the resampling filter in whatever
               | you are using to resize your QR code image into a nearest
               | neighbor filter. You can see why that would start to
               | cause issues. Imagine a version 7 code (45x45 modules)
               | sized to be 80x80 pixels. Most of your columns will be 2
               | pixels wide, but with binarization, somewhere in there
               | you have to have 10 columns that are only 1 pixel wide.
               | Good luck lining up your grid to hit all of them
               | perfectly (without looking at the timing pattern, which
               | would likely only help in the perfectly axis-aligned
               | case). Some kind of sub-pixel integration of the original
               | image before thresholding to decide each module value
               | could probably do better. That would make decoding a lot
               | more computationally expensive, though.
        
       | chpatrick wrote:
       | Didn't the EU covid passport decide to use the text encoding mode
       | because it's the only one that scanners supported reliably?
        
       | pclmulqdq wrote:
       | I'm not sure if anyone uses Base36 any more (or its more obscure
       | sister, Base32), but it uses [0-9, A-Z] as its alphabet. It is
       | URL safe and also smaller than base 10 in character count for
       | each number, and is the smallest standard URL-safe encoding that
       | works with alphanumeric QR codes.
       | 
       | I sort of assumed this was common knowledge, but I guess not.
        
         | 91bananas wrote:
         | Tooling is probably what dictates this more than anything.
         | atob() is everywhere.
        
           | knallfrosch wrote:
           | Yeah, I don't get it. Assume I have a standard URL with query
           | params, the web browser doesn't understand the decimal
           | encoding - right?
           | 
           | Let's assume... this: https://news.ycombinator.com/reply?id=3
           | 9907672&goto=item%3Fi...
           | 
           | The special encoding is just about sending data to the
           | backend?
        
         | dbaupp wrote:
         | I implicitly ignored encoding schemes like base 36 and 32 (and
         | 16, referenced elsewhere in the thread) because they're not as
         | good as the schemes referenced in the post. The best you can
         | get that's fully URL safe with Alphanumeric is a hypothetical
         | base 39, referenced in a footnote, and only using 39 of the 45
         | possible characters has 3.9% overhead (even ignoring the 50%
         | overhead of the https://www.rfc-editor.org/rfc/rfc9285.html
         | encoding).
         | 
         | I've added an analysis of many more bases to the article:
         | https://huonw.github.io/blog/2024/03/qr-base10-base64/#fn:ot...
        
         | jbaber wrote:
         | I independently discovered base36 for a personal project
         | recently and was very happy to have an explanation for why
         | Python's base conversion goes up to 36.
        
       | FullyFunctional wrote:
       | This is fascinating, but I was curious about the last two QR
       | codes. The left one is scannable on my iPhone (iOS 17.4.1)
       | leading to http://example.com/AAE..._w8fL whereas the one on the
       | right gets only http://example.com (both Safari and Firefox). Is
       | this an iOS URL length limitation?
        
         | dbaupp wrote:
         | Good catch! I should've tested. I've added a paragraph to
         | https://huonw.github.io/blog/2024/03/qr-base10-base64/#extre...
         | about this.
        
         | capitainenemo wrote:
         | You probably know this, but Firefox and Chrome don't have the
         | freedom to run their own browser engines on iOS and are little
         | more than browser skins around the webkit core, so listing
         | multiple browsers for a web issue on iOS usually doesn't mean
         | much.
        
       | bytecodes wrote:
       | 1. Pretty neat to switch encoding in the middle of the URL. It
       | does look like it works and it does look like a better encoding.
       | This is cool.
       | 
       | 2. I'd have called this base-1000. It's using 3-digit numbers
       | encoded into 10 bits. Base64 doesn't encode into 64 bits, it uses
       | 64 characters encoded into 6 bits. And this encoding uses 000 to
       | 999, encoded into 10 bits. But that messes up the title when you
       | compare apples to apples, 1000 > 64 is just obvious and true.
        
         | dbaupp wrote:
         | The base 10 is referring to conversion of bytes into a long
         | decimal (base 10) integer, not that it's being stored in chunks
         | of 10 bits.
         | 
         | But yes, you're right, it would be reasonable to think of this
         | as encoding the bytes in base 1000, where each "digit" just
         | happens to be shown to humans as 3 digits.
        
       | JoshMandel wrote:
       | We used essentially this technique in the SMART Health Cards
       | specification for vaccine and lab result QRs.
       | 
       | https://spec.smarthealth.cards/#encoding-qrs
       | 
       | It's well supported by scanners but can create unwieldy values
       | for users to copy/paste.
       | 
       | For more recent work with dynamic content (and the assumption
       | that a web server is involved in the flow), we're just limiting
       | the payload size and using ordinary byte mode
       | (https://docs.smarthealthit.org/smart-health-links/spec)
        
       | richardkiss wrote:
       | I discovered the same thing when I was writing a tool the
       | transmit data to a radio-free (no wifi or Bluetooth) airgapped
       | computer and created a de-facto standard called "qrint". The
       | comment in this file has enough text for a blog post.
       | 
       | https://github.com/Chia-Network/hsms/blob/main/hsms/util/qri...
       | 
       | Anyone who wants to use this, feel free.
        
       | Zamicol wrote:
       | I built an open source tool to specifically work on this problem:
       | https://convert.zamicol.com
       | 
       | We know only two open source JS projects that even support
       | alphanumeric, Nayuki and Paul's.
       | https://github.com/Cyphrme/QRGenJS. We have it hosted here:
       | https://cyphr.me/qrgen
       | 
       | I've also done a lot of work on this problem: https://image-
       | ppubs.uspto.gov/dirsearch-public/print/downloa...
       | 
       | Also, regarding alphanumeric, RFC 3986 states that:
       | 
       | > An implementation should accept uppercase letters as equivalent
       | to lowercase in scheme names (e.g., allow "HTTP" as well as
       | "http")
        
       ___________________________________________________________________
       (page generated 2024-04-03 23:02 UTC)