[HN Gopher] 10 > 64, in QR Codes
___________________________________________________________________
10 > 64, in QR Codes
Author : yvan
Score : 326 points
Date : 2024-04-01 13:48 UTC (2 days ago)
(HTM) web link (huonw.github.io)
(TXT) w3m dump (huonw.github.io)
| komlan wrote:
| This is particularly useful for numeric data that is usually
| displayed in hex, like UUIDs [1]
|
| I used this for digital QR code tickets [2], and it made the
| codes so much easier to scan, even with bad lighting.
|
| [1] https://news.ycombinator.com/item?id=39094251
|
| [2]
| https://workspace.google.com/marketplace/app/qr_code_ticket_...
| zygentoma wrote:
| qrencode -t UTF8 https://www.service.nsw.gov.au/campaign/service-
| nsw-mobile-app?data=eyJ0IjoiY292aWQxOV9idXNpbmVzcyIsImJpZCI6IjEyM
| TMyMSIsImJuYW1lIjoiVGVzdCBOU1cgR292ZXJubWVudCBRUiBjb2RlIiwiYmFkZH
| Jlc3MiOiJCdXNpbmVzcyBhZGRyZXNzIGdvZXMgaGVyZSAifQ==
|
| vs qrencode -t UTF8
| https://www.service.nsw.gov.au/campaign/service-nsw-mobile-app?da
| ta=07268568088551018982199489257790063821578941925846323948853349
| 92789559112405122791116333362867370890083842930669319743113055333
| 37894591404330656702603998035920596585517131555967430155259257402
| 71167169927643240820915139763817497440984288389845652728902601340
| 4155725275860173673194594939
|
| The latter one is actually smaller. TIL
| pimlottc wrote:
| Note that the URL also has to be encoded in two segments, so
| that the decimal part can use a more efficient QR encoding than
| the alphanumeric base URL.
|
| I'm not sure that qrencode CLI tool will automatically do this
| for you.
|
| > In a URL, the rest of the URL is not purely numeric, so
| actually seeing the benefits of this encoding requires using
| two segments:
|
| > * one with the "boring" bits of the URL at the start, likely
| using the Binary mode
|
| > * one with the big blob of base10 data, using the Numeric
| mode
| lifthrasiir wrote:
| > I'm not sure that qrencode CLI tool will automatically do
| this for you.
|
| If I'm looking at the correct repository, it does [1].
|
| [1]
| https://github.com/fukuchi/libqrencode/blob/master/split.c
| pimlottc wrote:
| You're right! I briefly searched the code but I missed
| seeing that since they don't use the term "segment".
| MrBuddyCasino wrote:
| We have a similar problem at work right now, but due to different
| constraints we've settled on Base85. Slightly denser than Base64,
| but still just plain old printable ASCII characters and the
| following characters are still "free" so one can use them as
| field delimiters in a CSV-style format: "',/[]\
|
| Incidentally, this also makes them JSON-Safe.
|
| Base94 uses all printable characters, and Base122 uses both
| printable characters and whitespace.
|
| UUIDs encoded in various alphabets: len
| algo value 24 Base64 padded
| wScmB8cVS/K05Wk+nORR8Q== 22 Base64 unpadded
| osnQ3DUDTDuUQBc9mBRYFw 20 Base85
| rHoLuTk%W0fgpY+`c>xc 20 Base94
| d(+H"Q/hP}i}d9<KeAt)% 18 Base122
| @#FoALt`92vSt@
| Zamicol wrote:
| I'm not following as Base85 isn't JSON safe. For example, { and
| } carry meaning in JSON.
| pimlottc wrote:
| This is really great, I didn't know you could switch encoding
| schemes within the same QR code. There's a nifty visualization
| tool [0] that shows how this can reduce QR code sizes. It can
| determine the optimal segmentation strategy for any string and
| display a color-code version with statistics. Very nice!
|
| 0: https://www.nayuki.io/page/optimal-text-segmentation-for-
| qr-...
| nick238 wrote:
| Seems like the lede was buried in the article; I know a bit
| about QR codes: there's different modes for alphanum, binary,
| kanji, etc, and error correcting capacity...but being able to
| switch character sets in the middle was new to me.
| pclmulqdq wrote:
| I am not entirely sure why you would want to switch encodings
| for URLs, personally. If you use alphanumeric encoding and a
| URL in Base36, you are pretty much information-theoretically
| optimal.
| planede wrote:
| > you are pretty much information-theoretically optimal
|
| base36 with alphanumeric mode encoding has around 6.38%
| overhead compared to base10's 0.34% overhead in numeric
| mode. So numeric mode gets you closer to optimal.
| daxterspeed wrote:
| The issue is that QR's alphanumeric segments are uppercase
| only, and while browsers will automatically lowercase the
| protocol and domain name, you'll have to either have all
| your paths be uppercase or automatically lowercase paths.
| On top of that when someone scans the code it will likely
| be presented with an uppercase URL (if it doesn't
| automatically open in a browser) and that should alert
| anyone that doesn't already know that uppercase domains are
| equivalent to lowercase domains.
|
| Ideally QR codes would have had a segment to encode URIs
| more efficiently (73-82 characters depending on how the
| implementation decided to handle the "unreserved marks"),
| but that ship has long sailed.
| pclmulqdq wrote:
| Many QR code readers will auto-lowercase URLs that are
| encoded in alphanumeric encoding. The rest will recognize
| uppercase URLs just fine. Alphanumeric encoding was
| basically made for URLs.
| pimlottc wrote:
| The QR alphanumeric input encoding does not include basic
| URL query string characters like '?' '&' '='
| djbusby wrote:
| I've been putting URL in QR for like a decade, mixed case
| and query string included. How has it never been an
| issue?
| IAmLiterallyAB wrote:
| Because you used bytes mode, not alphanumeric mode
| zamfi wrote:
| Speaking of visualization...that last figure in this post is
| super interesting in part because you can actually _see_ some
| of the redundancy in the base64 encoding on the left, in the
| patterns of vertical lines.
|
| In general, better compression means output that looks more
| like "randomness"--any redundancy implies there was room for
| more compression--and that figure makes this quite clear
| visually!
| klodolph wrote:
| That's undoubtedly some redundancy in the underlying data,
| not in the encoding itself.
| dbaupp wrote:
| Yes, the data is the bytes 00, 01, ..., FF repeating, and
| that pattern is highly visible with a power-of-2 encodings,
| but not visible with other bases (for similar reasons that
| 0.1 as a (binary) float doesn't behave as people expect).
| londons_explore wrote:
| Or... Don't encode data in the URL at all. If your data isn't
| secret or per-user, have it go to https://yoursite.com/gh. If it
| is security sensitive, go to https://yoursite.com/Qhm4Qr55mS
|
| 2 alphanumerics (=4000 links) is plenty to encode a link to all
| the major pages of your website/service you may want to
| advertise. 10 alphanumerics (=10^18) is plenty that even if every
| person in the world had a QR code, nobody could guess one before
| hitting your rate limiter.
|
| The user experience gained by fast reliable scanning is far
| greater than that enabled by slightly improved offline support
| (offline functionality requires that the user already has your
| app installed, and in that case, you could preload any codes that
| user had access to).
| pimlottc wrote:
| As the article mentions, they need to include the data so that
| the app could work offline, at least to some degree.
| chrisfinazzo wrote:
| To play devil's advocate for a moment...
|
| Wouldn't this break Deep/Universal links which send a user
| directly to a specific location within an app?
|
| I get that there are potential security/privacy concerns, but
| if you are in full control of URL schemes, isn't that purpose
| of this feature?
| nneonneo wrote:
| In the case of vaccine cards, which the OP uses as the case
| study, it's better to have the entire card offline for both
| privacy and offline use purposes.
| Zamicol wrote:
| Or just go directly to 2^256 and have enough links for every
| atom in the observable universe.
|
| More importantly, it's enough links that at the Landauer limit
| a collision can't happen without consuming ~300,000 solar
| systems of energy, vastly beyond human technological ability.
| With this property, each link can also be considered private.
| Karellen wrote:
| I'm not that familiar with QR codes. Anyone know how
| base16/hexadecimal encoding with 0-9A-F fares in comparison? It
| seems like an obvious encoding to test, especially for simplicity
| of implementation compared to base64 and base10, and an odd one
| to miss for comparison?
| komlan wrote:
| Hex is worse, see here [1] for UUIDs
|
| [1] https://news.ycombinator.com/item?id=39094251
| Zamicol wrote:
| I'm not confident of the math there.
|
| https://i.imgur.com/cAVbqka.png
|
| Because of quirks, in edge cases decimal is more efficient,
| but overall alphanumeric is better in QR code.
| komlan wrote:
| Ah, I was assuming _numeric_ data rendered as hex, like
| UUIDs. Decimal works wonders for those, because the numeric
| mode of QR codes is the most efficient.
| pimlottc wrote:
| The QR standard does not have a specific encoding mode [0] for
| hexademical, it would have to use alphanumeric. Since you'd
| only be using 16 out of 35 possible characters, it would be
| much less efficient.
|
| 0: https://en.wikipedia.org/wiki/QR_code#Information_capacity
| Karliss wrote:
| Most compact QR encoding capable of representing hex symbols is
| alphanumeric mode which requires 5.5 bits per character. Which
| means the output will be 5.5/4 = 1.375 times longer than
| encoded binary data or 37.5% overhead. That's even worse than
| 8/6 =1.33 you get for doing base64 encoding on top of byte
| mode.
| dbaupp wrote:
| Ah, it is a good point that it might be worth comparing to, but
| it is far worse.
|
| Abstractly, it requires approximately log(45)/log(16) output
| bits per input bit, an overhead of 37%.
|
| Making this more concrete: each input byte is encoded as two
| hex digits, and two hex digits have to be encoded as two
| Alphanumeric characters. It thus takes 11 bits in the QR code
| bit stream to store 8 bits of input.
| dbaupp wrote:
| (I've added an analysis of this and other bases to the
| article: https://huonw.github.io/blog/2024/03/qr-
| base10-base64/#fn:ot...)
| Zamicol wrote:
| RFC 3986 says that * is a sub-delim. It cannot be assumed
| to be URI safe.
|
| A base 38 alphabet is the maximal possible URI unreserved
| alphabet.
| ptramo wrote:
| https://zat.is uses uppercase base32 for URL checksums, as
| alphanumeric QR codes can contain 0-9, A-Z (upper-case only),
| space, $, %, *, +, -, ., /, :. Overhead is only 10% (5.5 bits / 5
| bits). All links fit in a 3333px image, margins included, so
| little point in improving on that for URLs so short. The tradeoff
| is that the checksum to URL mapping is stored in a backend and
| networking is required to learn anything about the real URL.
| YoshiRulz wrote:
| Thanks to the author's previous post, I instantly recognised the
| `eyJ` prefix as the start of a JSON object!
| JadeNB wrote:
| The previous post: [Mechanical sympathy for QR codes: making
| NSW check-in better](https://huonw.github.io/blog/2021/10/nsw-
| covid-qr).
| planede wrote:
| base10 can be awkward to work with for large data, one can also
| consider:
|
| base8 in numeric mode: 8 input bits -> 3 digits -> 10 output
| bits, 25% overhead
|
| base32 in alphanumeric mode: 5 input bits -> 1 character -> 5.5
| output bits, 10% overhead
|
| I would prefer base32 out of these too, but it's interesting that
| even base8 beats base64 here.
| dbaupp wrote:
| Good point! I've added an analysis of this and other bases to
| https://huonw.github.io/blog/2024/03/qr-base10-base64/#fn:ot...
| planede wrote:
| Doh! I don't know why I went with per-byte encoding for
| octal. Yeah, you can do 11.1% overhead with base8, not much
| worse than base32, surprisingly.
| sfmz wrote:
| I had an idea to embed a webpage in a dataurl and convert that to
| a QR code; the website would only exist on if you snapped the QR
| Code. I was dreaming of code-golf, demoscene, nft and weird
| business card applications, but the web-browsers ruined my fun
| because they won't display dataURL unless you manually copy/paste
| it into the URL bar.
|
| https://issues.chromium.org/issues/40502904
| ptramo wrote:
| Good idea. Built https://srv.us/d that does (edited):
| <html><body><script>document.body.innerHTML = decodeURI(window.
| location.hash.substring(1))</script></body></html>
|
| So you can point to https://srv.us/d#<h1>Demo</h1>
| sfmz wrote:
| Almost, page doesn't have a body yet, so you get a null ref.
| I thought of spinning it up or hosting on ipfs, but it still
| won't live forever, somebody will lose interest and stop
| paying the DNS costs or similar.
| ptramo wrote:
| Fixed, thanks! Yes, longevity is an issue with anything
| online. On the other hand, in this case recovery is not a
| huge issue for anybody technical enough.
| sfmz wrote:
| I was thinking at first it would be better if it takes
| the entire webpage as a datauri instead of the raw html
| like this basic business card template:
|
| data:text/html;base64,PCFET0NUWVBFIGh0bWw+DQo8aHRtbCBsYW5
| nPSJlbiI+DQogIDxoZWFkPg0KICAgIDxtZXRhIGNoYXJzZXQ9IlVURi04
| IiAvPg0KICAgIDxtZXRhIG5hbWU9InZpZXdwb3J0IiBjb250ZW50PSJ3a
| WR0aD1kZXZpY2Utd2lkdGgsIGluaXRpYWwtc2NhbGU9MS4wIiAvPg0KIC
| AgIDx0aXRsZT5GbGV4IENhcmQ8L3RpdGxlPg0KICAgIDxzdHlsZT4NCiA
| gICAgIGJvZHkgew0KICAgICAgICBmb250LWZhbWlseTogQXJpYWwsIHNh
| bnMtc2VyaWY7DQogICAgICAgIG1hcmdpbjogMDsNCiAgICAgICAgcGFkZ
| GluZzogMDsNCiAgICAgICAgYmFja2dyb3VuZC1jb2xvcjogI2YwZjBmMD
| sNCiAgICAgICAgZGlzcGxheTogZmxleDsNCiAgICAgICAganVzdGlmeS1
| jb250ZW50OiBjZW50ZXI7DQogICAgICAgIGFsaWduLWl0ZW1zOiBjZW50
| ZXI7DQogICAgICAgIGhlaWdodDogMTAwdmg7DQogICAgICB9DQogICAgI
| CAuY2FyZCB7DQogICAgICAgIGJhY2tncm91bmQtY29sb3I6ICNmZmY7DQ
| ogICAgICAgIGJvcmRlci1yYWRpdXM6IDEwcHg7DQogICAgICAgIGJveC1
| zaGFkb3c6IDAgMCAxMHB4IHJnYmEoMCwgMCwgMCwgMC4xKTsNCiAgICAg
| ICAgcGFkZGluZzogMjBweDsNCiAgICAgICAgbWF4LXdpZHRoOiAzMDBwe
| DsNCiAgICAgICAgdGV4dC1hbGlnbjogY2VudGVyOw0KICAgICAgfQ0KIC
| AgICAgLmNhcmQgaW1nIHsNCiAgICAgICAgYm9yZGVyLXJhZGl1czogNTA
| lOw0KICAgICAgICBtYXgtd2lkdGg6IDE1MHB4Ow0KICAgICAgICBtYXJn
| aW4tYm90dG9tOiAyMHB4Ow0KICAgICAgfQ0KICAgICAgLmNhcmQgaDEge
| w0KICAgICAgICBtYXJnaW4tYm90dG9tOiAxMHB4Ow0KICAgICAgfQ0KIC
| AgICAgLmNhcmQgcCB7DQogICAgICAgIGNvbG9yOiAjNjY2Ow0KICAgICA
| gfQ0KICAgIDwvc3R5bGU+DQogIDwvaGVhZD4NCiAgPGJvZHk+DQogICAg
| PGRpdiBjbGFzcz0iY2FyZCI+DQogICAgICA8aDEgc3R5bGU9ImZvbnQtd
| mFyaWFudDogc21hbGwtY2FwcyI+RWV5b3JlPC9oMT4NCiAgICAgIDxwPk
| ltYWdpbmVlcjwvcD4NCiAgICAgIDxwPk5pbmJvIGZsb2F0aW5nIGNpdHk
| sIHByZWZlY3R1cmUgOTwvcD4NCiAgICAgIDxwPigxMjMpIDQ1Ni03ODkw
| PC9wPg0KICAgICAgPHA+am9obkBleGFtcGxlLmNvbTwvcD4NCiAgICA8L
| 2Rpdj4NCiAgPC9ib2R5Pg0KPC9odG1sPg0K
|
| but that's already about half the max bytes for a QR
| Code, so maybe its not really that interesting.
| Gormo wrote:
| There's no need to base64 encode HTML, since it's already
| plaintext. You can just omit the encoding declaration in
| the data URL and include the raw HTML, e.g.
| 'data:text/html,<html><body><h1>Hello,
| world!</h1></body></html>'. That should save a bit of
| overhead.
| pimlottc wrote:
| In an ideal world, the QR standard would include a specific URL
| encoding scheme that exactly matches the URL-safe character set.
| But I suppose there's no real practical way to make big changes
| to the QR spec now, what with all the thousands of
| implementations in the wild.
| adrianmonk wrote:
| I'd rather they had just done entropy coding. It's good at
| using a smaller number of bits per character to represent a
| subset of characters, which is what they're trying to do. But
| it's more general, so you wouldn't be limited to only those
| characters.
|
| Huffman is probably simple enough. The typical approach is
| adaptive Huffman, which doesn't compress the initial data very
| well since it needs to adjust to actual character frequencies.
| So that wouldn't work well for QR codes since they're short.
|
| But you can start adaptive Huffman with a pre-agreed initial
| tree (as static Huffman does), which would give good
| compression from the start. There could be several standard
| pre-agreed Huffman trees, and instead of using bits in the QR
| code to select a character set, those bits could select one of
| a few pre-agreed initial Huffman trees.
| planede wrote:
| For dealing with larger data I would probably split the input
| bits into 63 bit chunks, which can be encoded in 19 decimal
| digits. 63 input bits turn into 19 digits which in turn is
| encoded in 63.33... output bits on average. This has an overhead
| of 0.53% instead of 0.34% of pure base10, which I think is
| acceptable. But then you don't have to bring bignum libraries
| into the picture, as each chunk fits into a 64bit integral type.
|
| 64bit chunks are a little bit worse, with 4.16% overhead, so it
| might be worth dealing with the little complexity of 63 bit
| chunks.
|
| I would also output the decimal digits in little-endian order.
|
| edit: If you are willing to go for larger chunks then 93bit
| chunks would be my next candidate, there the overhead is 0.36%,
| barely more than pure base10's 0.34%. I don't think it's worth
| going any higher.
| __s wrote:
| 127 bit integers get 38 digits, which lines up well with 128
| bit integers
| leni536 wrote:
| No, they get 39 decimal digits (1.7e38 is 39 decimal digits).
| 127bit chunks would get you 2.36% overhead, which is not bad.
| However at 93bit chunks can (barely) be encoded in 28 digits
| (2^93 ~= 9.9e27) and it's more efficient at around 0.36%
| overhead. So once you have 128 bit arithmetic, it's still not
| worth using all or most of those bits per chunk, 93bit chunks
| is the most efficient under 128 bits.
| ingen0s wrote:
| you had me at hello
| buildsjets wrote:
| I need to re-run the math based on this info, but a while back, I
| wanted to figure out the maximum density of QR codes that could
| be reliably printed on a sheet of plain paper with a laser
| printer, then optically scanned and re-digitized. I recall the
| answer was about the same as a double-density 5.25" floppy disk,
| which is 320kb.
| Zamicol wrote:
| https://i.imgur.com/cAVbqka.png
|
| Of course, pure binary (byte) encoding is best, but many
| systems have the constraint of text characters or non-control
| characters. With that constraint, alphanumeric encoding is
| best.
|
| https://zamicol.com/assets/11580064.pdf
| JadeNB wrote:
| Why does the title have a "'" that isn't in the document ("'10 >
| 64, in QR Codes" versus "10 > 64, in QR Codes")?
| dbaupp wrote:
| Hacker News strips leading digits targeted at "listicles" (e.g.
| "10 ways to fizz buzz" -> "Ways to fizz buzz"), so tricks are
| required if the digits are actually important.
| PanMan wrote:
| Cool article. What I've wondered, and the article doesn't touch
| on: In "normal" usage (not damaged QR codes), what's the best
| error correction to use, with a fixed output size (eg a sticker)?
| Using a higher level, results in more bytes, and thus a larger
| QR, which, when printed, results in smaller details. Is it better
| to have a low error correction, resulting in large blobs, or to
| have higher error correction, resulting in smaller details, which
| I guess will be harder to scan, but more room for correction?
| master-lincoln wrote:
| I guess this is a trade off that depends on your use case: from
| which distance does the qr code need to be scannable, what
| cameras do we expect to be used for scanning, how likely is
| what kind of damage to parts of the qr code, ...
| spamatica wrote:
| Indeed. The local bus transit has a digital ticket system
| with QR codes for tickets. I haven't actually tried decoding
| the codes but just seeing them and interacting with them I
| can tell they have gone WAY overboard with either the amount
| of data they try to fit or the amount of error correction.
| Probably both. They are nearly unscannable due to their size
| and all the bus drivers just wave you along if you don't
| manage to scan it.
| dbaupp wrote:
| Yeah, I had had the same question! One of my earlier articles
| experiments with this: https://huonw.github.io/blog/2021/09/qr-
| error-correction/
|
| Figure 8 and its surrounding section are the undamaged case.
| derf_ wrote:
| That was a nice read.
|
| _> I also tested only one background image, so the behaviour
| may differ greatly with QR codes contained in different
| surrounds._
|
| This likely does not matter much. It could theoretically
| affect binarization near the edges of the code (near module
| boundaries, depending on how you did the resizing), but in
| practice as long as the code itself is high-contrast, this is
| unlikely. The more usual issue is that real images often do
| not have a proper quiet zone around the code, but that is
| mostly going to be irrelevant for what you are trying to test
| here.
|
| _> The QR codes are generated to be perfectly rectangular
| and aligned to the image pixel grid, which is unlikely to
| happen in the real world._
|
| This is a much bigger deal. A large source of decoding errors
| for larger versions (for a fixed "field of view") is due to
| alignment / sampling issues. A lot of work goes into trying
| to find _where_ the code is in the image and identify the
| grid pattern, and that is just inherently less reliable for
| larger versions, particularly if there is projective
| distortion (so the module size is not constant). The periodic
| alignment patterns try to keep the number of parameters that
| can be used to fit this grid roughly constant relative to the
| number of modules in the grid, but locating those patterns is
| itself error-prone and subject to false positives (they are
| not nearly as unique-looking as finder patterns), and the
| initial global transform estimate has to get pretty close for
| them to work. I am actually happy that damaging these was not
| causing you more trouble. This is definitely somewhere that
| ZBar can be improved. It currently does not use the timing
| patterns at all, for example. I 'm not actually aware of an
| open-source QR decoder that does.
|
| (I'm the original author of ZBar's QR decoder)
| dbaupp wrote:
| Thanks for the kind words and the insight!
| derf_ wrote:
| A slightly more common way to express "field of view" is
| "module pitch", measured in pixels between adjacent
| module centers. I went back and tried to express the
| numbers from Figured 6 as a module pitch, and I think it
| works out to around 1.6 pixels / module. IIRC, the QR
| code standard recommends a module pitch of at least 4
| pixels. So it is nice that ZBar is able to do around 2.5x
| better before running into issues (a margin that is a lot
| bigger than the gains from higher EC levels).
|
| In theory there could still be room for improvement.
| Right now ZBar estimates finder and alignment pattern
| locations to quarter-pel precision, but rounds each
| module location to the nearest pixel so it can sample a
| binarized version of the image to decide the value of
| that module. At the extreme limits of small module pitch
| this effectively turns the resampling filter in whatever
| you are using to resize your QR code image into a nearest
| neighbor filter. You can see why that would start to
| cause issues. Imagine a version 7 code (45x45 modules)
| sized to be 80x80 pixels. Most of your columns will be 2
| pixels wide, but with binarization, somewhere in there
| you have to have 10 columns that are only 1 pixel wide.
| Good luck lining up your grid to hit all of them
| perfectly (without looking at the timing pattern, which
| would likely only help in the perfectly axis-aligned
| case). Some kind of sub-pixel integration of the original
| image before thresholding to decide each module value
| could probably do better. That would make decoding a lot
| more computationally expensive, though.
| chpatrick wrote:
| Didn't the EU covid passport decide to use the text encoding mode
| because it's the only one that scanners supported reliably?
| pclmulqdq wrote:
| I'm not sure if anyone uses Base36 any more (or its more obscure
| sister, Base32), but it uses [0-9, A-Z] as its alphabet. It is
| URL safe and also smaller than base 10 in character count for
| each number, and is the smallest standard URL-safe encoding that
| works with alphanumeric QR codes.
|
| I sort of assumed this was common knowledge, but I guess not.
| 91bananas wrote:
| Tooling is probably what dictates this more than anything.
| atob() is everywhere.
| knallfrosch wrote:
| Yeah, I don't get it. Assume I have a standard URL with query
| params, the web browser doesn't understand the decimal
| encoding - right?
|
| Let's assume... this: https://news.ycombinator.com/reply?id=3
| 9907672&goto=item%3Fi...
|
| The special encoding is just about sending data to the
| backend?
| dbaupp wrote:
| I implicitly ignored encoding schemes like base 36 and 32 (and
| 16, referenced elsewhere in the thread) because they're not as
| good as the schemes referenced in the post. The best you can
| get that's fully URL safe with Alphanumeric is a hypothetical
| base 39, referenced in a footnote, and only using 39 of the 45
| possible characters has 3.9% overhead (even ignoring the 50%
| overhead of the https://www.rfc-editor.org/rfc/rfc9285.html
| encoding).
|
| I've added an analysis of many more bases to the article:
| https://huonw.github.io/blog/2024/03/qr-base10-base64/#fn:ot...
| jbaber wrote:
| I independently discovered base36 for a personal project
| recently and was very happy to have an explanation for why
| Python's base conversion goes up to 36.
| FullyFunctional wrote:
| This is fascinating, but I was curious about the last two QR
| codes. The left one is scannable on my iPhone (iOS 17.4.1)
| leading to http://example.com/AAE..._w8fL whereas the one on the
| right gets only http://example.com (both Safari and Firefox). Is
| this an iOS URL length limitation?
| dbaupp wrote:
| Good catch! I should've tested. I've added a paragraph to
| https://huonw.github.io/blog/2024/03/qr-base10-base64/#extre...
| about this.
| capitainenemo wrote:
| You probably know this, but Firefox and Chrome don't have the
| freedom to run their own browser engines on iOS and are little
| more than browser skins around the webkit core, so listing
| multiple browsers for a web issue on iOS usually doesn't mean
| much.
| bytecodes wrote:
| 1. Pretty neat to switch encoding in the middle of the URL. It
| does look like it works and it does look like a better encoding.
| This is cool.
|
| 2. I'd have called this base-1000. It's using 3-digit numbers
| encoded into 10 bits. Base64 doesn't encode into 64 bits, it uses
| 64 characters encoded into 6 bits. And this encoding uses 000 to
| 999, encoded into 10 bits. But that messes up the title when you
| compare apples to apples, 1000 > 64 is just obvious and true.
| dbaupp wrote:
| The base 10 is referring to conversion of bytes into a long
| decimal (base 10) integer, not that it's being stored in chunks
| of 10 bits.
|
| But yes, you're right, it would be reasonable to think of this
| as encoding the bytes in base 1000, where each "digit" just
| happens to be shown to humans as 3 digits.
| JoshMandel wrote:
| We used essentially this technique in the SMART Health Cards
| specification for vaccine and lab result QRs.
|
| https://spec.smarthealth.cards/#encoding-qrs
|
| It's well supported by scanners but can create unwieldy values
| for users to copy/paste.
|
| For more recent work with dynamic content (and the assumption
| that a web server is involved in the flow), we're just limiting
| the payload size and using ordinary byte mode
| (https://docs.smarthealthit.org/smart-health-links/spec)
| richardkiss wrote:
| I discovered the same thing when I was writing a tool the
| transmit data to a radio-free (no wifi or Bluetooth) airgapped
| computer and created a de-facto standard called "qrint". The
| comment in this file has enough text for a blog post.
|
| https://github.com/Chia-Network/hsms/blob/main/hsms/util/qri...
|
| Anyone who wants to use this, feel free.
| Zamicol wrote:
| I built an open source tool to specifically work on this problem:
| https://convert.zamicol.com
|
| We know only two open source JS projects that even support
| alphanumeric, Nayuki and Paul's.
| https://github.com/Cyphrme/QRGenJS. We have it hosted here:
| https://cyphr.me/qrgen
|
| I've also done a lot of work on this problem: https://image-
| ppubs.uspto.gov/dirsearch-public/print/downloa...
|
| Also, regarding alphanumeric, RFC 3986 states that:
|
| > An implementation should accept uppercase letters as equivalent
| to lowercase in scheme names (e.g., allow "HTTP" as well as
| "http")
___________________________________________________________________
(page generated 2024-04-03 23:02 UTC)