[HN Gopher] Spotting base64 encoded JSON, certificates, and priv...
       ___________________________________________________________________
        
       Spotting base64 encoded JSON, certificates, and private keys
        
       Author : jandeboevrie
       Score  : 196 points
       Date   : 2025-08-05 19:17 UTC (3 hours ago)
        
 (HTM) web link (ergaster.org)
 (TXT) w3m dump (ergaster.org)
        
       | delecti wrote:
       | Oh that's nifty. Spotting base64 encoded strings is easy enough
       | (and easy enough to test that I give it a shot if I'm even
       | vaguely curious), but I'd never looked at them closely enough to
       | spot patterns.
        
         | morkalork wrote:
         | After copy and pasting enough access tokens into various tools
         | you pick up on it pretty fast.
        
       | rgovostes wrote:
       | There is a Base64 quasi-fixed point:                   $ echo -n
       | Vm0 | base64         Vm0w
       | 
       | It can be extended indefinitely one character at a time, but
       | there will always be some suffix.
        
         | thaumasiotes wrote:
         | Note that the suffix will grow in length with the input, making
         | it less and less interesting.
         | 
         | (Because the output is necessarily 8/6 the size of the input,
         | the suffix always adds 33% to the length.)
        
         | o11c wrote:
         | For reference, a program to generate the quasi-fixed point from
         | scratch:                 #!/usr/bin/env python3       import
         | base64            def len_common_prefix(a, b):           assert
         | len(a) < len(b)           for i in range(len(a)):
         | if a[i] != b[i]:                   return i           return
         | len(a)            def calculate_quasi_fixed_point(start,
         | length):           while True:               tmp =
         | base64.b64encode(start)               l =
         | len_common_prefix(start, tmp)               if l >= length:
         | return tmp[:length]
         | print(tmp[:l].decode('ascii'), tmp[l:].decode('ascii'),
         | sep='\v')               # Slicing beyond end of buffer will
         | safely truncate in Python.               start = tmp[:l*4//3+4]
         | # TODO is this ideal?            if __name__ == '__main__':
         | final = calculate_quasi_fixed_point(b'\0', 80)
         | print(final.decode('ascii'))
         | 
         | This ultimately produces:                 Vm0wd2QyUXlVWGxWV0d4V
         | 1YwZDRWMVl3WkRSV01WbDNXa1JTVjAxV2JETlhhMUpUVmpBeFYySkVUbGho
        
         | syncsynchalt wrote:
         | From the other direction, you'd call it a tail-eating unquine?
        
       | metalliqaz wrote:
       | Something similar pops up if you have to spend a lot of time
       | looking at binary blobs with a hex editor. Certain common
       | character sequences become familiar. This also leads to choosing
       | magic numbers in data formats that decode to easily recognized
       | ASCII strings. I'm sure if I worked with base64 I'd be choosing
       | something that encoded nicely into particular strings for the
       | same purpose.
        
         | skissane wrote:
         | Related trick I've learnt: binary data containing lots of 0x40
         | may be EBCDIC text, or binary data containing embedded EBCDIC
         | strings - 0x40 is EBCDIC space character
         | 
         | Probably not a very useful trick outside of certain specific
         | environments
        
           | metalliqaz wrote:
           | so, uhh... insurance or banking?
        
       | shortrounddev2 wrote:
       | I discovered this when I created a JWT system for my internship.
       | I got really good at spotting JWTs, or any base64 encoded json
       | payloads in our Kafka streams
        
       | cogman10 wrote:
       | I don't really love this. It just feels so wasteful.
       | 
       | JWT does it as well.
       | 
       | Even in this example, they are double base64 encoding strings
       | (the salt).
       | 
       | It's really too bad that there's really nothing quite like json.
       | Everything speaks it and can write it. It'd be nice if something
       | like protobuf was easier to write and read in a schemeless
       | fashion.
        
         | reactordev wrote:
         | We just need to sacrifice _n*field_count_ to a header
         | describing the structure. We also need to define allowed types.
        
         | Muromec wrote:
         | >Everything speaks it and can write it.
         | 
         | asn.1 is super nice -- everything speaks it and tooling is just
         | great (runs away and hides)
        
         | dlt713705 wrote:
         | What's wrong with this?
         | 
         | The purpose of Base64 is to encode data--especially binary data
         | --into a limited set of ASCII characters to allow transmission
         | over text-based protocols.
         | 
         | It is not a cryptographic library nor an obfuscation tool.
         | 
         | Avoid encoding sensitive data using Base64 or include sensitive
         | data in your JWT payload unless it is encrypted first.
        
           | zokier wrote:
           | JSON is already text based and not binary so encoding it with
           | base64 is bit wasteful. Especially if you are going to just
           | embed the text in another json document.
           | 
           | And of course text-based things themselves are quite
           | wasteful.
        
           | pak9rabid wrote:
           | Exactly. Using base64 as an obfuscation tool, or (shudder)
           | encryption is seriously misusing it for what it was
           | originally intended for. If that's what you need to do then
           | avoid using base64 in favor for something that was designed
           | to do that.
        
           | xg15 wrote:
           | I think it's more the waste of space in it all. Encoding data
           | in base64 increases the length by 33%. So base64-encoding
           | twice will blow it up by 33% of the original data and then
           | again 33% of the encoded data, making 69% in total. And
           | that's before adding JSON to the mix...
           | 
           | And before "space is cheap": JWT is used in contexts where
           | space is generally _not_ cheap, such as in HTTP headers.
        
             | cogman10 wrote:
             | Precisely my thoughts.
             | 
             | You have to ask the question "why are we encoding this as
             | base64 in the first place?"
             | 
             | The answer to that is generally that base64 plays nice with
             | http headers. It has no newlines or special characters that
             | need special handling. Then you ask "why encode json" And
             | the answer is "because JSON is easy to handle". Then you
             | ask the question "why embed a base64 field in the json?"
             | And the answer is "Json doesn't handle binary data".
             | 
             | These are all choices that ultimately create a much larger
             | text blob than needs be. And because this blob is being
             | used for security purposes, it gets forwarded onto the
             | request headers for every request. Now your simple "DELETE
             | foo/bar" endpoint ends up requiring a 10kb header of
             | security data just to make the request. Or if you are doing
             | http2, then it means your LB will end up storing that 10kb
             | blob for every connected client.
             | 
             | Just wasteful. Especially since it's a total of about 3 or
             | 4 different fields with relatively fixed sizes. It could
             | have been base64(key_length(1byte)|iterations(4bytes)|hash_
             | function(1byte)|salt(32bytes)) Which would have produced
             | something like a 51 byte base64 string. The example is 3x
             | that size (156 characters). It gets much worse than that on
             | real systems I've seen.
        
         | zokier wrote:
         | > It's really too bad that there's really nothing quite like
         | json
         | 
         | messagepack/cbor are very similar to json (schemaless, similar
         | primitive types) but can support binary data. bson is another
         | similar alternative. All three have implementations available
         | in many languages, and have been used in big mature projects.
        
         | derefr wrote:
         | > It'd be nice if something like protobuf was easier to write
         | and read in a schemeless fashion.
         | 
         | If you just want a generic, binary, hierarchical type-length-
         | value encoding, have you considered
         | https://en.wikipedia.org/wiki/Interchange_File_Format ?
         | 
         | It's not that there are widely-supported IFF libraries, per se;
         | but rather that the format is _so_ simple that as long as your
         | language has a byte-array type, you can code a _bug-free_ IFF
         | encoder /decoder in said language about five minutes.
         | 
         | (And this is _why_ there are no generic IFF metaformat
         | libraries, ala JSON or XML libraries; it 's "too simple to
         | bother everyone depending on my library with a transitive
         | dependency", so everyone just implements IFF encoding/decoding
         | as part of the parser + generator for their IFF-based concrete
         | file format.)
         | 
         | What's IFF used in? AIFF; RIFF (and therefore WAV, AVI, ANI,
         | and -- perhaps surprisingly -- WebP); JPEG2000; PNG [with
         | tweaks]...
         | 
         | * There's also a descendant metaformat, the ISO Base Media File
         | Format ("BMFF"), which in turn means that MP4, MOV, and
         | HEIF/HEIC can all be parsed by a generic IFF parser (though
         | you'll miss breaking some per-leaf-chunk metadata fields out
         | from the chunk body if you don't use a BMFF-specific parser.)
         | 
         | * And, as an alternative, there's
         | https://en.wikipedia.org/wiki/Extensible_Binary_Meta_Languag...
         | ("EBML"), which is basically IFF but with varint-encoding of
         | the "type" and "length" parts of TLV (see https://matroska-
         | org.github.io/libebml/specs.html). This is mostly currently
         | used as the metaformat of the Matroska (MKV) format. It's also
         | _just_ complex enough to have a standalone generic codec
         | library (https://github.com/Matroska-Org/libebml).
         | 
         | My personal recommendation, if you have some structured binary
         | data to dump to disk, is to just hand-generate IFF chunks
         | inline in your dump/export/send logic, the same way one would
         | e.g. hand-emit CSV inline in a printf call. Just say "this is
         | an IFF-based format" or put an .iff extension on it or send it
         | as application/x-iff, and an ecosystem _should_ be able to run
         | with that. (And just like with JSON, if you give the IFF chunks
         | descriptive names, people will probably be able to suss out
         | what the chunks  "mean" from context, without any kind of
         | schema docs being necessary.)
        
       | gabesullice wrote:
       | I love this post style. Never stop learning friend!
        
         | zavec wrote:
         | Yeah, people are being snarky and saying it's obvious, but it
         | was new to me! I guess I'm not staring at base64 all that
         | often. It's a neat trick though, now I'm going to pay attention
         | next time I have an opportunity to use it.
        
       | mmastrac wrote:
       | I built a JWT support library at work
       | (https://github.com/geldata/gel-rust/tree/master/gel-jwt) and I
       | can confirm that JWTs all sound like "eyyyyyy" in my head.
        
         | isoprophlex wrote:
         | eyy lmao
        
         | Muromec wrote:
         | >JWTs all sound like "eyyyyyy" in my head.
         | 
         | "eeey bruh, open the the API it's me"
        
         | proactivesvcs wrote:
         | "Is that you again, punk zip?" when seeing the first few bytes
         | of a zip file.
        
           | ragmodel226 wrote:
           | Probably shouldn't call Phil Katz a punk
        
             | ragmodel226 wrote:
             | Also, MZ in exe is Mark Zbikowski.
        
         | layer8 wrote:
         | It's like how all certificates sound like "miiiiii".
        
           | schoen wrote:
           | "It's miiiii! And I can _prove it_! "
        
         | pedropaulovc wrote:
         | Ey, I'm JSON!
        
       | snickerdoodle12 wrote:
       | Isn't this obvious to anyone who has seen a few base64 encoded
       | json strings or certificates? ey and LS are a staple.
        
         | mmastrac wrote:
         | `MII` for RSA private keys.
        
           | Muromec wrote:
           | MII is not RSA, it's an opening header of asn1 structure
           | encoded to DER -- 30 82 0x which is basically "{" when which
           | can be pretty much anything from x509 certificate to private
           | keys fro ECDSA.
           | 
           | Actual RSA oid is somewhere in the middle.
        
             | mmastrac wrote:
             | True, but for the most part, RSA keys are the only keys
             | that anyone encounters that start with long SEQUENCEs
             | requiring two-byte lengths.
             | 
             | `eY` could be any JSON, but it's most likely going to be a
             | JWT.
             | 
             | Neither is a perfect signal, but contextually is more
             | likely correct than not.
        
               | Muromec wrote:
               | That depends on the kind of abyss you are staring into.
               | Mine had plenty of non-RSA keys, certificates (which are
               | of course two-byte length all the time) and CMS
               | containers.
        
         | FelipeCortez wrote:
         | I thought so too, but xkcd 1053 / lucky 10000, I guess! I knew
         | about ey but not LS
        
         | InfoSecErik wrote:
         | IMO depends on your career. I did a lot of pentesting with Burp
         | Suite so I was able to (forced to) pick it up.
        
         | SkyPuncher wrote:
         | Probably is, but I still found it to be a fun tidbit.
         | 
         | I work with this stuff often enough to recognize something that
         | looks like a key or a hash. I don't work with it often enough
         | to have picked up `ey` and `LS`.
        
       | athorax wrote:
       | Base64 encoded yaml files will also be LS-prefixed if they have
       | the document marker (---)
        
         | thibaultamartin wrote:
         | That's right, I've added an errata to clarify. Thanks for the
         | heads up!
        
       | gnabgib wrote:
       | You can spot Base64 encoded JSON.
       | 
       | The PEM format (that begins with `-----BEGIN
       | [CERTIFICATE|CERTIFICATE REQUEST|PRIVATE KEY|X509 CRL|PUBLIC
       | KEY]-----`) is already Base64 within the body.. the header and
       | footer are ASCII, and shouldn't be encoded[0] (there's no link to
       | the claim so perhaps there's another format similar to PEM?)
       | 
       | You can't spot private keys, unless they start with a repeating
       | text sequence (or use the PEM format with header also encoded).
       | 
       | [0]: https://datatracker.ietf.org/doc/html/rfc7468
        
         | ctz wrote:
         | The other base64 prefix to look out for is `MI`. `MI` is common
         | to every ASN.1 DER encoded object (all public and private keys
         | in standard encodings, all certificates, all CRLs) because
         | overwhelmingly every object is a `SEQUENCE` (0x30 tag byte)
         | followed by a length introducer (top nibble 0x8). `MII` is very
         | very common, because it introduces a `SEQUENCE` with a two byte
         | length.
        
           | Muromec wrote:
           | I for one wait for the day when quantum computers will break
           | all the encryption forever so nobody will have to suffer
           | broken asn1 decoders, plaintext specifications of machine-
           | readable formats and unearned aura of arcane art that
           | surrounds the whole thing.
        
             | ctz wrote:
             | asn1 enjoyers can also look forward to the sweet release of
             | death. though if you end up in hell you might end up
             | staring at XER for the rest of eternity
        
         | thibaultamartin wrote:
         | Thanks for pointing it out! I've added an errata to the blog
         | post
        
       | yahoozoo wrote:
       | babby's first base64
        
       | cluckindan wrote:
       | On mobile, the long rows in the code blocks blow up the layout.
        
       | Muromec wrote:
       | After staring one time too much at base64-encoded or hex-encoded
       | asn1 I started to believe that scene in the Matrix where operator
       | was looking at raw stream from Matrix at his terminal and was
       | seeing things in it.
        
         | cestith wrote:
         | Years ago I was part of a group of people I knew who could read
         | and edit large parts of sendmail.cf by hand without using m4.
         | Other people who had to deal with mail servers at the time
         | certainly treated it like a superpower.
        
           | Muromec wrote:
           | Where I work right now superpower of the day is pressing
           | ctrl-r in the terminal.
        
           | nickdothutton wrote:
           | A significant part of my 1st ever job consisted of editing
           | sendmail.cf's by hand. Occasionally had to defer to my boss
           | at the time for the real mind bending stuff. I now believe
           | that he was in fact a non-human alien.
        
           | quesera wrote:
           | In some ways, I miss those days.
           | 
           | Spending hours wrangling sendmail.cf, and finally succeeding,
           | felt like a genuine accomplishment.
           | 
           | Nowadays, things just work, mostly. How boring.
        
           | PeterWhittaker wrote:
           | In 1989, my Toronto-based team was at TJ Watson for the final
           | push on porting IBM's first TCP/IP implementation to MVS.
           | Some of our tests ran raw, no RACF, no other system
           | protections. I was responsible for testing the C sockets API,
           | a very cool job for a co-op.
           | 
           | When one of my tests crashed one of those unprotected
           | mainframes, two guys who were then close to my age now stared
           | at an EBCDIC core dump, one of them slowly hitting page down,
           | one Matrix-like screen after another, until they both jabbed
           | at the screen and shouted "THERE!" simultaneously.
           | 
           | (One of them hand delivered the first WATFOR compiler to
           | Yorktown, returning from Waterloo with a car full of tapes. I
           | have thought of him - and this "THERE!" moment - every time I
           | have come across the old saw about the bandwidth of a station
           | wagon.)
        
       | curiousObject wrote:
       | Yikes! It would be smart to bury these strings in an ad hoc
       | obfuscation so they aren't so obvious.
       | 
       | It doesn't even need to be much better than ROT13. Security by
       | obscurity is good for this situation.
        
       | Faaak wrote:
       | Nitpick, but enclosing the first string in single quotes would
       | make the reading better:
       | 
       | $ echo '{"' | base64
       | 
       | Vs
       | 
       | $ echo "{\"" | base64
        
       | dhosek wrote:
       | Kind of reminds me of a junior being amazed when I was able to
       | read ascii strings out of a hex stream. Us old folks have seen a
       | _lot_.
        
         | schoen wrote:
         | For anyone here who's never pondered it ("today's lucky
         | 10,000"?), there's a lot of intentional structure in the
         | organization of ASCII that comes through readily in binary or
         | hex.
         | 
         | https://altcodeunicode.com/ascii-american-standard-code-for-...
         | 
         | The first nibble (hex digit) shows your position within the
         | chart, approximately like 2 = punctuation, 3 = digits, 4 =
         | uppercase letters, 6 = lowercase letters. (Yes, there's more
         | structure than that considering it in binary.)
         | 
         | For digits (first nibble 3), the value of the digit is equal to
         | the value of the second nibble.
         | 
         | For punctuation (first nibble 2), the punctuation is the
         | character you'd get on a traditional U.S. keyboard layout
         | pressing shift and the digit of the second nibble.
         | 
         | For uppercase letters (first nibble 4, then overflowing into
         | first nibble 5), the second nibble is the ordinal position of
         | the letter within the alphabet. So 41 = A (letter #1), 42 = B
         | (letter #2), 43 = C (letter #3).
         | 
         | Lowercase letters do the same thing starting at 6, so 61 = a
         | (letter #1), 62 = b (letter #2), 63 = c (letter #3), etc.
         | 
         | The tricky ones are the overflow/wraparound into first nibble 5
         | (the letters from letter #16, P) and into first nibble 7 (from
         | letter #16, p). There you have to actually add 16 to the letter
         | position before combining it with the second nibble, or think
         | of it as like "letter #0x10, letter #0x11, letter #0x12..."
         | which may be less intuitive for some people).
         | 
         | Again, there's even more structure and pattern than that in
         | ASCII, and it's all fully intentional, largely to facilitate
         | meaningful bit manipulations. E.g. converting uppercase to
         | lowercase is just a matter of adding 32, or logical OR with
         | 0x00100000. Converting lowercase to uppercase is just a matter
         | of subtracting 32, or logical AND with 0x11011111.
         | 
         | For reading hex dumps of ASCII, it's also helpful to know that
         | the very first printable character (0x20) is, ironically, blank
         | -- it's the space character.
        
       | iou wrote:
       | "Welcome to the party, pal!"
        
       | netsharc wrote:
       | Good knowledge, now explain why it's like that.
       | 
       | {" is ASCII 01111011, 00100010
       | 
       | Base64 takes 3 bytes x 8 bits = 24 bits, groups that 24 bit-
       | sequence into four parts of 6 bits each, and then converts each
       | to a number between 0-63. If there aren't enough bits (we only
       | have 2 bytes = 16 bits, we need 18 bits), pad them with 0. Of
       | course in reality the last 2 bits would be taken from the 3rd
       | character of the JSON string, which is variable.
       | 
       | The first 6 bits are 011110, which in decimal is 30.
       | 
       | The second 6 bits are 110010, which in decimal is 50.
       | 
       | The last 4 bits are 0010. Pad it with 00 and you get 001000,
       | which is 8.
       | 
       | Using an encoding table
       | (https://base64.guru/learn/base64-characters), 30 is e, 50 is y
       | and 8 is I. There's your "ey".
       | 
       | Funny how CS people are so incurious now, this blog post touches
       | the surface but didn't get into the explanation.
        
         | perching_aix wrote:
         | I'd be _very_ hesitant to consider this as some runaway symbol
         | of  "CS people being incurious now" over the author simply not
         | being this deeply invested in this at the time of writing in
         | the context of their discovery, especially since it almost
         | certainly doesn't actually matter for them beyond the pattern
         | existing, if even that does.
        
           | netsharc wrote:
           | > it almost certainly doesn't actually matter for them beyond
           | the pattern existing, if even that does.
           | 
           | https://web.cs.ucdavis.edu/~rogaway/classes/188/materials/th.
           | ..
        
             | perching_aix wrote:
             | And there goes my limit of curiosity now regarding this.
             | I'm interested in what _you_ have to say, but not 25 page
             | mini-novel PDF from someone else interested. I 'm glad you
             | enjoyed that piece, but I have no interest in reading it,
             | nor do I think it's reasonable for you to expect me to be
             | interested. Much like with the author and the specifics of
             | this encoding.
        
               | netsharc wrote:
               | Hah, how about a 3 letter response: QED.
               | 
               | Both the author (in my opinion evidently) not
               | understanding what base64 does and the short story is
               | about depending on technology but not knowing/caring how
               | it works underneath. If you can live with that, well go
               | ahead. I find it not very hackerly.
        
               | perching_aix wrote:
               | I guess I fully deserve this as some sort of karmic
               | retribution, because I'm usually the person in the room
               | who's frustrated about people poking things they don't
               | fully understand, about folks continuing to spitball
               | rather than looking a layer deeper, and the one who over-
               | obsesses over details. It took me a very long time to
               | accept that sometimes ignorance is not only acceptable,
               | but optimal, and it continues to challenge me to this
               | day.
               | 
               | You mention "being hackerly". Imagine you were reverse
               | engineering some gnarly 100 MB obfuscated x86 binary.
               | Surely you can appreciate that especially if you have a
               | specific goal, it is overwhelmingly preferable to guess,
               | experiment, and poke than to kick off some heroic RE
               | effort that will take tens of people years, just so that
               | you can supposedly "fully understand" what's happening.
               | Attention is precious - not everything is worth equal
               | attention. And it is absolutely possible to correctly
               | guess things from limited information, and is even
               | essential to be able to.
               | 
               | You find base64 encoding interesting enough that you were
               | able to either recall detailed facts about its operation
               | from memory here, or looked it up quickly to break it
               | down. How is the author, or me, not doing so is any
               | evidence for you we're:
               | 
               | - ignorant about how base64 works and always have been
               | 
               | - don't care about (CS) things at depth in general
               | 
               | These are _such_ immensely strong claims to make. Surely
               | you can appreciate that some people just have different
               | interests sometimes? That they might focus on different
               | things? That they can learn things and then forget about
               | them? That to some level everything is connected, so
               | appealing to that is not exactly some grand revelation of
               | missing a  "key piece"?
               | 
               | Few years, or I guess more than just a few years ago, in
               | college, I met up with a former classmate from primary
               | school. He was studying history and shared some great
               | (historical) stories that I really enjoyed. But then
               | another thought formulated in my mind: if I had to
               | actively _study_ this, rather than just catch a story or
               | two, I 'd definitely be dropping out. And that's when I
               | realized that there can be value to things, they can be
               | interesting, yet at the same time it's OK for me not to
               | be interested by them or pursue them deeper. Just like
               | how I think it is perfectly OK to be interested in this
               | pattern, but not care for the underlying mapping
               | mechanism, as it is essentially irrelevant. The fun was
               | in the fact, not in the mechanism (in my view for the
               | author anyways).
        
         | positisop wrote:
         | I think CS grads often skip the part of how something actually
         | works and are happy with abstractions.
        
         | syncsynchalt wrote:
         | I think the audience already understands why it works, it's
         | more the knowing there's a relatively small set of mnemonics
         | for these things that's interesting. "eyJ" for JSON, "LS0" for
         | dashes (PEM encoding), "MII" for the DER payload inside a PEM,
         | and so on.
         | 
         | I've been doing this a long time but until today the only one
         | I'd noticed was "MII".
        
         | appreciatorBus wrote:
         | That's really a leap about the writer's interest.
         | 
         | They could just as easily have felt the underlying reason was
         | so obvious it wasn't worth mentioning.
         | 
         | I know how base64 encoding works but had never noticed the
         | pattern the author pointed out. As soon as read it, I
         | ubderstood why. It didn't occur to me that the author should
         | have explained it at a deeper level.
        
         | nxnsxnbx wrote:
         | The author also doesn't explain what JSON is. Because it's
         | obvious to the target audience. There's simply no explanation
         | necessary
        
       | karel-3d wrote:
       | I debugged way too many JWT tokens
       | 
       | I know eyJhbG by heart
        
         | karel-3d wrote:
         | they technically don't need to begin like that! JWT is JSON and
         | is therefore infamously vague... but in practice they for some
         | reason always begin with "alg" so always like eyJhbG
        
           | xg15 wrote:
           | Has anyone tried to send a JWT token with the fields in a
           | different order (e.g. a long key first and key ID and
           | algorithm behind) and see how many implementations will
           | break?
        
         | syncsynchalt wrote:
         | I didn't even realize I knew that string, but I recognized it
         | immediately from your post.
        
       | Sophira wrote:
       | Mathematically, base64 is such that every block of three
       | characters of raw input will result in four characters of
       | base64'd output.
       | 
       | These blocks can be considered independent of each other. So for
       | example, with the string "Hello world", you can do the following
       | base64 transformations:
       | 
       | * "Hel" -> "SGVs"
       | 
       | * "lo " -> "bG8g"
       | 
       | * "wor" -> "d29y"
       | 
       | * "ld" -> "bGQ="
       | 
       | These encoded blocks can then be concatenated together and you
       | have your final encoded string: "SGVsbG8gd29ybGQ="
       | 
       | (Notice that the last one ends in an equals sign. This is because
       | the input is less than 3 characters, and so in order to produce 4
       | characters of output, it has to apply padding - part of which is
       | encoded in the third digit as well.)
       | 
       | It's important to note that this is simply a byproduct of the way
       | that base64 works, not actually an intended thing. My
       | understanding is that it's basically like how if you take an
       | ASCII character - which could be considered a base 256 digit -
       | and convert it to hexadecimal (base 16), the resulting hex number
       | will always be two digits long - the _same_ two digits, at that -
       | even if the original was part of a larger string.
       | 
       | In this case, every three base 256 digits will convert to four
       | base 64 digits, in the same way that it would convert to six base
       | 16 digits.
        
         | zokier wrote:
         | nitpick but ascii would be base128, largest ascii value is 0x7f
         | which in itself is a telltale if you are looking at hex dumps.
        
           | Sophira wrote:
           | Yeah, I was aware of that, but I figured it was the easiest
           | way to explain it. It's true that "character representation
           | of a byte" is more accurate, but it doesn't roll off the
           | tongue as easily.
        
         | Sophira wrote:
         | By the way, I would guess that this is almost certainly why
         | LLMs can actually decode/encode base64 somewhat well, even
         | without the help of any MCP-provided tools - it's possible to
         | 'read' it In a similar way to how an LLM might read any other
         | language, and most encoded base64 on the web will come with its
         | decoded version alongside it.
        
       | tetha wrote:
       | Reminds me of 1213486160[1]
       | 
       | Besides that, I just spent way too much time figuring out this is
       | an encrypted OpenTofu state. It just looked way too much like a
       | terraform state but not entirely. Tells ya what I spend a lot of
       | time with at work.
       | 
       | This is probably another interesting situation in which you
       | cannot read the state, but you can observe changes and growth by
       | observing the ciphertext. It's probably fine, but remains
       | interesting.
       | 
       | 1: https://rachelbythebay.com/w/2016/02/21/malloc/
        
       | koolba wrote:
       | Well duh. It's a deterministic encoding. Does not matter if it's
       | base64, hex, or even rot13.
       | 
       | Is this the state of modern understanding of basic primitives?
        
       | calibas wrote:
       | The encoded JSON string is going to start with "ey", unless
       | there's whitespace in the first couple characters.
       | 
       | Also, it seem like the really important point is kind of glossed
       | over. Base64 is not a kind of encryption, it's an encoding that
       | anybody can easily decode. Using it to hide secrets in a GitHub
       | repo is a really really dumb thing to do.
        
       | cfontes wrote:
       | Not directly correlated but I know a old guy that can decrypt
       | EBCDIC and credit card positional data format on the fly. And
       | sometimes it was a "feeling" he couldn't explain it properlly but
       | knew exactly the value, name and other data.
       | 
       | It was amazing to see him decode VISA and MASTER transactions on
       | the fly in logs and other places.
        
         | andrepd wrote:
         | That's got to be the most _niche_ party trick I 've ever heard
         | of.
        
         | VoidWhisperer wrote:
         | I would hope that these logs don't include the full details of
         | the credit card (such as number/cvv).. if it does, the company
         | that is logging this info could end up having some issues with
         | Visa/MC
         | 
         | Edit: Now that I looked at it a little deeper, i'm assuming
         | they are talking about these[0] sort of files?
         | 
         | [0]: https://docs.helix.q2.com/docs/card-transaction-file
        
           | Aspos wrote:
           | PCI DSS is a relatively new thing. Before it card data flew
           | in the open
        
       | benatkin wrote:
       | I'm more partial to PCFkb2N0eXBlIGh0bWw+
        
       ___________________________________________________________________
       (page generated 2025-08-05 23:00 UTC)