[HN Gopher] Representing SHA-256 Hashes as Avatars
       ___________________________________________________________________
        
       Representing SHA-256 Hashes as Avatars
        
       Author : franky47
       Score  : 141 points
       Date   : 2021-04-19 11:54 UTC (11 hours ago)
        
 (HTM) web link (francoisbest.com)
 (TXT) w3m dump (francoisbest.com)
        
       | rumblefrog wrote:
       | Would be more awesome if it could export as an image! Right now
       | I'm just manually inspecting it and copying the entire <g>
       | section.
        
         | franky47 wrote:
         | Someone actually built exactly that while I was writing the
         | article :)
         | 
         | https://github.com/wzulfikar/hashvatar
        
       | zellyn wrote:
       | Suggestion: shave off two bits, and switch between the variants
       | in the "A bit of fun" section:
       | https://francoisbest.com/posts/2021/hashvatars#a-bit-of-fun
        
         | franky47 wrote:
         | What do you mean by "shave off 2 bits"?
        
           | vimax wrote:
           | I think he means use 2 bits to decide the variation
        
       | EdSchouten wrote:
       | Still, these images are kind of hard to compare/remember.
       | 
       | Why not convert a hash to a correct horse battery staple?
       | https://xkcd.com/936/
        
       | milkey_mouse wrote:
       | Urbit also developed a solution for turning a number into an
       | avatar, although theirs only have 32 bits of entropy, and to be
       | honest there are many that are difficult to tell apart:
       | 
       | https://urbit.org/blog/creating-sigils/
        
       | sm4rk0 wrote:
       | Strange that neither the article nor the comments mention
       | https://gravatar.com/
       | 
       | It hashes the user's email
       | http://en.gravatar.com/site/implement/hash/ and creates an
       | "identicon" from the hash
       | http://scott.sherrillmix.com/blog/blogger/wp_identicon/ or loads
       | a user-defined image.
        
         | leipert wrote:
         | I really like the former article method over the gravatar
         | identicon because the circular shape is not going to end up
         | with ,,accidental swastikas"
        
       | nemo1618 wrote:
       | The problem with hash avatars in general is that people want to
       | use them for _identity verification_ -- and humans are wired to
       | do so automatically -- but technologically, they cannot provide
       | this. The space of possible avatars (2^256, in this case) is far,
       | far larger than the number of distinct objects that humans can
       | distinguish between. Which means that there will invariably be
       | "collisions:" two avatars that are not identical, but _appear_
       | identical to humans. As a result, if an attacker can brute-force
       | an avatar that looks very similar to, say, Elon Musk 's avatar,
       | they can trivially scam people.
       | 
       | It follows that, since avatars do not provide any proof of
       | identity, there is actually no harm in greatly truncating the
       | hash space when generating them! That is, rather than trying to
       | encode all 256 bits into the avatar, you can use a much more
       | manageable number, like 16. But isn't this too small? Won't there
       | be lots of collisions? Yes -- but that's a feature! If collisions
       | are _common_ , then the average user will be aware that avatar !=
       | identity, which makes them _less_ susceptible to scamming. But 16
       | bits is still enough to meet the real goal of avatars: quickly
       | distinguishing between different people in a conversation (or
       | transaction, or whatever).
       | 
       | (This also shows why making avatars more costly to generate, e.g.
       | with scrypt, can do more harm than good: doing so makes
       | collisions less likely, but still _not impossible_. Meaning that
       | if a collision _does_ occur, whether accidental or malicious, you
       | are less likely to notice it.)
        
         | FqOD4xih7Uq6m9Z wrote:
         | There might not be 2^256 distinguishable objects but maybe
         | someone can come up with 2^16 distinguishable objects and just
         | string 16 of them together. If there is one character off in a
         | string of 40 hexadecimal characters it is hard to notice but
         | that would be easier to detect in a set of 16 symbols.
        
         | adzm wrote:
         | On a related note, I've been experimenting with using a simple
         | word list (like the eff diceware list) to generate strings of
         | words encoding data. Trickiest part is figuring out how to
         | encode padding, and the eventual size of the word list, and how
         | complicated the final solution should be (eg using word lists
         | that are not even binary numbers and leftover bits and all
         | that). The diceware word list is nice since the words are not
         | ambiguous and don't have homophones.
         | 
         | I assumed there would be existing implementations of something
         | similar but have not found one that fits criteria other than
         | some that use very small word lists. Diceware has 7776 words
         | and pushing that to 8192 should be feasible and is a bit easier
         | to work with.
        
           | shoghicp wrote:
           | BIP-39, uses 2048 words, and can all be distinguished from
           | each other using the first four characters of each word. This
           | is used to encode raw binary entropy, but adapting it to
           | arbitrary amounts of data is straightforward. For padding I
           | would suggest either pre-encoding length at the start or
           | using classic block cipher padding
           | (https://en.wikipedia.org/wiki/Padding_(cryptography) )
           | 
           | See for BIP-39, wordlists under a folder https://github.com/b
           | itcoin/bips/blob/master/bip-0039.mediawi...
        
         | sva_ wrote:
         | > The space of possible avatars (2^256, in this case) is far,
         | far larger than the number of distinct objects that humans can
         | distinguish between.
         | 
         | That sounds intriguing to me. Are you aware of any research
         | into this?
        
           | HideousKojima wrote:
           | https://www.researchgate.net/publication/236023905_Color_dif.
           | ..
        
           | g_sch wrote:
           | It's a simple order of magnitude calculation. 2^256 is
           | greater than one billion to the eighth power, times 100,000.
           | There are probably several possible ways you could estimate
           | how many different objects a person could distinguish
           | between, but I think it's unlikely you'd come up with even a
           | single billion.
        
             | littlestymaar wrote:
             | > but I think it's unlikely you'd come up with even a
             | single billion.
             | 
             | One billion is really really tiny though, let's just play a
             | game :
             | 
             | - Unless you're colorblind, you can easily tell ten hues
             | apart. Let's pick two colors, one with a saturated hue, and
             | the other with a pastel one. That's 100 possibilities.
             | 
             | - I'm pretty sure you can easily recognize pictures of a
             | hundred people you've met at some point in your life. Let's
             | pick two of them, that's 10 thousand combinations.
             | 
             | - can you recognize ten different road signs? Ten country
             | shapes ? Ten animals ? Ten fictional character? Ten books
             | cover? Ten celebrities? Just pick three categories, and
             | you've got a thousand combinations.
             | 
             | Now I'm pretty sure you can tell your grandma sitting under
             | a vivid pink UK shape next to your 9th grade math teacher
             | staring at Bruce Willis holding a giant light blue stop
             | sign apart from any other imaginable combinations.
             | 
             | An untrained[1] human brain probably cannot distinguish
             | between 2^256 items, but it's still able to do it for
             | massive numbers.
             | 
             | [1]: but maybe it's possible with training: for instance,
             | chess professionals might be able to do it.
        
           | nemo1618 wrote:
           | To be honest, I have no idea how many distinct objects humans
           | can distinguish between, but I am 99% confident that it is
           | fewer than 2^128, much less 2^256.
           | 
           | I suppose it's a somewhat nuanced question, though. For
           | example, if I were shown every avatar in sequence, I'm quite
           | sure I would always notice the "diff" between two consecutive
           | avatars. But the bar that I have in mind is much, much
           | higher: given a sequence of avatars, can I recognize my
           | friend's avatar with 100% accuracy? Given that we can't even
           | do this within the set of <8 billion human faces (we
           | occasionally accost a stranger as though they were a friend),
           | I have to conclude that doing so within a set of 2^256
           | abstract shapes is entirely hopeless.
        
             | [deleted]
        
             | dwpdwpdwpdwpdwp wrote:
             | It is definitely a nuanced question. The definition of
             | object is surely up for debate as well. A silly example: if
             | one were to define an object as a string of 64 hex
             | characters, even non-literate people could distinguish
             | between any two distinct objects.
             | 
             | echo "hello" | sha256sum
             | 
             | >> 5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e8
             | 46f6be03
             | 
             | echo "world" | sha256sum
             | 
             | >> e258d248fda94c63753607f7c4494ee0fcbe92f1a76bfdac795c9d84
             | 101eb317
             | 
             | But were I to briefly glance at a computer screen, I'd
             | probably confuse these next two:
             | 
             | e258d248fda94c63753607f7c4494ee0fcbe92f1a76bfdac795c9d84101
             | eb317
             | 
             | e258d248fda94c63753607f7c4494ee0fcbe92f1a76bfaac795c9d84101
             | eb317
        
             | sva_ wrote:
             | Yeah, I should've been more clear. There's no doubt that
             | humans can't distinguish/memorize 2^256 abstract shapes.
             | However, only a small subset of those would have to be
             | memorized - those which are relevant to the individual. I'd
             | agree that this particular pattern doesn't have enough
             | variability for each pattern to be unique enough to
             | reliably identify it, but I'd conjecture that its possible
             | to make such a pattern, which has enough variability and
             | unique characteristics to be recognizable (ignoring the
             | fact that an adversary could make a very similar pattern to
             | mislead the individual - I'm not curious about it for
             | verification.)
             | 
             | Your example of the fact that we can't reliably recognize
             | every face on this planet is very interesting. Let's
             | imagine we know n faces which I can reliably distinguish
             | from one another, but now there is a (n+1)th face which I
             | mix up with one of the previous ones. Now lets assume this
             | face would instead have a very unique characteristic,
             | unlike all previous faces - lets imagine for example, the
             | nose on this face is upside down. Surely I'd be able to
             | differentiate it from the previous n faces, hence the issue
             | of identifying it might've been the limited
             | variability/characteristics in the various previous
             | instantiations of a face.
             | 
             | So there are a number of characteristics in a face, which
             | have a certain degree of variability, which enable us to
             | distinguish from a number of them. I've been pondering on
             | how many of those characteristics could be combined in an
             | object, and how high the variability could be; to create
             | uniquely identifiable patterns. It probably depends a lot
             | on the meaning we attribute to the pattern, different
             | associations we have to it.
             | 
             | I apologize - quite the tangent, I guess. I've just been
             | pondering a lot on this for a project I've been working on
             | for some time.
        
       | InfiniteCode wrote:
       | I did this one some time ago, allows custom visual effects using
       | hash and seeded random: https://www.blankjs.com/
        
         | tosh wrote:
         | Wow, those are beautiful. Starred.
        
       | sneak wrote:
       | See also:
       | 
       | https://robohash.org/
        
       | raphlinus wrote:
       | I still like snowflakes for this: https://levien.com/snowflake-
       | explain.html is a half-finished blog post explaining the
       | motivation and algorithm I came up with. I never did careful user
       | testing, but suspect that the answer would be that some people
       | can reliably distinguish the patterns, others won't be able to.
       | 
       | In any case, there are a lot of variations on this "visual hash"
       | idea, including the original fractal one, and I heard of more
       | recent work to use the hash to seed StyleGAN face generation.
        
         | ianopolous wrote:
         | This is a great idea! Trying random ones, I couldn't find two I
         | thought looked confusable.
        
       | joshbuddy wrote:
       | You should check out this paper where they tested different
       | representations on humans to see what they could tell apart, and
       | came up with a novel representation called Moji.
       | 
       | https://exascale.info/assets/pdf/students/MSc_Thesis_-_Micha...
        
       | geoah wrote:
       | One of the prettiest identicons I've seen.
       | 
       | Since it doesn't seem to be lossy, I was wondering if it could be
       | somehow adapted to something that could be scanned as a QR code.
       | I guess the minor color shifts might be hard to get right, but
       | maybe combined/replaced with some form of symbol inside rings to
       | help, a dot/dash combination?
        
         | geoah wrote:
         | I'll also leave here this very nice list of identicon
         | implementations: https://github.com/drhus/awesome-identicons
        
       | RcouF1uZ4gsC wrote:
       | It would be a lot more work, but it might work better if you
       | picked something which humans are particularly tuned to notice
       | subtle details such as faces.
        
         | franky47 wrote:
         | Using the hash as a seed for an AI face generator like
         | thispersondoesnotexist would be pretty powerful. Free idea for
         | anyone who wants to give it a shot.
        
           | capableweb wrote:
           | Look at that, you reinvented NFTs such as CryptoPunks :)
           | https://www.larvalabs.com/cryptopunks
        
       | tosh wrote:
       | How about using the variants as well so the avatars also
       | structurally look different from each other (and adding even more
       | variants)?
        
       | petee wrote:
       | I always thought ssh randomart representations were visually
       | unique enough; maybe combine smaller, simpler shapes with color
       | too?
       | 
       | The rings are neat, but I found many to be too similar based on
       | color alone, and segments too are really hard pick up on a
       | pattern or something memorable
        
       | KingMachiavelli wrote:
       | Despite the issue where it would be trivial to brute force
       | similar looking but not identical 'avatars', I think this still
       | has a few good uses for non-identification.
       | 
       | 1. Creating at least some default avatar. Not to be used to
       | verify identity but just somewhat better than having a very
       | limited set of default images. Having rate limits on account
       | creation would prevent most brute force methods. 2. Avatar
       | suitable for partial-identification for very small populations.
       | Imagine a matrix/Element room that as <100,000 people. The
       | hash/math could be modified to drastically trim down the space of
       | the hash (e.g. 2^256) to something similar to the size of the
       | room.
       | 
       | #2 sounds pretty interesting. It could be expanded by making
       | parts of the image/avatar dependent on some other input other
       | than the user ID like the user's role in the chat group. Another
       | segment/ring could something more short lived and relative like
       | just identifying users in recent chat messages.
        
       | kop316 wrote:
       | As a warning, this would not be good for colorblind people (such
       | as myself).
       | 
       | The "Hello, Hacker News!" Hash's middle ring has half it's ring
       | that looks identical to me, and unless I looked carefully, that
       | entire ring looked the same to me.
        
         | franky47 wrote:
         | What would you suggest as a solution ? I considered swapping
         | Hue for Lightness in order to increase contrast changes. Would
         | you be interested in testing out some variants ?
        
           | floatingatoll wrote:
           | Your Hue choices are selected from a pool of 16 with various
           | mutators applied. Hue alone isn't a viable path forward, so
           | finding a translation of Hue to a non-Hue representation that
           | doesn't worsen the diagram is essential.
           | 
           | You could apply repeating surface textures inside each slice,
           | rather than showing a solid color, so that Hue 1 shows
           | repeating dots, Hue 2 shows repeating lines, Hue 3 shows
           | repeating triangles.
           | 
           | You could use a Braille-like 2x2 grid to represent the 4-bit
           | Hue space as circles and lines within each slice.
           | 
           | If you imagine that each slice has 4 walls, replacing the
           | missing fourth wall of the innermost slices with the
           | innermost corner, then you could map the binary
           | representation of 2^4 hue (such as 0110) onto "bites" out of
           | the walls. For example, given 0110, map the 0s onto "bites"
           | and punch a small hole into two adjacent walls of the slice;
           | given 0000, punch a small hole into all four walls.
           | 
           | ASCII art of what I mean by "punch a hole into the wall", for
           | Hue 0111 (one zero, so one hole punched). This is an uncurved
           | slice, because ASCII art.                    ____________
           | |            |         |     __     |         |____/  \____|
        
             | franky47 wrote:
             | That's a great suggestion, I love the "hole punching" idea
             | (although it'll probably end up looking like Swiss cheese).
             | 
             | The key is to find a solution that looks good enough when
             | zoomed out to ~64px square, which is tricky for details,
             | especially in the inner ring where sections are packed so
             | close from one another.
        
               | floatingatoll wrote:
               | I imagine that's why square hashes are more common than
               | circles: the raw information density problem.
        
           | thehappypm wrote:
           | I am also colorblind. The gold standard is to use a color
           | palette that is engineered for color blindness, which uses a
           | suite of color-blind-friendly colors and heavily utilizes
           | lightness. Here's a good article on an example from Tableau:
           | https://public.tableau.com/en-us/s/blog/2013/10/choosing-
           | col...
        
           | jerf wrote:
           | Given the nature of what you're trying to do, arguably, there
           | is very little for you to do. As you've already observed,
           | "normal" human visual acuity is already wildly incapable of
           | perceiving 2^256 different possibilities as distinct anyhow.
           | If normal humans are 200 bits short of the desired 256 (and
           | I'm still feeling generous claiming we could distinguish 2^56
           | different images of this type, but it's a nice round number
           | to make my point here), color blind people are 203-ish bits
           | short or so. It's not that materially different.
           | 
           | Normally when discussing being color-blind sensitive we're
           | discussing not embedding 2 or 3 bits of information into
           | colors that can't be distinguished by those who are color
           | blind, but in this case, we're trying to jam massively more
           | bits than anyone can handle into an image, so it's not clear
           | that much is called for other than tweaking where the bits
           | get lost a bit.
           | 
           | Or, to put it another way, relative to the desired goal,
           | we're _all_ already massively  "colorblind". Those who are
           | what we humans would call colorblind are, in relative terms,
           | hardly at a disadvantage at all for once, because we're _all_
           | so many orders of magnitude short of the mark.
        
           | kop316 wrote:
           | The primary issue in colorblindness is:
           | 
           | 1) One confuses two types of colors as the same one (e.g.
           | red-green, blue-yellow, etc. colorblind) or
           | 
           | 2) Colors that are close together appear to be the identical
           | (Like the case that I saw, half of the row looked exactly the
           | same to me, the the entire row looked the same until I looked
           | close).
           | 
           | Perhaps a mix of shapes and colors would make it more
           | obvious? Or constrasting the border colors too to hightlight
           | closer difference (like if you have "f8" and "f0", which has
           | a hamming distance of 1, you make the boarder somehow
           | highlight the differences.
           | 
           | Don't get me wrong, I think it is a neat idea! I just want
           | you to be aware.
        
             | franky47 wrote:
             | Thanks for your feedback. Are the color-blindness
             | simulators in Firefox devtools good enough to reproduce
             | your experience? Or do you have tools that you'd recommend?
        
               | kop316 wrote:
               | You're welcome! I am happy to help.
               | 
               | > Are the color-blindness simulators in Firefox devtools
               | good enough to reproduce your experience?
               | 
               | I have honestly not used them, sorry.... (I don't make
               | GUIs).
               | 
               | I am looking at the docuymentation, and I am a bit
               | disappointed though. By far the most common issue is
               | contrast loss like they say:
               | https://developer.mozilla.org/en-
               | US/docs/Tools/Accessibility...
               | 
               | The condition to not see one color completely is
               | incredibly rare, and not seeing any colors even more so.
               | By in large the issue is constrast loss.
        
       ___________________________________________________________________
       (page generated 2021-04-19 23:01 UTC)