[HN Gopher] TIL: Versions of UUID and when to use them
       ___________________________________________________________________
        
       TIL: Versions of UUID and when to use them
        
       Author : fagnerbrack
       Score  : 128 points
       Date   : 2024-08-25 19:05 UTC (3 hours ago)
        
 (HTM) web link (ntietz.com)
 (TXT) w3m dump (ntietz.com)
        
       | MrDarcy wrote:
       | Just use v7.
       | 
       | Cue the security experts who say otherwise...
        
         | wongarsu wrote:
         | Use v4 if creation date could conceivably be sensitive
         | information of if you depend on your uuids being completely
         | unguessable. Otherwise use v7
        
           | stavros wrote:
           | If we want a v7, shouldn't we use a ULID instead?
        
             | wongarsu wrote:
             | When we didn't have UUIDv7, ULID was great. But now that we
             | have v7 it's the more widely supported alternative. And
             | apart from v7 setting the UUID version bits and having a
             | different default representation they are not that
             | different.
        
               | stavros wrote:
               | Oh, I didn't realize v7 was newer than ULID, thanks.
        
               | eropple wrote:
               | ULID's presentation format is probably better for humans,
               | though. You can double-click-to-highlight a ULID; the
               | standard UUID representation doesn't like this.
               | 
               | (You can use ULID's presentational tools with UUIDv7,
               | though.)
        
               | wtetzner wrote:
               | > the standard UUID representation doesn't like this.
               | 
               | Yeah, I've gotten in the habit of stripping hyphens from
               | the string representation of UUIDs in a lot of the code I
               | write for that reason.
        
               | beart wrote:
               | It doesn't help in other tools, but there is a css rule
               | to help with this.
               | 
               | https://developer.mozilla.org/en-US/docs/Web/CSS/user-
               | select
        
               | Lammy wrote:
               | > You can double-click-to-highlight a ULID; the standard
               | UUID representation doesn't like this.
               | 
               | You can control this behavior in CSS with `user-select`.
               | Peep my fiddle: https://jsfiddle.net/gLyph5km/
        
               | eropple wrote:
               | Yup, in a browser you can. In my terminal or my text
               | editor or Slack, I can't.
        
         | fiddlerwoaroof wrote:
         | An issue I've always had with UUIDs and ULIDs is there isn't a
         | great way to generate one deterministically, as far as I can
         | tell: for a lot of use-cases, being able to reprocess data and
         | generate identical IDs is really useful and there isn't a
         | standard way that I know of to achieve this.
        
           | exe34 wrote:
           | https://stackoverflow.com/a/64229385
        
             | fiddlerwoaroof wrote:
             | Sure, there are workarounds in various languages, but it
             | would nice to have a standardized hash-based UUID or ULID
        
               | 1986 wrote:
               | from the article, it sounds like this is V5?
        
               | fiddlerwoaroof wrote:
               | I missed that because I typically am using ULIDs these
               | days. But, yeah, some standardized format for a hash of
               | message data is what I want.
        
               | IggleSniggle wrote:
               | If it's a standardized sequence, then that's no different
               | than just 0, 1, 2, 3 but with different names. If you
               | just want a non-sequential but deterministic sequence,
               | then that's every random number generator that accepts a
               | seed value, and being anymore standardized than that
               | makes zero sense.
        
           | VWWHFSfQ wrote:
           | Are you looking for something other than just a custom seed
           | in the RNG?
        
           | 1986 wrote:
           | why wouldn't you use some sort of collision resistant hashing
           | function on the data to achieve this instead?
        
           | mmiyer wrote:
           | That's UUID v5 (uses a sha1 hash of input data).
        
         | treve wrote:
         | Why are you dismissive of security-related issues?
        
           | tonetegeatinst wrote:
           | Because developers don't always consider the security aspect.
           | Not saying this is what he's doing but could also just be due
           | to how complex good software can be to write.
           | 
           | Their is a reason cybersecurity or UI/UX or product design
           | isn't always left to the developer. The coder write code that
           | fits certain criteria they are given, then someone down the
           | line might QA check it, fuzz inputs or security review the
           | code. How well this is done depends on the product,market,
           | and environment.
        
       | Vecr wrote:
       | I suggest not using any of the MAC based versions. In theory that
       | could be anything other than v4 and v7, but v1 is the worst. As
       | well as v3, MD5 is horribly broken.
        
         | tashbarg wrote:
         | MD5 is "broken" as a cryptographic hash function. It still is
         | perfectly fine as a non-cryptographic hash function.
        
           | Vecr wrote:
           | Not really, it's slower than truncated blake3 for no gain and
           | much loss.
        
           | slaymaker1907 wrote:
           | Yeah, if you really need non-guessability, you should be
           | using the version that's completely random anyways.
        
             | ozim wrote:
             | If you rely on non-guessability you use it as a security
             | measure? So your sentence doesn't invalidate previous
             | poster.
        
       | motohagiography wrote:
       | While I didn't know the details of ones other than 4, the one
       | really useful one missing would be using some SHA256 data with a
       | counter, not unlike PBKDF2. It could be a privacy preserving
       | derived identifier, where you you could loosely prove a given
       | UUID had been derived from a given seed.
        
       | hamasho wrote:
       | I wish there's a standard for short UUID, like
       | `73WakrfVbNJBaAmhQtEeDv` or `bK7nP9xM`. I mean, it's not UUID
       | cause it can be duplicated somewhere, I just want an ID standart
       | combination of random and short enough to remember.
        
         | jfdjkfdhjds wrote:
         | just use creation timedate plus auto increment int.
         | 
         | and then a small hash with base64 or 37 or whatever is in vogue
         | these days.
         | 
         | thats what old timers used before uuid 1.
         | 
         | guess we should guerilla standardize something like this as
         | uuid-0 or uuid-deprecated-2.0 for keeping up with the spirit.
        
         | gregmac wrote:
         | The closest that comes to minds is ULID[0]. It is short (26
         | character base32), 128 bit and lexicographically sortable.
         | 
         | I think the reason there's no other popular standard is you
         | give up something. 128 bit gives a pretty low risk of
         | collisions in almost all uses, but as you go smaller you start
         | having to consider the specific scenario and impact, etc, which
         | doesn't work well for a _standard_.
         | 
         | You could use another encoding (eg base64 or base85) to get it
         | shorter, but you start sacrificing other things (case
         | sensitivity, url-safeness) - again, not great for a standard.
         | 
         | [0] https://github.com/ulid/spec
        
         | tommy_axle wrote:
         | Not a standard per se but nanoid seems to fit the bill. Widely
         | implemented.
        
         | geitir wrote:
         | Git uses SHA and then dynamically set the number of characters
         | to use based on repository size. You could do something like
         | this.
        
         | pants2 wrote:
         | Sqids[1] might fit the bill for you - the IDs it produces are
         | much shorter than UUIDs, however they're not universally unique
         | - they're generated from an integer sequence.
         | 
         | 1. https://sqids.org/
        
         | andrewstuart wrote:
         | I was just today wanting shorter UUIDs so if you like a more
         | compact/short UUID you can convert them like so to url safe
         | base64.
         | 
         | It's the same UUID just in 22 character form and can be
         | converted back. It's n ot really a conversion because a UUID is
         | just a 128 bit value so its an alternative representation.
         | 483971cf-aad7-4c84-abf1-4a94c9d72f99 -> SDlxz6rXTISr8UqUydcvmQ
         | (length: 22)       fb67926f-3cfb-486c-a7da-30662147a20b ->
         | A2eSbzz7SGyn2jBmIUeiCw (length: 22)
         | 799069a9-b32a-415f-b689-a8cc3f51bfa4 -> eZBpqbMqQVA2iajMP1GBpA
         | (length: 22)       8161ee0b-f7a5-4b32-95ea-9b9efe94e5f2 ->
         | gWHuCBelSzKV6pueBpTl8g (length: 22)       b1ea416c-f209-43cb-
         | bfaf-d9cf6229459e -> sepBbPIJQ8uBr9nPYilFng (length: 22)
         | ee70989a-b614-4665-9881-41054544c313 -> 7nCYmrYURmWYgUEFRUTDEw
         | (length: 22)       cce06fe2-b64f-47bc-a91a-d3dfd343e1e5 ->
         | zOBv4rZPR7ypGtPf00Ph5Q (length: 22)
         | aea3de6e-e769-4c8d-ba2d-77922d227176 -> rqPebudpTI26LXeSLSJxdg
         | (length: 22)              import uuid         import base64
         | def make_short_uuid(data):             encoded =
         | base64.urlsafe_b64encode(data).rstrip(b'=').decode('utf-8')
         | return encoded.replace('-', 'A').replace('_', 'B')
         | def generate_and_print_uuids():             for _ in range(8):
         | uuid_obj = uuid.uuid4()                 uuid_bytes =
         | uuid_obj.bytes                 print(f'{uuid_obj} ->
         | {make_short_uuid(uuid_bytes)} (length:
         | {len(make_short_uuid(uuid_bytes))})')
         | generate_and_print_uuids()
        
         | wereHamster wrote:
         | I usually generate N bits of randomness and base58 encode it.
         | Choose N to your liking. You loose the benefits of monotonic
         | sorting that is present in some UUID versions. Base58 is url
         | safe and does not contain any special characters. And you can
         | still store values as binary (eg. bytea in Postgres instead of
         | a text column).
        
       | jagrsw wrote:
       | Imagine how many careers have been built on inventing and
       | promoting something, in the end, turned out to be a cleverly
       | encoded output from /dev/urandom.
        
       | JSDevOps wrote:
       | Interesting read. You learn something everyday.
        
       | Lammy wrote:
       | > UUID Version 2 (v2) is reserved for security IDs with no known
       | details.
       | 
       | Only no known details if the only document you're reading is the
       | notoriously poorly-specified RFC. Here you go:
       | https://pubs.opengroup.org/onlinepubs/9696989899/chap5.htm#t...
       | 
       | There are also "version 0" UUIDs that you are very unlikely to
       | ever come across but should be noted because they are the source
       | of the reserved bits (via wastefully setting aside an entire
       | octet for Address Family) that later allowed the other "versions"
       | to be specified in a compatible way. Read my research about them
       | here in my UUID library:
       | https://github.com/okeeblow/DistorteD/blob/NEW%E2%80%85SENSA...
       | 
       | I decided to support them Because It's Cool(tm) but still need to
       | figure out how to handle the date rollover of them and the even-
       | older Apollo UIDs:                 irb>
       | ::GlobeGlitter::from_ncs_time       => "#<GlobeGlitter
       | 40639cd25341.02.00.00.e0.4c.18.00.69>"       irb>
       | ::GlobeGlitter::from_ncs_time.to_time       => 1988-12-21
       | 14:52:02 UTC       irb> ::GlobeGlitter::from_aegis_time       =>
       | "#<GlobeGlitter 00000000-0000-0000-4814-17c8b0080069>"
       | 
       | (Proper AEGIS `#to_str` not implemented yet lol)
        
       | dlgeek wrote:
       | > UUID Version 2 (v2) is reserved for security IDs with no known
       | details.
       | 
       | I found the details in about 2 minutes: Click the link in the
       | article to take me to the section of RFC 9562 that says it's
       | defined as part of DCE, click the first link in that paragraph to
       | go to the spec, ctrl-f "UUID", then jump to appendix A
       | (deceptively named "Universal Unique Identifier") which has all
       | the details.
       | 
       | Is it really too much to ask to CLICK YOUR OWN LINKS?
        
         | octernion wrote:
         | haha I had the precise same thought process and immediately
         | didn't finish the article since they didn't have much attention
         | to detail.
         | 
         | i enjoyed reading the appendix though as a snapshot of time.
        
         | THBC wrote:
         | Probably written by a language model
        
       | efilife wrote:
       | Uuid 4 is just a random bytes generator that inserts hyphens in
       | specified places. You don't need to use it, you can just generate
       | random bytes yourself and save on space (unnecessary hyphens,
       | version info and so on)
        
         | sweca wrote:
         | True but the appeal for most developers is it's simple to
         | implement. Virtually every language has a UUID library that
         | works in one line or code.
         | 
         | Like in Go, it's just uuid.New().String() vs using crypto/rand
         | to read random data, convert it into Base64 of hex... which
         | will take more lines and effort.
        
           | lopkeny12ko wrote:
           | This is an unfair argument. Which standard library in Go
           | gives you uuid.New().String()? Anyone can publish a third
           | party library that condenses reading random data, creating an
           | identifier from it, and rendering it as a string into a
           | single line of code API.
        
         | Lammy wrote:
         | UUIDs are 128-bit numbers and the hyphenated-string
         | representation is only one of many ways to represent that
         | number, sort of like how an IPv4 address is a 32-bit number of
         | which the "dotted-quad" is only one representation. If you are
         | thinking of UUID as a string format then your most fundamental
         | concept of UUID is flawed.
         | 
         | Even if you do just want a random identifier (not really the
         | original point of UUID but has become their most popular form)
         | I still think it's cool how random UUIDs have a little flag bit
         | to tell you that it's intended to be random. Useful when one
         | runs across a lone identifier with zero context.
        
         | wongarsu wrote:
         | UUID 4 also sets 4 bits to fixed values to indicate it's
         | version 4. You can argue whether creating different namespaces
         | between the different methods of creating UUIDs is useful. But
         | your plain random number generator has only a 1/16 chance of
         | generating a valid UUIDv4. (setting the bits correctly is
         | however trivial if you do want to roll your own uuid generator)
        
       | pajeets wrote:
       | is there something shorter than UUID
       | 
       | i hate how long it is
       | 
       | something like youtube URLs but guaranteed to be without
       | duplicates
        
         | asperous wrote:
         | One advantage of uuids is they can be generated on several
         | distributed systems without having to check with each other
         | that they are unique. Only long ids make this reliable. Youtube
         | ids are random and short, but youtube has to check they are
         | unique when generating them.
         | 
         | Maybe one way is to split up a random assignment space and
         | assign to each distributed node, but that would be more
         | complex.
        
         | wtetzner wrote:
         | Even UUIDs are not guaranteed to not have duplicates. It's just
         | extremely unlikely, largely due to their length.
        
         | wongarsu wrote:
         | If you are fine with creating IDs in a centralized way (as you
         | would do in 99% of cases anyways) you can just use a normal
         | incrementing integer primary key. Then encrypt it with XTEA
         | (either at your API boundary or in the database) to get non-
         | sequential unguessable 64 bit keys. [1] has example code for
         | postgres. If the original key don't have duplicates then the
         | XTEA encrypted keys don't have duplicates either.
         | 
         | Then just encode it in a format of your choosing. Youtube uses
         | a modified base64 encoding (no padding, and + and / are
         | replaced by - and _). And youtube video ids seem to also be 64
         | bits, just like xtea output.
         | 
         | 1: https://wiki.postgresql.org/wiki/XTEA_(crypt_64_bits)
        
       | amarcheschi wrote:
       | I'm failing at understanding what is the purpose of having uuid2.
       | I didn't even know that more type existed till now. I had only
       | encountered uuid2 when asking xandr to remove my personal data
       | from its database. (discussion about xandr being asked to be
       | investigated in Europe by noyb here
       | https://news.ycombinator.com/item?id=40913915)
       | 
       | By reading the Wikipedia page I'm failing at understanding why we
       | invented something called universally unique identifier and have
       | different types of it, some of which can be traced back to the
       | original pc. Is it because mixing some Mac codes increase the
       | chance of the uuid2 being randomic or does it have a different
       | reason? For privacy reason, could we just not have a very long
       | identifier with many different chars to choose from so that we
       | have so many combinations that we're almost guaranteed we're
       | using non duplicated uuids?
        
       ___________________________________________________________________
       (page generated 2024-08-25 23:00 UTC)