[HN Gopher] TIL: Versions of UUID and when to use them
___________________________________________________________________
TIL: Versions of UUID and when to use them
Author : fagnerbrack
Score : 128 points
Date : 2024-08-25 19:05 UTC (3 hours ago)
(HTM) web link (ntietz.com)
(TXT) w3m dump (ntietz.com)
| MrDarcy wrote:
| Just use v7.
|
| Cue the security experts who say otherwise...
| wongarsu wrote:
| Use v4 if creation date could conceivably be sensitive
| information of if you depend on your uuids being completely
| unguessable. Otherwise use v7
| stavros wrote:
| If we want a v7, shouldn't we use a ULID instead?
| wongarsu wrote:
| When we didn't have UUIDv7, ULID was great. But now that we
| have v7 it's the more widely supported alternative. And
| apart from v7 setting the UUID version bits and having a
| different default representation they are not that
| different.
| stavros wrote:
| Oh, I didn't realize v7 was newer than ULID, thanks.
| eropple wrote:
| ULID's presentation format is probably better for humans,
| though. You can double-click-to-highlight a ULID; the
| standard UUID representation doesn't like this.
|
| (You can use ULID's presentational tools with UUIDv7,
| though.)
| wtetzner wrote:
| > the standard UUID representation doesn't like this.
|
| Yeah, I've gotten in the habit of stripping hyphens from
| the string representation of UUIDs in a lot of the code I
| write for that reason.
| beart wrote:
| It doesn't help in other tools, but there is a css rule
| to help with this.
|
| https://developer.mozilla.org/en-US/docs/Web/CSS/user-
| select
| Lammy wrote:
| > You can double-click-to-highlight a ULID; the standard
| UUID representation doesn't like this.
|
| You can control this behavior in CSS with `user-select`.
| Peep my fiddle: https://jsfiddle.net/gLyph5km/
| eropple wrote:
| Yup, in a browser you can. In my terminal or my text
| editor or Slack, I can't.
| fiddlerwoaroof wrote:
| An issue I've always had with UUIDs and ULIDs is there isn't a
| great way to generate one deterministically, as far as I can
| tell: for a lot of use-cases, being able to reprocess data and
| generate identical IDs is really useful and there isn't a
| standard way that I know of to achieve this.
| exe34 wrote:
| https://stackoverflow.com/a/64229385
| fiddlerwoaroof wrote:
| Sure, there are workarounds in various languages, but it
| would nice to have a standardized hash-based UUID or ULID
| 1986 wrote:
| from the article, it sounds like this is V5?
| fiddlerwoaroof wrote:
| I missed that because I typically am using ULIDs these
| days. But, yeah, some standardized format for a hash of
| message data is what I want.
| IggleSniggle wrote:
| If it's a standardized sequence, then that's no different
| than just 0, 1, 2, 3 but with different names. If you
| just want a non-sequential but deterministic sequence,
| then that's every random number generator that accepts a
| seed value, and being anymore standardized than that
| makes zero sense.
| VWWHFSfQ wrote:
| Are you looking for something other than just a custom seed
| in the RNG?
| 1986 wrote:
| why wouldn't you use some sort of collision resistant hashing
| function on the data to achieve this instead?
| mmiyer wrote:
| That's UUID v5 (uses a sha1 hash of input data).
| treve wrote:
| Why are you dismissive of security-related issues?
| tonetegeatinst wrote:
| Because developers don't always consider the security aspect.
| Not saying this is what he's doing but could also just be due
| to how complex good software can be to write.
|
| Their is a reason cybersecurity or UI/UX or product design
| isn't always left to the developer. The coder write code that
| fits certain criteria they are given, then someone down the
| line might QA check it, fuzz inputs or security review the
| code. How well this is done depends on the product,market,
| and environment.
| Vecr wrote:
| I suggest not using any of the MAC based versions. In theory that
| could be anything other than v4 and v7, but v1 is the worst. As
| well as v3, MD5 is horribly broken.
| tashbarg wrote:
| MD5 is "broken" as a cryptographic hash function. It still is
| perfectly fine as a non-cryptographic hash function.
| Vecr wrote:
| Not really, it's slower than truncated blake3 for no gain and
| much loss.
| slaymaker1907 wrote:
| Yeah, if you really need non-guessability, you should be
| using the version that's completely random anyways.
| ozim wrote:
| If you rely on non-guessability you use it as a security
| measure? So your sentence doesn't invalidate previous
| poster.
| motohagiography wrote:
| While I didn't know the details of ones other than 4, the one
| really useful one missing would be using some SHA256 data with a
| counter, not unlike PBKDF2. It could be a privacy preserving
| derived identifier, where you you could loosely prove a given
| UUID had been derived from a given seed.
| hamasho wrote:
| I wish there's a standard for short UUID, like
| `73WakrfVbNJBaAmhQtEeDv` or `bK7nP9xM`. I mean, it's not UUID
| cause it can be duplicated somewhere, I just want an ID standart
| combination of random and short enough to remember.
| jfdjkfdhjds wrote:
| just use creation timedate plus auto increment int.
|
| and then a small hash with base64 or 37 or whatever is in vogue
| these days.
|
| thats what old timers used before uuid 1.
|
| guess we should guerilla standardize something like this as
| uuid-0 or uuid-deprecated-2.0 for keeping up with the spirit.
| gregmac wrote:
| The closest that comes to minds is ULID[0]. It is short (26
| character base32), 128 bit and lexicographically sortable.
|
| I think the reason there's no other popular standard is you
| give up something. 128 bit gives a pretty low risk of
| collisions in almost all uses, but as you go smaller you start
| having to consider the specific scenario and impact, etc, which
| doesn't work well for a _standard_.
|
| You could use another encoding (eg base64 or base85) to get it
| shorter, but you start sacrificing other things (case
| sensitivity, url-safeness) - again, not great for a standard.
|
| [0] https://github.com/ulid/spec
| tommy_axle wrote:
| Not a standard per se but nanoid seems to fit the bill. Widely
| implemented.
| geitir wrote:
| Git uses SHA and then dynamically set the number of characters
| to use based on repository size. You could do something like
| this.
| pants2 wrote:
| Sqids[1] might fit the bill for you - the IDs it produces are
| much shorter than UUIDs, however they're not universally unique
| - they're generated from an integer sequence.
|
| 1. https://sqids.org/
| andrewstuart wrote:
| I was just today wanting shorter UUIDs so if you like a more
| compact/short UUID you can convert them like so to url safe
| base64.
|
| It's the same UUID just in 22 character form and can be
| converted back. It's n ot really a conversion because a UUID is
| just a 128 bit value so its an alternative representation.
| 483971cf-aad7-4c84-abf1-4a94c9d72f99 -> SDlxz6rXTISr8UqUydcvmQ
| (length: 22) fb67926f-3cfb-486c-a7da-30662147a20b ->
| A2eSbzz7SGyn2jBmIUeiCw (length: 22)
| 799069a9-b32a-415f-b689-a8cc3f51bfa4 -> eZBpqbMqQVA2iajMP1GBpA
| (length: 22) 8161ee0b-f7a5-4b32-95ea-9b9efe94e5f2 ->
| gWHuCBelSzKV6pueBpTl8g (length: 22) b1ea416c-f209-43cb-
| bfaf-d9cf6229459e -> sepBbPIJQ8uBr9nPYilFng (length: 22)
| ee70989a-b614-4665-9881-41054544c313 -> 7nCYmrYURmWYgUEFRUTDEw
| (length: 22) cce06fe2-b64f-47bc-a91a-d3dfd343e1e5 ->
| zOBv4rZPR7ypGtPf00Ph5Q (length: 22)
| aea3de6e-e769-4c8d-ba2d-77922d227176 -> rqPebudpTI26LXeSLSJxdg
| (length: 22) import uuid import base64
| def make_short_uuid(data): encoded =
| base64.urlsafe_b64encode(data).rstrip(b'=').decode('utf-8')
| return encoded.replace('-', 'A').replace('_', 'B')
| def generate_and_print_uuids(): for _ in range(8):
| uuid_obj = uuid.uuid4() uuid_bytes =
| uuid_obj.bytes print(f'{uuid_obj} ->
| {make_short_uuid(uuid_bytes)} (length:
| {len(make_short_uuid(uuid_bytes))})')
| generate_and_print_uuids()
| wereHamster wrote:
| I usually generate N bits of randomness and base58 encode it.
| Choose N to your liking. You loose the benefits of monotonic
| sorting that is present in some UUID versions. Base58 is url
| safe and does not contain any special characters. And you can
| still store values as binary (eg. bytea in Postgres instead of
| a text column).
| jagrsw wrote:
| Imagine how many careers have been built on inventing and
| promoting something, in the end, turned out to be a cleverly
| encoded output from /dev/urandom.
| JSDevOps wrote:
| Interesting read. You learn something everyday.
| Lammy wrote:
| > UUID Version 2 (v2) is reserved for security IDs with no known
| details.
|
| Only no known details if the only document you're reading is the
| notoriously poorly-specified RFC. Here you go:
| https://pubs.opengroup.org/onlinepubs/9696989899/chap5.htm#t...
|
| There are also "version 0" UUIDs that you are very unlikely to
| ever come across but should be noted because they are the source
| of the reserved bits (via wastefully setting aside an entire
| octet for Address Family) that later allowed the other "versions"
| to be specified in a compatible way. Read my research about them
| here in my UUID library:
| https://github.com/okeeblow/DistorteD/blob/NEW%E2%80%85SENSA...
|
| I decided to support them Because It's Cool(tm) but still need to
| figure out how to handle the date rollover of them and the even-
| older Apollo UIDs: irb>
| ::GlobeGlitter::from_ncs_time => "#<GlobeGlitter
| 40639cd25341.02.00.00.e0.4c.18.00.69>" irb>
| ::GlobeGlitter::from_ncs_time.to_time => 1988-12-21
| 14:52:02 UTC irb> ::GlobeGlitter::from_aegis_time =>
| "#<GlobeGlitter 00000000-0000-0000-4814-17c8b0080069>"
|
| (Proper AEGIS `#to_str` not implemented yet lol)
| dlgeek wrote:
| > UUID Version 2 (v2) is reserved for security IDs with no known
| details.
|
| I found the details in about 2 minutes: Click the link in the
| article to take me to the section of RFC 9562 that says it's
| defined as part of DCE, click the first link in that paragraph to
| go to the spec, ctrl-f "UUID", then jump to appendix A
| (deceptively named "Universal Unique Identifier") which has all
| the details.
|
| Is it really too much to ask to CLICK YOUR OWN LINKS?
| octernion wrote:
| haha I had the precise same thought process and immediately
| didn't finish the article since they didn't have much attention
| to detail.
|
| i enjoyed reading the appendix though as a snapshot of time.
| THBC wrote:
| Probably written by a language model
| efilife wrote:
| Uuid 4 is just a random bytes generator that inserts hyphens in
| specified places. You don't need to use it, you can just generate
| random bytes yourself and save on space (unnecessary hyphens,
| version info and so on)
| sweca wrote:
| True but the appeal for most developers is it's simple to
| implement. Virtually every language has a UUID library that
| works in one line or code.
|
| Like in Go, it's just uuid.New().String() vs using crypto/rand
| to read random data, convert it into Base64 of hex... which
| will take more lines and effort.
| lopkeny12ko wrote:
| This is an unfair argument. Which standard library in Go
| gives you uuid.New().String()? Anyone can publish a third
| party library that condenses reading random data, creating an
| identifier from it, and rendering it as a string into a
| single line of code API.
| Lammy wrote:
| UUIDs are 128-bit numbers and the hyphenated-string
| representation is only one of many ways to represent that
| number, sort of like how an IPv4 address is a 32-bit number of
| which the "dotted-quad" is only one representation. If you are
| thinking of UUID as a string format then your most fundamental
| concept of UUID is flawed.
|
| Even if you do just want a random identifier (not really the
| original point of UUID but has become their most popular form)
| I still think it's cool how random UUIDs have a little flag bit
| to tell you that it's intended to be random. Useful when one
| runs across a lone identifier with zero context.
| wongarsu wrote:
| UUID 4 also sets 4 bits to fixed values to indicate it's
| version 4. You can argue whether creating different namespaces
| between the different methods of creating UUIDs is useful. But
| your plain random number generator has only a 1/16 chance of
| generating a valid UUIDv4. (setting the bits correctly is
| however trivial if you do want to roll your own uuid generator)
| pajeets wrote:
| is there something shorter than UUID
|
| i hate how long it is
|
| something like youtube URLs but guaranteed to be without
| duplicates
| asperous wrote:
| One advantage of uuids is they can be generated on several
| distributed systems without having to check with each other
| that they are unique. Only long ids make this reliable. Youtube
| ids are random and short, but youtube has to check they are
| unique when generating them.
|
| Maybe one way is to split up a random assignment space and
| assign to each distributed node, but that would be more
| complex.
| wtetzner wrote:
| Even UUIDs are not guaranteed to not have duplicates. It's just
| extremely unlikely, largely due to their length.
| wongarsu wrote:
| If you are fine with creating IDs in a centralized way (as you
| would do in 99% of cases anyways) you can just use a normal
| incrementing integer primary key. Then encrypt it with XTEA
| (either at your API boundary or in the database) to get non-
| sequential unguessable 64 bit keys. [1] has example code for
| postgres. If the original key don't have duplicates then the
| XTEA encrypted keys don't have duplicates either.
|
| Then just encode it in a format of your choosing. Youtube uses
| a modified base64 encoding (no padding, and + and / are
| replaced by - and _). And youtube video ids seem to also be 64
| bits, just like xtea output.
|
| 1: https://wiki.postgresql.org/wiki/XTEA_(crypt_64_bits)
| amarcheschi wrote:
| I'm failing at understanding what is the purpose of having uuid2.
| I didn't even know that more type existed till now. I had only
| encountered uuid2 when asking xandr to remove my personal data
| from its database. (discussion about xandr being asked to be
| investigated in Europe by noyb here
| https://news.ycombinator.com/item?id=40913915)
|
| By reading the Wikipedia page I'm failing at understanding why we
| invented something called universally unique identifier and have
| different types of it, some of which can be traced back to the
| original pc. Is it because mixing some Mac codes increase the
| chance of the uuid2 being randomic or does it have a different
| reason? For privacy reason, could we just not have a very long
| identifier with many different chars to choose from so that we
| have so many combinations that we're almost guaranteed we're
| using non duplicated uuids?
___________________________________________________________________
(page generated 2024-08-25 23:00 UTC)