hngopher.com

       [HN Gopher] New UUID Formats
       ___________________________________________________________________
        
       New UUID Formats
        
       Author : swyx
       Score  : 265 points
       Date   : 2022-06-12 15:17 UTC (7 hours ago)
        
 (HTM) web link (www.ietf.org)
 (TXT) w3m dump (www.ietf.org)
        
       | nixpulvis wrote:
       | Maybe I'm being slow right now, but can somewhat help me
       | understand why Max UUID is ever specifically useful?
        
         | okr wrote:
         | My guess is, that it is just a constant, ready to be used for
         | bitwise operations.
        
           | nixpulvis wrote:
           | Yep... I was just being slow. I thought the specialized
           | variant was a new kind of UUID (being inverse of RFC4122),
           | but it's just the inverse of RFC4122's Nil UUID; a single
           | value.
           | 
           | Sorry for the silly comment. This value is just a bunch of
           | binary 1s.
        
       | lewisl9029 wrote:
       | I've been using ULID for a while now, which analogous to UUID v7
       | but with a different (better IMHO) string representation. They've
       | been awesome for using as sort keys in dynamo for instance, since
       | they're lexicographically sortable as strings.
       | 
       | But one thing I'm still wary about is exposing these IDs with
       | millisecond-precision time components to end users, since I've
       | seen multiple discussions here on HN about the potential for
       | timing attacks.
       | 
       | How worried should I really be? Do people have useful heuristics
       | on the kinds of data where it's safe/unsafe to expose timing
       | information, or should I just only expose a separate UUID v4
       | externally across the board just to be safe?
        
         | doliveira wrote:
         | I came up with a scheme in which the random part of the ULID is
         | a slice of the hash of a UUID, which seems to work fine because
         | I'm generating it all server side, I guess? It works very well
         | for insertion into NoSQL databases, for instance.
         | 
         | So I'd also like to know the threat model for these timing
         | attacks.
        
       | kbumsik wrote:
       | UUIDv7 looks interesting, but how is it different from ULID [1]
       | in practice? I was considering using ULID for a upcoming new
       | project because it is lexicographically sortable but it looks
       | like UUIDv7 just can replace that.
       | 
       | [1]:
       | https://cran.r-project.org/web/packages/ulid/vignettes/intro...
        
         | rekwah wrote:
         | As the author of a popular ULID implementation in python[1],
         | the spec has no stewardship anymore. The specification repo[2]
         | has plenty of open issues and no real guidance or communication
         | beyond language implementation authors discussing corner cases
         | and the gaps in the spec. The monotonic functionality is
         | ambiguous (at best), doesn't consider distributed id
         | generation, and is implemented differently per-language [3].
         | 
         | Functionally, UUIDv7 might be the _same_ but the hope would be
         | for a more rigid specification for interoperability.
         | 
         | [1]: https://github.com/ahawker/ulid
         | 
         | [2]: https://github.com/ulid/spec
         | 
         | [3]: https://github.com/ulid/spec/issues/11
        
           | RobertRoberts wrote:
           | Thank you, I've been using ULID for a while now, and it
           | serves my purposes. But I have long term support concerns.
           | 
           | UUIDv7 really seems like the sweet spot between pure
           | INT/BIGINT auto incrementing PKs and universally sortable
           | universal ids.
        
           | kortex wrote:
           | I've bee using ULIDs in python for about a year now and so
           | far have been super happy with them, so a) thank you for
           | maintaining this! b) I always felt a bit uneasy about the way
           | the spec describes the monotonicity component. Personally I
           | just rely on the random aspect as I am fortunate enough to
           | say that two events in the same millisecond are effectively
           | simultaneous.
           | 
           | At that point, it's basically just UUID7 with Crockford
           | base32 encoding, more or less.
           | 
           | IMHO the in-process monotonically increasing feature of ULID
           | is misguided. As you mention, distributed ids are a pain. The
           | instant you start talking distributed, monotonic counters or
           | orderable events (two threads count as distributed in this
           | case), you need to talk things like Lamport clocks or other
           | hybrid clock strategies. It's better to reach for the right
           | tools in this case, vs half-baked monotonic-only-in-this-
           | process vague guarantee.
        
       | canadiantim wrote:
       | How would one go about trying to use UUID 7 in a Postgres
       | database / python codebase now?
        
         | Starlevel001 wrote:
         | Postgres UUIDs are treated as an opaque 8-byte array, you can
         | use anything as long as it's in the textual format.
        
         | oittaa wrote:
         | I made a simple Python test library that extends the standard
         | UUID class with UUIDv6 and UUIDv7. You might want to check it
         | out. https://github.com/oittaa/uuid6-python
         | 
         | The official UUID Draft repository has also some alternatives
         | if you'd like to check those out.
         | https://github.com/uuid6/prototypes
         | 
         | Postgres supports UUIDs with any version number natively so you
         | can then do something like this with it:                 create
         | table data (id uuid, firstname varchar(100));       insert into
         | data (id, firstname) values
         | ('017f21cf-d130-7cc3-98c4-dc0c0c07398f', 'John');       select
         | * from data;
        
       | amluto wrote:
       | Fortunately this is a bit less relevant today as Windows loses
       | market share in database and server applications, but:
       | 
       | UUIDs have historically massively screwed up endian handling.
       | While this new draft discusses sorting UUIDs as strings of octets
       | (bytes) and the _text_ of RFC4122 is fairly explicit about most
       | significant bytes coming first, the C UUID structure in RFC 4122
       | appendix A is entirely misguided:                   typedef
       | struct {             unsigned32  time_low;             unsigned16
       | time_mid;             unsigned16  time_hi_and_version;
       | unsigned8   clock_seq_hi_and_reserved;             unsigned8
       | clock_seq_low;             byte        node[6];         } uuid_t;
       | 
       | Those aren't bytes -- they're integers of various sizes. (Hint:
       | do _not_ use integer types in C code for portable data
       | structures. ntohl, etc are a mess. Just use arrays of bytes.)
       | 
       | I don't know the whole history, but MS somehow took this
       | structure at face value and caused problems like this:
       | 
       | https://github.com/uuid-rs/uuid/issues/277
       | 
       | So, if you want to do anything (e.g. sorting) that depends on the
       | representation of a UUID (or even depends on converting between
       | string and binary representations), be aware that UUIDs coming
       | from Windows may be little-endian. In my book, this is a Windows
       | bug, but opinions may differ here.
        
         | Someone wrote:
         | > Hint: do not use integer types in C code for portable data
         | structures. ntohl, etc are a mess. Just use arrays of bytes.
         | 
         | I don't see how that helps much. If a developer forgets to call
         | _ntohl_ on multi-byte integer fields, I don't trust them to
         | correctly convert said integers to arrays of bytes, either.
        
           | kevincox wrote:
           | If you write it the naive way it works.
           | uint8_t bytes[4];         uint32_t = bytes[0] << 24 +
           | bytes[1] << 16 + bytes[2] << 8 + bytes[3];
           | 
           | The endianness is whatever you write in the indexing and will
           | be the same across architectures.
        
             | e4m2 wrote:
             | Unfortunately, the naive way also turns out to be wrong in
             | C. uint8_t gets promoted to a signed int when shifting,
             | which in turn causes undefined behavior for specific input.
             | One way of fixing this is casting to the desired type
             | before the shift, thus avoiding surprising conversions.
             | 
             | On a side note, compiler warnings and sanitizers help with
             | this kind of stuff greatly, use them if you have the
             | option: https://godbolt.org/z/8oq9GTcze
        
         | quotemstr wrote:
         | Little endian won. I don't see the point of maintaining
         | theoretical compatibility with big endian systems that are at
         | best esoteric right now and soon to be extinct. Likewise, every
         | reasonable platform aligns data fields on natural alignment
         | these days. It's just a waste of effort to make software
         | portable to evolutionary dead ends.
        
           | wolverine876 wrote:
           | You're saying it won for this application (UUIDs)?
           | Universally?
           | 
           | What is the most common remaining use of big endian?
        
             | tech2 wrote:
             | TCP/IP might be a reasonable example. There's a reason
             | "network byte order" and "big endian" are the same thing.
        
               | wolverine876 wrote:
               | > TCP/IP might be a reasonable example.
               | 
               | That's a pretty significant example of widespread usage
               | that isn't going away soon. Perhaps the GGP was only
               | referring to UUID applications.
        
           | mort96 wrote:
           | This isn't about compatibility with big-endian machines. This
           | is about compatibility between different uuid libraries,
           | potentially on different operating systems, all on little
           | endian CPU architectures.
        
             | ntauthority wrote:
             | Every existing library follows the standard which specifies
             | host byte order, which usually means little endian. The
             | Rust library cited a few levels up this chain ignored that,
             | somehow assuming big endian, and then had to correct for
             | this mistake.
        
           | [deleted]
        
           | [deleted]
        
         | firebird84 wrote:
         | It's actually worse than that. The first 3 groupings
         | (textually) of the uuid might be little endian while the other
         | 2 are big endian. Learning this cost me more time than I care
         | to admit.
         | 
         | https://en.wikipedia.org/wiki/Universally_unique_identifier#...
        
           | cpach wrote:
           | What the...?
           | 
           | This fact will haunt me in my dreams :-p
        
           | amluto wrote:
           | This is consistent with the misguided structure in the RFC.
           | The first three fields (the time fields) are multibyte
           | integers. The remainder is just bytes. The dashes in the
           | textual representation are just there to confuse you.
        
           | throwaway9870 wrote:
           | I had to write some EFI/GPT code earlier this year and was
           | dumbfounded to learn this. This is right up there with the
           | mork file format.
        
           | formerly_proven wrote:
           | Making mixed-endian _even more haunted_ is quite the
           | achievement. I congratulate whoever did this at Microsoft for
           | their lasting contribution.
        
         | secondcoming wrote:
         | I thought Windows uses GUIDs, not UUIDs
        
           | takeda wrote:
           | It looks like GUID is synonymous with UUID, but the name GUID
           | also implies that it could contain that bug/feature
           | mentioned[1]
           | 
           | [1] https://en.wikipedia.org/wiki/Universally_unique_identifi
           | er#...
        
         | JonathonW wrote:
         | UUIDs (as GUIDs) in Windows predate RFC 4122, so I don't think
         | it's that unreasonable that they're not compliant with the spec
         | (since the exact contents of a UUID aren't typically that
         | important, but consistency in how you produce them _is_ ).
         | 
         | The GUID implementation in Windows derives from the DCE RPC
         | specification [1]. That's where the multibyte integers in the
         | RFC 4122 specification come from (they're stated the same way
         | in the DCE RPC spec), and it doesn't explicitly specify their
         | endianness. It does call them "NDR integers", but NDR integers
         | can be little endian or big endian depending on implementation.
         | DCE specifies a mechanism by which you'd indicate which you're
         | using in an RPC call, but that data's not included in the UUID
         | format-- you get whatever byte order the system's decided to
         | use, which, for Windows, is little-endian.
         | 
         | [1]
         | https://pubs.opengroup.org/onlinepubs/9629399/apdxa.htm#tagc...
        
         | blueflow wrote:
         | This mishap lives forth in the UUID stored in a machines DMI
         | data, as well in GPT partition tables, which are required when
         | EFI is used. It would be really cool if we had some replacement
         | for EFI that would not harbour these kind of painful legacies.
        
           | dtech wrote:
           | On the other hand, this is an easy to implement conversion,
           | while changing such a fundamental thing from EFI sounds
           | pretty hard, making it not worth it.
        
             | blueflow wrote:
             | You cannot fix this with an conversion because you do not
             | know if your UUID is correct or needs conversion. DMI data
             | for example has inconsistent endian-ness depending on the
             | vendor. So if you have a UUID sticker on a new server, you
             | still have two options which UUID the machine will send
             | during PXE, either the printed UUID in big-endian encoding
             | or in the microsoft mixed-endian encoding.
             | 
             | Use BIOS boot instead of EFI, it has less legacy to
             | implement: PE executables, FAT file system, Win64 ABI
        
       | operator-name wrote:
       | With the introduction of UUIDv8, my fun script to generate vanity
       | uuids[0] can finally be spec comfortant!
       | 
       | [0]:https://github.com/operator-name/vanity-uuid
        
         | davidjfelix wrote:
         | I've been staring at the supposedly readable and memorable uuid
         | for like 4 minutes without any idea what it says.
        
           | operator-name wrote:
           | Yeah, looking back I could have done a bit more than simple
           | substitution. Since the sentences don't gave meaning,
           | interpreting between 1 as i vs l is especially difficult.
           | 
           | 5eedbed5-f05e-b055-ada0-d15ab11171e5
           | 
           | seedbeds-fose-boss-adao-disabilities
           | 
           | "Memorable" was definitely tongue in cheek, as was the spec
           | bending.
        
       | gigatexal wrote:
       | Uuidv7 looks really interesting. I like that the article mentions
       | all the other projects that attempted to fix uuid issues for
       | different use cases.
        
       | stevesimmons wrote:
       | This had appeared a few times before at earlier stages of the
       | draft process...
       | 
       | https://news.ycombinator.com/item?id=28088213 [244 comments]
       | 
       | Personally, I really like UUIDv7, except I transform the UUID to
       | a 25-character string which has all the same properties except it
       | doesn't look like a UUID. The last thing I want is my index of
       | time-sortable UUIDs getting contaminated with with some UUIDv4
       | fully random ones. Since UUIDs may be generated and persisted in
       | a distributed manner, it's a simple way to at least spot this.
        
         | ikornaselur wrote:
         | The UUID version is in the actual ID, so you would be able to
         | spot the version as it isn't fully random.
         | 
         | Although you light mean just visually you'd spot the difference
         | much easier!
        
         | theptip wrote:
         | Can't you just check the "version" bits and reject if it's not
         | 4/7? Or are you worried about someone generating a completely
         | random (non-compliant) set of bits that happens to parse as a
         | v4/7
        
       | duxup wrote:
       | I'm always amazed how much work goes into these.
       | 
       | Meanwhile 99% of the time i just call a function when "I just
       | need something unique here", -calls function-, and it works!
       | 
       | Thanks everyone!
        
       | GiorgioG wrote:
       | I used to be a big proponent of using UUIDs for database PKs but
       | I've found them inherently difficult to work with. It's much
       | easier to remember/recognize an integer based PK when
       | troubleshooting a data problem.
       | 
       | This isn't to say you shouldn't use UUIDs at all, but I much
       | prefer to use an "ExternalId" column of UUID type if you don't
       | want to expose your integer based PKs externally.
        
         | staticassertion wrote:
         | I feel like `serial` has got to be pretty fast for writes and
         | gives you some nice properties like sorted ordering based on
         | insertion time, a smaller key, and a key that can be compressed
         | far better than a uuid.
         | 
         | I get the idea of a ULID/UUID7 encoding a timestamp so you get
         | sorted order, but I wonder at what scale that beats just using
         | serial.
        
           | stormbrew wrote:
           | At any scale where you have to have multiple writers
           | generating IDs or need to merge results from multiple
           | sources, basically. Synchronizing a serial increment across
           | hosts and across time is a pain.
           | 
           | Obviously you can composite a timestamp to an incrementing
           | id, but then the serial part of it is kind of useless for
           | ordering, so you may as well use a random number and avoid
           | needing to synchronize at all. And then you've just
           | reinvented a time ordered uuid (but to be fair, without the
           | endian compatibility bs mentioned elsewhere).
        
             | staticassertion wrote:
             | I'm wondering what that performance actually looks like
             | though. The closest benchmark I have is a single postgres
             | instance with a uuid pkey and a bigint column,
             | transactionally incrementing the integer in the column.
             | 
             | I mean I guess mathing it out, updating an atomic integer
             | is ~2-100ns, depending on contention. If you need to
             | coordinate the writes you have anywhere from ~250ms-10ms.
             | 
             | We can basically throw away the increment at that point
             | since the network is hundreds/thousands of times slower.
             | 
             | So at 10ms, that's 100 op/s. At 250ms more like 40kops/s.
             | 
             | That ignores the fact that your db can perform those writes
             | concurrently and then batch the writes off to the other
             | database, so long as it doesn't pretend that they're all
             | committed at once. psql HOT updates would presumably be a
             | thing here idk.
             | 
             | If I had to guess, I'd lean towards the "40kop/s" being
             | closer than the "100op/s" but idk! I wish we had benchmarks
             | but I can't find anything :\
        
         | tpetry wrote:
         | That's in my experience also the best approach. I wrote an
         | article a few days ago about the exact thing: An integer auto
         | incrementing PK with an UUID you use externally:
         | 
         | https://sqlfordevs.io/uuid-prevent-enumeration-attack
        
           | tylerscott wrote:
           | Just seconding this as a sane way to use UUIDs IME. Basically
           | the sequential integer PK is the "internal ID". IIRC in
           | SQLite regardless of type or existence of PK there is a
           | private sequential integer. Super handy pattern to use.
        
           | hn_throwaway_99 wrote:
           | That is how I used to do (and currently still do) DB design,
           | but honestly I think if DBs start supporting UUID v7s well
           | that I would use that as the sole primary DB key as well as
           | the external ID:
           | 
           | 1. They are still sorted in increasing timestamp order (at
           | millisecond granularity), so they should have good DB index
           | characteristics.
           | 
           | 2. At the same time, they contain 62 bits of randomness,
           | would should pretty much eliminate IDOR attacks if there is a
           | bug elsewhere that isn't doing proper access checks. Not good
           | enough for secure tokens, but just good defense against
           | access permission check bugs.
           | 
           | That is, you should basically get the best of both worlds:
           | ordered keys with enough randomness to make ID-increment
           | attacks infeasible.
        
             | RobertRoberts wrote:
             | Yep, I want UUID v7, because right now I am using ULID and
             | it's fantastic, but I'd like more official and wide support
             | as well.
        
               | jeremyjh wrote:
               | I like the textual representation of ULIDs as well
               | though; I wish they'd just adopted ULID as v7. At least
               | it is binary compatible with existing UUID types.
        
               | RobertRoberts wrote:
               | Is there not an equivalent text representation for
               | UUIDv7?
        
               | jeremyjh wrote:
               | Sure you could encode it in Crockford's base-32 but if it
               | isn't part of the standard then tools won't implement it
               | natively, so you couldn't copy a key from a url and look
               | it up in postgres without running it through a conversion
               | function, for example.
        
         | kortex wrote:
         | > It's much easier to remember/recognize an integer based PK
         | when troubleshooting a data problem.
         | 
         | How often are you actually relying on memory of an ID to
         | troubleshoot a problem? I mean sure, if you are scanning
         | visually, it's good to recognize the same ID over and over
         | again, but my ability to do so caps around 4-6 characters. So I
         | just look at the last 4 chars regardless when fast scanning.
         | 
         | I use copy-paste for any time I need to transport IDs between
         | contexts (that isn't just scripted, which is best). Having a
         | copy-paste stack (Alfred, Raycast and others have this feature)
         | is a huge game changer here.
        
         | marcos100 wrote:
         | I can't see a case where an UUID PK is better than an INT (or
         | BIGINT). Why would you do that?
        
           | __s wrote:
           | If the id appears in a url, you may not want people to guess
           | ids. The information leak exists even if you do
           | authentication: maybe you don't want someone to be able to
           | guess how many records there are, or how quickly records are
           | being generated
        
           | tylerscott wrote:
           | If for no other reason than simplified debugging I find there
           | is value. Maybe I'm old but if you have more than one UUID
           | involved in the debugging I'm more likely to trip up than
           | just integers.
        
           | lkrubner wrote:
           | There were two problems.
           | 
           | One problem was the style that started around 2004, and was
           | very popular with Ruby on Rails and WordPress, and then
           | Syfmony and Django, where you expose the PK in the URL. If
           | your integer starts with 1 and then increments, you may not
           | get to a billion, and you'll never get to a trillion. So it
           | became ridiculously easy to for outsiders to scan your site:
           | 
           | http://www.example.com/1
           | 
           | http://www.example.com/2
           | 
           | http://www.example.com/3
           | 
           | ...
           | 
           | http://www.example.com/10000000000
           | 
           | That was one problem. Using UUIDs for PKs means outsiders
           | can't simply scan your site.
           | 
           | The other problem was that over the years, everyone ran into
           | the problem of moving a database, or needing to combine
           | multiple databases, in which case having PKs the start with 1
           | and then increment, a collision of the PKs, from different
           | databases, is 100% guaranteed. This often happens when
           | combining WordPress sites, for instance. If you use UUIDs as
           | your PK, then such collisions become unlikely.
        
           | madsbuch wrote:
           | Some applications have the need to create IDs in a
           | distributed manner, eg. Og clients, and use that identity
           | _before_ the database returns it. These systems benefit from
           | randomly generated IDs.
           | 
           | You could potentially hist use a random number between 0 and
           | 2^128-1 and still use ints, though I haven't seen that in
           | action, usually pk with ints are centrally generated and
           | consequitive.
        
         | chrismorgan wrote:
         | I've been working on a robust scheme for encrypted sequential
         | IDs, which is done, including library implementations in Rust,
         | JavaScript and Python, pending just a smidgeon more writing
         | about it and reviewing a decision on naming. You store an
         | integer in the database, then encrypt it with a real block
         | cipher, and stringify with Base58. I have three modes: one for
         | 32-bit IDs, using Speck32/64 and producing 4-6 character IDs;
         | one for 64-bit IDs, using Speck64/128 and producing 8-11
         | character IDs; and one hybrid, using the 32-bit mode for IDs
         | below 232 and the 64-bit mode above that, providing both a
         | forwards-compatibility measure and a way of producing short IDs
         | as long as possible. Contact me (see my profile) if you're
         | interested, or I'll probably publish it in another day or two.
         | Trouble is that I've been getting distracted with other related
         | concepts, like optimally-short encoding by using encryption
         | domains [0, 581), [581, 582), ..., [5810, 264) (this is format-
         | preserving encryption; the main reputable and practical choices
         | I've found are Hasty Pudding, which I've just about finished
         | implementing but would like test vectors for but they're on a
         | dead FTP site, and NIST's FF1 and FF3, which are patent-
         | encumbered), and ways of avoiding undesirable patterns (curse
         | words and such) by skipping integers from the database's ID
         | sequence if they encode to what you don't want, and check
         | characters with the Damm algorithm. If I didn't keep getting
         | distracted with these things, I'd have published a couple of
         | weeks ago.
         | 
         | (I am not aware of any open-source library embodying a scheme
         | like what I propose--all that I've found have either reduced
         | scope or badly broken encryption; https://github.com/yi-
         | jiayu/presents encrypts soundly, but doesn't stringify; Hashids
         | is broken almost beyond belief and should not be considered
         | encryption; Optimus uses an extremely weak encryption.)
         | 
         | UUIDs are crazy overkill in any situation where you can have
         | centralised ID allocation. Fully decentralised? Sure, 128 bits
         | of randomness or mixed clock and randomness or similar, knock
         | yourself out. But got a master database? Nah, you're just
         | generating unreasonably long values that take up unnecessary
         | space and make for messy URLs and such.
        
           | mappu wrote:
           | I've done something similar to obfuscate private DB IDs in a
           | large existing application - just ensure they're all
           | Skip32-encoded in all query parameters with an app-wide
           | secret.
           | 
           | It works well but you have to be very disciplined to catch
           | every case individually. Using GUID PKs from the start just
           | removes this entire category of problem.
        
         | cameronh90 wrote:
         | UUIDs have a few distinct advantages: you'll never run out, you
         | don't need a roundtrip to find out what they are after saving
         | them, they often make a good partitioning key and it makes
         | things easier if you ever need to combine multiple data sources
         | together in migration and recovery type scenarios. I also quite
         | like how they're unique across all data sources and tables, so
         | if you just encounter a random contextless UUID in the wild,
         | for example in a support ticket, you can probably still find
         | what it refers to.
         | 
         | They are quite unwieldy though. There are a few compact
         | representations you can use in URLs which make it a bit less
         | ugly, but they can make your database and logs quite bloated,
         | in particular if you've got a large number of small records.
        
           | CharlesW wrote:
           | > _There are a few compact representations you can use in
           | URLs which make it a bit less ugly..._
           | 
           | Any thoughts on where to find best-practices guidance? I need
           | to create an external ID scheme for several million items.
           | hashids (hashids.org) seems interesting, but I have anxiety
           | about choosing a solution with weaknesses that I can't
           | identify given my current level of experience in regards to
           | this.
        
             | munawwar wrote:
             | I took a shot at the math behind this at
             | https://www.codepasta.com/databases/2020/09/10/shorter-
             | uniqu...
             | 
             | Using the equation listed in the article I couldn't
             | generate a collision so far. Yet, I still check (in code)
             | for id collision, and pick new id, just to be 100% sure.
        
         | deepsun wrote:
         | The best strategy is to have integer primary keys for internal
         | purposes, and some form of uuid for external (think
         | permalinks).
        
         | zbuf wrote:
         | "Unique IDs" _can_ be super really easy to work with if they're
         | not so baffling complicated.
         | 
         | A random string generated using quality randomness can be
         | adjusted to length to suit the quantity of data (negligible
         | probability of a collision) which in most cases is very short.
         | 
         | It's easy to increase the length as you get more data.
         | 
         | They are visually very different for each item of data.
         | 
         | They're evenly spread which means they hash/index well.
         | 
         | You can tune a subset of characters if you want to decrease
         | ambiguity eg. when exchanged by voice (no zero vs. letter O,
         | upper/lower case etc.)
         | 
         | And a final bonus, when working with user input only a a short
         | prefix is needed to uniquely identify an item (in contrast, it
         | seems like UUIDs deliberately share a common prefix)
         | 
         | I'm very happy to concede I must be missing something here, and
         | would be interested to know. But the above approach has served
         | me well in a range of uses.
         | 
         | I can see how UUIDs work, and perhaps "looks like a UUID" is a
         | useful feature. But reading the URL above and a bit of
         | Wikipedia doesn't give me much to go on as to _why_ any of this
         | is happening, and why the hyphens aim to retain meaning to what
         | is ostensibly a 'unique' number.
        
           | jxcole wrote:
           | We used almost this exact scheme for app id indices and the
           | curious problem we had to design against was inadvertent
           | profanity. At some point we decided to just never use vowels
           | to avoid ever having a complaint about 12f*ck if in the URL
        
             | jasonwatkinspdx wrote:
             | Another approach is to use something like EFF's dice words
             | lists. One of the smaller lists in particular is
             | interesting as it's 6^4 words, filtered for profanity, and
             | where all words have both a unique 3 letter prefix and an
             | edit distance of 3. That makes them robust for the use case
             | of someone reading out the phrase to someone typing or
             | such.
             | 
             | Never using vowells is a smart idea I wish I'd used in the
             | past. Previously when I've needed something like this I've
             | used other dictionary lists vs EFF's, and those were not
             | curated sufficiently to avoid some really unfortunate
             | combinations.
        
             | manigandham wrote:
             | Use integer IDs and a library like Hashids for friendly
             | alphanumeric representations: https://hashids.org/
             | 
             | This particular implementation is available in dozens of
             | languages.
        
           | jasonwatkinspdx wrote:
           | Using purely random ids in your database destroys locality.
           | They mention this in the introduction:
           | 
           | > Non-time-ordered UUID versions such as UUIDv4 have poor
           | database index locality. Meaning new values created in
           | succession are not close to each other in the index and thus
           | require inserts to be performed at random locations. The
           | negative performance effects of which on common structures
           | used for this (B-tree and its variants) can be dramatic.
           | 
           | The V7 ids work similarly to what you like, as they're just a
           | unix timestamp and 74 bits of pseudorandom data (they present
           | several different schemes you could use to generate this
           | randomness, but the basic birthday bound says we'd need to be
           | above 100 billion id's generated in a single millisecond to
           | worry about collisons. Obviously most systems are nowhere
           | near that territory.
           | 
           | So using these id's gives you the practical advantages of
           | random uinique ids, but with the performance of autoincrement
           | ids.
        
             | zbuf wrote:
             | Thanks for drawing my attention to that, it's the useful
             | answer I was looking for; my use cases haven't been bound
             | by write performance in this manner. However, I'd still be
             | considering carefully before making use of these UUID
             | schemes.
        
               | jasonwatkinspdx wrote:
               | If you want something like this, and need a "I just want
               | it to work, require no central coordination, and to have
               | vanishingly small probability of collision" then just use
               | the same concepts but wider than the 128 bit footprint
               | limit of this scheme. This limit makes sense for the RFC
               | in the post, as there backwards compatibility is an
               | explicit goal. But if you used say a 64 bit nanosecond
               | counter (to preserve best case precision on a single
               | machine) along with 128+ bits of random data and you'll
               | need to worry more about gamma ray bursts than
               | collisions.
        
             | jandrewrogers wrote:
             | 100 billion UUIDs per millisecond is the 50% collision
             | probability threshold. Achieving an acceptable collision
             | probability for most applications would limit the UUID
             | generation rate to more like thousands of UUIDs per
             | millisecond.
             | 
             | Even if one was not generating millions of UUIDs per second
             | on average, the risk of spiky temporal distributions when
             | generating UUIDs would still need to be considered.
        
               | jasonwatkinspdx wrote:
               | I'm aware of what the bound I quoted is, and am just too
               | lazy to type the refinement into Wolfram for a discussion
               | like this. But just knowing the value off the top of my
               | head for 64 bits is 200k for 1e-9 probability of
               | collision, I'm pretty happy with 72 bits.
               | 
               | And although this standard obviously wants to stick
               | within the existing UUID footprint, if you were say doing
               | some IoT software that would run on billions of nodes
               | simultaneously, just add another 32/64/whatever bits of
               | random data and deal with the minor annoyance of longer
               | ids and lack of UUID RFC compatibility. But even then,
               | you can truncate and reformat these these to v4 UUIDs
               | trivially without meaningfully impacting the collision
               | resistance, for unsorted external ids in systems that
               | need the compatability.
               | 
               | The actual big risk with this sort of scheme is vm
               | initialization. You need to be sure the CSPRNG you're
               | using is initialized, which can be slightly tricky in
               | cloud environments with configuration/control layer stuff
               | that's racy. Mess this up and two nodes hydrated from the
               | same snapshot may overlap in sequence as they start
               | generating ids, and of course the time component cannot
               | be trusted to save you in this instance.
        
           | manigandham wrote:
           | > _" They're evenly spread which means they hash/index
           | well."_
           | 
           | What do you mean by this? Why would you hash it further? Hash
           | distribution is primarily down to the hashing algorithm, not
           | the input data.
           | 
           | Also indexes are better with somewhat ordered and smaller
           | data. A 64-bit int sequential counter is much faster and half
           | the size, and compatible everywhere without the annoyances of
           | a UUID.
        
       | jwilk wrote:
       | Old-school formatting of the same document:
       | 
       | https://datatracker.ietf.org/doc/html/draft-peabody-dispatch...
        
         | [deleted]
        
       | Zamicol wrote:
       | Why isn't there an option with a strong cryptographic hash like
       | SHA-256?
        
         | staticassertion wrote:
         | It would be slow and doesn't really serve a purpose since you
         | either have uuids that are totally random or uuids that need to
         | preserve their structure.
        
           | jwilk wrote:
           | Or UUIDs derived from other identifiers:
           | 
           | https://datatracker.ietf.org/doc/html/rfc4122#section-4.3
        
       | kstenerud wrote:
       | It's a shame they didn't finally drop big endian. Little endian
       | won out and our newer protocols should reflect that for
       | efficiency's sake.
        
         | wyager wrote:
         | I guarantee you that byte order swapping is not the limiting
         | factor on the efficiency of any system involving UUIDs on the
         | wire.
        
         | amluto wrote:
         | No, they should have tried to find a way to drop _little_
         | endian. A cycle or two when generating UUIDs is irrelevant, and
         | almost all UUID implementations, and most of the RFC 4122 text,
         | are big-endian.
        
         | zuzun wrote:
         | It's necessary to keep them sortable at byte level.
        
         | finnh wrote:
         | But network byte order is big endian, so I can see arguments
         | both ways (as it were)
        
         | chrismorgan wrote:
         | Big-endianness is mandatory for opaque bytewise sorting (or
         | lexicographic ordering in string form), which is a very
         | desirable property of UUIDv6 and UUIDv7.
        
       | ripe wrote:
       | Side note: I love the HTML format of these IETF RFCs, as in TFA.
       | Over the decades, I was used to seeing the old text format (which
       | I like), but this one is particularly easy on the eye, especially
       | on my Android phone.
        
         | [deleted]
        
         | hoten wrote:
         | Only mobile issue I see is there is a really long link that
         | overflows the intended document width, which introduces an
         | horizontal scrollbar. That makes scrolling a bit finicky on
         | mobile.
        
         | sureglymop wrote:
         | I love the design and would like to copy it for my blog!
         | Especially the code blocks with the nice 'figure' description
         | below them.
        
       | bearjaws wrote:
       | UUIDv6 coming in just about 10 years late. Guess its something.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-06-12 23:00 UTC)