[HN Gopher] Postgres UUIDv7 and per-back end monotonicity
       ___________________________________________________________________
        
       Postgres UUIDv7 and per-back end monotonicity
        
       Author : craigkerstiens
       Score  : 145 points
       Date   : 2025-01-02 16:32 UTC (6 hours ago)
        
 (HTM) web link (brandur.org)
 (TXT) w3m dump (brandur.org)
        
       | urronglol wrote:
       | What is a v7 UUID. Why do we need more than 1. uuid from a random
       | seed and 2. one derived from that and a timestamp (orderable)
        
         | n2d4 wrote:
         | UUID v7 is the latter, whereas v4 is the former.
         | 
         | All the other versions are somewhat legacy, and you shouldn't
         | use them in new systems (besides v8, which is "custom format
         | UUID", if you need that.)
        
           | elehack wrote:
           | UUID v5 is quite useful if you want to deterministically
           | convert external identifiers into UUIDS -- define a namespace
           | UUID for each potential identifier source (to keep them
           | separate), then use that to derive a V5 UUID from the
           | external identifier. It's very useful for idempotent data
           | imports.
        
             | jandrewrogers wrote:
             | Both UUIDv3 and UUIDv5 are prohibited for some use cases in
             | some countries (including the US), which is something to be
             | aware of. Unfortunately, no one has created an updated
             | standard UUID that uses a hash function that is not broken.
             | While useful it is not always an option.
        
         | cube2222 wrote:
         | UUID v7 is what you numbered #2.
         | 
         | For the others, it's best to read up on Wikipedia[0]. I believe
         | they all have their unique use-cases and tradeoffs.
         | 
         | E.g. including information about which node of the system
         | generated an ID.
         | 
         | [0]:
         | https://en.m.wikipedia.org/wiki/Universally_unique_identifie...
        
         | mind-blight wrote:
         | A deterministic uuid based off of a hash of bits is also very
         | useful (UUID5). I've used that for deduping records from
         | multiple sources
        
         | chimpontherun wrote:
         | As it is usual in many areas of human endeavor, newcomers to
         | the field tend to criticize design decisions that were made
         | before them, only to re-invent what was already invented.
         | 
         | Sometimes it leads to improvements in the field, via rejection
         | of the accumulated legacy crud, or just simply affording a new
         | perspective. Most other times it's a well-intentioned, but low-
         | effort noise.
         | 
         | I, personally, do it myself. This is how I learn.
        
         | urronglol wrote:
         | And yet again. A perfectly reasonable question downvoted for no
         | reason. God I hate this site.
        
           | purerandomness wrote:
           | It comes off as a low-effort question that seems to try to
           | evoke a reply from a peer, while it's the kind of question
           | that is best answered by an LLM, Google, or Wikipedia.
        
             | a3w wrote:
             | Well, a LLM could answer it or write total bullshit. But
             | yes, Wikipedia or other research quick internet research
             | will help generally. And excactly here, it will tell you
             | that there are competing standards since we have competing
             | use cases.
        
               | kraftman wrote:
               | For simple questions like this with unambiguous answers,
               | it is statistically very unlikely that you'll get a
               | bullshit answer from an LLM.
        
           | treve wrote:
           | I didn't downvote you, but the terseness made it for me
           | immediately come off as a kind of criticism, e.g.: "Why would
           | we ever need it". May not have been your intent but if it was
           | a genuine question, form matters.
        
             | urronglol wrote:
             | Be kind. Don't be snarky. Converse curiously; don't cross-
             | examine. Edit out swipes.
             | 
             | Comments should get more thoughtful and substantive, not
             | less, as a topic gets more divisive.
             | 
             | When disagreeing, please reply to the argument instead of
             | calling names. "That is idiotic; 1 + 1 is 2, not 3" can be
             | shortened to "1 + 1 is 2, not 3."
             | 
             | Please don't fulminate. Please don't sneer, including at
             | the rest of the community.
             | 
             | Please respond to the strongest plausible interpretation of
             | what someone says, not a weaker one that's easier to
             | criticize. Assume good faith.
        
               | mtmail wrote:
               | From the same guidelines "Please don't comment about the
               | voting on comments. It never does any good, and it makes
               | boring reading." treve gave insight into their thought
               | process when they read your initial comment, and I had
               | the same reaction. Neither treve nor I downvoted it.
        
       | kingkilr wrote:
       | I would strongly implore people not to follow the example this
       | post suggests, and write code that relies on this monotonicity.
       | 
       | The reason for this is simple: the documentation doesn't promise
       | this property. Moreover, even if it did, the RFC for UUIDv7
       | doesn't promise this property. If you decide to depend on it,
       | you're setting yourself up for a bad time when PostgreSQL decides
       | to change their implementation strategy, or you move to a
       | different database.
       | 
       | Further, the stated motivations for this, to slightly simplify
       | testing code, are massively under-motivating. Saving a single
       | line of code can hardly be said to be worth it, but even if it
       | were, this is a problem far better solved by simply writing a
       | function that will both generate the objects and sort them.
       | 
       | As a profession, I strongly feel we need to do a better job
       | orienting ourselves to the reality that our code has a tendency
       | to live for a long time, and we need to optimize not for "how
       | quickly can I type it", but "what will this code cost over its
       | lifetime".
        
         | 3eb7988a1663 wrote:
         | I too am missing the win on this. It is breaking the spec, and
         | does not seem like it offers a significant advantage. In the
         | eventual event where you have a collection of UUID7 you are
         | only ever going to be able to rely on the millisecond precision
         | anyway.
        
           | sbuttgereit wrote:
           | You say it's breaking the spec, but is it?
           | 
           | From https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-
           | versio...:
           | 
           | "UUIDv7 values are created by allocating a Unix timestamp in
           | milliseconds in the most significant 48 bits and filling the
           | remaining 74 bits, excluding the required version and variant
           | bits, with random bits for each new UUIDv7 generated to
           | provide uniqueness as per Section 6.9. Alternatively,
           | implementations MAY fill the 74 bits, jointly, with a
           | combination of the following subfields, in this order from
           | the most significant bits to the least, to guarantee
           | additional monotonicity within a millisecond:
           | 1.  An OPTIONAL sub-millisecond timestamp fraction (12 bits
           | at            maximum) as per Section 6.2 (Method 3).
           | 2.  An OPTIONAL carefully seeded counter as per Section 6.2
           | (Method 1            or 2).             3.  Random data for
           | each new UUIDv7 generated for any remaining
           | space."
           | 
           | Which the referenced "method 3" is:
           | 
           | "Replace Leftmost Random Bits with Increased Clock Precision
           | (Method 3):
           | 
           | For UUIDv7, which has millisecond timestamp precision, it is
           | possible to use additional clock precision available on the
           | system to substitute for up to 12 random bits immediately
           | following the timestamp. This can provide values that are
           | time ordered with sub-millisecond precision, using however
           | many bits are appropriate in the implementation environment.
           | With this method, the additional time precision bits MUST
           | follow the timestamp as the next available bit in the rand_a
           | field for UUIDv7."
        
           | throw0101c wrote:
           | > _It is breaking the spec_ [...]
           | 
           | As per a sibling comment, it is not breaking the spec. The
           | comment in the Pg code even cites the spec that says what to
           | do (and is quoted in the post):                    * Generate
           | UUID version 7 per RFC 9562, with the given timestamp.
           | *          * UUID version 7 consists of a Unix timestamp in
           | milliseconds (48          * bits) and 74 random bits,
           | excluding the required version and          * variant bits.
           | To ensure monotonicity in scenarios of high-          *
           | frequency UUID generation, we employ the method "Replace
           | * LeftmostRandom Bits with Increased Clock Precision (Method
           | 3)",          * described in the RFC. [...]
        
         | peterldowns wrote:
         | The test should do a set comparison, not an ordered list
         | comparison, if it wants to check that the same 5 accounts were
         | returned by the API. I think it's as simple as that.
         | 
         | The blogpost is interesting and I appreciated learning the
         | details of how the UUIDv7 implementation works.
        
           | vips7L wrote:
           | Don't you think that depends on what you're guaranteeing in
           | your api? If you're guaranteeing that your api returns the
           | accounts ordered you need to test for that. But I do agree in
           | general that using a set is the correct move.
        
             | Too wrote:
             | The test is a very strange example indeed. Is it testing
             | the backend, the database or both? If the api was
             | guaranteeing ordered values, pre-uuid7 the backend must
             | have sorted them by other means before returning, making
             | the test identical. If the backend is not guaranteeing
             | order, that shouldn't be tested either.
        
         | paulddraper wrote:
         | > Moreover, _even if it did,_ the RFC for UUIDv7 doesn 't
         | promise this property.
         | 
         | Huh?
         | 
         | If the docs were to guarantee it, they guarantee it. Why are
         | you looking for everything to be part of RFC UUIDv7?
         | 
         | Failure of logic.
        
           | fwip wrote:
           | Their next sentence explains. Other databases might not make
           | that guarantee, including future versions of Postgres.
        
         | throw0101c wrote:
         | > [...] _code that relies on this monotonicity._ _The reason
         | for this is simple: the documentation doesn 't promise this
         | property. Moreover, even if it did, the RFC for UUIDv7 doesn't
         | promise this property._
         | 
         | The "RFC for UUIDv7", RFC 9562, explicitly mentions
         | monotonicity in SS6.2 ("Monotonicity and Counters"):
         | Monotonicity (each subsequent value being greater than the
         | last) is          the backbone of time-based sortable UUIDs.
         | Normally, time-based UUIDs          from this document will be
         | monotonic due to an embedded timestamp;          however,
         | implementations can guarantee additional monotonicity via
         | the concepts covered in this section.
         | 
         | * https://datatracker.ietf.org/doc/html/rfc9562#name-
         | monotonic...
         | 
         | In the UUIDv7 definition (SS5.7) it explicitly mentions the
         | technique that Postgres employs for _rand_a_ :
         | rand_a:             12 bits of pseudorandom data to provide
         | uniqueness as per             Section 6.9 and/or optional
         | constructs to guarantee additional              monotonicity as
         | per Section 6.2. Occupies bits 52 through 63
         | (octets 6-7).
         | 
         | * https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-
         | vers...
         | 
         | Note: "optional constructs to guarantee additional
         | monotonicity". Pg makes use of that option.
        
           | stonemetal12 wrote:
           | >explicitly mentions monotonicity
           | 
           | >optional constructs
           | 
           | So it is explicitly mentioned in the RFC as optional, and Pg
           | doesn't state that they guaranty that option. The point still
           | stands, depending on optional behavior is a recipe for
           | failure when the option is no longer taken.
        
             | idconvict wrote:
             | The "optional" portion is this part of the spec, not the
             | time part.
             | 
             | > implementations can guarantee additional monotonicity via
             | the concepts covered in this section
        
             | arghwhat wrote:
             | Relying on an explicitly documented implementation behavior
             | that the specification explicitly describes as an option is
             | not an issue. Especially if the behavior is only relied on
             | in a test, where the worst outcome is a failed testcase
             | that is easily fixed.
             | 
             | Even if the behavior went away, UUIDs unlike serials can
             | always be safely generated directly by the application just
             | as well as they can be generated by the database.
             | 
             | Going straight for that would arguably be the "better"
             | path, and allows mocking PRNG to get sequential IDs.
        
             | mlyle wrote:
             | It's mentioned in the RFC as being explicitly monotonic
             | based the time-based design.
             | 
             | Implementations that need monotonicity beyond the
             | resolution of a timestamp-- like when you allocate 30 UUIDs
             | at one instant in a batch-- can optionally use those
             | additional bits for that purpose.
             | 
             | > Implementations _SHOULD_ employ the following methods for
             | single-node UUID implementations that require batch UUID
             | creation or are otherwise concerned about monotonicity with
             | high-frequency UUID generation.
             | 
             | (And it goes on to recommend the obvious things you'd do:
             | use a counter in those bits when assigning a batch; use
             | more bits of time precision; etc.)
             | 
             | The comment in PostgreSQL before the implementation makes
             | it clear that they chose the third option for this in the
             | RFC:                    * variant bits. To ensure
             | monotonicity in scenarios of high-          * frequency
             | UUID generation, we employ the method "Replace          *
             | LeftmostRandom Bits with Increased Clock Precision (Method
             | 3)",          * described in the RFC. ...
        
               | Dylan16807 wrote:
               | > It's mentioned in the RFC as being explicitly monotonic
               | based the time-based design.
               | 
               | It's explicitly partially monotonic.
               | 
               | Or as other people would call it, "not monotonic".
               | 
               | People are talking past each other based on their use of
               | the word "monotonic".
        
             | sbuttgereit wrote:
             | Software is arbitrary. Any so-called "guarantee" is only as
             | good as the developers and organizations maintaining a
             | piece of software want to make it regardless of prior
             | statements. At some point, the practical likelihood of a
             | documented, but not guaranteed, process being violated vs.
             | the willful abandonment of a guarantee start to look very
             | similar.... at which point nothing saves you.
             | 
             | Sometimes the best you can do is recognize who you're
             | working with today, know how they work, and be prepared for
             | those people to be different in the future (or of a
             | different mind) and for things to change regardless to
             | expressed guarantees.
             | 
             | ....unless we're talking about the laws of physics...
             | ...that's different...
        
             | throw0101c wrote:
             | > _So it is explicitly mentioned in the RFC as optional_
             | [...]
             | 
             | The use of rand_a for _extra_ monotonicity is optional. The
             | monotonicity _itself_ is not optional.
             | 
             | SS5.7 states:                   Alternatively,
             | implementations MAY fill the 74 bits,          jointly,
             | with a combination of the following subfields,          in
             | this order from the most significant bits to the least,
             | to guarantee additional monotonicity within a millisecond:
             | 
             | Guaranteeing _additional_ monotonicity means that there is
             | _already_ a  'base' level of monotonicity, and there are
             | provisions for _even more_ ( "additional") levels of it.
             | This 'base level' is why SS6.2 states:
             | Monotonicity (each subsequent value being greater than the
             | last) is          the backbone of time-based sortable
             | UUIDs. Normally, time-based UUIDs          from this
             | document will be monotonic due to an embedded timestamp;
             | however, implementations can guarantee additional
             | monotonicity via          the concepts covered in this
             | section.
             | 
             | "Backbone of time-based sortable UUIDs"; "additional
             | monotonicity". Additional: adding to what's already there.
             | 
             | * https://datatracker.ietf.org/doc/html/rfc9562
        
               | reshlo wrote:
               | > Normally, time-based UUIDs from this document will be
               | monotonic due to an embedded timestamp; however,
               | implementations can guarantee additional monotonicity via
               | the concepts covered in this section.
               | 
               | "Normally, I am at home because I do not have a reason to
               | go out; however, sometimes I am at home because I am
               | sleeping."
               | 
               | Notice how this statement does not actually mean that I
               | am always at home.
        
               | Dylan16807 wrote:
               | "this monotonicity" that OP suggests people not use is
               | specifically the _additional_ monotonicity.
               | 
               | Or to put it another way: OP is suggesting you don't
               | depend on it being properly monotonic, because the
               | default is that it is only _partially_ monotonic.
        
             | btown wrote:
             | I was recently bit doing a Postgres upgrade by the Postgres
             | team considering statements like `select 1 group by true`
             | fine to silently break in Postgres 15. See
             | https://postgrespro.com/list/thread-id/2661353 - and this
             | behavior remains undocumented in
             | https://www.postgresql.org/docs/release/ . It's an
             | absolutely incredible project, and I don't disagree with
             | the decision to classify it as wontfix - but it's an
             | anecdote to not rely on undefined behavior!
        
         | braiamp wrote:
         | I don't think most people will heed this warning. I warned
         | people in a programming forum that Python ordering of objects
         | by insertion time was a implementation detail, because it's not
         | guaranteed by any PEP [0]. I could literally write a PEP
         | compliant Python interpreter and could blow up in someone's
         | code because they rely on the CPython interpreter behavior.
         | 
         | [0]: https://mail.python.org/pipermail/python-
         | dev/2017-December/1...
        
           | kstrauser wrote:
           | That definitely _was_ true, and I use to jitter my code a
           | little to deliberately find and break tests that depended on
           | any particular ordering.
           | 
           | It's now explicitly documented to be true, and you can
           | officially rely on it. From
           | https://docs.python.org/3/library/stdtypes.html#dict:
           | 
           | > Changed in version 3.7: Dictionary order is guaranteed to
           | be insertion order.
           | 
           | That link documents the Python language's semantics, not the
           | behavior of any particular interpreter.
        
           | dragonwriter wrote:
           | > I warned people in a programming forum that Python ordering
           | of objects by insertion time was a implementation detail,
           | because it's not guaranteed by any PEP
           | 
           | PEPs do not provide a spec for Python, they neither cover the
           | initial base language before the PEP process started, nor
           | were all subsequent language changes made through PEPs. The
           | closest thing Python has to a cross-implementation standard
           | is the Python Language Reference for a particular version,
           | treating as excluded anything explicitly noted as a CPython
           | implementation detail. Dictionaries being insertion-ordered
           | went from a CPython implementation detail in 3.6 to
           | guaranteed language feature in 3.7+.
        
         | sedatk wrote:
         | As a counter-argument, it will inevitably turn into a spec if
         | it becomes widely-used enough.
         | 
         | What was that saying, like: "every behavior of software
         | eventually becomes API"
        
           | tomstuart wrote:
           | https://www.hyrumslaw.com/
        
             | sedatk wrote:
             | Yes, that one! Thanks :)
        
           | the8472 wrote:
           | Consider the incentives you're setting up there. An API
           | contract goes both ways, the vendor promises some things and
           | not others to preserve flexibility, and the user has to abide
           | by it to not get broken in the future. If you unilaterally
           | ignore the contract, even plan to do so in advance, then
           | eventually kindness and capacity to accommodate such abuse
           | will run might run out and they may switch to an adversarial
           | stance. See QUIC for example which is a big middle finger to
           | middle boxes.
        
             | sedatk wrote:
             | Sure, there is a risk. But, it all depends on how great and
             | desirable the benefits are.
        
         | deadbabe wrote:
         | Most code does not live for a long time. Similar to how
         | consumer products are built for planned obsolescence, code is
         | also built with a specific lifespan in mind.
         | 
         | If you spend time making code bulletproof so it can run for
         | like 100 years, you will have wasted a lot of effort for
         | nothing when someone comes along and wipes it clean and
         | replaces it with new code in 2 years. Requirements change, code
         | changes, it's the nature of business.
         | 
         | Remember any fool can build a bridge that stands, it takes an
         | engineer to make a bridge that barely stands.
        
           | Pxtl wrote:
           | Uh, more people work on 20-year-old codebases than you'd
           | think.
        
             | 9dev wrote:
             | And yet these people are dwarved by the number of
             | developers crunching out generic line of business CRUD apps
             | every day.
        
           | agilob wrote:
           | >Most code does not live for a long time.
           | 
           | Sure, and here I am in a third company doing cloud migration
           | and changing our default DB from MySQL to SQL server. The
           | pain is real, 2 year long roadmap is now 5 years longer
           | roadmap. All because some dude negotiated a discount on cloud
           | services. And we still develop integrations that talk to
           | systems written for DOS.
        
       | fngjdflmdflg wrote:
       | >The Postgres patch solves the problem by repurposing 12 bits of
       | the UUID's random component to increase the precision of the
       | timestamp down to nanosecond granularity [...]
       | 
       | >It makes a repeated UUID between processes more likely, but
       | there's still 62 bits of randomness left to make use of, so
       | collisions remain vastly unlikely.
       | 
       | Does it? Even though the number of random bits has decreased, the
       | time interval to create such a duplicate has also decreased,
       | namely to an interval of one nanosecond.
        
         | paulddraper wrote:
         | Depends if you think sub-millisecond locality is significant.
        
         | londons_explore wrote:
         | I could imagine that certain nanoseconds might be vastly more
         | likely than other nanoseconds.
         | 
         | For example, imagine you have a router that sends network
         | packets out at the start of each microsecond, synced to wall
         | time.
         | 
         | Or the OS scheduler always wakes processes up on a millisecond
         | timer tick or some polling loop.
         | 
         | Now, when those packets are received by a postgres server and
         | processed, the time to do that is probably fairly consistent -
         | meaning that X nanoseconds past the microsecond you probably
         | get most records being created.
        
           | UltraSane wrote:
           | But only one nanosecond slower or faster and you get another
           | set of 4.611 billion billion random IDs. I think random
           | variations in buffer depths and CPU speeds will easily
           | introduce hundreds of nanoseconds of timing variations.
           | syncing any two things to less than 1 nanosecond is
           | incredibly hard and doesn't happen by accident.
        
             | zamadatix wrote:
             | The important part is the events in time aren't going to be
             | as random as the actual random source. The chances of an
             | actual collision remain low but the distribution of events
             | over time is a weaker (in relative terms) source of random
             | bits compared to proper "random" sources which won't have
             | obvious bias at all.
        
               | UltraSane wrote:
               | I am sure there is bias but 1 nanosecond is an incredibly
               | narrow window. It really would be an interesting
               | experiment to evaluate the optimal balance of bits for
               | timestamp and for random value. What about hostname and
               | even process ID? Snowflake IDs are 63 bits long with 41
               | bits as a millisecond timestamp, 10 bits as a machine ID,
               | and 12 bits as a sequential counter.
        
               | WorldMaker wrote:
               | Similarly for direct comparison, ULID has 48-bit
               | timestamps, also at the millisecond, and 80 random bits.
               | 
               | Also to compare, the ULID spec technique for monotonicity
               | is to take a single random value and then start
               | incrementing the lowest bits, trading random entropy for
               | direct "nearness", one after another. Versus the rand_a
               | approach is effectively using the most significant bits,
               | but keeping more random entropy.
        
               | zamadatix wrote:
               | I suppose that would depend entirely on how you measure
               | what optimal is. Optimal randomness is 128 bits from the
               | best random source and 0 bits from anything else, like
               | time. Optimal "just random enough for my use case but no
               | more so I can fit other information in the value" depends
               | entirely on the requirement of your use case (more
               | specifically, not just "for databases" but "for my
               | database to... on the hardware... in which the access
               | is... on the presumed growth..." and so on). For picking
               | a "good enough" value 12 bits is probably as reasonable
               | as one will find generic reason for.
        
             | mlyle wrote:
             | We're not talking about nanoseconds of real time; we're
             | talking about nanoseconds as measured by the CPU doing the
             | processing. Nanoseconds are not likely to be a uniform
             | variate.
        
               | UltraSane wrote:
               | Yes and they are also not likely to be so non-uniform
               | that more than 6.411 billion billion events all happen in
               | one nanosecond.
        
               | mlyle wrote:
               | Note it's not that number, but roughly the square root of
               | that number, that matters.
               | 
               | And they might be quite non-uniform. If the scheduler
               | tick and the nanosecond clock are synchronous, you could
               | end up with a few thousand popular values instead of a
               | billion.
               | 
               | It's not a real concern today, and probably won't be a
               | real concern in 10 years, but it's not so far removed
               | from possibility that no one has to think about it.
        
               | UltraSane wrote:
               | Good point about the square root of the random part. I
               | guess that is why the 63 bit Snowflake ID uses a
               | sequential counter.
        
         | michaelt wrote:
         | Imagine if you were generating 16 UUIDs per nanosecond, every
         | nanosecond.
         | 
         | According to [1] due to the birthday paradox, the probability
         | of a collision in any given nanosecond would be 3E-17 which of
         | course sounds pretty low
         | 
         | But there are 3.154e+16 nanoseconds in a year - and if you get
         | out your high-precision calculator, it'll tell you there's a
         | 61.41% chance of a collision in a year.
         | 
         | Of course you might very well say "Who needs 16 UUIDs per
         | nanosecond anyway?"
         | 
         | [1] https://www.bdayprob.com/
        
           | Horffupolde wrote:
           | So what if there's a collision? If the column is UNIQUE at
           | most it'll ROLLBACK on INSERT. 16 INSERTS per nanosecond is
           | 16 billion TPS. At that scale you'll have other problems.
        
       | dotdi wrote:
       | My org has been using ULID[0] extensively for a few years, and
       | generally we've been quite happy with it. After initially dabbing
       | with a few implementations, I reimplemented the spec in Kotlin,
       | and this has been working out quite well for us. We will open-
       | source our implementation in the following weeks.
       | 
       | ULID does specifically require generated IDs to be monotonically
       | increasing as opposed to what the RFC for UUIDv7 states, which is
       | a big deal IMHO.
       | 
       | [0]: https://github.com/ulid/spec
        
         | willvarfar wrote:
         | Having used a lot of the ULID variants that the UUIDv7 spec
         | cites as prior art, including the ULID spec you link to, I've
         | gotta say that UUIDv7 has some real advantages.
         | 
         | The biggest advantage is that it is hex. Haven't yet met a
         | database system that doesn't have functions for substr and
         | from_hex etc, meaning you can extract the time part using
         | vanilla sql.
         | 
         | ULID and others that use custom variants of base32 or base62 or
         | whatever are just about impossible to wrangle with normal
         | tooling.
         | 
         | Your future selfs will thank you for being able to manipulate
         | it in whatever database you use in the future to analyse old
         | logs or import whatever data you generate today.
        
           | mixmastamyk wrote:
           | Aren't they stored as 16 bytes in binary? How to format it
           | later as text is then your choice.
        
             | WorldMaker wrote:
             | It's that eternal push/pull "war" between "we need a sproc
             | that can report this directly from the SQL server" and
             | "please don't do things directly on the SQL server because
             | you'll route around important application code" and "it's a
             | feature not a bug that you can't just look things up by ID
             | in the DB without a little bit of extra work".
             | 
             | I did work on a project using ULIDs in SQL Server. They
             | were stored in uniqueidentifier fields with a complex byte
             | swap from ULID to fake-UUID to get better storage/indexing
             | out of SQL Server [1]. There was an attempt to use SQL
             | Functions to display/search the ULID form directly in the
             | database, but it was never as bug free as the C# byte order
             | code and so it was definitely not recommended doing it
             | directly in the DB and that if a "report" was missing it
             | should be a part of the application (which was already
             | almost nothing but a bloated "reporting" tool) or in a
             | related "configuration" application. It did feel more like
             | a feature than a bug because it did keep some meddling and
             | drama out of the DB. I also see the arguments for why in
             | some different types of applications it makes debugging a
             | lot harder and those arguments make sense and it is
             | definitely a trade-off to consider.
             | 
             | [1] The rabbit hole into SQL Server's ancient weird UUID
             | sort order: https://devblogs.microsoft.com/oldnewthing/2019
             | 0426-00/?p=10...
        
           | sedatk wrote:
           | Additionally, v7 UUIDs can be generated simultaneously on the
           | client-side by multiple threads without waiting for an oracle
           | to release the next available ID. That's quite important for
           | parallel processing. Otherwise, you might as well use an
           | autoincrement BIGINT.
        
         | sedatk wrote:
         | ULID guarantees monotonicity only per process, and it requires
         | ID generation to be serialized. I find the promise quite
         | misleading because of that. You might as well use a wide-enough
         | integer with the current timestamp + random as baseline for the
         | same purpose, but I wouldn't recommend that either.
        
       | lordofgibbons wrote:
       | What benefit does this have over something like Twitter's
       | Snowflake, which can be used to generate distributed
       | monotonically increasing IDs without synchronization?
       | 
       | We've been using an implementation of it in Go for many years in
       | production without issues.
        
         | WorldMaker wrote:
         | UUIDv7 interoperates with all the other versions of UUID. The
         | v7 support in Postgres doesn't add a new column type, it makes
         | the existing column type more powerful/capable. Applications
         | that had been using UUIDv4 everywhere can get cheap Snowflake-
         | like benefits in existing code just from switching the
         | generator function. Most languages have a GUID or UUID
         | class/struct that is compatibly upgradable from v4 to v7, too.
        
       | pphysch wrote:
       | The naming of "rand_a" and "rand_b" in the spec is a bit
       | misleading. They don't have to be generated randomly. I'm sure
       | there's a historical reason for it.
       | 
       | "extra_" or "distinct_" would be a more accurate prefix for
       | UUIDv7.
       | 
       | UUIDv7 is actually quite a flexible standard due to these two
       | underspecified fields. I'm glad Postgres took advantage of that!
        
       | mmerickel wrote:
       | Remember even if timestamps may be generated using a
       | monotonically increasing value that does not mean they were
       | committed in the same order to the database. It is an entirely
       | separate problem if you are trying to actually determine what
       | rows are "new" versus "previously seen" for things like cursor-
       | based APIs and background job processing. This problem exists
       | even with things like a serial/autoincrement primary key.
        
         | shalabhc wrote:
         | +1
         | 
         | What could be useful here is if postgres provided a way to
         | determine the latest frozen uuid. This could be a few ms behind
         | the last committed uuid but should guarantee that no new rows
         | will land before the frozen uuid. Then we can use a single
         | cursor track previously seen.
        
       | willvarfar wrote:
       | Ordering for UUIDv7s in the same millisecond is super useful when
       | some rows represent actions and others reactions.
       | 
       | I have used this guarantee for events generated on clients. It
       | really simplifies a lot of reasoning.
        
       | Glyptodon wrote:
       | On one hand I too am looking forward to more widespread use of
       | UUIDv7, but on the other I don't really get the problem this is
       | solving for their spec. If you care about timestamp ordering I
       | don't think doing it in a way that forces you to fake a PK if you
       | insert an earlier dated record at a future point makes sense. But
       | I guess I'm implicitly assuming that human meaningful dates
       | differ from insertion times in many domains.
        
       | samatman wrote:
       | I maintain that people are too eager to use UUIDv7 to begin with.
       | It's a dessert topping and a floor wax.
       | 
       | Let's say you need an opaque unique handle, and a timestamp, and
       | a monotonically increasing row ID. Common enough. Do they have to
       | be the same thing? _Should_ they be the same thing? Because to me
       | that sounds like three things: an autoincrementing primary key, a
       | UUIDv4, and a nanosecond timestamp.
       | 
       | Is it always ok that the 'opaque' unique ID isn't opaque at all,
       | that it's carrying around a timestamp? Will that allow
       | correlating things which maybe you didn't want hostiles to
       | correlate? Are you 100% sure that you'll never want, or need, to
       | re-timestamp data without changing its global ID?
       | 
       | Maybe you do need these things unnormalized and conflated. Do you
       | though? At least ask the question.
        
         | user3939382 wrote:
         | Re-timestamp would be a new one for me. What's a conceivable
         | use case? An NTP fault?
        
       | scrollaway wrote:
       | UUID7 is excellent.
       | 
       | I want to share a django library I wrote a little while back
       | which allows for prefixed identity fields, in the same style as
       | Stripe's ID fields (obj_XXXXXXXXX):
       | 
       | https://github.com/jleclanche/django-prefixed-identity-field...
       | 
       | This gives a PrefixedIdentityField(prefix="obj_"), which is
       | backed by uuid7 and base58. In the database, the IDs are stored
       | as UUIDs, which makes them an efficient field -- they are
       | transformed into prefixed IDs when coming out of the database,
       | which makes them perfect for APIs.
       | 
       | (I know, no documentation .. if someone wants to use this, feel
       | free to file issues to ask questions, I'd love to help)
        
       | nikisweeting wrote:
       | I implemented this in pure Python a few days ago in case anyone
       | finds it helpful, here it is:
       | https://gist.github.com/pirate/7e44387c12f434a77072d50c52a3d...
       | 
       | My implementation supports graceful degradation between
       | nanosecond scale resolution, microsecond, and millisecond, by
       | using 12 bits for each and filling up the leftmost bits of rand_a
       | and rand_b. Not all environments provide high resolution system
       | clocks with no drift, so it's is important to maintain
       | monotonicity when generating IDs with a low-res timestamp as
       | input. You still want the bits that would've held the nanosecond
       | value to be monotonic.
       | 
       | Neither of the existing uuid_utils and uuid7 python libs that can
       | generate UUID7s support this monotonicity property.
       | 
       | Am planning on using this for ArchiveBox append-only "snapshot"
       | records, which are intrinsically linked to time, so it's a good
       | use-case imo.
       | 
       | There's another great resource here that I think is one of the
       | best explainers of UUIDv7: https://antonz.org/uuidv7/
       | 
       | Whatever you do, don't implement the cursed 36-bit whole-second
       | based time UUIDv7 variant that you occasionally see on
       | StackOverflow / blog posts, stick to 48!
        
       | Dylan16807 wrote:
       | > The Postgres patch solves the problem by repurposing 12 bits of
       | the UUID's random component to increase the precision of the
       | timestamp down to nanosecond granularity (filling rand_a above),
       | which in practice is too precise to contain two UUIDv7s generated
       | in the same process.
       | 
       | A millisecond divided by 4096 is not a nanosecond. It's about 250
       | nanoseconds.
        
       ___________________________________________________________________
       (page generated 2025-01-02 23:00 UTC)