[HN Gopher] Postgres UUIDv7 and per-back end monotonicity
___________________________________________________________________
Postgres UUIDv7 and per-back end monotonicity
Author : craigkerstiens
Score : 145 points
Date : 2025-01-02 16:32 UTC (6 hours ago)
(HTM) web link (brandur.org)
(TXT) w3m dump (brandur.org)
| urronglol wrote:
| What is a v7 UUID. Why do we need more than 1. uuid from a random
| seed and 2. one derived from that and a timestamp (orderable)
| n2d4 wrote:
| UUID v7 is the latter, whereas v4 is the former.
|
| All the other versions are somewhat legacy, and you shouldn't
| use them in new systems (besides v8, which is "custom format
| UUID", if you need that.)
| elehack wrote:
| UUID v5 is quite useful if you want to deterministically
| convert external identifiers into UUIDS -- define a namespace
| UUID for each potential identifier source (to keep them
| separate), then use that to derive a V5 UUID from the
| external identifier. It's very useful for idempotent data
| imports.
| jandrewrogers wrote:
| Both UUIDv3 and UUIDv5 are prohibited for some use cases in
| some countries (including the US), which is something to be
| aware of. Unfortunately, no one has created an updated
| standard UUID that uses a hash function that is not broken.
| While useful it is not always an option.
| cube2222 wrote:
| UUID v7 is what you numbered #2.
|
| For the others, it's best to read up on Wikipedia[0]. I believe
| they all have their unique use-cases and tradeoffs.
|
| E.g. including information about which node of the system
| generated an ID.
|
| [0]:
| https://en.m.wikipedia.org/wiki/Universally_unique_identifie...
| mind-blight wrote:
| A deterministic uuid based off of a hash of bits is also very
| useful (UUID5). I've used that for deduping records from
| multiple sources
| chimpontherun wrote:
| As it is usual in many areas of human endeavor, newcomers to
| the field tend to criticize design decisions that were made
| before them, only to re-invent what was already invented.
|
| Sometimes it leads to improvements in the field, via rejection
| of the accumulated legacy crud, or just simply affording a new
| perspective. Most other times it's a well-intentioned, but low-
| effort noise.
|
| I, personally, do it myself. This is how I learn.
| urronglol wrote:
| And yet again. A perfectly reasonable question downvoted for no
| reason. God I hate this site.
| purerandomness wrote:
| It comes off as a low-effort question that seems to try to
| evoke a reply from a peer, while it's the kind of question
| that is best answered by an LLM, Google, or Wikipedia.
| a3w wrote:
| Well, a LLM could answer it or write total bullshit. But
| yes, Wikipedia or other research quick internet research
| will help generally. And excactly here, it will tell you
| that there are competing standards since we have competing
| use cases.
| kraftman wrote:
| For simple questions like this with unambiguous answers,
| it is statistically very unlikely that you'll get a
| bullshit answer from an LLM.
| treve wrote:
| I didn't downvote you, but the terseness made it for me
| immediately come off as a kind of criticism, e.g.: "Why would
| we ever need it". May not have been your intent but if it was
| a genuine question, form matters.
| urronglol wrote:
| Be kind. Don't be snarky. Converse curiously; don't cross-
| examine. Edit out swipes.
|
| Comments should get more thoughtful and substantive, not
| less, as a topic gets more divisive.
|
| When disagreeing, please reply to the argument instead of
| calling names. "That is idiotic; 1 + 1 is 2, not 3" can be
| shortened to "1 + 1 is 2, not 3."
|
| Please don't fulminate. Please don't sneer, including at
| the rest of the community.
|
| Please respond to the strongest plausible interpretation of
| what someone says, not a weaker one that's easier to
| criticize. Assume good faith.
| mtmail wrote:
| From the same guidelines "Please don't comment about the
| voting on comments. It never does any good, and it makes
| boring reading." treve gave insight into their thought
| process when they read your initial comment, and I had
| the same reaction. Neither treve nor I downvoted it.
| kingkilr wrote:
| I would strongly implore people not to follow the example this
| post suggests, and write code that relies on this monotonicity.
|
| The reason for this is simple: the documentation doesn't promise
| this property. Moreover, even if it did, the RFC for UUIDv7
| doesn't promise this property. If you decide to depend on it,
| you're setting yourself up for a bad time when PostgreSQL decides
| to change their implementation strategy, or you move to a
| different database.
|
| Further, the stated motivations for this, to slightly simplify
| testing code, are massively under-motivating. Saving a single
| line of code can hardly be said to be worth it, but even if it
| were, this is a problem far better solved by simply writing a
| function that will both generate the objects and sort them.
|
| As a profession, I strongly feel we need to do a better job
| orienting ourselves to the reality that our code has a tendency
| to live for a long time, and we need to optimize not for "how
| quickly can I type it", but "what will this code cost over its
| lifetime".
| 3eb7988a1663 wrote:
| I too am missing the win on this. It is breaking the spec, and
| does not seem like it offers a significant advantage. In the
| eventual event where you have a collection of UUID7 you are
| only ever going to be able to rely on the millisecond precision
| anyway.
| sbuttgereit wrote:
| You say it's breaking the spec, but is it?
|
| From https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-
| versio...:
|
| "UUIDv7 values are created by allocating a Unix timestamp in
| milliseconds in the most significant 48 bits and filling the
| remaining 74 bits, excluding the required version and variant
| bits, with random bits for each new UUIDv7 generated to
| provide uniqueness as per Section 6.9. Alternatively,
| implementations MAY fill the 74 bits, jointly, with a
| combination of the following subfields, in this order from
| the most significant bits to the least, to guarantee
| additional monotonicity within a millisecond:
| 1. An OPTIONAL sub-millisecond timestamp fraction (12 bits
| at maximum) as per Section 6.2 (Method 3).
| 2. An OPTIONAL carefully seeded counter as per Section 6.2
| (Method 1 or 2). 3. Random data for
| each new UUIDv7 generated for any remaining
| space."
|
| Which the referenced "method 3" is:
|
| "Replace Leftmost Random Bits with Increased Clock Precision
| (Method 3):
|
| For UUIDv7, which has millisecond timestamp precision, it is
| possible to use additional clock precision available on the
| system to substitute for up to 12 random bits immediately
| following the timestamp. This can provide values that are
| time ordered with sub-millisecond precision, using however
| many bits are appropriate in the implementation environment.
| With this method, the additional time precision bits MUST
| follow the timestamp as the next available bit in the rand_a
| field for UUIDv7."
| throw0101c wrote:
| > _It is breaking the spec_ [...]
|
| As per a sibling comment, it is not breaking the spec. The
| comment in the Pg code even cites the spec that says what to
| do (and is quoted in the post): * Generate
| UUID version 7 per RFC 9562, with the given timestamp.
| * * UUID version 7 consists of a Unix timestamp in
| milliseconds (48 * bits) and 74 random bits,
| excluding the required version and * variant bits.
| To ensure monotonicity in scenarios of high- *
| frequency UUID generation, we employ the method "Replace
| * LeftmostRandom Bits with Increased Clock Precision (Method
| 3)", * described in the RFC. [...]
| peterldowns wrote:
| The test should do a set comparison, not an ordered list
| comparison, if it wants to check that the same 5 accounts were
| returned by the API. I think it's as simple as that.
|
| The blogpost is interesting and I appreciated learning the
| details of how the UUIDv7 implementation works.
| vips7L wrote:
| Don't you think that depends on what you're guaranteeing in
| your api? If you're guaranteeing that your api returns the
| accounts ordered you need to test for that. But I do agree in
| general that using a set is the correct move.
| Too wrote:
| The test is a very strange example indeed. Is it testing
| the backend, the database or both? If the api was
| guaranteeing ordered values, pre-uuid7 the backend must
| have sorted them by other means before returning, making
| the test identical. If the backend is not guaranteeing
| order, that shouldn't be tested either.
| paulddraper wrote:
| > Moreover, _even if it did,_ the RFC for UUIDv7 doesn 't
| promise this property.
|
| Huh?
|
| If the docs were to guarantee it, they guarantee it. Why are
| you looking for everything to be part of RFC UUIDv7?
|
| Failure of logic.
| fwip wrote:
| Their next sentence explains. Other databases might not make
| that guarantee, including future versions of Postgres.
| throw0101c wrote:
| > [...] _code that relies on this monotonicity._ _The reason
| for this is simple: the documentation doesn 't promise this
| property. Moreover, even if it did, the RFC for UUIDv7 doesn't
| promise this property._
|
| The "RFC for UUIDv7", RFC 9562, explicitly mentions
| monotonicity in SS6.2 ("Monotonicity and Counters"):
| Monotonicity (each subsequent value being greater than the
| last) is the backbone of time-based sortable UUIDs.
| Normally, time-based UUIDs from this document will be
| monotonic due to an embedded timestamp; however,
| implementations can guarantee additional monotonicity via
| the concepts covered in this section.
|
| * https://datatracker.ietf.org/doc/html/rfc9562#name-
| monotonic...
|
| In the UUIDv7 definition (SS5.7) it explicitly mentions the
| technique that Postgres employs for _rand_a_ :
| rand_a: 12 bits of pseudorandom data to provide
| uniqueness as per Section 6.9 and/or optional
| constructs to guarantee additional monotonicity as
| per Section 6.2. Occupies bits 52 through 63
| (octets 6-7).
|
| * https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-
| vers...
|
| Note: "optional constructs to guarantee additional
| monotonicity". Pg makes use of that option.
| stonemetal12 wrote:
| >explicitly mentions monotonicity
|
| >optional constructs
|
| So it is explicitly mentioned in the RFC as optional, and Pg
| doesn't state that they guaranty that option. The point still
| stands, depending on optional behavior is a recipe for
| failure when the option is no longer taken.
| idconvict wrote:
| The "optional" portion is this part of the spec, not the
| time part.
|
| > implementations can guarantee additional monotonicity via
| the concepts covered in this section
| arghwhat wrote:
| Relying on an explicitly documented implementation behavior
| that the specification explicitly describes as an option is
| not an issue. Especially if the behavior is only relied on
| in a test, where the worst outcome is a failed testcase
| that is easily fixed.
|
| Even if the behavior went away, UUIDs unlike serials can
| always be safely generated directly by the application just
| as well as they can be generated by the database.
|
| Going straight for that would arguably be the "better"
| path, and allows mocking PRNG to get sequential IDs.
| mlyle wrote:
| It's mentioned in the RFC as being explicitly monotonic
| based the time-based design.
|
| Implementations that need monotonicity beyond the
| resolution of a timestamp-- like when you allocate 30 UUIDs
| at one instant in a batch-- can optionally use those
| additional bits for that purpose.
|
| > Implementations _SHOULD_ employ the following methods for
| single-node UUID implementations that require batch UUID
| creation or are otherwise concerned about monotonicity with
| high-frequency UUID generation.
|
| (And it goes on to recommend the obvious things you'd do:
| use a counter in those bits when assigning a batch; use
| more bits of time precision; etc.)
|
| The comment in PostgreSQL before the implementation makes
| it clear that they chose the third option for this in the
| RFC: * variant bits. To ensure
| monotonicity in scenarios of high- * frequency
| UUID generation, we employ the method "Replace *
| LeftmostRandom Bits with Increased Clock Precision (Method
| 3)", * described in the RFC. ...
| Dylan16807 wrote:
| > It's mentioned in the RFC as being explicitly monotonic
| based the time-based design.
|
| It's explicitly partially monotonic.
|
| Or as other people would call it, "not monotonic".
|
| People are talking past each other based on their use of
| the word "monotonic".
| sbuttgereit wrote:
| Software is arbitrary. Any so-called "guarantee" is only as
| good as the developers and organizations maintaining a
| piece of software want to make it regardless of prior
| statements. At some point, the practical likelihood of a
| documented, but not guaranteed, process being violated vs.
| the willful abandonment of a guarantee start to look very
| similar.... at which point nothing saves you.
|
| Sometimes the best you can do is recognize who you're
| working with today, know how they work, and be prepared for
| those people to be different in the future (or of a
| different mind) and for things to change regardless to
| expressed guarantees.
|
| ....unless we're talking about the laws of physics...
| ...that's different...
| throw0101c wrote:
| > _So it is explicitly mentioned in the RFC as optional_
| [...]
|
| The use of rand_a for _extra_ monotonicity is optional. The
| monotonicity _itself_ is not optional.
|
| SS5.7 states: Alternatively,
| implementations MAY fill the 74 bits, jointly,
| with a combination of the following subfields, in
| this order from the most significant bits to the least,
| to guarantee additional monotonicity within a millisecond:
|
| Guaranteeing _additional_ monotonicity means that there is
| _already_ a 'base' level of monotonicity, and there are
| provisions for _even more_ ( "additional") levels of it.
| This 'base level' is why SS6.2 states:
| Monotonicity (each subsequent value being greater than the
| last) is the backbone of time-based sortable
| UUIDs. Normally, time-based UUIDs from this
| document will be monotonic due to an embedded timestamp;
| however, implementations can guarantee additional
| monotonicity via the concepts covered in this
| section.
|
| "Backbone of time-based sortable UUIDs"; "additional
| monotonicity". Additional: adding to what's already there.
|
| * https://datatracker.ietf.org/doc/html/rfc9562
| reshlo wrote:
| > Normally, time-based UUIDs from this document will be
| monotonic due to an embedded timestamp; however,
| implementations can guarantee additional monotonicity via
| the concepts covered in this section.
|
| "Normally, I am at home because I do not have a reason to
| go out; however, sometimes I am at home because I am
| sleeping."
|
| Notice how this statement does not actually mean that I
| am always at home.
| Dylan16807 wrote:
| "this monotonicity" that OP suggests people not use is
| specifically the _additional_ monotonicity.
|
| Or to put it another way: OP is suggesting you don't
| depend on it being properly monotonic, because the
| default is that it is only _partially_ monotonic.
| btown wrote:
| I was recently bit doing a Postgres upgrade by the Postgres
| team considering statements like `select 1 group by true`
| fine to silently break in Postgres 15. See
| https://postgrespro.com/list/thread-id/2661353 - and this
| behavior remains undocumented in
| https://www.postgresql.org/docs/release/ . It's an
| absolutely incredible project, and I don't disagree with
| the decision to classify it as wontfix - but it's an
| anecdote to not rely on undefined behavior!
| braiamp wrote:
| I don't think most people will heed this warning. I warned
| people in a programming forum that Python ordering of objects
| by insertion time was a implementation detail, because it's not
| guaranteed by any PEP [0]. I could literally write a PEP
| compliant Python interpreter and could blow up in someone's
| code because they rely on the CPython interpreter behavior.
|
| [0]: https://mail.python.org/pipermail/python-
| dev/2017-December/1...
| kstrauser wrote:
| That definitely _was_ true, and I use to jitter my code a
| little to deliberately find and break tests that depended on
| any particular ordering.
|
| It's now explicitly documented to be true, and you can
| officially rely on it. From
| https://docs.python.org/3/library/stdtypes.html#dict:
|
| > Changed in version 3.7: Dictionary order is guaranteed to
| be insertion order.
|
| That link documents the Python language's semantics, not the
| behavior of any particular interpreter.
| dragonwriter wrote:
| > I warned people in a programming forum that Python ordering
| of objects by insertion time was a implementation detail,
| because it's not guaranteed by any PEP
|
| PEPs do not provide a spec for Python, they neither cover the
| initial base language before the PEP process started, nor
| were all subsequent language changes made through PEPs. The
| closest thing Python has to a cross-implementation standard
| is the Python Language Reference for a particular version,
| treating as excluded anything explicitly noted as a CPython
| implementation detail. Dictionaries being insertion-ordered
| went from a CPython implementation detail in 3.6 to
| guaranteed language feature in 3.7+.
| sedatk wrote:
| As a counter-argument, it will inevitably turn into a spec if
| it becomes widely-used enough.
|
| What was that saying, like: "every behavior of software
| eventually becomes API"
| tomstuart wrote:
| https://www.hyrumslaw.com/
| sedatk wrote:
| Yes, that one! Thanks :)
| the8472 wrote:
| Consider the incentives you're setting up there. An API
| contract goes both ways, the vendor promises some things and
| not others to preserve flexibility, and the user has to abide
| by it to not get broken in the future. If you unilaterally
| ignore the contract, even plan to do so in advance, then
| eventually kindness and capacity to accommodate such abuse
| will run might run out and they may switch to an adversarial
| stance. See QUIC for example which is a big middle finger to
| middle boxes.
| sedatk wrote:
| Sure, there is a risk. But, it all depends on how great and
| desirable the benefits are.
| deadbabe wrote:
| Most code does not live for a long time. Similar to how
| consumer products are built for planned obsolescence, code is
| also built with a specific lifespan in mind.
|
| If you spend time making code bulletproof so it can run for
| like 100 years, you will have wasted a lot of effort for
| nothing when someone comes along and wipes it clean and
| replaces it with new code in 2 years. Requirements change, code
| changes, it's the nature of business.
|
| Remember any fool can build a bridge that stands, it takes an
| engineer to make a bridge that barely stands.
| Pxtl wrote:
| Uh, more people work on 20-year-old codebases than you'd
| think.
| 9dev wrote:
| And yet these people are dwarved by the number of
| developers crunching out generic line of business CRUD apps
| every day.
| agilob wrote:
| >Most code does not live for a long time.
|
| Sure, and here I am in a third company doing cloud migration
| and changing our default DB from MySQL to SQL server. The
| pain is real, 2 year long roadmap is now 5 years longer
| roadmap. All because some dude negotiated a discount on cloud
| services. And we still develop integrations that talk to
| systems written for DOS.
| fngjdflmdflg wrote:
| >The Postgres patch solves the problem by repurposing 12 bits of
| the UUID's random component to increase the precision of the
| timestamp down to nanosecond granularity [...]
|
| >It makes a repeated UUID between processes more likely, but
| there's still 62 bits of randomness left to make use of, so
| collisions remain vastly unlikely.
|
| Does it? Even though the number of random bits has decreased, the
| time interval to create such a duplicate has also decreased,
| namely to an interval of one nanosecond.
| paulddraper wrote:
| Depends if you think sub-millisecond locality is significant.
| londons_explore wrote:
| I could imagine that certain nanoseconds might be vastly more
| likely than other nanoseconds.
|
| For example, imagine you have a router that sends network
| packets out at the start of each microsecond, synced to wall
| time.
|
| Or the OS scheduler always wakes processes up on a millisecond
| timer tick or some polling loop.
|
| Now, when those packets are received by a postgres server and
| processed, the time to do that is probably fairly consistent -
| meaning that X nanoseconds past the microsecond you probably
| get most records being created.
| UltraSane wrote:
| But only one nanosecond slower or faster and you get another
| set of 4.611 billion billion random IDs. I think random
| variations in buffer depths and CPU speeds will easily
| introduce hundreds of nanoseconds of timing variations.
| syncing any two things to less than 1 nanosecond is
| incredibly hard and doesn't happen by accident.
| zamadatix wrote:
| The important part is the events in time aren't going to be
| as random as the actual random source. The chances of an
| actual collision remain low but the distribution of events
| over time is a weaker (in relative terms) source of random
| bits compared to proper "random" sources which won't have
| obvious bias at all.
| UltraSane wrote:
| I am sure there is bias but 1 nanosecond is an incredibly
| narrow window. It really would be an interesting
| experiment to evaluate the optimal balance of bits for
| timestamp and for random value. What about hostname and
| even process ID? Snowflake IDs are 63 bits long with 41
| bits as a millisecond timestamp, 10 bits as a machine ID,
| and 12 bits as a sequential counter.
| WorldMaker wrote:
| Similarly for direct comparison, ULID has 48-bit
| timestamps, also at the millisecond, and 80 random bits.
|
| Also to compare, the ULID spec technique for monotonicity
| is to take a single random value and then start
| incrementing the lowest bits, trading random entropy for
| direct "nearness", one after another. Versus the rand_a
| approach is effectively using the most significant bits,
| but keeping more random entropy.
| zamadatix wrote:
| I suppose that would depend entirely on how you measure
| what optimal is. Optimal randomness is 128 bits from the
| best random source and 0 bits from anything else, like
| time. Optimal "just random enough for my use case but no
| more so I can fit other information in the value" depends
| entirely on the requirement of your use case (more
| specifically, not just "for databases" but "for my
| database to... on the hardware... in which the access
| is... on the presumed growth..." and so on). For picking
| a "good enough" value 12 bits is probably as reasonable
| as one will find generic reason for.
| mlyle wrote:
| We're not talking about nanoseconds of real time; we're
| talking about nanoseconds as measured by the CPU doing the
| processing. Nanoseconds are not likely to be a uniform
| variate.
| UltraSane wrote:
| Yes and they are also not likely to be so non-uniform
| that more than 6.411 billion billion events all happen in
| one nanosecond.
| mlyle wrote:
| Note it's not that number, but roughly the square root of
| that number, that matters.
|
| And they might be quite non-uniform. If the scheduler
| tick and the nanosecond clock are synchronous, you could
| end up with a few thousand popular values instead of a
| billion.
|
| It's not a real concern today, and probably won't be a
| real concern in 10 years, but it's not so far removed
| from possibility that no one has to think about it.
| UltraSane wrote:
| Good point about the square root of the random part. I
| guess that is why the 63 bit Snowflake ID uses a
| sequential counter.
| michaelt wrote:
| Imagine if you were generating 16 UUIDs per nanosecond, every
| nanosecond.
|
| According to [1] due to the birthday paradox, the probability
| of a collision in any given nanosecond would be 3E-17 which of
| course sounds pretty low
|
| But there are 3.154e+16 nanoseconds in a year - and if you get
| out your high-precision calculator, it'll tell you there's a
| 61.41% chance of a collision in a year.
|
| Of course you might very well say "Who needs 16 UUIDs per
| nanosecond anyway?"
|
| [1] https://www.bdayprob.com/
| Horffupolde wrote:
| So what if there's a collision? If the column is UNIQUE at
| most it'll ROLLBACK on INSERT. 16 INSERTS per nanosecond is
| 16 billion TPS. At that scale you'll have other problems.
| dotdi wrote:
| My org has been using ULID[0] extensively for a few years, and
| generally we've been quite happy with it. After initially dabbing
| with a few implementations, I reimplemented the spec in Kotlin,
| and this has been working out quite well for us. We will open-
| source our implementation in the following weeks.
|
| ULID does specifically require generated IDs to be monotonically
| increasing as opposed to what the RFC for UUIDv7 states, which is
| a big deal IMHO.
|
| [0]: https://github.com/ulid/spec
| willvarfar wrote:
| Having used a lot of the ULID variants that the UUIDv7 spec
| cites as prior art, including the ULID spec you link to, I've
| gotta say that UUIDv7 has some real advantages.
|
| The biggest advantage is that it is hex. Haven't yet met a
| database system that doesn't have functions for substr and
| from_hex etc, meaning you can extract the time part using
| vanilla sql.
|
| ULID and others that use custom variants of base32 or base62 or
| whatever are just about impossible to wrangle with normal
| tooling.
|
| Your future selfs will thank you for being able to manipulate
| it in whatever database you use in the future to analyse old
| logs or import whatever data you generate today.
| mixmastamyk wrote:
| Aren't they stored as 16 bytes in binary? How to format it
| later as text is then your choice.
| WorldMaker wrote:
| It's that eternal push/pull "war" between "we need a sproc
| that can report this directly from the SQL server" and
| "please don't do things directly on the SQL server because
| you'll route around important application code" and "it's a
| feature not a bug that you can't just look things up by ID
| in the DB without a little bit of extra work".
|
| I did work on a project using ULIDs in SQL Server. They
| were stored in uniqueidentifier fields with a complex byte
| swap from ULID to fake-UUID to get better storage/indexing
| out of SQL Server [1]. There was an attempt to use SQL
| Functions to display/search the ULID form directly in the
| database, but it was never as bug free as the C# byte order
| code and so it was definitely not recommended doing it
| directly in the DB and that if a "report" was missing it
| should be a part of the application (which was already
| almost nothing but a bloated "reporting" tool) or in a
| related "configuration" application. It did feel more like
| a feature than a bug because it did keep some meddling and
| drama out of the DB. I also see the arguments for why in
| some different types of applications it makes debugging a
| lot harder and those arguments make sense and it is
| definitely a trade-off to consider.
|
| [1] The rabbit hole into SQL Server's ancient weird UUID
| sort order: https://devblogs.microsoft.com/oldnewthing/2019
| 0426-00/?p=10...
| sedatk wrote:
| Additionally, v7 UUIDs can be generated simultaneously on the
| client-side by multiple threads without waiting for an oracle
| to release the next available ID. That's quite important for
| parallel processing. Otherwise, you might as well use an
| autoincrement BIGINT.
| sedatk wrote:
| ULID guarantees monotonicity only per process, and it requires
| ID generation to be serialized. I find the promise quite
| misleading because of that. You might as well use a wide-enough
| integer with the current timestamp + random as baseline for the
| same purpose, but I wouldn't recommend that either.
| lordofgibbons wrote:
| What benefit does this have over something like Twitter's
| Snowflake, which can be used to generate distributed
| monotonically increasing IDs without synchronization?
|
| We've been using an implementation of it in Go for many years in
| production without issues.
| WorldMaker wrote:
| UUIDv7 interoperates with all the other versions of UUID. The
| v7 support in Postgres doesn't add a new column type, it makes
| the existing column type more powerful/capable. Applications
| that had been using UUIDv4 everywhere can get cheap Snowflake-
| like benefits in existing code just from switching the
| generator function. Most languages have a GUID or UUID
| class/struct that is compatibly upgradable from v4 to v7, too.
| pphysch wrote:
| The naming of "rand_a" and "rand_b" in the spec is a bit
| misleading. They don't have to be generated randomly. I'm sure
| there's a historical reason for it.
|
| "extra_" or "distinct_" would be a more accurate prefix for
| UUIDv7.
|
| UUIDv7 is actually quite a flexible standard due to these two
| underspecified fields. I'm glad Postgres took advantage of that!
| mmerickel wrote:
| Remember even if timestamps may be generated using a
| monotonically increasing value that does not mean they were
| committed in the same order to the database. It is an entirely
| separate problem if you are trying to actually determine what
| rows are "new" versus "previously seen" for things like cursor-
| based APIs and background job processing. This problem exists
| even with things like a serial/autoincrement primary key.
| shalabhc wrote:
| +1
|
| What could be useful here is if postgres provided a way to
| determine the latest frozen uuid. This could be a few ms behind
| the last committed uuid but should guarantee that no new rows
| will land before the frozen uuid. Then we can use a single
| cursor track previously seen.
| willvarfar wrote:
| Ordering for UUIDv7s in the same millisecond is super useful when
| some rows represent actions and others reactions.
|
| I have used this guarantee for events generated on clients. It
| really simplifies a lot of reasoning.
| Glyptodon wrote:
| On one hand I too am looking forward to more widespread use of
| UUIDv7, but on the other I don't really get the problem this is
| solving for their spec. If you care about timestamp ordering I
| don't think doing it in a way that forces you to fake a PK if you
| insert an earlier dated record at a future point makes sense. But
| I guess I'm implicitly assuming that human meaningful dates
| differ from insertion times in many domains.
| samatman wrote:
| I maintain that people are too eager to use UUIDv7 to begin with.
| It's a dessert topping and a floor wax.
|
| Let's say you need an opaque unique handle, and a timestamp, and
| a monotonically increasing row ID. Common enough. Do they have to
| be the same thing? _Should_ they be the same thing? Because to me
| that sounds like three things: an autoincrementing primary key, a
| UUIDv4, and a nanosecond timestamp.
|
| Is it always ok that the 'opaque' unique ID isn't opaque at all,
| that it's carrying around a timestamp? Will that allow
| correlating things which maybe you didn't want hostiles to
| correlate? Are you 100% sure that you'll never want, or need, to
| re-timestamp data without changing its global ID?
|
| Maybe you do need these things unnormalized and conflated. Do you
| though? At least ask the question.
| user3939382 wrote:
| Re-timestamp would be a new one for me. What's a conceivable
| use case? An NTP fault?
| scrollaway wrote:
| UUID7 is excellent.
|
| I want to share a django library I wrote a little while back
| which allows for prefixed identity fields, in the same style as
| Stripe's ID fields (obj_XXXXXXXXX):
|
| https://github.com/jleclanche/django-prefixed-identity-field...
|
| This gives a PrefixedIdentityField(prefix="obj_"), which is
| backed by uuid7 and base58. In the database, the IDs are stored
| as UUIDs, which makes them an efficient field -- they are
| transformed into prefixed IDs when coming out of the database,
| which makes them perfect for APIs.
|
| (I know, no documentation .. if someone wants to use this, feel
| free to file issues to ask questions, I'd love to help)
| nikisweeting wrote:
| I implemented this in pure Python a few days ago in case anyone
| finds it helpful, here it is:
| https://gist.github.com/pirate/7e44387c12f434a77072d50c52a3d...
|
| My implementation supports graceful degradation between
| nanosecond scale resolution, microsecond, and millisecond, by
| using 12 bits for each and filling up the leftmost bits of rand_a
| and rand_b. Not all environments provide high resolution system
| clocks with no drift, so it's is important to maintain
| monotonicity when generating IDs with a low-res timestamp as
| input. You still want the bits that would've held the nanosecond
| value to be monotonic.
|
| Neither of the existing uuid_utils and uuid7 python libs that can
| generate UUID7s support this monotonicity property.
|
| Am planning on using this for ArchiveBox append-only "snapshot"
| records, which are intrinsically linked to time, so it's a good
| use-case imo.
|
| There's another great resource here that I think is one of the
| best explainers of UUIDv7: https://antonz.org/uuidv7/
|
| Whatever you do, don't implement the cursed 36-bit whole-second
| based time UUIDv7 variant that you occasionally see on
| StackOverflow / blog posts, stick to 48!
| Dylan16807 wrote:
| > The Postgres patch solves the problem by repurposing 12 bits of
| the UUID's random component to increase the precision of the
| timestamp down to nanosecond granularity (filling rand_a above),
| which in practice is too precise to contain two UUIDv7s generated
| in the same process.
|
| A millisecond divided by 4096 is not a nanosecond. It's about 250
| nanoseconds.
___________________________________________________________________
(page generated 2025-01-02 23:00 UTC)