[HN Gopher] Nanosecond timestamp collisions are common
       ___________________________________________________________________
        
       Nanosecond timestamp collisions are common
        
       Author : ingve
       Score  : 273 points
       Date   : 2023-07-21 07:01 UTC (16 hours ago)
        
 (HTM) web link (www.evanjones.ca)
 (TXT) w3m dump (www.evanjones.ca)
        
       | vbezhenar wrote:
       | If you need unique nanosecond, keep track of the previously
       | generated one and increase it if necessary. Would require global
       | lock or atomic stuff, but should be good enough for practical
       | uses.
        
         | pyrale wrote:
         | One benefit of UUID is that you don't need coordination. In
         | coordinated systems, unique IDs are a non-issue.
        
           | OJFord wrote:
           | Is there any other benefit? That's the raison d'etre -
           | _Universally_.
        
             | eru wrote:
             | Depends on what you compare them with.
        
         | buster wrote:
         | If you do that, just take some redis server and increment an
         | integer every time you need a new number?
        
           | marcus_holmes wrote:
           | or y'know, use the thing that was designed specifically for
           | this case: UUID's
        
             | hnlmorg wrote:
             | UUIDs are designed to solve a slightly different, albeit
             | related, problem: where you don't have synchronisation of
             | the system as a whole. The solution the GP is solving is
             | for systems that are synchronised and therefore generating
             | a UUID is additional and unnecessary overhead.
        
         | LAC-Tech wrote:
         | "Logical Physical Clocks" may be of interest. The timestamps
         | are monotonic and don't require atomic clocks like in google
         | spanner.
         | 
         |  _HLC captures the causality relationship like logical clocks,
         | and enables easy identification of consistent snapshots in
         | distributed systems. Dually, HLC can be used in lieu of
         | physical /NTP clocks since it maintains its logical clock to be
         | always close to the NTP clock. Moreover HLC fits in to 64 bits
         | NTP timestamp format, and is masking tolerant to NTP kinks and
         | uncertainties._
         | 
         | https://cse.buffalo.edu/~demirbas/publications/hlc.pdf
        
         | LeoNatan25 wrote:
         | If there are collisions seen, that means there would be lock
         | contention at this critical section, thus not a "good enough"
         | solution.
        
       | getmeinrn wrote:
       | I was going to post about "use a UUID", but I was surprised to
       | learn that no UUID uses both timestamp + a random component. You
       | can either get fully random with UUID4, or have a time + MAC
       | based UUID with UUID1. Strange, I would have thought there would
       | exist a UUID that uses time + random to minimize collisions like
       | described in the post.
        
         | wood_spirit wrote:
         | the new UUIDv6, 7 and 8 work the way you deacribe.
        
           | dtech wrote:
           | Only v6 and v7, v8 is "whatever you want".
        
             | mcv wrote:
             | That's not much of a standard.
        
         | nabogh wrote:
         | Can't find much info about it but is this UUID v7?
        
           | [deleted]
        
         | zote wrote:
         | someone else in the thread mentioned UUIDv7
        
         | LAC-Tech wrote:
         | I believe the proposed UUIDv7 standard uses this.
         | 
         | https://datatracker.ietf.org/doc/html/draft-peabody-dispatch...
         | 
         | It's a draft but there's a lot of implementations out there.
        
         | OJFord wrote:
         | ULID (and I think UUIDv7 draft) has millisecond timestamp + 80b
         | randomness.
        
         | sethammons wrote:
         | Ksuid may be what you want. Pretty much time sortable uuid.
         | 
         | Go implementation: https://github.com/segmentio/ksuid
        
       | wood_spirit wrote:
       | This is why you should use ids that combine both a time component
       | and a sequence.
       | 
       | Eg UUIDv7 has a milliseconds time component and then a field that
       | increments for each event in the same millisecond, and then
       | enough random bits to make collisions between ids generated on
       | different machines astronomically unlikely.
       | 
       | Of course there are only so many bits so you might generate too
       | many events in the same time slice so the sequence overflows, and
       | you might actually get collisions between machines, and you are
       | limiting your event generation speed by forcing your cpu to sync
       | on the increment etc.
       | 
       | But in practice UUIDv7 works great at scale.
        
         | hardwaresofton wrote:
         | And they play nicely with sort ordering in popular databases
         | like postgres!
         | 
         | They're not in tree yet but there are a bunch of awesome pg
         | extensions that provide access to uuidv7 (outside of just using
         | it at the application level)
        
         | junon wrote:
         | Don't even need the "same millisecond" part to save a few
         | cycles, use case depending. An overflowing increment with a
         | counter of any sort plus a few random bits is usually enough.
         | 
         | If you're clever enough, neither needs a branch.
        
           | SV_BubbleTime wrote:
           | UUIDv7 has two at least two such monotonic strategies. You
           | can use a counter, or you can use those bytes to increase the
           | random entropy.
        
         | qwerty456127 wrote:
         | The problem with UUIDs is they are completely unreadable. Not
         | just you can't understand them, it is prohibitively hard to
         | even distinguish between them visually. This is why identifiers
         | which would not include any bit of information (or noise)
         | beyond the necessary minimum are useful in some cases.
        
           | SV_BubbleTime wrote:
           | Thus UUIDv7.
        
         | hinkley wrote:
         | I'm feeling a bit like an accidental time traveler, because I
         | can recall a conversation at a tech meetup that had to have
         | been at least ten years ago where someone was struggling with
         | unique UUIDs because they were bursting above 1000 UUIDs per
         | millisecond and not happy with the available options.
         | 
         | How old is UUID7? I can't get the internet to tell me.
        
           | yohannesk wrote:
           | Were they not happy due to possible collisions or something
           | else?
        
             | hinkley wrote:
             | Yeah they had a problem where the amortized ID rate was
             | less than the number of valid UUIDs you could generate in a
             | given time interval, but with a peak rate well above that,
             | so UUIDs weren't quite going to function for
             | naming/numbering them. You gotta be pushing a lot of
             | messages through a system to hit that, but it's possible.
             | And adding more layers of coordination on top of something
             | meant to reduce coordination overhead tends to... make
             | things messy.
             | 
             | I've half-heartedly tried to look up his problem every time
             | I've had to introduce UUIDs to something (most recently
             | fixing agents not generating correlation IDs), and I have
             | not figured out which version of UUID he was talking about.
             | I now suspect it was some proprietary or industry-specific
             | quote-unquote UUID spec rather than an industry wide one. I
             | may look through the UUID7 spec at some point to see if
             | they mention prior art. Much more plausible than him being
             | a time traveler.
        
           | tredre3 wrote:
           | UUID7 was first drafted a bit over a year ago:
           | https://datatracker.ietf.org/doc/html/draft-peabody-
           | dispatch...
        
             | hinkley wrote:
             | That's kinda what I figured. So you can see my confusion.
        
         | eru wrote:
         | Why do you need the time component anyway? It's just eating up
         | bits in your UUID without contributing much entropy.
        
           | ericHosick wrote:
           | UUIDs can serve different purposes. As others have mentioned,
           | database performance on inserts might trump the need for
           | difficult to guess UUIDs.
           | 
           | In other cases, the UUID needs to be as random as possible.
           | 
           | It really depends on the use case.
        
             | SV_BubbleTime wrote:
             | Yea. This is an easy one. We use both.
             | 
             | For our "session" records, it's a UUIDv7. This sorts
             | beautifully, and if I wanted to, I could look at a log and
             | easily see all the entries in a particular session.
             | 
             | For a larger db, we just need unique entries and at least
             | in Dynamo, it is an advantage to equally distribute them as
             | much as possible. UUIDv4 there.
        
           | madsbuch wrote:
           | You need it to make database indices perform better.
           | 
           | If you don't need that, but just need a random UUID, UUIDv4
           | is better.
        
             | EGreg wrote:
             | I dont know why people use relational databases other than
             | they were first and "that's the way it's always been done".
             | 
             | Why not use a graph database? O(1) lookups instead of O(N).
             | Why need indices if you can just point to the data. Why use
             | JOINs when map-reduce querying is far more flexible?
        
               | madsbuch wrote:
               | How do you do constant time lookup an graph databases?
               | 
               | My intuition let's me know that you can not get below O(n
               | log n) (Lower limit for comparison based ordering)
        
               | tharkun__ wrote:
               | They were not first.
               | 
               | ISAM and VSAM come to mind. Yes it says "files" in there
               | a lot but it got used like a database with programming
               | interfaces like finding records in a database. If you
               | will this method is more like NoSQL databases than a
               | relational DB. The I in ISAM was a step towards not
               | having to know the key (true "NoSQL"). Kind of like
               | today's NoSQL databases all also give the ability to add
               | indexes now.
               | 
               | https://en.m.wikipedia.org/wiki/ISAM
        
               | dlisboa wrote:
               | I've been interested in learning more about them and how
               | to best utilize them in my company. What graph database
               | and query language would you recommend (regardless of
               | stack)?
        
               | EGreg wrote:
               | 1. Memgraph 2. Neo4j
               | 
               | As for query language: definitely Cypher!
        
               | insanitybit wrote:
               | You aren't describing a property of a graph database.
               | You're describing a property of some set of key value
               | based systems.
               | 
               | The reason why you want indices is because some query
               | patterns don't have a key to perform a look up on.
        
             | dtech wrote:
             | I feel like v7 is almost a strictly better v4. Assuming you
             | can generate a v7 (you have a time source), what are the
             | disadvantages?
        
               | andersa wrote:
               | Anyone who can see the uuidv7 can determine at what time
               | it was generated. You might not want that for things
               | exposed to users.
        
               | madsbuch wrote:
               | Entropy. You loose bits to a known source, which reduce
               | entropy of the UUID.
        
               | dtech wrote:
               | I do feel 74 bits per second is enough though. That would
               | require 2^(74/2) = 137 billion UUIDs generated in a
               | single second for a 50% chance of a single collision.
        
               | dralley wrote:
               | 128 bits is a lot of entropy to go around, though.
        
               | wolf550e wrote:
               | You can hand out uuidv4 to clients without revealing
               | anything.
        
             | mcv wrote:
             | But shouldn't that be a separate field then?
        
               | mattashii wrote:
               | The logistics of combining fields in indexes and
               | identifiers is relatively complex, while the logistics of
               | indexing a single field is comparatively trivial. This is
               | also why you don't ship timestamps using separate fields
               | for second/minute/hour/day/month/year, but a single ISO-
               | string or UNIX timestamp as representation: it makes
               | querying and interpreting the value more consistent.
        
               | btilly wrote:
               | Disagree on "relatively complex".
               | create index on $table ($field1, $field2);
               | 
               | Seems pretty simple to me.
        
               | frankreyes wrote:
               | Create a new column with an MD5 hash of the other
               | columns. Easy.
               | 
               | /s
        
               | rvnx wrote:
               | This is how it works, except it's not MD5 but a less-
               | costly hash function
        
               | madsbuch wrote:
               | I am curios about this, and might be misunderstanding
               | what you mean.
               | 
               | Can you layout a demo architecture where you use multiple
               | keys like you propose?
        
               | junon wrote:
               | There are many DBMSes that combine columns into a
               | singular primary key. Performance tradeoffs vary,
               | especially when it comes to indexing.
        
               | e12e wrote:
               | Any pure relational database design will eschew surrogate
               | keys - most real-world systems will (should) add them
               | back - because a surprising number of good natural keys
               | end up changing (names change, phone numbers change,
               | emails change, twitter handles become
               | irrelevant/disappear, location of birth may change
               | subject to geographic regions changing size...).
               | 
               | And on top of all that, there are efficiency concerns.
               | 
               | That said, at least for join tables AFAIK - there aren't
               | often a need for row IDs beyond the involved foreign keys
               | - unless you need to add meta data from other
               | relations/tables...
        
               | TRiG_Ireland wrote:
               | > location of birth may change
               | 
               | My British passport says I was born in Birr; my Irish
               | passport says I was born in Galway. These are both
               | correct, because they are answering different questions.
               | (I was born in the town of Ballinasloe in County Galway,
               | but my mother's place of residence at the time of my
               | birth was the town of Birr in County Offaly.)
        
               | pc86 wrote:
               | So British place of birth means "mother's residence" and
               | not place of birth?
        
               | rvnx wrote:
               | Official guidances here: https://assets.publishing.servic
               | e.gov.uk/government/uploads/...
               | 
               | Could maybe be just a mistake when doing the UK
               | registration ? It's an easy fix if wanted.
        
               | dtech wrote:
               | But then you have 2 identifiers which complicates
               | everything.
        
           | qwerty456127 wrote:
           | > Why do you need the time component anyway?
           | 
           | To sort or filter the records by time. Sure, you can just add
           | an extra column if you need this, but there are cases when
           | this is not convenient to do. E.g. when you want to be able
           | to export the records info files, naming the files with the
           | IDs and still be able to sort them.
        
             | willis936 wrote:
             | One benefit of keeping it separate is that you can choose
             | the precision of the timestamp necessary. "Millisecond
             | precision" is arbitrary and commonly insufficient.
        
               | preseinger wrote:
               | it's commonly totally sufficient for IDs to represent a
               | rough sort order
               | 
               | millisecond precision is great for a lot of use cases
        
               | willis936 wrote:
               | I didn't say that it wasn't. Hell even ms is too precise
               | for many use cases (usually where date is used instead).
               | 
               | What I said was that it's useful to be able to select
               | timestamp precision independently of UUID implementation.
               | One size that fits all fits none best.
        
               | dtech wrote:
               | Lucky for you, they also define UUIDv8 as a free-for-all
               | where you can do whatever you want, and nanosecond
               | precision is one of the examples given in the RFC.
        
             | danbruc wrote:
             | More importantly if you have an index on purely random IDs,
             | then each insert will go to some random position into the
             | tree whereas having IDs that increase with time will make
             | all new IDs end up at the end of the tree which reduces
             | index fragmentation.
        
               | lazide wrote:
               | depending on scale and architecture, either behavior can
               | be better. it's easier to shard when writes occur
               | randomly over the overall space. it's easier to coalesce
               | when writes all happen in a given place (head or tail)
        
               | remram wrote:
               | If that is the case then why shouldn't the storage system
               | hash the IDs itself, to spread them as it requires?
        
               | lazide wrote:
               | Because sometimes you want some data to be collocated,
               | while the rest sharded.
               | 
               | For instance, you might use a random object ID as a
               | prefix value in the index, followed by attribute ID which
               | isn't. Or a modified time, so you can have a history of
               | values which can be read out linearly.
               | 
               | If using it directly, that means Objects and their data
               | are sharded randomly across, but when looking for an
               | objects attributes (or attribute by time), their index
               | entries are always co-located and you can read them out
               | linearly with good performance.
               | 
               | If blindly hashing keys to distribute them, you can't do
               | that. Also, you can't really do a linear read at all,
               | since no data will be 'associatable' with others, as the
               | index value is randomized, and what is stored in the
               | index has no related to the key provided by the user.
               | 
               | You can only do a straight get, not a read. That is very
               | limiting, and expensive with large data sets as most
               | algorithms benefit greatly from having ordered data.
               | (Well, you could do a read, but you'd get back entries in
               | completely random order)
               | 
               | Needless to say, this is 'advanced' usage and requires
               | pretty deep understanding of your data and
               | indexing/write/read patterns, which is why random hashing
               | is the most common hash map behavior.
        
               | sgarland wrote:
               | Page splits... page splits everywhere.
        
               | EGreg wrote:
               | Or, you could use a graph database and stop having
               | frustrating relational impedance mismatch, nonlocality
               | etc. You can have O(1) lookups instead of O(log N) for
               | almost everything
        
               | coding123 wrote:
               | Graph databases don't solve that. All databases,
               | document, graph, rel ALL implement indexes to find
               | specific things in the exactly the same way. Very well
               | known tree, hash and other techniques.
               | 
               | The representation (outside of indexing) has properties
               | that make your USE CASE better or worse. EGreg would not
               | be someone to hire to architect a solution. He'll just
               | put your 1Trillion row per month use-case in a graph DB
               | like Neo4J and you'll just watch it fall over when you
               | run billing.
        
               | victor106 wrote:
               | That sounds too good to be true. Is that really true of
               | all grapdb's?
               | 
               | Also, if that's really true why can't everyone just use
               | graphdb's?
        
               | EGreg wrote:
               | Because of vendor lock-in with the LAMP stack over the
               | years. Every host used MySQL, how many had Neo4j as
               | available?
        
               | e12e wrote:
               | How big is the 1 though?
        
               | EGreg wrote:
               | Look, if you have N items related to X, at insert time,
               | you store them in an array and have X point to that
               | array, instead of foreign keys.
               | 
               | For example, when a user has 7 articles. Do you want to
               | just point to where the articles are stored? Or do you
               | want to do O(log n) lookup for each article?
               | 
               | And if you have many-to-many, do you want to join an
               | Intermediate Table for even more processing, or just
               | follow a pointer to a range of an intermediate node
               | directly and traverse?
        
               | ndriscoll wrote:
               | How is that different from a clustered index?
        
               | EGreg wrote:
               | A clustered index requires O(log N) lookups, since it's
               | still an index.
               | 
               | I'm talking about pointing directly to the location of an
               | array where things are stored. That array isn't a
               | clustered index. Each array is different.
        
               | ndriscoll wrote:
               | What about when you delete rows? Do you just leave the
               | space unused now? Or if you update a row to be larger?
               | Rewrite the whole array (so possibly O(n) updates)?
               | 
               | How do you deal with data that gets accessed in different
               | orders based on relationships from multiple other
               | entities? Duplicate the data? If so, updates now get
               | amplified and you can fit less data in RAM so you're more
               | likely to require disk IO. If not, you need a layer of
               | indirection (so you have an array of pointers instead of
               | an array of data).
               | 
               | Even with a layer of indirection, updates that grow a row
               | and require a reallocation will cause you to have to go
               | update all pointers (also, those pointers need to be
               | indexed to find who you need to update). To avoid write
               | amplification, you can use an array of ids instead of an
               | array of pointers. Now you want an id <-> pointer lookup,
               | which could be done with a hash map in O(1), or with a
               | tree in O(logN) if you want to also allow efficient range
               | queries.
               | 
               | I'm not exactly sure on the implementation details, but
               | for ballpark numbers, for an 8 kB page size with 80% fill
               | factor and 16 B entries (8 B key + 8 B pointer), with
               | 10E9 rows, log(N) ought to be ~4 (page IOs). Not ideal,
               | but the point is for a lot of real-world engineering,
               | O(logN) is effectively O(1) (with a small enough constant
               | factor that it's fine for general use).
               | 
               | So if your data is not write-once, trees are a well-
               | rounded default data structure.
        
               | hinkley wrote:
               | When you're talking about data sets so large they dictate
               | what hardware you use, and introduce terms like
               | "cluster", then 1 = [?]n
               | 
               | Which is why we need a version 2 of complexity theory,
               | that doesn't treat memory access or arithmetic on
               | arbitrary precision numbers (aka as n actually goes to
               | infinity) as O(1) operations. They aren't. Which every
               | large system engineer knows but few will talk about.
        
               | hansvm wrote:
               | Why sqrt(n) and not log(n)?
               | 
               | And that complexity theory already exists. Typical
               | whiteboard engineering uses transdichotomous models to
               | gloss over some polylogarithmic factors (as do much of
               | the literature), but more accurate models exist.
               | 
               | The difference isn't usually super relevant when
               | comparing multiple solutions all using the same model of
               | computation though since the extra terms don't tend to
               | bump one complexity class above another when switching
               | models, and if you cared about actual runtime
               | characteristics you wouldn't be relying on log factors in
               | such a crude tool anyway.
        
               | hinkley wrote:
               | Speed of light.
               | 
               | Imagine a data center containing exabytes of data. How
               | long does it take to access an arbitrary bit of that
               | data?
               | 
               | We use clusters because computers cannot contain an
               | infinite amount of memory, storage, or CPUs, because of
               | physics. You see this same thing play out at smaller
               | scales but it's more obvious at the macro scale. More
               | addresses take logn time to sort out, but time to access
               | is measured in radii, not gate depth.
               | 
               | In a world where clusters are rare, Knuth made decent
               | approximations. In a world where clusters are not only de
               | rigeur but hosted on multitenant hardware spread out over
               | upwards of 100 miles, those approximations are bullshit
               | and need to change.
        
               | hansvm wrote:
               | Oooh interesting, thank you. I totally misunderstood this
               | as the "integers never need more than 64 bits, so hash
               | tables are constant" argument.
        
               | hinkley wrote:
               | Integer arithmetic is really quantized logarithmic
               | complexity. If your hardware has a bucket your number
               | fits into, you're calculating n+1 or nxn in constant
               | time. But if your data set doubles in size (especially
               | for multiplication) you may find yourself in a bucket
               | that doesn't exist or a more expensive one. Contemporary
               | code is more likely to reach for bignum which is logn,
               | but again stairstepped to each number of integers it
               | takes to represent the entire number. A bigger problem
               | when your data set is sparse, so that values grow faster
               | than population (eg, UUID).
               | 
               | But on the topic of hash tables, you can only reach
               | 'O(1)' if you can avoid collisions. And to avoid
               | collisions you need a key of length m, which grows as n
               | grows. You cannot put 51 pigeons in 50 pigeonholes. So
               | the key length of your hash keys is m >= logn, which
               | means the time to compare keys is also logn, which means
               | hash tables are never actually O(1) access or insert
               | time. Actual access time for a hash table on non-
               | imaginary hardware is [?]nlogn, not O(1).
               | 
               | If you consider that we have applications that are just a
               | single hash table occupying the entire RAM of a single
               | machine, then this is not picking nits. It's capacity
               | planning.
        
               | danbruc wrote:
               | That will depend on which graph database you use as a
               | graph database might just store the graph in an
               | underlying relational database. And it will also depend
               | on what kind of data you have and what kind of queries
               | you want to perform. For a graph database it might be
               | faster to navigate along links in the graph but I would
               | guess you will have to pay a big performance penalty if
               | you have to operate orthogonally to your links, like
               | aggregate across all instances of some entity.
        
               | [deleted]
        
           | dtech wrote:
           | When you sort them they will be ordered by generation time.
           | This is useful in UIs, and greatly improves performance in
           | databases (how much depends on DB and use case).
        
           | glodalitazic wrote:
           | Iirc you dont want to spend entropy that you don't need. The
           | advantage of using the time means you need to spend less bits
           | on expensive randomness and thus generation is cheaper.
        
             | SV_BubbleTime wrote:
             | I only work in embedded, so I'm not sure about fancy
             | computers - but when we make UUIDv7s, I am certain the
             | random is faster than pulling the time, formatting for
             | milliseconds, then storing in position.
        
           | KMag wrote:
           | Otherwise, failure recovery requires robust storage. Upon
           | startup, you just wait until your timestamp ticks over, and
           | then you know you're not re-issuing any UUIDs.
           | 
           | With a pure counter-based system, you need robust distributed
           | storage and you need the machine to reserve batches of IDs
           | via committing writes to the robust distributed storage.
           | Otherwise, a disk failure may cause you re-issue UUIDs.
           | 
           | Though, seeding a thread-local AES-256-ctr or ChaCha20
           | instance via getrandom()/getentropy() is probably a better
           | fit for most situations. If you need sequential IDs for
           | database performance, seed a simple 128-bit counter using
           | getrandom()/getenropy(). If you're going to need more than
           | 2^40 unique IDs, then use IDs larger than 128 bits.
           | 
           | Assuming getrandom()/getentropy() is getting you a high-
           | entropy seed, it's easy to size the IDs to bring the
           | probability of collision much lower than the probability of
           | background radiation flipping a bit and breaking your
           | equality test between IDs. If you're not running redundant
           | calculations and/or on radiation-hardened hardware, it's
           | difficult to argue against randomized IDs.
        
           | c0balt wrote:
           | Because it allows te encoding of information the id. This
           | makes it, at least in my experience, somewhat sortable.
        
         | darkclouds wrote:
         | > This is why you should use ids that combine both a time
         | component and a sequence.
         | 
         | Computers should run like clockwork, so in this example of
         | using all the cores, in Windows and likely some other OS's,
         | threads are assigned to cores when they are started and you can
         | have many threads per core, ergo the time component should also
         | have the thread and the core number combined with it with multi
         | core systems.
         | 
         | Its possible to write the code in such a way some variables are
         | bounced out of the cpu cache back to memory to avoid any
         | caching issues because I think the cache on some cpu's are per
         | core, and some are per cpu.
        
           | Joker_vD wrote:
           | > threads are assigned to cores when they are started
           | 
           | Do they? I thought the normal behaviour was for cores to pick
           | any available thread to run, so core migration is quite
           | normal.
           | 
           | > ergo the time component should also have the thread and the
           | core number combined with it with multi core systems.
           | 
           | Sorry, how exactly does it follow from the previous? You seem
           | to have omitted the other half of your syllogism. After all,
           | clockworks do not have thread nor core numbers so I don't
           | quite see how having those in UUIDs will make computers run
           | like clockwork.
        
         | Mikhail_Edoshin wrote:
         | Isn't it simpler to use sequence keys then?
        
           | Cthulhu_ wrote:
           | Yes, but only on single machines, UUID and co are intended
           | for distributed systems.
           | 
           | Although now I wonder if / how UUID v7 can do sequential keys
           | on distributed systems. Mind you, on those systems "close
           | enough" will probably be good enough, and sorting will be
           | done by date instead of incremental ID.
        
             | moffkalast wrote:
             | You could also keep a global incremental index for that,
             | assuming there's some authoritative server that has to be
             | polled sequentially to get them.
             | 
             | Probably too much overhead when conflicting UUIDs are a few
             | orders of magnitude less likely than the clients crashing
             | from some random bug though.
        
             | EGreg wrote:
             | So just prefix the node ID which is assigned when the node
             | joins a swarm
        
               | Timon3 wrote:
               | How is that easier or better? Now you're letting your
               | infrastructure details bleed into your database.
        
               | EGreg wrote:
               | They _should_ bleed into your database. The database is
               | part of your infrastructure and it contains details like
               | where to find the replicas and so on
        
               | Timon3 wrote:
               | Why? Tight coupling doesn't make things easier.
        
               | EGreg wrote:
               | You're complaining that the infrastructure details are
               | bleeding into the intrastructure implementation?
        
               | Timon3 wrote:
               | Infrastructure details are bleeding into non-
               | infrastructure implementation. Your database content is
               | _not_ an infrastructure implementation.
        
               | EGreg wrote:
               | the content is not, but it can use a database function,
               | such as CURRENT_TIMESTAMP or RANDOM() can it not?
        
           | michaelt wrote:
           | Ah but you see, UUIDs are web scale.
           | 
           | And by web scale, I mean they're too big to exchange by any
           | offline channel.
        
             | SV_BubbleTime wrote:
             | FWIW... we're using UUIDv7s over BLE.
        
               | _flux wrote:
               | Should you, though? BLE itself uses UUIDs, but the ones
               | it uses are in a shortened format just because the BLE
               | frames are small and the protocol is not too fast.
               | 
               | Granted I've even used JSON over GATT so maybe I should
               | keep my mouth shut :).
        
               | SV_BubbleTime wrote:
               | It's for an auth use. And at least we're using CBOR. But
               | I get your point.
        
             | Tade0 wrote:
             | It's 128 bits vs 64 bits - not really that much of a
             | difference.
        
       | efitz wrote:
       | Related
       | 
       | I used to be the program manager owner of the security event log
       | in windows.
       | 
       | When things happen simultaneously or very closely in time on
       | multi-core systems, thread scheduling can significantly affect
       | observations of those things. For example, your thread quantum
       | might expire before you get to the syscall to get a time stamp,
       | or before you can pass a buffer to queue your event for
       | subsequent time stamping.
       | 
       | In fact, on multiprocessing systems, it was very common to see
       | out of order event log entries on Windows back in the day
       | (2000-oughts). You also could not count on the log timestamp
       | accuracy too precisely; 1s was pretty much the safe lower bound
       | (some components truncated or rounded time stamps IIRC).
        
       | maxekman wrote:
       | There is some interesting content regarding this in the
       | Timestamps and Sequencing section in this post about the Go
       | execution tracer overhaul:
       | https://go.googlesource.com/proposal/+/ac09a140c3d26f8bb62cb...
        
       | forrestthewoods wrote:
       | > On my system, the minimum increment was 32 ns.
       | 
       | Then you aren't using a nanosecond clock.
       | 
       | If you want actual nanosecond precision then you probably want
       | rdtsc.
        
       | non-e-moose wrote:
       | Silly Rabbit - absolutely accurate times are a security problem.
       | CPU designers (even waay back in Alpha @ DEC) intentionally
       | introduced clock jitter, just to prevent total predictability.
       | For x86, I think if you performed 3-4 of them, and then saved the
       | values in to registers, and then upon completion reported those
       | values, you would find that the time deltas are NOT exactly the
       | same.
        
       | grose wrote:
       | This is why it scares me a bit to use a raw timestamp as a sort
       | key in DynamoDB. I append a (random) unique ID to the timestamp
       | text to avoid it. Better safe than sorry, I figure.
        
         | moduspol wrote:
         | We ran into this, though we were using millisecond timestamps
         | with the Number type.
         | 
         | Ended up working around it by adding numbers after the decimal
         | point. DynamoDB apparently supports up to 38 digits of
         | precision, and JavaScript (we're NodeJS based) by default
         | encodes numbers to allow for up to 16 digits (combined, before
         | and after the decimal point). A UNIX epoch millisecond
         | timestamp is 13 digits, so we were able to use 3 just to add
         | uniqueness without having to update anything else in our
         | codebase.
         | 
         | We certainly now plan to essentially always use string-based
         | (and generically named) partition and sort keys, which would
         | allow us to more easily do what you're describing.
        
         | fer wrote:
         | I have a use case that uses a similar solution, but for a
         | different reason: it doesn't scare me to use timestamp as sort
         | key I since I know my hash keys are unique and I only write
         | every few seconds. But I still add random amount of ms (well
         | under the frequency of writes/updates), because otherwise I'll
         | be hitting hot partition issues on the GSI (indexed by
         | timestamp, as you can guess).
        
       | rdtsc wrote:
       | Erlang/Elixir (BEAM VM) makes this very clear - it's a
       | distinction between monotonic vs strictly monotonic.
       | 
       | https://www.erlang.org/doc/apps/erts/time_correction.html#mo...
       | 
       | > In a monotonically increasing sequence of values, all values
       | that have a predecessor are either larger than or equal to its
       | predecessor.
       | 
       | These are available via the
       | https://www.erlang.org/doc/man/erlang.html#monotonic_time-0
       | function.
       | 
       | https://www.erlang.org/doc/apps/erts/time_correction.html#st...
       | 
       | > In a strictly monotonically increasing sequence of values, all
       | values that have a predecessor are larger than its predecessor.
       | 
       | Strictly monotonic values imply some synchronization /
       | coordination, with an associated performance impact when there
       | are many concurrent processes. This functionality is available
       | via the
       | https://www.erlang.org/doc/man/erlang.html#unique_integer-1
       | function. With a warning associated with it:
       | 
       | > Strictly monotonically increasing values are inherently quite
       | expensive to generate and scales poorly. This is because the
       | values need to be synchronized between CPU cores. That is, do not
       | pass the monotonic modifier unless you really need strictly
       | monotonically increasing values.
        
         | derefr wrote:
         | Relatedly, Erlang's own refs aren't created by a strictly-
         | monotonic global generator; but rather are internally a pair of
         | a regular monotonic identifier, and the PID of the requesting
         | process. In other words, they're akin to UUIDv1s, or to
         | https://en.wikipedia.org/wiki/Snowflake_ID s.
         | 
         | You only really _need_ strictly monotonic global identifiers if
         | you need immediately-consistent first /last-write-wins. If you
         | can instead rely on eventually-consistent first/last-write-wins
         | (i.e. if your write events enter an event store/queue that
         | linearizes them by ID, and then all "simultaneous" writes but
         | the one with highest ID priority can be dropped/ignored, either
         | during processing, or on read), then I'd recommend first
         | considering packed (nodeID, seq) pairs. And, if you want global
         | event orderibility, considering the snowflake ID formulation
         | (timestampMajor, nodeID, timestampMinor, seq) specifically.
        
       | datenwolf wrote:
       | In other news: Water is wet.
       | 
       | The high resolution precision time counters are derived from the
       | system base clock, usually operating at ~33MHz, which translates
       | exactly into that 30ns granularity observed.
       | 
       | If you really want robust time derived timestamp identifiers,
       | truncate the high resolution timer to at best 10us resolution
       | replace the low bits with the hash of them, concatenated with the
       | value of the CPU cycle counter (`RDTSC` on x86).
        
         | hinkley wrote:
         | The HRT patch set for Linux (which is eight years old) claims
         | that Linux had a 10ms clock resolution before the patch, and
         | conversations on SO suggest the resolution is now 1ns, so I
         | believe this information is dated, or at least OS dependent.
        
         | usefulcat wrote:
         | I thought clock_gettime() usually does use rdtsc(p) on Linux?
         | Possibly depending on the particular clock type (montonic,
         | realtime, etc). Either way I'd be interested in knowing more.
        
           | datenwolf wrote:
           | RDTSC is directly influenced by frequency scaling. So while
           | monotonic, its clock interval is neither constant, nor
           | deterministic on modern systems.
           | 
           | Here's a small online visualization of RDTSC average and
           | standard deviation I just hacked together: https://gist.githu
           | b.com/datenwolf/151486f6d73c9b25ac701bdbde...
           | 
           | On a system with frequency scaling you can see that under
           | higher load the difference between RDTSC in subsequent
           | iterations of a tight loop that does nothing else than
           | reading that register will drop. Here's how it looks on the
           | system I'm currently using:
           | https://www.youtube.com/watch?v=FKKjSJ1JZ78
        
             | vient wrote:
             | > RDTSC is directly influenced by frequency scaling
             | 
             | Unfortunate wording, RDTSC itself is not influenced by
             | frequency scaling, it has constant frequency on modern CPUs
             | after all. Your video nicely shows that RDTSC delta is
             | influenced by CPU frequency, as expected, but how does it
             | affect using RDTSC as a clock? On my CPU RDTSC seems to
             | tick at 3GHz, for example. I wonder how precise it is
             | though, how much its frequency can drift from spec.
        
               | datenwolf wrote:
               | I did update my program, now it measures the ratio.
        
               | datenwolf wrote:
               | Ah crud, you're right. What'd really be interesting to
               | see is the ratio between d/dt rdtsc and d/dt
               | clock_monotonic.
        
         | amelius wrote:
         | > replace the low bits with the hash of them
         | 
         | Doesn't this hash-step only increase the probability of
         | collisions?
         | 
         | What is this step intended to solve?
        
           | jjk166 wrote:
           | > replace the low bits with the hash of them, concatenated
           | with the value of the CPU cycle counter (`RDTSC` on x86)
           | 
           | you're concatenating two values and then taking the hash of
           | the combination, ie:
           | 
           | hash(low bits + CPU cycle counter)
        
             | dheera wrote:
             | Why?                   low bits + CPU cycle counter
             | 
             | is enough. No need of the hash()
        
               | datenwolf wrote:
               | You don't know precisely at which frequency the cycle
               | counter runs. Depending on the system load it might
               | either run faster or slower than the lowest bits the
               | HPTC. For what it's worth this part is more or less
               | nondeterministic, so the sane thing to do, is spread out
               | the information as much as possible (maximize entropy),
               | in order to minimize the probability of collisions.
        
               | dheera wrote:
               | That's ok, the bits past the low bits are just there to
               | avoid collisions, not an actual measure of high precision
               | time beyond the low bits.
               | 
               | It's not worse than the hash solution, I'm just saying
               | it's not necessary to hash it if the only objective is to
               | reduce collisions.
               | 
               | In fact the hashing solution, if it is replacing the low
               | bits with a hash of low bits plus something else, is
               | actually destroying valuable time information.
        
               | datenwolf wrote:
               | That only works, if you know exactly, that the low bits
               | are constant. Otherwise you may run into the issue that
               | due to unsteady rate of RDTSC between two
               | processes/threads that may be preemptively unscheduled
               | between reading the HPTC and the RDTSC you might again
               | end up with colliding time stamps. And if you took the
               | derivative between timestamps taken in succession, you
               | might even find is to be non-monotonic in places.
               | 
               | The combination of multiple counters incremented by
               | individual unsteady clocks used to be a source for pseudo
               | random scrambler sequences; these days we prefer LFSRs,
               | but overall this is something that can be weird.
               | 
               | Hence my recommendation: Just throw xxHash32 on
               | concatenation of the HPTC's low bits and the CPU clock
               | cycle counter, and forgo any pretense of monotony in the
               | low bits (because very likely you don't have it anyway).
        
             | amelius wrote:
             | But a hash() destroys information.
             | 
             | When you are trying to reduce collisions, why would you use
             | a function that is known to introduce collisions?
        
               | datenwolf wrote:
               | Because the raw value from the timestamp will be very low
               | entropy and have the short scale variation concentrated
               | in just a few bits. A hash not just destroys information,
               | it also creates entropy by mixing that information over
               | all the bits that come out of it. And using 64 bit hash
               | replacing a 64 bit nanosecond counter that has short term
               | entropy of less than 16 bits, you're in fact reducing the
               | likelihood of a collision by a factor of 2^48.
        
               | marcosdumay wrote:
               | > it also creates entropy
               | 
               | It's a nitpick, but it concentrates the entropy. It
               | doesn't create any.
               | 
               | I do think it will make the answer more clear, as the
               | hash concentrates the less than 64 bits of entropy on
               | that 128 bits of original data into a usable 64 bits
               | package.
        
               | datenwolf wrote:
               | Actually hashes do create entropy (every computation
               | creates entropy in some form or another). What's the
               | entropy of a 4 bit number? What's the entropy of a 4 bit
               | number hashed by a 64 bit hash function? The act of
               | computation does in fact create entropy, as per the 2nd
               | law of thermodynamics, a part of which shows up in the
               | hash.
        
               | amelius wrote:
               | > every computation creates entropy in some form or
               | another
               | 
               | Ok, what is the entropy created by this function that
               | maps a 4-bit number to a 64 bit number:
               | 0 -> 0         1 -> 1         2 -> 1         3 -> 1
               | 4 -> 1         ...         15 -> 1
        
               | datenwolf wrote:
               | 60 bits. Yes, I know, you can compress it down very well.
               | But consider that entropy in computation involves not
               | just the bits you store, but also the bits that the
               | processor touches and eventually dissipates as heat into
               | the universe.
        
               | amelius wrote:
               | What definition of entropy do you use?
               | 
               | (I'm using Shannon entropy.)
        
               | pxx wrote:
               | I don't think you understand what this conversation is
               | about. We are considering information theoretic entropy,
               | not thermodynamic entropy from the mechanism of
               | computation itself.
               | 
               | The result of applying a deterministic function on a
               | random variable cannot have more entropy than the
               | underlying random variable. This is a theorem, one that
               | is trivial enough to not have a name. But you can find
               | solution sets to homework that will prove it for you: htt
               | ps://my.ece.utah.edu/~rchen/courses/homework1_sol_rr.pdf
        
               | zamadatix wrote:
               | The hash, in this case, is just one deterministic way to
               | shorten the CPU counter to few bits which can then be
               | used to increase the entropy of the timestamp by
               | replacing the timestamps stale bits. What's being asked
               | here is not why use some compressed bits of the CPU
               | counter increases the entropy of the timestamp overall.
               | Rather, why you'd use a hash of the entropy providing
               | information to do this since hashes allow for many:1
               | mappings (i.e. allow for decreases in entropy) while
               | never providing better than 1:1 mappings (i.e. preserving
               | entropy). At best, the hash is preserving the entropy of
               | the timestamp and counter bits and, at worst, it is
               | throwing some away. The question that follows: is there a
               | better way that preserves the entropy of the inputs
               | without the risk of throwing some away and, if so, was
               | there some other reason to still use the hash? This is
               | what I took amelius to be asking.
               | 
               | Knowing the timestamp delta is ~30ns then even a 1 THz
               | processor would only execute 30,000 cycles before said
               | timestamp increases and the outcome stays unique anyways.
               | From that perspective, taking the low 16 bits of the
               | cycle counter is already guaranteed to help produce
               | perfectly unique timestamps, and without the risk of
               | introducing hash collisions. It's also significantly
               | easier to compute and now monotonic* now, whereas hashes
               | were neither. From that perspective, I too don't see what
               | value the hash is supposed to add to the information
               | being fed to it in this case.
               | 
               | In threaded/distributed/parallel scenarios you may wish
               | to throw the lane/core/node number in the bits as well.
               | In the case having the same timestamp is supposed to be
               | invalid this leaves you a simple deterministic way to
               | tiebreak the situation as part of the creation of the
               | timestamp itself.
               | 
               | *A 10 GHz CPU running for a little over 58 years could
               | risk a 64 bit counter rollover in the middle of a time
               | delta. If that's too close for comfort or you want the
               | code to work on e.g. 32 bit counter systems, you'd need
               | to eat more cycles and registers to set another bit to
               | the outcome of whether current cycle count is < the one
               | at the start of the current timestamp delta.
        
               | sfink wrote:
               | Yeah, when I was thinking about the hash, I thought of it
               | as stuffing to fill the unused portion of the number that
               | would look better than using zeroes.
               | 
               | TIMESTAMP010111101010101TSC "looks better" than
               | TIMESTAMP000000000000000TSC even though they contain the
               | same information.
               | 
               | I would drop the hash, it's deceptive.
               | 
               | I don't believe it breaks the monotonicity, though? I
               | mean, it would if it weren't already broken. If you're
               | taking the low 16 bits of the TSC, then a rollover in
               | those 16 bits during the same timestamp will already go
               | backwards. TIMESTAMP0...0 follows TIMESTAMPf...f.
        
         | byteknight wrote:
         | Wouldn't those operations reduce the accuracy of the time
         | stamp?
        
           | crote wrote:
           | Modern CPUs don't really give you accurate nanosecond-scale
           | time stamps anyways. The CPU will randomly speed up or slow
           | down, execute instructions out of order, and even
           | speculatively execute instructions. Not to mention that it'll
           | have a dozen different clocks - which are not guaranteed to
           | be in sync.
        
             | blibble wrote:
             | > The CPU will randomly speed up or slow down
             | 
             | constant_tsc has been a thing for more than a decade
             | 
             | > execute instructions out of order, and even speculatively
             | execute
             | 
             | rdtscp serialises
             | 
             | > Not to mention that it'll have a dozen different clocks -
             | which are not guaranteed to be in sync.
             | 
             | synchronised_tsc has been a think for about 6 years now
        
               | lazide wrote:
               | None of these are performant, no?
               | 
               | Generally, you can have consistency, speed, or low cost -
               | but not more than two at the same time.
        
               | kprotty wrote:
               | Invariant (Constant) TSC is detectable via `cpuid` and
               | applies to `rdtsc/rdtscp` by default. In that aspect,
               | there's no tradeoff being made there (observable to
               | software) AFAICK.
        
               | lazide wrote:
               | interesting results! i guess it depends in if you
               | consider 35ish cycles expensive or not.
               | 
               | [https://community.intel.com/t5/Software-Tuning-
               | Performance/H...]
        
             | byteknight wrote:
             | This may be coming from a place of ignorance, but if that
             | were the case, then time would drift significantly
             | constantly due to Intel speed step for example. And if that
             | were the source of truth for time, then when your computer
             | is off, it wouldn't be able to keep. I'm pretty sure they
             | all have real time clock chips in the motherboards.
        
               | nomel wrote:
               | There's time keeping (which uses the real time clock) and
               | high resolution clocks, which are good for relative
               | timers. They're never the same components.
        
           | datenwolf wrote:
           | What is this "accuracy" you're talking about? Between the
           | scheduler hanging over a process's head, ready to yank away
           | its compute time for milliseconds between the syscall to read
           | out the HPTC or the CPU cycle counter, and the process
           | actually writing the timestamp into a variable/in-memory
           | struct, reading the HPTC from the underlying hardware also
           | not being super accurate, and the CPU cycle counter being
           | influence by frequency scaling, on a bog-standard machine the
           | highest precision you can reasonable expect is on the order
           | of 100us or so.
        
         | imglorp wrote:
         | Integrating a hash might improve (not guarantee) the uniqueness
         | situation but not the monotonicity situation. Right?
        
           | datenwolf wrote:
           | Well, you can always attempt to catch the CPU cycle counter
           | overflow (happens at roughly 10Hz on current machines),
           | adding up the carries and add it to a nanosecond counter
           | shifted up by a few bits. Problem with the CPU cycle counter
           | is, that it's not in lockstep with the HPTC, due to dynamic
           | frequency scaling.
           | 
           | If you really, really, really need system wide, nanosecond
           | precision timestamps, you'll have to resort to dedicated
           | hardware incrementing a counter at the desired rate, and with
           | a well known latency for each read access. On the hardware
           | and driver level you'd have some MMIO port mapped into user
           | space with an identity transform between bus address and
           | process address space.
           | 
           | However this still doesn't solve the problem, of the
           | scheduler being able to throw in several milliseconds between
           | reading the high precision timer value into a register and
           | writing it back into the memory holding the timestamp
           | variable. Seriously, on your typical computer system software
           | derived timestamps at more than 100us resolution are kind of
           | bogus.
        
             | jsolson wrote:
             | Some of this is not current.
             | 
             | Constant-rate TSCs are ~15-20 years old. Synchronized TSCs
             | are at least ten.
             | 
             | Also RDTSC produces a 64-bit result, so it does not
             | overflow at a rate of several Hz.
        
               | datenwolf wrote:
               | It's 64 bit on 64 systems. In the world of hard realtime
               | applications there's still a huge armada of systems out
               | there in the field running on 32 bit (think motion
               | control, CNC machines, industrial robotics). If you're
               | developing software that's concerned with this kind of
               | precision, you might find yourself confronted with such
               | "outdated" systems more often, than not.
               | 
               | Also see https://news.ycombinator.com/item?id=36814762
        
         | BiteCode_dev wrote:
         | Or UUID7
        
         | dan-robertson wrote:
         | I don't think this is right. I am pretty doubtful of the
         | discussion of the granularity of the results of clock_gettime,
         | but I failed to find any sources and I don't have a suitable
         | computer to hand to experiment.
         | 
         | But here are two more doubts:
         | 
         | 1. On at least some systems, clock_gettime will be implemented
         | by a vdso call that (on x86) uses rdtsc to give you the time,
         | so you should expect its results to be pretty highly correlated
         | with an rdtsc immediately afterwards, so you aren't really
         | adding useful entropy (whereas you are losing time-locality of
         | your identifiers which may be desirable if they are to end up
         | as primary keys somewhere)
         | 
         | 2. On some systems (eg some AMD processors), rdtsc gave me
         | pretty low precision results (eg rounding to the nearest 10
         | cycles) so you also won't necessarily get the entropy you would
         | expect from the description of the instruction. I failed to
         | find a reference for this too apart from an offhand mention in
         | a paper about timing attacks.
        
       | godelski wrote:
       | Despite the resolution being nanoseconds, what is the actual
       | precision of computer clocks? I can't imagine it is actually
       | nanoseconds. Takes me back to teaching physics labs where I had
       | to hound students to remember that the accuracy of their
       | measuring device is not identical to the smallest number it
       | displays...
        
         | Rapzid wrote:
         | > I can't imagine it is actually nanoseconds
         | 
         | It's nanoseconds.
        
           | godelski wrote:
           | 1? 10? 100? Those are all nanoseconds. When someone is asking
           | about precision it tends to be a good thing to be precise in
           | your answer.
        
             | wewtyflakes wrote:
             | You were similarly not precise in your skepticism. ;)
        
         | ajb wrote:
         | For devices clocked above 1Ghz it's perfectly possible for the
         | clock to increment every ns, although that doesn't make it
         | accurate to that level, and multi core systems may have clocks
         | that are not synchronised to that level.
         | 
         | ARMv8 guarantees that it's clock increments at at least 1Ghz,
         | for intel and earlier ARM it's more complicated
        
           | zokier wrote:
           | Cycle counting is one thing, but it gets tricky when you have
           | frequency scaling in play. Another problem that even without
           | freq scaling, cpu clocks are not designed to be super
           | accurate/exact, and the true frequency might vary
           | significantly even if the nominal freq is fixed
        
             | daniel-cussen wrote:
             | It doesn't matter that much they're not super accurate
             | exact, they do in fact count ticks n tocks j dandy. I
             | wonder if they are counted equally, interesting question if
             | i do say so myself about my own question, i know rising
             | edge is steady w the next rising edge, n falling edge w
             | falling edge, but yeah...no i think that's a specification
             | figure of merit, that they be balanced w respect to each
             | other. N Intel(r) despite it's many failures, long ago
             | forecast n long overdue due to the sheer difficulty of
             | their business model n how long they kept it going--they
             | were saying it was going to end soon in the late seventies
             | already--n you know what, i'm fine w that. They don't make
             | old chips rot like other software companies i could
             | mention, bits rot but glass is timeless. Which brings me
             | back to the point, in my analysis the problem is not the
             | clocks being inaccurate but rather the jitter, which means
             | a single run will not suffice in describing a repeatable
             | exact clock time taken for eg an inner loop, which is what
             | is worth bothering with. The minimum jitter attainable
             | currently is 1 cycle, n then i guess you run the same code
             | repeatedly n take the minimum with more repeatability as a
             | consequence of that low jitter.
             | 
             | In the early nineties it was not so, you'd get the same
             | number of clock cycles again n again n again.
             | 
             | N then it gets tricky because cycle counts are thresholds
             | set within which, if voltages n frequency are proper, the
             | operation will complete deterministically w a soft error
             | rate no greater than the system as a whole, about one per
             | 30 years of hammering on the chip at sea level. Which is
             | not enough for my tastes, n the jitter is a fucking mess.
             | 
             | I much prefer the GA144, least jitter of any platform bar
             | none, sensible because it is fully asynchronous logic, no
             | clock anywhere in sight until you connect it to an
             | oscillator, n even then the vibration doesn't rock the
             | system like that of a grandfather clock synchs w another
             | grandfather clock with whose pendulum's swing the former
             | clock's pendulum swing is aligned. GA144 it's pretty easy
             | to tell average case complexity of a function down to the
             | tens of picoseconds, at which point you have to check that
             | there's no open window letting a draft in. In fact the time
             | trial will tell you such a draft is coming into the house,
             | it happened to me, while not from a parallel universe in
             | spacial dimensions it is by all means from another universe
             | in the time dimension.
        
       | ChuckMcM wrote:
       | Let me tell you about the time the performance guy figured out
       | one of the big cycle eaters was that every CPU in the SMP system
       | was trying to read the time of day from the same place in memory
       | so they "fixed" that by giving every CPU its only copy of the
       | current time ...
        
       | techNoob123 wrote:
       | At some point doesn't this come down to the ISA? A CPU running at
       | 3GHz gets 3 clock cycles per nanosecond.
       | 
       | I bet there is a fair amount of optimization in the compiler that
       | leads to back to back assembly calls of reading the clock
       | register. If subsequent time.Now() calls happen within 3 clock
       | cycles of each other, can you really fairly expect unique
       | nanosecond precision...
        
         | rwmj wrote:
         | Linux will (x86_64) use RDTSC and adjust that against a value
         | read from the VDSO, so it can indeed happen very quickly.
        
       | zokier wrote:
       | I guess I'm old, the macOS behaviour is more in line with my
       | expectations.
       | 
       | But this got me thinking, how feasible it would be to tie the
       | various clock systems of a computer to some reference clock, like
       | 10 MHz GPSDO? Obviously it wouldn't improve the granularity, but
       | you could ensure that the timestamps are actually accurate.
       | Because otherwise I doubt that random computer clock would be
       | accurate down to 32ns even with NTP.
        
         | mgaunard wrote:
         | Getting a 10MHz PPS signal requires specialized expensive
         | hardware and typically doesn't scale well to cover every
         | server. That sort of thing is best left to extreme applications
         | with FPGAs or ASICs. In particular the 10MHz version; a lot of
         | commodity hardware only supports the 1Hz one.
         | 
         | There's still an in-between of PPS and NTP: PTP.
        
           | mlichvar wrote:
           | A major problem in synchronization of the system clock is
           | PCIe. Hardware can timestamp PPS signal or PTP/NTP packets
           | with accuracy of a few nanoseconds if everything is well
           | compensated, but the PCIe bus between the CPU and the
           | timestamping HW has a latency of hundreds of nanoseconds with
           | potentially large asymmetry, degrading the accuracy
           | significantly.
           | 
           | Measuring that error is difficult. A possibility is to run a
           | program periodically making random reads from memory
           | (avoiding the CPU cache) to generate a PPS signal on the
           | memory bus, which can be observed with a scope. There is a
           | lot of noise due to other memory activity, RAM refresh, etc.
           | From the configured memory speed and tRCD+tCL timings the
           | uncertainty of the error can be reduced.
           | 
           | This might improve with new hardware. There is a feature
           | called Precision Time Measurement (PTM), which is a hardware
           | implementation of an NTP-like protocol in PCIe, but so far I
           | have seen this working only on some onboard NICs.
        
       | quickthrower2 wrote:
       | Another case of precision but not accuracy.
        
       | dnstalk wrote:
       | Comment out CLOCK_MONOTONIC_RAW because that's not available on
       | FreeBSD, and it looks OK to me? My understanding is that we
       | should see some timestamps repeating if there's a collision,
       | correct? I can't get any collisions...                 >
       | ./clock_gettime_demo/clock_gettime_demo
       | clock_getres(CLOCK_REALTIME, ...)=1 ns
       | clock_getres(CLOCK_MONOTONIC, ...)=1 ns            CLOCK_REALTIME
       | 30 samples:       1689957434662039455       1689957434662039526
       | (diff=71)       1689957434662039566 (diff=40)
       | 1689957434662039606 (diff=40)       1689957434662039636 (diff=30)
       | 1689957434662039676 (diff=40)       1689957434662039706 (diff=30)
       | 1689957434662039736 (diff=30)       1689957434662039766 (diff=30)
       | 1689957434662039796 (diff=30)       1689957434662039826 (diff=30)
       | 1689957434662039856 (diff=30)       1689957434662039886 (diff=30)
       | 1689957434662039916 (diff=30)       1689957434662039946 (diff=30)
       | 1689957434662039987 (diff=41)       1689957434662040016 (diff=29)
       | 1689957434662040047 (diff=31)       1689957434662040076 (diff=29)
       | 1689957434662040107 (diff=31)       1689957434662040137 (diff=30)
       | 1689957434662040167 (diff=30)       1689957434662040197 (diff=30)
       | 1689957434662040227 (diff=30)       1689957434662040257 (diff=30)
       | 1689957434662040297 (diff=40)       1689957434662040327 (diff=30)
       | 1689957434662040357 (diff=30)       1689957434662040387 (diff=30)
       | 1689957434662040417 (diff=30)            CLOCK_MONOTONIC 30
       | samples:       2168262770533974       2168262770534023 (diff=49)
       | 2168262770534054 (diff=31)       2168262770534094 (diff=40)
       | 2168262770534124 (diff=30)       2168262770534164 (diff=40)
       | 2168262770534194 (diff=30)       2168262770534224 (diff=30)
       | 2168262770534254 (diff=30)       2168262770534284 (diff=30)
       | 2168262770534314 (diff=30)       2168262770534344 (diff=30)
       | 2168262770534374 (diff=30)       2168262770534404 (diff=30)
       | 2168262770534435 (diff=31)       2168262770534464 (diff=29)
       | 2168262770534495 (diff=31)       2168262770534524 (diff=29)
       | 2168262770534555 (diff=31)       2168262770534585 (diff=30)
       | 2168262770534615 (diff=30)       2168262770534645 (diff=30)
       | 2168262770534685 (diff=40)       2168262770534715 (diff=30)
       | 2168262770534745 (diff=30)       2168262770534775 (diff=30)
       | 2168262770534805 (diff=30)       2168262770534835 (diff=30)
       | 2168262770534865 (diff=30)       2168262770534895 (diff=30)
        
         | vient wrote:
         | Do you run it on 4 cores simultaneously as the author did?
        
           | dnstalk wrote:
           | I'm running it on real hardware (Ryzen 9 5900X, 12 cores 24
           | threads) and all I'm doing is executing the program once. My
           | understanding is that the program is using
           | runtime.GOMAXPROCS(0) to ensure it's using all available CPU
           | cores because that falls back to runtime.NumCPU ?
        
           | dnstalk wrote:
           | Oh, I was only running the C program not the Go binary which
           | orchestrates it, that's why I wasn't seeing the longer stats.
           | But I'm still getting no duplicate collisions.
           | running longer zeros test ...       sampled 10000000 pairs; 0
           | time diff zeros = 0.000000%; 0 nano diff zeros = 0.000000%
           | starting parallel test 24 goroutines x 10000000 samples ...
           | 10000000 samples from a thread; 0 collisions inside the
           | thread; 0 collisions with other threads       10000000
           | samples from a thread; 0 collisions inside the thread; 945466
           | collisions with other threads       10000000 samples from a
           | thread; 0 collisions inside the thread; 1982688 collisions
           | with other threads       10000000 samples from a thread; 0
           | collisions inside the thread; 2739919 collisions with other
           | threads       10000000 samples from a thread; 0 collisions
           | inside the thread; 3334361 collisions with other threads
           | 10000000 samples from a thread; 0 collisions inside the
           | thread; 4109772 collisions with other threads       10000000
           | samples from a thread; 0 collisions inside the thread;
           | 4489881 collisions with other threads       10000000 samples
           | from a thread; 0 collisions inside the thread; 5157178
           | collisions with other threads       10000000 samples from a
           | thread; 0 collisions inside the thread; 5596508 collisions
           | with other threads       10000000 samples from a thread; 0
           | collisions inside the thread; 5854763 collisions with other
           | threads       10000000 samples from a thread; 0 collisions
           | inside the thread; 5937583 collisions with other threads
           | 10000000 samples from a thread; 0 collisions inside the
           | thread; 6434076 collisions with other threads       10000000
           | samples from a thread; 0 collisions inside the thread;
           | 6521917 collisions with other threads       10000000 samples
           | from a thread; 0 collisions inside the thread; 6932626
           | collisions with other threads       10000000 samples from a
           | thread; 0 collisions inside the thread; 7104428 collisions
           | with other threads       10000000 samples from a thread; 0
           | collisions inside the thread; 7285076 collisions with other
           | threads       10000000 samples from a thread; 0 collisions
           | inside the thread; 7514904 collisions with other threads
           | 10000000 samples from a thread; 0 collisions inside the
           | thread; 7833317 collisions with other threads       10000000
           | samples from a thread; 0 collisions inside the thread;
           | 7737265 collisions with other threads       10000000 samples
           | from a thread; 0 collisions inside the thread; 7723545
           | collisions with other threads       10000000 samples from a
           | thread; 0 collisions inside the thread; 7730441 collisions
           | with other threads       10000000 samples from a thread; 0
           | collisions inside the thread; 8311161 collisions with other
           | threads       10000000 samples from a thread; 0 collisions
           | inside the thread; 7965117 collisions with other threads
           | 10000000 samples from a thread; 0 collisions inside the
           | thread; 8460173 collisions with other threads       102297835
           | final samples; 137702165 total collisions = 57.375902%;
           | possible duplicate collisions? 0
        
       | qwerty456127 wrote:
       | > it is unsafe to assume that a raw nanosecond timestamp is a
       | unique identifier
       | 
       | It is reasonable to assume that a raw nanosecond timestamp is a
       | unique identifier in cases when you can reasonably expect such
       | identifiers are not generated too offten.
       | 
       | I.e. this is Ok (and even redundant) for identifying manual
       | interactions a single user would pdosuce. It probably is also Ok
       | for labelling manual interactions multiple users would produce if
       | that's a small team.
        
         | aaron695 wrote:
         | [dead]
        
       | LAC-Tech wrote:
       | A lot of mention of UUDv7 in this thread which is good. But I
       | also wonder what the collision rate for Ulids are.
        
         | 3cats-in-a-coat wrote:
         | I frankly don't understand how it's good. UUID originally was
         | intended as something you use very sparingly, to name, say, a
         | product SKU maybe, an organization, something like that. Not
         | literally content that collides commonly at the same
         | nanosecond, in the same application, in the same platform/org.
         | 
         | At some point we have to question the sanity of using one
         | single flat address space for everything from the tiniest
         | identifier to the... well, "Universe", as if it makes sense.
         | 
         | We can have registrars for global ids, and we can nest local
         | ids inside them, we can have a hierarchy, and we can have local
         | compact ids, 32, 64 or 128 bit, which will never collide, and
         | be locally cohesive.
         | 
         | So why aren't we doing this? Is it ignorance? Is it because
         | magic is easier than actually figuring out the synchronization
         | and order of things in a system like this (and no,
         | synchronization does NOT imply you need to issue ids strictly
         | linearly).
         | 
         | Honestly I'm at a loss of words.
        
           | creatonez wrote:
           | > I frankly don't understand how it's good. UUID originally
           | was intended as something you use very sparingly, to name,
           | say, a product SKU maybe, an organization, something like
           | that. Not literally content that collides commonly at the
           | same nanosecond, in the same application, in the same
           | platform/org.
           | 
           | Wikipedia gives a different history, that it was originally
           | for networked computers - https://en.wikipedia.org/wiki/Unive
           | rsally_unique_identifier#...
        
           | RicardoLuis0 wrote:
           | > So why aren't we doing this?
           | 
           | isn't IPv6 is basically that? just restricted to internet
           | addresses
        
             | 3cats-in-a-coat wrote:
             | It's sort of this. Although it would've been nice if the
             | size of the IP wasn't restricted, so one day we can add an
             | optional segment on top and connect the whole Milky Way, or
             | something.
        
               | fer wrote:
               | Fun fact: a ping from the center to the extreme of the
               | milky way would overflow for 64 bit millisecond
               | timestamps.
        
           | pravus wrote:
           | > We can have registrars for global ids, and we can nest
           | local ids inside them, we can have a hierarchy, and we can
           | have local compact ids, 32, 64 or 128 bit, which will never
           | collide, and be locally cohesive.
           | 
           | We have this. It's called OID (Object Identifier) and is used
           | in X.509, LDAP, SNMP, and many other technologies. A global
           | registry exists (I have an entry) and it's a giant tree.
           | 
           | > So why aren't we doing this? Is it ignorance?
           | 
           | The problem you are solving here is for durable, long-lasting
           | keys. There is also a need to generate large batches of
           | short-lived keys that need to never collide for
           | security/identification purposes. A centralized registry will
           | not work for that and it requires a wholly different
           | technique.
        
             | 3cats-in-a-coat wrote:
             | My point was, let's start with that large OID tree, for
             | example.
             | 
             | And continue this concept downward. Except when it's inside
             | your org, you're the registrar of your namespace's sub-
             | OIDs, and so on. And there's precisely no reason not to
             | have hierarchical sequential ids for everything. You need
             | to generate ids on 100 servers? Good, give each server a
             | namespace. You need to do that in a 100 processes on each
             | server? Good, give a sub-namespace to those too.
             | 
             | And the best of all is that the above-organization OID part
             | of the tree is ONLY needed if you show these ids outside
             | your organization. Otherwise you can use the internal
             | private part of the identifiers.
             | 
             | So what am I missing here? Maybe they need to be hard to
             | guess, so add a random key part to them (distinct from the
             | generated sequence part) as a "password". Done.
        
               | pravus wrote:
               | I'll also have the need to make namespaces dynamic and
               | centrally coordinate their lifecycle along with the rest
               | of my infrastructure as I stand up anything that needs
               | some sort of coordinated ID. So I've just moved this
               | issue to an even larger scope with even more complexity.
               | 
               | What do I gain for this? UUIDs solve all of this because
               | the chance of creating a duplicate is so low it can
               | effectively be ignored. I can namespace UUIDS and create
               | chains of them if needed.
               | 
               | This is the reason both exist. We need both. We can use
               | both.
        
               | 3cats-in-a-coat wrote:
               | When you get used to this basic principle of when you
               | create something, the creator slaps a name on it, it
               | stops being considered a hassle or confusing.
               | 
               | More specifically:
               | 
               | 1. When you create something, the creator names it.
               | 
               | 2. When you bring something you created outside your
               | scope, you prepend your name to it.
               | 
               | It's kind of what we do with (actual) children if you
               | think, slightly less structured, but the idea was always
               | there. Just add recursion to complete the principle.
        
               | pravus wrote:
               | I generate children faster than the birth registry can
               | issue certificates and Cronus eats them before it could
               | be issued anyway. Nesting namespaces doesn't solve the
               | problem of scale within a single namespace.
        
           | newaccount74 wrote:
           | The problem with hierarchic identifiers is that you need to
           | be extremely dilligent when assigning them, and you make it
           | extremely hard to refactor something in the future. But
           | refactoring is something that happens all the time.
           | 
           | For example, you might have "comments" and "posts", and later
           | merge the two entities and call them "notes". Now you have to
           | figure out a way to merge the identifiers -- with UUIDs you
           | don't have issues like this, because they are universally
           | unique.
           | 
           | A lot of developers have found that UUIDv7 provides the
           | properties they want (unique identifiers which sort by
           | creation time), and they don't come with the hassle that
           | other approaches come with.
        
           | aljarry wrote:
           | What's wrong with using UUIDs for things that are commonly
           | used every nanosecond? It's still improbable to get 2 UUIDs
           | for identifier for the same "thing" in the system, even if
           | you generate billions of them per second. It's a pretty good
           | way to get non-colliding IDs without a central registry.
           | That's the main feature.
           | 
           | I've never seen duplicated UUID generated, and if that's the
           | issue, you may think about using cryptographic randomness
           | source to generate it.
           | 
           | UUID v7 has millisecond precision + 72 bits of randomness,
           | which is _a lot_.
           | 
           | Also, central registry for IDs seems like a great way to
           | shoot yourself in a foot, in case it's down, your network is
           | down, you're on cellular connection, on a plane...
        
         | newaccount74 wrote:
         | ULIDs are almost the same as UUIDv7, except that they have a
         | few extra random bits that UUIDv7 use to encode the version of
         | UUID. Every UUIDv7 is a valid ULID, and the main difference is
         | the encoding when displayed in a textual form.
        
       | bArray wrote:
       | > I was wondering: how often do nanosecond timestamps collide on
       | modern systems? The answer is: very often, like 5% of all
       | samples, when reading the clock on all 4 physical cores at the
       | same time.
       | 
       | A few things to consider:
       | 
       | 1. This would depend a lot on how regularly you are checking, the
       | more regular, the more collisions.
       | 
       | 2. There may also be some weirdness where threads doing similar
       | tasks will synchronise or desynchronise to maximize throughput.
       | 
       | 3. Your system may not offer accurate nanosecond time. For this
       | reason some OSes tend not to offer lower than microsecond time,
       | as the clocks simply cannot offer higher usable accuracy. You're
       | also measuring the time to call the function(s), create the
       | timestamp, etc.
       | 
       | A simple solution I had years ago was to add random precision to
       | reduce collisions. That breaks most tie situations. If you have
       | 5% ties in nano seconds, with random pico second accuracy your
       | collisions are down to 0.005%. You could also seed the random
       | offset by an ID.
        
       | p-e-w wrote:
       | If you want unique identifiers, use version 4 (random) UUIDs.
       | Problem solved.
       | 
       | The probability of a collision is roughly the same as the
       | probability of a fully grown dinosaur spontaneously manifesting
       | in your bedroom due to quantum fluctuations.
        
         | beefield wrote:
         | Knowing nothing about UUID v4 generation, I have likely a
         | stupid question. What makes you so confident that all
         | implementations and their entropy sources are flawless enough
         | to make actual collision probability close enough to theory?
        
           | Thiez wrote:
           | What makes us so confident that our database implements ACID
           | correctly, the RAM stores bins correctly, and the disk
           | drivers store the data correctly?
           | 
           | In the end we have to make _some_ assumptions about the
           | correctness of (some of the) components.
        
             | tjoff wrote:
             | We are not confident in any of that. Which is why we
             | mitigate it with checksums, ECC etc.
             | 
             | So depending on the consequences you might opt to reduce
             | the risk or help mitigate the consequences.
             | 
             | To just state that it is about as likely as my coffee maker
             | being a portal to the future isn't very helpful. Poor
             | entropy sources or bugs are not uncommon.
        
         | littlestymaar wrote:
         | > The probability of a collision is roughly the same as the
         | probability of a fully grown dinosaur spontaneously manifesting
         | in your bedroom due to quantum fluctuations.
         | 
         | I'd love to see the math for the probability of the second
         | option.
        
         | amelius wrote:
         | > The probability of a collision is roughly the same as the
         | probability of a fully grown dinosaur spontaneously manifesting
         | in your bedroom due to quantum fluctuations.
         | 
         | But now the probability of bad things happening increased by
         | about a factor of two, which is not acceptable.
        
         | jraph wrote:
         | I will take my chance :-)
         | 
         | More seriously, If you can use them, good old increments are
         | probably best. They are fast and cheap. Especially in a
         | database. They can have privacy/security issues (you could
         | guess things by the values of ids of stuff). UUIDs are better
         | in those case or when you deal with a distributed system.
        
           | p-e-w wrote:
           | Unless the RNG itself is the problem, there's no reason not
           | to. All kinds of civilization-ending catastrophes are
           | _vastly_ more likely than a collision in a space of size
           | 2^122.
        
           | p-e-w wrote:
           | > They can have privacy/security issues (you could guess
           | things by the values of ids of stuff).
           | 
           | Push them through a secure hash function, and that problem is
           | solved too (assuming you can keep the base counter private).
        
             | cacheyourdreams wrote:
             | If you're going to do that then you might as well just use
             | UUID, since you effectively reintroduce the negative
             | aspects of that (infinitesimally miniscule chance of
             | collisions, computation involved in the calculation, etc.)
        
               | p-e-w wrote:
               | The difference is that you can still use sequential IDs
               | internally, while exposing hashed IDs to the outside.
               | This protects your database from collisions under all
               | circumstances, while in the absolute worst case, a single
               | user might experience bugs because two external IDs
               | collide.
        
               | asimpletune wrote:
               | If you're hashing for security reasons, I think you
               | should still maintain a salt.
        
               | cacheyourdreams wrote:
               | Yes, I tend to like this philosophy in database design,
               | of internal sequential ids which are used for joins
               | between tables etc. and an exposed "external reference".
               | But I typically would use a UUID for my external
               | reference rather than a hash of the internal id.
        
               | newaccount74 wrote:
               | Doesn't that just add a whole lot of unnecessary
               | complexity? If elements have multiple IDs, one of which
               | should not be leaked to the outside, that's just asking
               | for trouble in my opinion.
               | 
               | Is generating UUIDv4 or UUIDv7 really too much effort?
               | I'd assume that writing the row to the database takes
               | longer than generating the UUID.
        
               | iforgotpassword wrote:
               | It also means once your hash function leaks for whatever
               | reason or gets brute forced because of whatever weird
               | weakness in your system, it's game over and everybody
               | will forever be able to predict any future ids, guess
               | neighboring ids, etc., unless you're willing to change
               | the hash and invalidate all links to any content on your
               | site.
               | 
               | If I'm in a scenario where I think I need consecutive ids
               | internally and random ones externally, I'll just have two
               | fields in my tables.
        
               | jstanley wrote:
               | You need 2 fields anyway, unless you want to have to
               | brute force your hash function when you need to invert
               | it.
        
               | wongarsu wrote:
               | Store just the sequential id, compute the hash on the
               | edge.
               | 
               | This keeps your database simple and performant, and
               | pushes complexity and work to the backend servers. This
               | can be nice because developers are typically more at home
               | at that layer, and scaling the backend can be a lot
               | easier than scaling your database. But it also comes with
               | the downsides listed in this thread.
        
               | jstanley wrote:
               | That's fine, but when a request comes in referencing only
               | a hash and not an id (because you're not leaking ids to
               | clients), how do you get the id?
        
               | wongarsu wrote:
               | Good point. Back when we did that we just used a
               | reversible hash function (some would call it encryption).
               | There are some simple algorithms meant for encrypting
               | single integers with a reasonable key.
        
               | iforgotpassword wrote:
               | I might be misremembering, but didn't YouTube do this in
               | the early days? So yeah, that was what I was thinking of
               | when replying, not a traditional hash function.
        
               | stickfigure wrote:
               | This is a weird proposal. If you're using non-hashed IDs
               | internally and exposing hashed IDs externally, you are
               | going to need to map those (securely hashed) ids back to
               | internal ids when the client hands them to you.
               | 
               | I guess you could do this with complete table scans,
               | hashing the ids and looking for matches, but that would
               | be horribly inefficient. You could maintain your own
               | internal reverse index of hash -> id but now I have to
               | ask what's the point? You aren't saving any storage and
               | you're adding a lot of complexity.
               | 
               | Seems like if you want random unguessable external ids,
               | you're always better off just generating them and using
               | them as primary keys.
               | 
               | Also, you aren't protecting your database "from
               | collisions under all circumstances" - there's no
               | guarantee your hash won't collide even if the input is
               | small.
        
             | fluoridation wrote:
             | A block cipher is better, as it's guaranteed to not have
             | collisions.
        
             | marcosdumay wrote:
             | Hum, no. You can easily hash numbers from 1 up to the value
             | you see and guess the next value.
             | 
             | If you want a secure identifier, make a random 64 or 128
             | bits number (a UUID type 4). And do not use this number as
             | an internal identifier, because identifiers performance is
             | all about predictability and low entropy.
        
               | gnulinux wrote:
               | If you use a robust salt, such a randomly generated long
               | salt, attacker won't be able to guess the next hash.
        
         | enyone wrote:
         | the dinosaur is now on my bed, what next?
        
           | quickthrower2 wrote:
           | Enjoy the non heat-death of the universe
        
           | MattPalmer1086 wrote:
           | Close the door and immediately switch to v5.
        
           | lyu07282 wrote:
           | I would just call Randall Munroe, he will know what to do
        
         | fendy3002 wrote:
         | but is it higher or lower if the dinosaur is not fully grown?
        
         | dtech wrote:
         | v7 looks nicer since it solves v4's locality issue and you're
         | still a gazillion times more likely to win the lottery than
         | generate a collision.
        
           | p-e-w wrote:
           | Unfortunately, many libraries don't implement it yet.
        
             | gmhafiz wrote:
             | It is still in draft and I've watched the spec changed at
             | least once.
        
       | hinkley wrote:
       | I've met too many people who are surprised by millisecond or
       | microsecond timestamp collisions.
       | 
       | My most memorable and least favorite variety of this is when
       | people try to assemble a timestamp from two system calls, one for
       | the most significant digits and a second for the least.
       | 
       | If the smallest digits roll over from 99x to 00x after you read
       | the large digits, due to process preemption, you can create a
       | time stamp for an entity that happens before the entity that
       | caused it to exist. This breaks some code spectacularly (I've
       | seen infinite loops at least twice).
       | 
       | If you haven't memorized this as a thing to always avoid, you end
       | up with tests that pass 99.5% of the time, and it can take
       | someone with very good pattern matching skills to catch that the
       | same test has been red once a week for a month and a half, which
       | is a very long time for a logic bomb to live in CI/CD code before
       | getting fixed.
        
       | tsenart wrote:
       | Our Go ULID package has millisecond precision + monotonic random
       | bytes for disambiguation while preserving ordering within the
       | same millisecond. https://github.com/oklog/ulid
        
       | chaoticmass wrote:
       | Reminds me of some Lotus Notes folklore. Apparently they used to
       | use timestamps with a resolution of 1 second as unique IDs for
       | things. When there was a collision, they would just add 1 second.
       | Eventually, you'd end up with items being in the future because
       | there would be so many collisions.
        
       | donatj wrote:
       | Slacks API uses straight nanoseconds for the IDs of their
       | messages which I always found very curious but figured they must
       | have some sort of way on the backend of resolving collisions.
        
         | rocho wrote:
         | A message ID only needs to be unique per workspace, in which
         | case you'd expect very few collisions to begin with, and you
         | could even retry on insert failure with a new timestamp. I
         | don't think that would cause significant performance penalties.
        
       | scandum wrote:
       | I get the current time in microseconds, and increment by one
       | nanosecond for each call between updates.
       | 
       | Problem solved for the next 200 years.
        
         | ourmandave wrote:
         | Next article: _Picosecond timestamp collisions are common._
        
         | tourist2d wrote:
         | Problem solved until your boss calls you an idiot for storing
         | fake nanosecond precisions.
        
       | JacobSeated wrote:
       | Timestamps should probably never be used as a "unique" id.
        
         | 3cats-in-a-coat wrote:
         | The problem is achieving locality in cohesion/meaning, which
         | usually involves locality in time, which provokes using
         | timestamps as part of the id at the very least. But it's a
         | chain of very lazy thinking IMHO and it's like a peek at the
         | house of cards some large systems are built like.
        
           | Mawr wrote:
           | Would a Snowflake ID work for this?
           | https://en.wikipedia.org/wiki/Snowflake_ID
        
           | eru wrote:
           | Why do you want this kind of locality anyway?
        
             | traviscj wrote:
             | Database indices.
        
       ___________________________________________________________________
       (page generated 2023-07-21 23:01 UTC)