hngopher.com

       [HN Gopher] Show HN: High-precision date/time in SQLite
       ___________________________________________________________________
        
       Show HN: High-precision date/time in SQLite
        
       Author : nalgeon
       Score  : 226 points
       Date   : 2024-08-15 11:13 UTC (11 hours ago)
        
 (HTM) web link (antonz.org)
 (TXT) w3m dump (antonz.org)
        
       | davidhyde wrote:
       | I think it's important to be explicit about whether or not signed
       | integers are used. From reading the document it seems that they
       | may be signed but they could not be. If they are signed then you
       | could have multiple bit strings that represent the same date and
       | time which is not great.
        
         | jagged-chisel wrote:
         | Definitely signed - "use negative duration to subtract"
         | 
         | But bit pattern is an issue internal to the library. If you can
         | find a bug in the code, certainly point it out and offer a fix
         | if it's in your skillset.
        
           | sigseg1v wrote:
           | I think the negative number here refers to the amount of
           | days/etc to subtract (eg. add negative days to subtract, not
           | supply a negative date).
           | 
           | However, at the same time it seems to indicate that it stores
           | data using sqlites built in number type, which to my
           | understanding does not support unsigned? Secondly, the docs
           | mention you can store with a range of 290 years and the
           | precision is nanoseconds, which if you calculate it out works
           | out to about 63 bits of information, suggesting a signed
           | implementation.
        
             | tyingq wrote:
             | Yes, it's signed...https://www.sqlite.org/datatype3.html
             | Each value stored in an SQLite database (or manipulated by
             | the database engine) has one of the following storage
             | classes:       # some omitted...       INTEGER. The value
             | is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes
             | depending on the magnitude of the value.
        
           | gcr wrote:
           | Subtraction of unsigned negative values still works just fine
           | because of two's compliment.
           | 
           | (uint8)(-3) is 253, for example, and (uint8)5-(uint8)253 =
           | (uint8)8, corresponding to 5 - (-3)
        
         | kaoD wrote:
         | > multiple bit strings that represent the same date and time
         | 
         | How so?
        
           | davidhyde wrote:
           | You're right, whether or not the integers are signed has
           | nothing to do with the issue above. Unsigned integers have
           | the same issue.
           | 
           | Here is an example for signed integers.
           | 
           | These represent zero time but have different representations
           | in memory:
           | 
           | Seconds: 2 Nanoseconds: -2,000,000,000 (fits in a 32 bit
           | number) Time: zero seconds
           | 
           | Seconds: -2 Nanoseconds: 2,000,000,000 Time: zero seconds
           | 
           | Here is an example for unsigned: Seconds: 1 Nanoseconds: 0
           | Time: 1 second
           | 
           | Seconds: 0 Nanoseconds: 1,000,000,000 Time 1 second
        
             | kaoD wrote:
             | Thanks, but I'm not gonna pretend that was my point. Dumb
             | question from me, I just forgot the context that time was a
             | pair of integers and was utterly confused, haha. You're
             | spot on!
        
       | zokier wrote:
       | I just wish people would stop using the phrase "seconds since
       | epoch" (or equivalent) unless that is exactly what they mean.
       | 
       | I wonder what does                   select
       | time_sub(time_date(2011, 11, 19), time_date(1311, 11, 18));
       | 
       | return?
        
         | ralferoo wrote:
         | Why do you wish that?
         | 
         | I can think of a few plausible reasons, but the only one that
         | is really significant is "what epoch"? In the case of UNIX-
         | based systems and systems that try to mimic that behaviour,
         | that is well defined. But as you haven't said what your
         | complaints are, it's hard to provide any counterpoint or
         | justification for why things are as they are.
         | 
         | > time_date(1311, 11, 18)
         | 
         | That isn't defined in the epoch used by most computer systems,
         | so all bets are off. Perhaps it'll return MAX_INT, MIN_INT, 0,
         | something that's plausible but doesn't take into calendar
         | reforms that have no bearing on the epoch being used, or
         | perhaps it translates into a different epoch and calculates the
         | exact number of seconds, or anything else. One could even argue
         | that there are no valid epochs before GMT/UTC because it was
         | all just local time before then.
         | 
         | But of course, you can argue either way whether -ve values
         | should be supported. Exactly 24 hours before 1970-1-1 0:00:00
         | UTC could be reasonably expected to be -86400, on the other
         | hand "since" strongly implies positive only.
         | 
         | Other people might have entirely different epochs for different
         | reasons, again within the domain it's being used, that's fine
         | as long as everyone agrees.
         | 
         | Or did you have some other objection?
        
           | zokier wrote:
           | The problem with "seconds since epoch" expression is that
           | almost always it doesn't mean literally seconds since epoch,
           | but instead some unix-style monstrosity. And it's annoying
           | that you need to read some footnote to figure out what
           | exactly it means; it's annoying that it is basically a code-
           | phrase that you just need to know that it's not supposed to
           | be taken literally.
        
             | ralferoo wrote:
             | > it doesn't mean literally seconds since epoch, but
             | instead some unix-style monstrosity
             | 
             | That "unix-style monstrosity" is literally seconds since
             | the UNIX time epoch, which is unambiguously defined as
             | starting on 1970-1-1 0:00:00 UTC.
             | 
             | Or it would have been, had leap seconds not been forced
             | upon the world in 1972, at which point yes, arguably it's
             | no longer "physical earth seconds" since the epoch but
             | "UNIX seconds" where a day is defined as exactly 86400 UNIX
             | seconds.
             | 
             | In retrospect, it'd have been better if UNIX time was
             | exactly a second, and the leap seconds accounted for by the
             | tz database, but that didn't exist until over a decade
             | after the first leap seconds were added, so probably
             | everybody thought it was easier just to take the pragmatic
             | option to skip the missing seconds, exactly the same way
             | that the rest of the world was doing.
             | 
             | I'm still not sure if that's what your complaint is about,
             | as I don't know of time systems defined any other way
             | handle this correctly if you were to ask for the time
             | difference in seconds between a time before and after a
             | leap second.
             | 
             | Maybe a better question would be: what do you think would
             | be a better way of defining a representation of a date and
             | time, and that would allow for easy calculations and also
             | easy transformations into how it's presented for users?
        
               | edflsafoiewq wrote:
               | You literally did the exact thing GP is complaining
               | about.
        
               | fluoridation wrote:
               | It's unclear if that's what they're complaining about,
               | but if it is, the reason people just say "seconds since
               | epoch" instead of saying what they actually mean is
               | precisely because no one wants to get into the weeds of
               | leap seconds. POSIX timestamps are a convenient format
               | and they represent an idealized "second" of constant
               | length that's in a uniform correspondence with the
               | rotation of the Earth. Most likely if you're using such a
               | representation you won't care that 1723740711 is not
               | literally and exactly 1723740711 seconds since
               | 1970-01-01T00:00:00Z and the error of a few seconds will
               | not even be noticed.
        
               | edflsafoiewq wrote:
               | No, that is literally exactly what they just complained
               | about.
               | 
               | > And it's annoying that you need to read some footnote
               | to figure out what exactly it means; it's annoying that
               | it is basically a code-phrase that you just need to know
               | that it's not supposed to be taken literally.
               | 
               | Even to point of deferring the real explanation to a
               | secondary paragraph.
        
               | fluoridation wrote:
               | I mean if they're referring specifically to leap seconds
               | or to some other obscure complexity about dealing with
               | real time.
        
               | ralferoo wrote:
               | It seems you're arguing based on the assumption that real
               | time is exactly 365x24x60x60 seconds every year or
               | exactly 366x24x60x60 seconds on leap years. It's not.
               | 
               | The problem is that we have a very precise definition of
               | a second in terms of decay of atoms (maybe precise is the
               | wrong word, as it's the statistical likelihood of x atoms
               | decaying given certain conditions, but whatever). The
               | problem is arguably that this is over-defined.
               | 
               | There's a good case that a second is actually how it's
               | always been defined historically up until 1967 - as
               | 1/60th of a minute, which is 1/60th of an hour, which is
               | 1/24th of a day. That's what UNIX seconds are. 86400 of
               | them in one day. And we have a pretty good idea what a
               | solar day is, and have been doing calendars based on them
               | for thousands of years.
               | 
               | But if you want to base your times on the decay of
               | caesium, then you can do that, but you have to accept
               | that it no longer corresponds neatly to a solar day any
               | more. The length of a day fluctuates by a couple of
               | seconds a day in either direction, which we largely just
               | ignore because over time, that mostly cancels out.
               | Personally, I don't think leap seconds should ever have
               | been introduced - over the last century or so, the earth
               | has been rotating "faster" than our idealised second
               | based on radioactive decay, so we've added leap seconds.
               | But more recently, it's been rotating "slower", and we're
               | at the situation where we need negative leap seconds.
               | Maybe really, we should have just left it alone and over
               | a longer period it'd all have averaged out anyway.
               | 
               | But what's interesting is that apart from the meddling
               | with leap seconds, we've decided that a "typical day" has
               | exactly 86400 seconds where a second is some constant
               | time, even though that isn't true of the reality of our
               | planet. Some days are too short when defined this way,
               | some days are too long. But on average, this 86400
               | seconds is pretty much right.
               | 
               | And arguably, any day that needs a leap second isn't
               | "wrong", the problem is actually that we over defined a
               | second before we realised that the periodicity of the
               | solar day wasn't a constant. I wouldn't advocate trying
               | to redefine what a second is again, because actually
               | having a constant time second is incredibly useful for
               | defining all the other derived SI units. But with that
               | usefulness, you also need to be aware that it's not the
               | same as the traditional timekeeping second.
               | 
               | But in any case, except for leap seconds, all the world's
               | time systems agree on 84000 seconds per day. So, can you
               | make the case for why you think UNIX time in particular
               | is a problem? And what would you rather have instead?
        
               | ralferoo wrote:
               | Again, the same question: what would be a better
               | solution?
        
         | nalgeon wrote:
         | > If the result exceeds the maximum value that can be stored in
         | a Duration, the maximum duration will be returned.
        
       | simontheowl wrote:
       | Very cool - definitely an important missing feature in SQlite.
        
       | mynameisash wrote:
       | I find the three different time representations/sizes curious
       | (eg, what possible use case would need nanosecond precision over
       | a span of billions of years?). More confusing is that there's
       | pretty extreme time granularity, but only +-290 years range with
       | nanosecond precision for time durations?
        
         | nalgeon wrote:
         | It works very well for me and thousands of other Go developers.
         | That's why I chose this approach.
        
           | g15jv2dp wrote:
           | There's no reason it wouldn't "work", the question is "why".
           | Having such precise dates obviously comes with some
           | compromises (e.g., the representation is larger, or it's
           | variable depending on the value which comes with additional
           | complexity, etc.). So surely there must be some pros to
           | counterbalance the cons. "Because it's what Go does" is an
           | answer, but I don't know if it's a convincing one.
        
           | bongodongobob wrote:
           | Nice. Smoking cigarettes works for me and millions of others
           | but it's still stupid and will take years or decades of your
           | life.
        
         | michaelt wrote:
         | _> what possible use case would need nanosecond precision over
         | a span of billions of years?_
         | 
         | Once you've decided you're using nanosecond precision, a 64-bit
         | representation can only cover 584 years which ain't enough. You
         | really want at least 2 more bits, so you can represent 2024
         | years.
         | 
         | But once you're adding on 2 bits, why not just add on 16 or
         | even 32? Then your library can cover the needs of everyone from
         | people calculating how it takes light to travel 30cm, to people
         | calculating the age of the universe.
         | 
         | That's how I imagine the design decisions went, anyway :)
         | 
         | Of course you can't _really_ provide sub-second accuracy
         | without leapsecond support and what does pre-human-civilisation
         | leapsecond support even mean?
        
       | alberth wrote:
       | Does this handle the special case of timezone changes (and local
       | time discontinuity) that Jon Skeet famously documented?
       | 
       | https://stackoverflow.com/questions/6841333/why-is-subtracti...
       | 
       | And computerphile explains so well in their 10-min video:
       | 
       | https://www.youtube.com/watch?v=-5wpm-gesOY
       | 
       | ---
       | 
       | I've long ago learned to never build my own Date/Time nor
       | Encryption libraries. There's endless edge cases that can bite
       | you hard.
       | 
       | (Which is also why I'm skeptical when I encounter new such
       | libraries)
        
         | sltkr wrote:
         | This library doesn't deal with the notion of local time at all.
         | It's all UTC-based times, possibly with a user-supplied
         | timezone offset, but then the hard part of calculating the
         | timezone offset must be done by the caller.
         | 
         | I do think the documentation could be a little clearer. The
         | author talks about "time zones" but the library only deals with
         | time zone offsets. (A time zone is something like
         | America/New_York, while a time zone offset is the difference to
         | UTC time, which is -14400 seconds for New York today, but will
         | be -18000 in a few months due to daylight saving time changes.)
        
           | nalgeon wrote:
           | Thanks for the suggestion! True, only fixed offsets are
           | supported, not timezone names.
        
             | alberth wrote:
             | @nalgeon
             | 
             | Do you plan to address the use cases in the SO post, or
             | asked differently - what is the intended use case of this
             | library?
             | 
             | I tried to recreate it on your site (which is very cool btw
             | in allowing the code to run in browser) and it seems to
             | fail and give the wrong time difference.
             | select time_compare(time_date(1927, 12, 31, 23, 58, 08, 0,
             | 28800000), time_date(1927, 12, 31, 23, 58, 09, 0,
             | 28800000));
             | 
             | Results in an answer of '1', which is incorrect.
             | 
             | Please don't take my comments as being negative or
             | unappreciated, this is super difficult stuff and anyone who
             | tries to make the world an easier place should be thanked
             | for that. So thank you.
             | 
             | ----
             | 
             | EDIT: this post explains why the answer isn't "1"
             | 
             | https://stackoverflow.com/questions/6841333/why-is-
             | subtracti...
        
               | nalgeon wrote:
               | I appreciate your comments, and thank you for trying out
               | the extension.
               | 
               | This query returns -1 (minus one, not one), which seems
               | correct to me. The first date is before the second:
               | select time_compare(           time_date(1927, 12, 31,
               | 23, 58, 08, 0, 28800000),           time_date(1927, 12,
               | 31, 23, 58, 09, 0, 28800000)         );              -1
        
               | kaoD wrote:
               | As discussed in the top-level comment, this library has
               | no concept of timezones (only offsets) so the SO link
               | does not apply. The time rollback only happened in
               | Asia/Shanghai.
        
           | Someone wrote:
           | > It's all UTC-based times
           | 
           | Not even that. UTC has leap seconds, which this code doesn't
           | handle (FTA: _"The calendrical calculations always assume a
           | Gregorian calendar, with no leap seconds"_ )
           | 
           | It copies that from the golang _time_ package, which makes
           | the same claim (https://pkg.go.dev/time)
           | 
           | That makes life a lot simpler for the implementer, but
           | doesn't that mean you can only reliably use these two
           | libraries for computing with durations, not with moments in
           | time or vice versa? The moment you start mapping these times
           | to real world clocks and adding durations to them, you run
           | the risk of getting small (up to about half a minute, at the
           | moment) inconsistencies.
        
             | doctorpangloss wrote:
             | Another POV is, why build for the SQLite "ecosystem" at
             | all?
        
               | devmor wrote:
               | Probably because SQLite is one of (if not THE most)
               | widely used database implementations for IoT software in
               | the world.
               | 
               | That's like asking why someone made a package for
               | javascript.
        
               | ErikBjare wrote:
               | SQLite is the most widely used database in the world.
               | Full stop.
        
             | nottorp wrote:
             | > The calendrical calculations always assume a Gregorian
             | calendar, with no leap seconds.
             | 
             | That could drift a bit if you can represent a 580 million
             | year interval (with millisecond precision), wouldn't it?
        
               | Someone wrote:
               | A library that handles leap seconds can't do much better,
               | as it's as good as impossible to predict leap seconds,
               | certainly not that far ahead.
               | 
               | Luckily, leap seconds are on the way out (https://en.wiki
               | pedia.org/wiki/Leap_second#International_prop...)
        
           | akira2501 wrote:
           | > A time zone is something like America/New_York
           | 
           | It's US/Eastern. Paul Eggert can call this a "deprecated
           | compatibility time" all he wants, but "Eastern Time Zone" is
           | the official name of the time zone as maintained by the civil
           | time keeping authority.
        
       | out_of_protocol wrote:
       | Why not go golang style, unix timestamp as nanoseconds, in signed
       | int64. Maybe you can't cover millions of years with nanosecond
       | precision, do you really need it?
        
         | commodoreboxer wrote:
         | With that precision and size, you can only cover the years from
         | 1678 to 2262, which strongly limits your ability to represent
         | historical dates and times.
        
           | azornathogron wrote:
           | If you're representing dates back into the 1600s you need to
           | keep in mind that calendar maths and things like "was this
           | year a leap year" become more complicated. The Gregorian
           | calendar was introduced in the 1500s but worldwide adoption
           | took a long time - for example, the UK didn't adopt it until
           | the 1700s. So you've got more than a century where just
           | having "a date" isn't really sufficient information to know
           | when something happened, you'll need to also know what
           | calendar system that date is in.
           | 
           | Overall, this means if you're representing historical dates I
           | would question whether a seconds-since-epoch timestamp
           | representation is what you want at all, regardless of range
           | and precision.
           | 
           | Edit: yes, you can kinda handle this as part of handling
           | timezones, but still, it's complicated enough that you may
           | want to retain more or different information if you're
           | displaying or letting users enter historical dates.
        
           | out_of_protocol wrote:
           | > represent historical dates and times.
           | 
           | With nanosecond precision? Just decide what you want to do
           | beforehand, i bet even datetime don't make much sense for
           | that time period, bare date would suffice. also, you'll
           | likely need location, calendar system etc since real dates
           | were not that standardized back then
        
         | nalgeon wrote:
         | Storing unix timestamp as nanoseconds is not Go's style, but
         | you can do just that with this extension.
         | select time_to_nano(time_now());         -- 1722979335431295000
        
       | lifeisstillgood wrote:
       | This is a sort of lazy Ask HN: but in your experience, what is
       | more useful / valuable - nanosecond representation, or years
       | outside the nano range of something like 1678-2200
       | 
       | I don't do "proper" science so the value of nanoseconds seems
       | limited to very clever experiments (or some financial trade
       | tracking that is probalby even more limited in scope).
       | 
       | But being able to represent historical dates seems more likely to
       | come up?
       | 
       | Thoughts?
        
         | cyberax wrote:
         | Historical dates, for sure.
         | 
         | Simply reducing the precision to 10ns will provide enough range
         | in practice.
        
         | rokkamokka wrote:
         | A bit like asking if a hammer or a screwdriver is more useful.
         | It depends on the work
        
       | cryptonector wrote:
       | I so wish that SQLite3 had an extensible type system.
        
         | funny_falcon wrote:
         | As a PostgreSQL smallish contributor I just can say: NO, DON'T
         | DO THIS!!!!
         | 
         | Extensible type system is a worst thing that could happend with
         | database end-user performance. Then one may not short-cut no
         | single thing in query parsing and optimization: you must check
         | type of any single operand, find correct operator
         | implemenation, find correct index operator family/class and
         | many more all through querying system catalog. And input/output
         | of values are also goes through the functions, stored in system
         | catalog. You may not even answer to "select 1" without
         | consulting with system catalog.
         | 
         | There should be sane set of builtin types + struct/json like
         | way of composition. That is like most DBs do except PostgreSQL.
         | And I strongly believe it is right way.
        
           | cryptonector wrote:
           | > you must check type of any single operand, find correct
           | operator implemenation, find correct index operator
           | family/class and many more all through querying system
           | catalog.
           | 
           | Not with static typing.
           | 
           | The problem with PG is that it's not fully statically typed
           | internally. SQLite3 is worse still, naturally. But a
           | statically typed SQL RDBMS should be possible.
        
       | quotemstr wrote:
       | Related tangent: databases should track units. If I have a time
       | column, I should be able to say a column represents, say,
       | durations in float64 seconds. Then I should be able to write
       | SELECT * FROM my_table WHERE duration_s >= 2h
       | 
       | and have the database DWIM, converting "2h" to 7200.0 seconds and
       | comparing like-for-like during the table scan.
       | 
       | Years ago, I wrote a special-purpose SQL database that had this
       | kind of native unit handling, but I've seen nothing before or
       | since, and it seems like a gap in the UI ecosystem.
       | 
       | And it shouldn't be for time. We should have the whole inventory
       | of units --- mass, volume, information, temperature, and so on.
       | Why not? We can also teach the database to reject mathematical
       | nonsense, e.g.                   SELECT 2h + 15kg -- type error!
       | 
       | Doing so would go a long way towards catching analysis errors
       | early.
        
         | zokier wrote:
         | Postgresql interval units allow already querying with natural-
         | like expressions:
         | https://www.postgresql.org/docs/current/datatype-datetime.ht...
        
         | n_plus_1_acc wrote:
         | What about leap seconds?
        
           | quotemstr wrote:
           | The leap second mechanism amounts to a collective agreement
           | to rewrite chronological history. It's like a git rebase for
           | your clock. Everyone (almost) in practice does math as if
           | leap seconds never happened, and the consequent divergence
           | from physical time ends up not mattering.
        
             | SonOfLilit wrote:
             | ... no?
             | 
             | If we add a leap second at the end of 2025, nothing in 2024
             | gets rewritten. Only the future meaning of pointer
             | expressions like "12 pm on January 2nd 2025" change their
             | value. When I want exactly 48 hours after 12 pm Dec 31, I
             | use a leap second independent time representation. But
             | since usually I want the same thing everyone calls 12 pm
             | Jan 2, I usually use a representation that gives me that.
             | 
             | And I, among many, take meticulous care to do my date math
             | (for a bank core system) only in ways that naturally
             | support leap seconds.
        
       ___________________________________________________________________
       (page generated 2024-08-15 23:00 UTC)