[HN Gopher] Show HN: High-precision date/time in SQLite
___________________________________________________________________
Show HN: High-precision date/time in SQLite
Author : nalgeon
Score : 226 points
Date : 2024-08-15 11:13 UTC (11 hours ago)
(HTM) web link (antonz.org)
(TXT) w3m dump (antonz.org)
| davidhyde wrote:
| I think it's important to be explicit about whether or not signed
| integers are used. From reading the document it seems that they
| may be signed but they could not be. If they are signed then you
| could have multiple bit strings that represent the same date and
| time which is not great.
| jagged-chisel wrote:
| Definitely signed - "use negative duration to subtract"
|
| But bit pattern is an issue internal to the library. If you can
| find a bug in the code, certainly point it out and offer a fix
| if it's in your skillset.
| sigseg1v wrote:
| I think the negative number here refers to the amount of
| days/etc to subtract (eg. add negative days to subtract, not
| supply a negative date).
|
| However, at the same time it seems to indicate that it stores
| data using sqlites built in number type, which to my
| understanding does not support unsigned? Secondly, the docs
| mention you can store with a range of 290 years and the
| precision is nanoseconds, which if you calculate it out works
| out to about 63 bits of information, suggesting a signed
| implementation.
| tyingq wrote:
| Yes, it's signed...https://www.sqlite.org/datatype3.html
| Each value stored in an SQLite database (or manipulated by
| the database engine) has one of the following storage
| classes: # some omitted... INTEGER. The value
| is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes
| depending on the magnitude of the value.
| gcr wrote:
| Subtraction of unsigned negative values still works just fine
| because of two's compliment.
|
| (uint8)(-3) is 253, for example, and (uint8)5-(uint8)253 =
| (uint8)8, corresponding to 5 - (-3)
| kaoD wrote:
| > multiple bit strings that represent the same date and time
|
| How so?
| davidhyde wrote:
| You're right, whether or not the integers are signed has
| nothing to do with the issue above. Unsigned integers have
| the same issue.
|
| Here is an example for signed integers.
|
| These represent zero time but have different representations
| in memory:
|
| Seconds: 2 Nanoseconds: -2,000,000,000 (fits in a 32 bit
| number) Time: zero seconds
|
| Seconds: -2 Nanoseconds: 2,000,000,000 Time: zero seconds
|
| Here is an example for unsigned: Seconds: 1 Nanoseconds: 0
| Time: 1 second
|
| Seconds: 0 Nanoseconds: 1,000,000,000 Time 1 second
| kaoD wrote:
| Thanks, but I'm not gonna pretend that was my point. Dumb
| question from me, I just forgot the context that time was a
| pair of integers and was utterly confused, haha. You're
| spot on!
| zokier wrote:
| I just wish people would stop using the phrase "seconds since
| epoch" (or equivalent) unless that is exactly what they mean.
|
| I wonder what does select
| time_sub(time_date(2011, 11, 19), time_date(1311, 11, 18));
|
| return?
| ralferoo wrote:
| Why do you wish that?
|
| I can think of a few plausible reasons, but the only one that
| is really significant is "what epoch"? In the case of UNIX-
| based systems and systems that try to mimic that behaviour,
| that is well defined. But as you haven't said what your
| complaints are, it's hard to provide any counterpoint or
| justification for why things are as they are.
|
| > time_date(1311, 11, 18)
|
| That isn't defined in the epoch used by most computer systems,
| so all bets are off. Perhaps it'll return MAX_INT, MIN_INT, 0,
| something that's plausible but doesn't take into calendar
| reforms that have no bearing on the epoch being used, or
| perhaps it translates into a different epoch and calculates the
| exact number of seconds, or anything else. One could even argue
| that there are no valid epochs before GMT/UTC because it was
| all just local time before then.
|
| But of course, you can argue either way whether -ve values
| should be supported. Exactly 24 hours before 1970-1-1 0:00:00
| UTC could be reasonably expected to be -86400, on the other
| hand "since" strongly implies positive only.
|
| Other people might have entirely different epochs for different
| reasons, again within the domain it's being used, that's fine
| as long as everyone agrees.
|
| Or did you have some other objection?
| zokier wrote:
| The problem with "seconds since epoch" expression is that
| almost always it doesn't mean literally seconds since epoch,
| but instead some unix-style monstrosity. And it's annoying
| that you need to read some footnote to figure out what
| exactly it means; it's annoying that it is basically a code-
| phrase that you just need to know that it's not supposed to
| be taken literally.
| ralferoo wrote:
| > it doesn't mean literally seconds since epoch, but
| instead some unix-style monstrosity
|
| That "unix-style monstrosity" is literally seconds since
| the UNIX time epoch, which is unambiguously defined as
| starting on 1970-1-1 0:00:00 UTC.
|
| Or it would have been, had leap seconds not been forced
| upon the world in 1972, at which point yes, arguably it's
| no longer "physical earth seconds" since the epoch but
| "UNIX seconds" where a day is defined as exactly 86400 UNIX
| seconds.
|
| In retrospect, it'd have been better if UNIX time was
| exactly a second, and the leap seconds accounted for by the
| tz database, but that didn't exist until over a decade
| after the first leap seconds were added, so probably
| everybody thought it was easier just to take the pragmatic
| option to skip the missing seconds, exactly the same way
| that the rest of the world was doing.
|
| I'm still not sure if that's what your complaint is about,
| as I don't know of time systems defined any other way
| handle this correctly if you were to ask for the time
| difference in seconds between a time before and after a
| leap second.
|
| Maybe a better question would be: what do you think would
| be a better way of defining a representation of a date and
| time, and that would allow for easy calculations and also
| easy transformations into how it's presented for users?
| edflsafoiewq wrote:
| You literally did the exact thing GP is complaining
| about.
| fluoridation wrote:
| It's unclear if that's what they're complaining about,
| but if it is, the reason people just say "seconds since
| epoch" instead of saying what they actually mean is
| precisely because no one wants to get into the weeds of
| leap seconds. POSIX timestamps are a convenient format
| and they represent an idealized "second" of constant
| length that's in a uniform correspondence with the
| rotation of the Earth. Most likely if you're using such a
| representation you won't care that 1723740711 is not
| literally and exactly 1723740711 seconds since
| 1970-01-01T00:00:00Z and the error of a few seconds will
| not even be noticed.
| edflsafoiewq wrote:
| No, that is literally exactly what they just complained
| about.
|
| > And it's annoying that you need to read some footnote
| to figure out what exactly it means; it's annoying that
| it is basically a code-phrase that you just need to know
| that it's not supposed to be taken literally.
|
| Even to point of deferring the real explanation to a
| secondary paragraph.
| fluoridation wrote:
| I mean if they're referring specifically to leap seconds
| or to some other obscure complexity about dealing with
| real time.
| ralferoo wrote:
| It seems you're arguing based on the assumption that real
| time is exactly 365x24x60x60 seconds every year or
| exactly 366x24x60x60 seconds on leap years. It's not.
|
| The problem is that we have a very precise definition of
| a second in terms of decay of atoms (maybe precise is the
| wrong word, as it's the statistical likelihood of x atoms
| decaying given certain conditions, but whatever). The
| problem is arguably that this is over-defined.
|
| There's a good case that a second is actually how it's
| always been defined historically up until 1967 - as
| 1/60th of a minute, which is 1/60th of an hour, which is
| 1/24th of a day. That's what UNIX seconds are. 86400 of
| them in one day. And we have a pretty good idea what a
| solar day is, and have been doing calendars based on them
| for thousands of years.
|
| But if you want to base your times on the decay of
| caesium, then you can do that, but you have to accept
| that it no longer corresponds neatly to a solar day any
| more. The length of a day fluctuates by a couple of
| seconds a day in either direction, which we largely just
| ignore because over time, that mostly cancels out.
| Personally, I don't think leap seconds should ever have
| been introduced - over the last century or so, the earth
| has been rotating "faster" than our idealised second
| based on radioactive decay, so we've added leap seconds.
| But more recently, it's been rotating "slower", and we're
| at the situation where we need negative leap seconds.
| Maybe really, we should have just left it alone and over
| a longer period it'd all have averaged out anyway.
|
| But what's interesting is that apart from the meddling
| with leap seconds, we've decided that a "typical day" has
| exactly 86400 seconds where a second is some constant
| time, even though that isn't true of the reality of our
| planet. Some days are too short when defined this way,
| some days are too long. But on average, this 86400
| seconds is pretty much right.
|
| And arguably, any day that needs a leap second isn't
| "wrong", the problem is actually that we over defined a
| second before we realised that the periodicity of the
| solar day wasn't a constant. I wouldn't advocate trying
| to redefine what a second is again, because actually
| having a constant time second is incredibly useful for
| defining all the other derived SI units. But with that
| usefulness, you also need to be aware that it's not the
| same as the traditional timekeeping second.
|
| But in any case, except for leap seconds, all the world's
| time systems agree on 84000 seconds per day. So, can you
| make the case for why you think UNIX time in particular
| is a problem? And what would you rather have instead?
| ralferoo wrote:
| Again, the same question: what would be a better
| solution?
| nalgeon wrote:
| > If the result exceeds the maximum value that can be stored in
| a Duration, the maximum duration will be returned.
| simontheowl wrote:
| Very cool - definitely an important missing feature in SQlite.
| mynameisash wrote:
| I find the three different time representations/sizes curious
| (eg, what possible use case would need nanosecond precision over
| a span of billions of years?). More confusing is that there's
| pretty extreme time granularity, but only +-290 years range with
| nanosecond precision for time durations?
| nalgeon wrote:
| It works very well for me and thousands of other Go developers.
| That's why I chose this approach.
| g15jv2dp wrote:
| There's no reason it wouldn't "work", the question is "why".
| Having such precise dates obviously comes with some
| compromises (e.g., the representation is larger, or it's
| variable depending on the value which comes with additional
| complexity, etc.). So surely there must be some pros to
| counterbalance the cons. "Because it's what Go does" is an
| answer, but I don't know if it's a convincing one.
| bongodongobob wrote:
| Nice. Smoking cigarettes works for me and millions of others
| but it's still stupid and will take years or decades of your
| life.
| michaelt wrote:
| _> what possible use case would need nanosecond precision over
| a span of billions of years?_
|
| Once you've decided you're using nanosecond precision, a 64-bit
| representation can only cover 584 years which ain't enough. You
| really want at least 2 more bits, so you can represent 2024
| years.
|
| But once you're adding on 2 bits, why not just add on 16 or
| even 32? Then your library can cover the needs of everyone from
| people calculating how it takes light to travel 30cm, to people
| calculating the age of the universe.
|
| That's how I imagine the design decisions went, anyway :)
|
| Of course you can't _really_ provide sub-second accuracy
| without leapsecond support and what does pre-human-civilisation
| leapsecond support even mean?
| alberth wrote:
| Does this handle the special case of timezone changes (and local
| time discontinuity) that Jon Skeet famously documented?
|
| https://stackoverflow.com/questions/6841333/why-is-subtracti...
|
| And computerphile explains so well in their 10-min video:
|
| https://www.youtube.com/watch?v=-5wpm-gesOY
|
| ---
|
| I've long ago learned to never build my own Date/Time nor
| Encryption libraries. There's endless edge cases that can bite
| you hard.
|
| (Which is also why I'm skeptical when I encounter new such
| libraries)
| sltkr wrote:
| This library doesn't deal with the notion of local time at all.
| It's all UTC-based times, possibly with a user-supplied
| timezone offset, but then the hard part of calculating the
| timezone offset must be done by the caller.
|
| I do think the documentation could be a little clearer. The
| author talks about "time zones" but the library only deals with
| time zone offsets. (A time zone is something like
| America/New_York, while a time zone offset is the difference to
| UTC time, which is -14400 seconds for New York today, but will
| be -18000 in a few months due to daylight saving time changes.)
| nalgeon wrote:
| Thanks for the suggestion! True, only fixed offsets are
| supported, not timezone names.
| alberth wrote:
| @nalgeon
|
| Do you plan to address the use cases in the SO post, or
| asked differently - what is the intended use case of this
| library?
|
| I tried to recreate it on your site (which is very cool btw
| in allowing the code to run in browser) and it seems to
| fail and give the wrong time difference.
| select time_compare(time_date(1927, 12, 31, 23, 58, 08, 0,
| 28800000), time_date(1927, 12, 31, 23, 58, 09, 0,
| 28800000));
|
| Results in an answer of '1', which is incorrect.
|
| Please don't take my comments as being negative or
| unappreciated, this is super difficult stuff and anyone who
| tries to make the world an easier place should be thanked
| for that. So thank you.
|
| ----
|
| EDIT: this post explains why the answer isn't "1"
|
| https://stackoverflow.com/questions/6841333/why-is-
| subtracti...
| nalgeon wrote:
| I appreciate your comments, and thank you for trying out
| the extension.
|
| This query returns -1 (minus one, not one), which seems
| correct to me. The first date is before the second:
| select time_compare( time_date(1927, 12, 31,
| 23, 58, 08, 0, 28800000), time_date(1927, 12,
| 31, 23, 58, 09, 0, 28800000) ); -1
| kaoD wrote:
| As discussed in the top-level comment, this library has
| no concept of timezones (only offsets) so the SO link
| does not apply. The time rollback only happened in
| Asia/Shanghai.
| Someone wrote:
| > It's all UTC-based times
|
| Not even that. UTC has leap seconds, which this code doesn't
| handle (FTA: _"The calendrical calculations always assume a
| Gregorian calendar, with no leap seconds"_ )
|
| It copies that from the golang _time_ package, which makes
| the same claim (https://pkg.go.dev/time)
|
| That makes life a lot simpler for the implementer, but
| doesn't that mean you can only reliably use these two
| libraries for computing with durations, not with moments in
| time or vice versa? The moment you start mapping these times
| to real world clocks and adding durations to them, you run
| the risk of getting small (up to about half a minute, at the
| moment) inconsistencies.
| doctorpangloss wrote:
| Another POV is, why build for the SQLite "ecosystem" at
| all?
| devmor wrote:
| Probably because SQLite is one of (if not THE most)
| widely used database implementations for IoT software in
| the world.
|
| That's like asking why someone made a package for
| javascript.
| ErikBjare wrote:
| SQLite is the most widely used database in the world.
| Full stop.
| nottorp wrote:
| > The calendrical calculations always assume a Gregorian
| calendar, with no leap seconds.
|
| That could drift a bit if you can represent a 580 million
| year interval (with millisecond precision), wouldn't it?
| Someone wrote:
| A library that handles leap seconds can't do much better,
| as it's as good as impossible to predict leap seconds,
| certainly not that far ahead.
|
| Luckily, leap seconds are on the way out (https://en.wiki
| pedia.org/wiki/Leap_second#International_prop...)
| akira2501 wrote:
| > A time zone is something like America/New_York
|
| It's US/Eastern. Paul Eggert can call this a "deprecated
| compatibility time" all he wants, but "Eastern Time Zone" is
| the official name of the time zone as maintained by the civil
| time keeping authority.
| out_of_protocol wrote:
| Why not go golang style, unix timestamp as nanoseconds, in signed
| int64. Maybe you can't cover millions of years with nanosecond
| precision, do you really need it?
| commodoreboxer wrote:
| With that precision and size, you can only cover the years from
| 1678 to 2262, which strongly limits your ability to represent
| historical dates and times.
| azornathogron wrote:
| If you're representing dates back into the 1600s you need to
| keep in mind that calendar maths and things like "was this
| year a leap year" become more complicated. The Gregorian
| calendar was introduced in the 1500s but worldwide adoption
| took a long time - for example, the UK didn't adopt it until
| the 1700s. So you've got more than a century where just
| having "a date" isn't really sufficient information to know
| when something happened, you'll need to also know what
| calendar system that date is in.
|
| Overall, this means if you're representing historical dates I
| would question whether a seconds-since-epoch timestamp
| representation is what you want at all, regardless of range
| and precision.
|
| Edit: yes, you can kinda handle this as part of handling
| timezones, but still, it's complicated enough that you may
| want to retain more or different information if you're
| displaying or letting users enter historical dates.
| out_of_protocol wrote:
| > represent historical dates and times.
|
| With nanosecond precision? Just decide what you want to do
| beforehand, i bet even datetime don't make much sense for
| that time period, bare date would suffice. also, you'll
| likely need location, calendar system etc since real dates
| were not that standardized back then
| nalgeon wrote:
| Storing unix timestamp as nanoseconds is not Go's style, but
| you can do just that with this extension.
| select time_to_nano(time_now()); -- 1722979335431295000
| lifeisstillgood wrote:
| This is a sort of lazy Ask HN: but in your experience, what is
| more useful / valuable - nanosecond representation, or years
| outside the nano range of something like 1678-2200
|
| I don't do "proper" science so the value of nanoseconds seems
| limited to very clever experiments (or some financial trade
| tracking that is probalby even more limited in scope).
|
| But being able to represent historical dates seems more likely to
| come up?
|
| Thoughts?
| cyberax wrote:
| Historical dates, for sure.
|
| Simply reducing the precision to 10ns will provide enough range
| in practice.
| rokkamokka wrote:
| A bit like asking if a hammer or a screwdriver is more useful.
| It depends on the work
| cryptonector wrote:
| I so wish that SQLite3 had an extensible type system.
| funny_falcon wrote:
| As a PostgreSQL smallish contributor I just can say: NO, DON'T
| DO THIS!!!!
|
| Extensible type system is a worst thing that could happend with
| database end-user performance. Then one may not short-cut no
| single thing in query parsing and optimization: you must check
| type of any single operand, find correct operator
| implemenation, find correct index operator family/class and
| many more all through querying system catalog. And input/output
| of values are also goes through the functions, stored in system
| catalog. You may not even answer to "select 1" without
| consulting with system catalog.
|
| There should be sane set of builtin types + struct/json like
| way of composition. That is like most DBs do except PostgreSQL.
| And I strongly believe it is right way.
| cryptonector wrote:
| > you must check type of any single operand, find correct
| operator implemenation, find correct index operator
| family/class and many more all through querying system
| catalog.
|
| Not with static typing.
|
| The problem with PG is that it's not fully statically typed
| internally. SQLite3 is worse still, naturally. But a
| statically typed SQL RDBMS should be possible.
| quotemstr wrote:
| Related tangent: databases should track units. If I have a time
| column, I should be able to say a column represents, say,
| durations in float64 seconds. Then I should be able to write
| SELECT * FROM my_table WHERE duration_s >= 2h
|
| and have the database DWIM, converting "2h" to 7200.0 seconds and
| comparing like-for-like during the table scan.
|
| Years ago, I wrote a special-purpose SQL database that had this
| kind of native unit handling, but I've seen nothing before or
| since, and it seems like a gap in the UI ecosystem.
|
| And it shouldn't be for time. We should have the whole inventory
| of units --- mass, volume, information, temperature, and so on.
| Why not? We can also teach the database to reject mathematical
| nonsense, e.g. SELECT 2h + 15kg -- type error!
|
| Doing so would go a long way towards catching analysis errors
| early.
| zokier wrote:
| Postgresql interval units allow already querying with natural-
| like expressions:
| https://www.postgresql.org/docs/current/datatype-datetime.ht...
| n_plus_1_acc wrote:
| What about leap seconds?
| quotemstr wrote:
| The leap second mechanism amounts to a collective agreement
| to rewrite chronological history. It's like a git rebase for
| your clock. Everyone (almost) in practice does math as if
| leap seconds never happened, and the consequent divergence
| from physical time ends up not mattering.
| SonOfLilit wrote:
| ... no?
|
| If we add a leap second at the end of 2025, nothing in 2024
| gets rewritten. Only the future meaning of pointer
| expressions like "12 pm on January 2nd 2025" change their
| value. When I want exactly 48 hours after 12 pm Dec 31, I
| use a leap second independent time representation. But
| since usually I want the same thing everyone calls 12 pm
| Jan 2, I usually use a representation that gives me that.
|
| And I, among many, take meticulous care to do my date math
| (for a bank core system) only in ways that naturally
| support leap seconds.
___________________________________________________________________
(page generated 2024-08-15 23:00 UTC)