https://lemire.me/blog/2023/07/01/parsing-time-stamps-faster-with-simd-instructions/

Skip to content

Daniel Lemire's blog

Daniel Lemire is a computer science professor at the Data Science
Laboratory of the Universite du Quebec (TELUQ) in Montreal. His
research is focused on software performance and data engineering. He
is a techno-optimist and a free-speech advocate.

Menu and widgets

  * My home page
  * My papers
  * My software

Join over 12,500 email subscribers:

[                    ][Go!]
You can follow this blog on telegram.
You can find me on twitter as @lemire or on Mastodon.
Search for: [                    ] [Search]
Support my work!

I do not accept any advertisement. However, you can you can sponsor
my open-source work on GitHub.

Recent Posts

  * Parsing time stamps faster with SIMD instructions
  * Dynamic bit shuffle using AVX-512
  * Science and Technology links (June 25 2023)
  * Citogenesis in science and the importance of real problems
  * Science and Technology links (June 11 2023)

Recent Comments

  * aqrit on Parsing time stamps faster with SIMD instructions
  * Daniel on Parsing time stamps faster with SIMD instructions
  * Sergey Bronnikov on Parsing time stamps faster with SIMD
    instructions
  * Davidmh on Parsing time stamps faster with SIMD instructions
  * Arthur Chance on Parsing time stamps faster with SIMD
    instructions

Pages

  * A short history of technology
  * About me
  * Book recommendations
  * Cognitive biases
  * Interviews and talks
  * My bets
  * My favorite articles
  * My favorite quotes
  * My readers
  * My rules
  * Newsletter
  * Predictions
  * Privacy Policy
  * Recommended video games
  * Terms of use
  * Write good papers

Archives

Archives [Select Month        ]

Boring stuff

  * Log in
  * Entries feed
  * Comments feed
  * WordPress.org

Parsing time stamps faster with SIMD instructions

In software, it is common to represent time as a time-stamp string.
It is usually specified by a time format string. Some standards use
the format %Y%m%d%H%M%S meaning that we print the year, the month,
the day, the hours, the minutes and the seconds. The current time as
I write this blog post would be 20230701205436 as a time stamp in
this format. It is convenient because it is short, easy to read and
if you sort the strings lexicographically, you also sort them
chronologically.

You can generate time stamps using any programming language. In C,
the following program will print the current time (universal, not
local time):

#include <time.h>
#include <stdio.h>
int main() {
  char buffer[15];
  struct tm timeinfo;
  time_t rawtime;
  time(&rawtime);
  gmtime_r(&rawtime, &timeinfo);
  size_t len = strftime(buffer, 15, "%Y%m%d%H%M%S", &timeinfo);
  buffer[14] = '\0';
  puts(buffer);
}

We are interested in the problem of parsing these strings. In
practice, this means that we want to convert them to an integer
presenting the number of seconds since the Unix epoch. The Unix epoch
is January 1st 1970. For my purposes, I will consider the time to be
an unsigned 32-bit integer so we can represent time between 1970 and
2106. It is not difficult to switch over to a 64-bit integer or to
signed integers.

The way you typically solve this problem is to use something like the
C function strptime. Can we do better?

Modern processors have fast instructions that operate on several
words at once, called SIMD instructions. We have a block of 14
characters. Let us assume that we can read 16 characters safely,
ignoring the content of the leftover characters.

We load the block of digits in a SIMD register. We subtract 0x30 (the
code point value of the character '0'), and all bytes values should
be between 0 and 9, inclusively. We know that some character must be
smaller than 9, for example, we generally cannot have more than 59
seconds and never 60 seconds, in the time stamp string. So one
character must be between 0 and 5. Similarly, we start the hours at
00 and end at 23, so one character must be between 0 and 2. We do a
saturating subtraction of the maximum: the result of such a
subtraction should be zero if the value is no larger. We then use a
special instruction to multiply one byte by 10, and sum it up with
the next byte, getting a 16-bit value. We then repeat the same
approach as before, checking that the result is not too large.

The code might look as follow using Intel intrinsic functions:

 __m128i v = _mm_loadu_si128((const __m128i *)date_string);
v = _mm_sub_epi8(v, _mm_set1_epi8(0x30));
__m128i limit =
_mm_setr_epi8(9, 9, 9, 9, 1, 9, 3, 9, 2, 9, 5, 9, 5, 9, -1, -1);
__m128i abide_by_limits = _mm_subs_epu8(v, limit); // must be all zero
const __m128i weights = _mm_setr_epi8(
10, 1, 10, 1, 10, 1, 10, 1, 10, 1, 10, 1, 10, 1, 0, 0);
v = _mm_maddubs_epi16(v, weights);
__m128i limit16 =
_mm_setr_epi16(99,99, 12, 31, 23, 59, 59, -1);
__m128i abide_by_limits16 = _mm_subs_epu16(v, limit16);
__m128i limits = _mm_or_si128(abide_by_limits16,abide_by_limits);
if (!_mm_test_all_zeros(limits, limits)) {
  return false;
}


It does not get all the parsing done, but at this point, you have the
months, days, hours, minutes and seconds as valid binary integer
values. The year is parsed in two components (the first two digits,
and the next two digits).

We can just use standard C code for the result.

Is it fast? I wrote a benchmark that I compile using GCC 12 on an
Intel Ice Lake Linux server.

                         instructions per stamp time per stamp
standard C with strptime 700                    46
SIMD approach            65                     7.9

We use about 10 times fewer instructions, and we go 6 times faster.
That is not bad, but I suspect it is not nearly optimal.

The source code is available.

Credit: Thanks to Jeroen Koekkoek from NLnetLabs for initial work and
for proposing the problem, and to @aqrit for sketching the current
code.

 

Published by

[2ca999]

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).
View all posts by Daniel Lemire

Posted on July 1, 2023July 2, 2023Author Daniel LemireCategories  

7 thoughts on "Parsing time stamps faster with SIMD instructions"

 1. [e03257] -.- says:
    July 2, 2023 at 1:29 am

    The SSE code looks good. Possible tweaks (probably won't show any
    difference in benchmarks, but may have theoretical advantages):

    replace _mm_sub_epi8 with _mm_xor_si128 - latter is commutative,
    so may give compiler more freedom with ordering (e.g. use load-op
    on the source if the 0x30 vector is already in a register). Some
    CPUs may have more ports for bitwise ops over arithmetic
    maddubs has longish latency; the _mm_subs_epu16 could actually be
    done before it if you use a BCD-like representation for limit16
    which might help ILP

    Reply
 2. [a133ac] Duck says:
    July 2, 2023 at 7:32 am

    I solved a very similar problem on Stackoverflow some time ago.
    Its speed was < 1ns per time stamp.

    https://stackoverflow.com/questions/75680256/
    most-insanely-fast-way-to-convert-yymmdd-hhmmss-timestamp-to-uint64-t-number

    Reply
 3. [07e182] Arthur Chance says:
    July 2, 2023 at 1:35 pm

    we cannot have more than 59 seconds and never 60 seconds

    A slight nitpick: technically the seconds field can be '60' at
    23:59 on the 30th of June or 31st of December if there's a
    positive leap second.

    Reply
     1. [2f4c56] aqrit says:
        July 2, 2023 at 9:01 pm

        According to my reading of the linked RFC 4034 draft:

        A value of 60+ seconds is explicitly forbidden.

        The field being serialized is a Unix Timestamp. Unix Time
        explicitly ignores leap seconds.
        Converting Unix Time to UTC will never yield a leap second,
        as it can not be represented.

        Reply
 4. [18a30c] Davidmh says:
    July 2, 2023 at 1:57 pm

    As Arthur indicates, you are ignoring all the leap seconds, which
    explains at least some of the reasons why the library version is
    slower.
    Not enough to get to 10 times more instructions, but still not a
    fair comparison.

    Reply
 5. [fc8788] Sergey Bronnikov says:
    July 2, 2023 at 4:30 pm

    SQL standard allows 62 seconds (0-61) in a minute, see https://
    twitter.com/noop_noob/status/1166982640118845442

    Reply
 6. [0b6cf4] Daniel says:
    July 2, 2023 at 7:56 pm

    Just wanted to say hello and I'd wish I understood half of the
    code shared

    Reply

Leave a Reply Cancel reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four
spaces:

    This will be displayed in a monospaced font. The first four
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
Comment * [                                             ]

Name * [                              ]

Email * [                              ]

Website [                              ]

[ ] Save my name, email, and website in this browser for the next
time I comment.

Receive Email Notifications? [no, do not subscribe      ] 
[instantly    ]
Or, you can subscribe without commenting.

[Post Comment] 

 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
D[                                             ] 

You may subscribe to this blog by email.

Post navigation

Previous Previous post: Dynamic bit shuffle using AVX-512
Terms of use Proudly powered by WordPress