https://lemire.me/blog/2023/11/28/parsing-8-bit-integers-quickly/

Skip to content

Daniel Lemire's blog

Daniel Lemire is a computer science professor at the Data Science
Laboratory of the Universite du Quebec (TELUQ) in Montreal. His
research is focused on software performance and data engineering. He
is a techno-optimist and a free-speech advocate.

Menu and widgets

  * My home page
  * My papers
  * My software

Join over 12,500 email subscribers:

[                    ][Go!]
You can follow this blog on telegram.
You can find me on twitter as @lemire or on Mastodon.
Search for: [                    ] [Search]
Support my work!

I do not accept any advertisement. However, you can you can sponsor
my open-source work on GitHub.

Recent Posts

  * Parsing 8-bit integers quickly
  * A simple WebSocket benchmark in Python
  * A simple WebSocket benchmark in JavaScript: Node.js versus Bun
  * Science and Technology links (November 12 2023)
  * Generating arrays at compile-time in C++ with lambdas

Recent Comments

  * Daniel Lemire on A simple WebSocket benchmark in Python
  * Sergey on A simple WebSocket benchmark in Python
  * Nedgeva on A simple WebSocket benchmark in JavaScript: Node.js
    versus Bun
  * Juho Vepsalainen on A simple WebSocket benchmark in JavaScript:
    Node.js versus Bun
  * Rusty on A simple WebSocket benchmark in JavaScript: Node.js
    versus Bun

Pages

  * A short history of technology
  * About me
  * Book recommendations
  * Cognitive biases
  * Interviews and talks
  * My bets
  * My favorite articles
  * My favorite quotes
  * My rules
  * Newsletter
  * Predictions
  * Privacy Policy
  * Recommended video games
  * Terms of use
  * Write good papers

Archives

Archives [Select Month        ]

Boring stuff

  * Log in
  * Entries feed
  * Comments feed
  * WordPress.org

Parsing 8-bit integers quickly

Suppose that you want to parse quickly 8-bit integers (0, 1, 2, ...,
254, 255) from an ASCII/UTF-8 string. The problem comes up in the
simdzone project lead by Jeroen Koekkoek (NLnet Labs). You are given
a string and its length: e.g., '22' and length is 2. The naive
approach in C might be:

int parse_uint8_naive(const char *str, size_t len, uint8_t *num) {
  uint32_t n = 0;
  for (size_t i = 0, r = len & 0x3; i < r; i++) {
    uint8_t d = (uint8_t)(str[i] - '0');
    if (d > 9)
     return 0;
    n = n * 10 + d;
  }
  *num = (uint8_t)n;
  return n < 256 && len && len < 4;
}

This code is a C function that takes a string of characters, its
length, and a pointer to an unsigned 8-bit integer as input
arguments. The function returns a Boolean value indicating whether
the input string can be parsed into an unsigned 8-bit integer or not.

The function first initializes a 32-bit unsigned integer n to zero,
we will store our answer there. The function then iterates over the
input string, extracting each digit character from the string and
converting it to an unsigned 8-bit integer. The extracted digit is
then added to n after being multiplied by 10. This process continues
until the end of the string or until the function has processed 4
bytes of the string. Finally, the function assigns the value of n to
the unsigned 8-bit integer pointed to by num. It then returns a
boolean value indicating whether the parsed integer is less than 256
and the length of the input string is between 1 and 3 characters.  If
the input string contains any non-digit characters or if the length
of the string is greater than 3 bytes, the function returns false.

In C++, you could call from_chars:

int parse_uint8_fromchars(const char *str, size_t len, uint8_t *num) {
  auto [p, ec] = std::from_chars(str, str + len, *num);
  return (ec == std::errc());
}


The std::from_chars function takes three arguments: a pointer to the
beginning of the input character sequence, a pointer to the end of
the input character sequence, and a reference to the output variable
that will hold the parsed integer value. The function returns
a std::from_chars_result object that contains two members: a pointer
to the first character that was not parsed, and an error code that
indicates whether the parsing was successful or not.

In this function, the std::from_chars function is called with the
input string and its length as arguments, along with a pointer to the
unsigned 8-bit integer that will hold the parsed value. The function
then checks whether the error code returned by std::from_chars is
equal to std::errc(), which indicates that the parsing was
successful. If the parsing was successful, the function returns true.
Otherwise, it returns false.

Can we do better than these functions? Suppose that you can always
read 4 bytes, even if the string is shorter (i.e., there is a
buffer). This is often a safe assumption. Then you can load your
input into a 32-bit word and process all bytes at once, in a single
register. This often called SWAR: In computer science, SWAR means
SIMD within a register, which is a technique for performing parallel
operations on data contained in a processor register.

Jeroen Koekkoek first came up with a valid SWAR approach, but it was
only about 40% than the naive approach in the case where we had
unpredictable inputs, and no faster than the naive approach given
predictable inputs. Let us examine an approach that might be
competitive all around:

int parse_uint8_fastswar(const char *str, size_t len, uint8_t *num) {
  union {
    uint8_t as_str[4];
    uint32_t as_int;
  } digits;
  memcpy(&digits.as_int, str, sizeof(digits));
  digits.as_int ^= 0x30303030lu;
  digits.as_int <<= (4 - (len & 0x3)) * 8;
  uint32_t y = ((UINT64_C(0x640a0100) * digits.as_int)>>32)&0xff;
  *num = (uint8_t)(y);
  return (digits.as_int & 0xf0f0f0f0) == 0 && y < 256 && len != 0
          && len < 4;
}


Again, this code is a C function that takes a string of characters,
its length, and a pointer to an unsigned 8-bit integer as input
arguments. The function returns a boolean value indicating whether
the input string can be parsed into an unsigned 8-bit integer or not.
The function first initializes a union digits that contains an array
of 4 unsigned 8-bit integers and a 32-bit unsigned integer. The
function then copies the input string into the 32-bit unsigned
integer using the memcpy function. The memcpy function copies the
input string into the 32-bit unsigned integer.

The function then flips the bits of the 32-bit unsigned integer using
the XOR operator and the constant value 0x30303030lu. This operation
converts each digit character in the input string to its
corresponding decimal value. Indeed, the ASCII characters from 0 to 9
have code point values 0x30 to 0x39 in ASCII. The function then
shifts the 32-bit unsigned integer to the left by a certain number of
bits, depending on the length of the input string. This operation
removes any trailing bytes that were not part of the input string.

The next part is where I contributed to the routine. The function
then multiplies the 32-bit unsigned integer by the constant value
0x640a0100 using a 64-bit unsigned integer. It is a concise way to do
two multiplications (by 100 and by 10) and two sums at once. Observe
that 0x64 is 100 and 0x0a is 10. The result of this multiplication is
then shifted to the right by 32 bits and masked with the value 0xff.
This operation extracts the least significant byte of the 32-bit
unsigned integer, which represents the parsed unsigned 8-bit integer.
Finally, the function assigns the value of the parsed unsigned 8-bit
integer to the unsigned 8-bit integer pointed to by num. It then
returns a boolean value indicating whether the parsed integer is less
than 256 and the length of the input string is between 1 and 3
characters.

To test these functions, I wrote a benchmark. The benchmark uses
random inputs, or sequential inputs (0,1,...), and it ends up being
very relevant. I use GCC 12 and an Ice Lake (Intel) linux server
running at 3.2 GHz. I report the number of millions of numbers parsed
by second.

           random numbers sequential numbers
from_chars 145 M/s        255 M/s
naive      210 M/s        345 M/s
SWAR       450 M/s        450 M/s

So the SWAR approach is effectively twice as fast as the naive
approach when the inputs are unpredictable. Otherwise, for
predictable inputs, we still have a 30% gain according to my numbers.
The from_chars results are disappointing all around.

Can you do better? The benchmark is available.

Credit: I am grateful to Jeroen Koekkoek from NLnet Labs for
suggesting this problem.

Published by

[2ca999]

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).
View all posts by Daniel Lemire

Posted on November 28, 2023November 28, 2023Author Daniel Lemire
Categories  

Leave a Reply Cancel reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four
spaces:

    This will be displayed in a monospaced font. The first four
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
Comment * [                                             ]

Name * [                              ]

Email * [                              ]

Website [                              ]

[ ] Save my name, email, and website in this browser for the next
time I comment.

Receive Email Notifications? [no, do not subscribe      ] 
[instantly    ]
Or, you can subscribe without commenting.

[Post Comment] 

 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
D[                                             ] 

You may subscribe to this blog by email.

Post navigation

Previous Previous post: A simple WebSocket benchmark in Python
Terms of use Proudly powered by WordPress