https://lemire.me/blog/2023/11/28/parsing-8-bit-integers-quickly/ Skip to content Daniel Lemire's blog Daniel Lemire is a computer science professor at the Data Science Laboratory of the Universite du Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist and a free-speech advocate. Menu and widgets * My home page * My papers * My software Join over 12,500 email subscribers: [ ][Go!] You can follow this blog on telegram. You can find me on twitter as @lemire or on Mastodon. Search for: [ ] [Search] Support my work! I do not accept any advertisement. However, you can you can sponsor my open-source work on GitHub. Recent Posts * Parsing 8-bit integers quickly * A simple WebSocket benchmark in Python * A simple WebSocket benchmark in JavaScript: Node.js versus Bun * Science and Technology links (November 12 2023) * Generating arrays at compile-time in C++ with lambdas Recent Comments * Daniel Lemire on A simple WebSocket benchmark in Python * Sergey on A simple WebSocket benchmark in Python * Nedgeva on A simple WebSocket benchmark in JavaScript: Node.js versus Bun * Juho Vepsalainen on A simple WebSocket benchmark in JavaScript: Node.js versus Bun * Rusty on A simple WebSocket benchmark in JavaScript: Node.js versus Bun Pages * A short history of technology * About me * Book recommendations * Cognitive biases * Interviews and talks * My bets * My favorite articles * My favorite quotes * My rules * Newsletter * Predictions * Privacy Policy * Recommended video games * Terms of use * Write good papers Archives Archives [Select Month ] Boring stuff * Log in * Entries feed * Comments feed * WordPress.org Parsing 8-bit integers quickly Suppose that you want to parse quickly 8-bit integers (0, 1, 2, ..., 254, 255) from an ASCII/UTF-8 string. The problem comes up in the simdzone project lead by Jeroen Koekkoek (NLnet Labs). You are given a string and its length: e.g., '22' and length is 2. The naive approach in C might be: int parse_uint8_naive(const char *str, size_t len, uint8_t *num) { uint32_t n = 0; for (size_t i = 0, r = len & 0x3; i < r; i++) { uint8_t d = (uint8_t)(str[i] - '0'); if (d > 9) return 0; n = n * 10 + d; } *num = (uint8_t)n; return n < 256 && len && len < 4; } This code is a C function that takes a string of characters, its length, and a pointer to an unsigned 8-bit integer as input arguments. The function returns a Boolean value indicating whether the input string can be parsed into an unsigned 8-bit integer or not. The function first initializes a 32-bit unsigned integer n to zero, we will store our answer there. The function then iterates over the input string, extracting each digit character from the string and converting it to an unsigned 8-bit integer. The extracted digit is then added to n after being multiplied by 10. This process continues until the end of the string or until the function has processed 4 bytes of the string. Finally, the function assigns the value of n to the unsigned 8-bit integer pointed to by num. It then returns a boolean value indicating whether the parsed integer is less than 256 and the length of the input string is between 1 and 3 characters. If the input string contains any non-digit characters or if the length of the string is greater than 3 bytes, the function returns false. In C++, you could call from_chars: int parse_uint8_fromchars(const char *str, size_t len, uint8_t *num) { auto [p, ec] = std::from_chars(str, str + len, *num); return (ec == std::errc()); } The std::from_chars function takes three arguments: a pointer to the beginning of the input character sequence, a pointer to the end of the input character sequence, and a reference to the output variable that will hold the parsed integer value. The function returns a std::from_chars_result object that contains two members: a pointer to the first character that was not parsed, and an error code that indicates whether the parsing was successful or not. In this function, the std::from_chars function is called with the input string and its length as arguments, along with a pointer to the unsigned 8-bit integer that will hold the parsed value. The function then checks whether the error code returned by std::from_chars is equal to std::errc(), which indicates that the parsing was successful. If the parsing was successful, the function returns true. Otherwise, it returns false. Can we do better than these functions? Suppose that you can always read 4 bytes, even if the string is shorter (i.e., there is a buffer). This is often a safe assumption. Then you can load your input into a 32-bit word and process all bytes at once, in a single register. This often called SWAR: In computer science, SWAR means SIMD within a register, which is a technique for performing parallel operations on data contained in a processor register. Jeroen Koekkoek first came up with a valid SWAR approach, but it was only about 40% than the naive approach in the case where we had unpredictable inputs, and no faster than the naive approach given predictable inputs. Let us examine an approach that might be competitive all around: int parse_uint8_fastswar(const char *str, size_t len, uint8_t *num) { union { uint8_t as_str[4]; uint32_t as_int; } digits; memcpy(&digits.as_int, str, sizeof(digits)); digits.as_int ^= 0x30303030lu; digits.as_int <<= (4 - (len & 0x3)) * 8; uint32_t y = ((UINT64_C(0x640a0100) * digits.as_int)>>32)&0xff; *num = (uint8_t)(y); return (digits.as_int & 0xf0f0f0f0) == 0 && y < 256 && len != 0 && len < 4; } Again, this code is a C function that takes a string of characters, its length, and a pointer to an unsigned 8-bit integer as input arguments. The function returns a boolean value indicating whether the input string can be parsed into an unsigned 8-bit integer or not. The function first initializes a union digits that contains an array of 4 unsigned 8-bit integers and a 32-bit unsigned integer. The function then copies the input string into the 32-bit unsigned integer using the memcpy function. The memcpy function copies the input string into the 32-bit unsigned integer. The function then flips the bits of the 32-bit unsigned integer using the XOR operator and the constant value 0x30303030lu. This operation converts each digit character in the input string to its corresponding decimal value. Indeed, the ASCII characters from 0 to 9 have code point values 0x30 to 0x39 in ASCII. The function then shifts the 32-bit unsigned integer to the left by a certain number of bits, depending on the length of the input string. This operation removes any trailing bytes that were not part of the input string. The next part is where I contributed to the routine. The function then multiplies the 32-bit unsigned integer by the constant value 0x640a0100 using a 64-bit unsigned integer. It is a concise way to do two multiplications (by 100 and by 10) and two sums at once. Observe that 0x64 is 100 and 0x0a is 10. The result of this multiplication is then shifted to the right by 32 bits and masked with the value 0xff. This operation extracts the least significant byte of the 32-bit unsigned integer, which represents the parsed unsigned 8-bit integer. Finally, the function assigns the value of the parsed unsigned 8-bit integer to the unsigned 8-bit integer pointed to by num. It then returns a boolean value indicating whether the parsed integer is less than 256 and the length of the input string is between 1 and 3 characters. To test these functions, I wrote a benchmark. The benchmark uses random inputs, or sequential inputs (0,1,...), and it ends up being very relevant. I use GCC 12 and an Ice Lake (Intel) linux server running at 3.2 GHz. I report the number of millions of numbers parsed by second. random numbers sequential numbers from_chars 145 M/s 255 M/s naive 210 M/s 345 M/s SWAR 450 M/s 450 M/s So the SWAR approach is effectively twice as fast as the naive approach when the inputs are unpredictable. Otherwise, for predictable inputs, we still have a 30% gain according to my numbers. The from_chars results are disappointing all around. Can you do better? The benchmark is available. Credit: I am grateful to Jeroen Koekkoek from NLnet Labs for suggesting this problem. Published by [2ca999] Daniel Lemire A computer science professor at the University of Quebec (TELUQ). View all posts by Daniel Lemire Posted on November 28, 2023November 28, 2023Author Daniel Lemire Categories Leave a Reply Cancel reply Your email address will not be published. To create code blocks or other preformatted text, indent by four spaces: This will be displayed in a monospaced font. The first four spaces will be stripped off, but all other whitespace will be preserved. Markdown is turned off in code blocks: [This is not a link](http://example.com) To create not a block, but an inline code span, use backticks: Here is some inline `code`. For more help see http://daringfireball.net/projects/markdown/syntax [ ] [ ] [ ] [ ] [ ] [ ] [ ] Comment * [ ] Name * [ ] Email * [ ] Website [ ] [ ] Save my name, email, and website in this browser for the next time I comment. Receive Email Notifications? [no, do not subscribe ] [instantly ] Or, you can subscribe without commenting. [Post Comment] [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] You may subscribe to this blog by email. Post navigation Previous Previous post: A simple WebSocket benchmark in Python Terms of use Proudly powered by WordPress