https://lemire.me/blog/2022/09/14/escaping-strings-faster-with-avx-512/ Skip to content Daniel Lemire's blog Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist and a free-speech advocate. Menu and widgets * My home page * My papers * My software Subscribe Join 12,500 subscribers: Email Address [ ] [ ] [Subscribe by email] You can also follow this blog on telegram. Search for: [ ] [Search] Support my work! I do not accept any advertisement. However, you can support the blog with donations through paypal. Please consider getting in touch if you are a supporter so that I can thank you. You can also support my work on GitHub. Recent Posts * Escaping strings faster with AVX-512 * Science and Technology links (September 12 2022) * Catching sanitizer errors programmatically * "Hello world" is slower in C++ than in C (Linux) * Science and Technology links (August 7 2022) Recent Comments * Leo Boytsov on Science and Technology links (September 12 2022) * Daniel Lemire on Comparing strtod with from_chars (GCC 12) * Tobin Baker on How fast does interpolation search converge? * Anupam Kapoor on Comparing strtod with from_chars (GCC 12) * Dominic Amann on "Hello world" is slower in C++ than in C (Linux) Pages * A short history of technology * About me * Book recommendations * Cognitive biases * Interviews and talks * My bets * My favorite articles * My favorite quotes * My readers * My sayings * Predictions * Recommended video games * Terms of use * Write good papers Archives Archives [Select Month ] Boring stuff * Log in * Entries feed * Comments feed * WordPress.org Escaping strings faster with AVX-512 When programming, we often have to 'escape' strings. A standard way to do it is to insert the backslash character (\) before some characters such as the double quote. For example, the string my title is "La vie" becomes my title is \"La vie\" A simple routine in C++ to escape a string might look as follows: for (...) { if ((*in == '\\') || (*in == '"')) { *out++ = '\\'; } *out++ = *in; } Such a character-by-character approach is unlikely to provide the best possible performance on modern hardware. Recent Intel processors have fast instructions (AVX-512) that are well suited for such problems. I decided to sketch a solution using Intel intrinsic functions. The routine goes as follows: 1. I use two constant registers containing 64 copies of the backslash character and 64 copies of the quote characters. 2. I start a loop by loading 32 bytes from the input. 3. I expands these 32 bytes into a 64 byte register, interleaving zero bytes. 4. I copy these bytes with the quotes and backslash characters. 5. From the resulting mask, I then construct (by shifting and blending) escaped characters. 6. I 'compress' the result, removing the zero bytes that appear before the unescaped characters. 7. I advance the output pointer by the number of written bytes and I continue the loop. The C++ code roughly looks like this... __m512i solidus = _mm512_set1_epi8('\\'); __m512i quote = _mm512_set1_epi8('"'); for (; in + 32 <= finalin; in += 32) { __m256i input = _mm256_loadu_si256(in); __m512i input1 = _mm512_cvtepu8_epi16(input); __mmask64 is_solidus = _mm512_cmpeq_epi8_mask(input1, solidus); __mmask64 is_quote = _mm512_cmpeq_epi8_mask(input1, quote); __mmask64 is_quote_or_solidus = _kor_mask64(is_solidus, is_quote); __mmask64 to_keep = _kor_mask64(is_quote_or_solidus, 0xaaaaaaaaaaaaaaaa); __m512i shifted_input1 = _mm512_bslli_epi128(input1, 1); __m512i escaped = _mm512_mask_blend_epi8(is_quote_or_solidus, shifted_input1, solidus); _mm512_mask_compressstoreu_epi8(out, to_keep, escaped); out += _mm_popcnt_u64(_cvtmask64_u64(to_keep)); } This code can be greatly improved. Nevertheless, it is a good first step. What are the results an Intel icelake processor using GCC 11 (Linux) ? A simple benchmark indicates a 5x performance boost compared to a naive implementation: regular code 0.6 ns/character AVX-512 code 0.1 ns/character It looks quite encouraging ! My source code is available. I require a recent x64 processor with AVX-512 VBMI2 support. Published by [2ca999] Daniel Lemire A computer science professor at the University of Quebec (TELUQ). View all posts by Daniel Lemire Posted on September 14, 2022September 14, 2022Author Daniel Lemire Categories Leave a Reply Cancel reply Your email address will not be published. The comment form expects plain text. If you need to format your text, you can use HTML elements such strong, blockquote, cite, code and em. For formatting code as HTML automatically, I recommend tohtml.com. [ ] [ ] [ ] [ ] [ ] [ ] [ ] Comment * [ ] Name * [ ] Email * [ ] Website [ ] [ ] Save my name, email, and website in this browser for the next time I comment. Receive Email Notifications? [no, do not subscribe ] [instantly ] Or, you can subscribe without commenting. [Post Comment] [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] You may subscribe to this blog by email. Post navigation Previous Previous post: Science and Technology links (September 12 2022) Terms of use Proudly powered by WordPress