https://lemire.me/blog/2023/11/07/generating-arrays-at-compile-time-in-c-with-lambdas/ Skip to content Daniel Lemire's blog Daniel Lemire is a computer science professor at the Data Science Laboratory of the Universite du Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist and a free-speech advocate. Menu and widgets * My home page * My papers * My software Join over 12,500 email subscribers: [ ][Go!] You can follow this blog on telegram. You can find me on twitter as @lemire or on Mastodon. Search for: [ ] [Search] Support my work! I do not accept any advertisement. However, you can you can sponsor my open-source work on GitHub. Recent Posts * Generating arrays at compile-time in C++ with lambdas * Appending to an std::string character-by-character: how does the capacity grow? * For processing strings, streams in C++ can be slow * How many billions of transistors in your iPhone processor? * Randomness in programming (with Go code) Recent Comments * Daniel Lemire on How many billions of transistors in your iPhone processor? * Sam Mason on How many billions of transistors in your iPhone processor? * Daniel Lemire on How many billions of transistors in your iPhone processor? * Sam Mason on How many billions of transistors in your iPhone processor? * Daniel Lemire on How many billions of transistors in your iPhone processor? Pages * A short history of technology * About me * Book recommendations * Cognitive biases * Interviews and talks * My bets * My favorite articles * My favorite quotes * My rules * Newsletter * Predictions * Privacy Policy * Recommended video games * Terms of use * Write good papers Archives Archives [Select Month ] Boring stuff * Log in * Entries feed * Comments feed * WordPress.org Generating arrays at compile-time in C++ with lambdas Suppose that you want to check whether a character in C++ belongs to a fixed set, such as '\0', '\x09', '\x0a','\x0d', ' ', '#', '/', ':', '<', '>', '?', '@', '[', '\\', ']', '^', '|'. A simple way is to generate a 256-byte array of Boolean values and lookup the value. This approach is sometimes called memoization (and not memorization!!!). You might do it as follows: constexpr static bool is_forbidden_host_code_point_table[] = { 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; bool is_forbidden_host_code_point(char c) { return is_forbidden_host_code_point_table[uint8_t(c)]; } It is reasonably efficient in practice. Some people might object to how the table is generated. Can you have the C++ compiler generate the array at compile-time from a function? Using C++17, you might do it with an std::array as follows: constexpr static std::array is_forbidden_array = []() { std::array result{}; for (uint8_t c : {'\0', '\x09', '\x0a','\x0d', ' ', '#', '/', ':', '<', '>', '?', '@', '[', '\\', ']', '^', '|'}) { result[c] = true; } return result; }(); bool is_forbidden_host_code_point_array(char c) { return is_forbidden_array[uint8_t(c)]; } These two approaches should be equivalent in practice. The might compile down to a single lookup instruction, or the equivalent. You may want to compare it against the default (without memoization) which might be... bool is_forbidden_host_code_point_default(char c) noexcept { return c == '\0' || c == '\x09' || c == '\x0a' || c == '\x0d' || c == ' ' || c == '#' || c == '/'|| c == ':' || c == '<'|| c == '>' || c == '?' || c == '@' || c == '['|| c == '\\' || c == ']'|| c == '^' || c == '|'; } A compiler like GCC might compile this routine to a bitset approach such as ... cmp dil, 62 ja .L2 movabs rax, 6052978675329017345 bt rax, rdi setc al ret .L2: sub edi, 63 cmp dil, 61 ja .L4 movabs rax, 2305843013240225795 bt rax, rdi setc al ret The default approach certainly generates more instructions, and might be less efficient in some cases. Published by [2ca999] Daniel Lemire A computer science professor at the University of Quebec (TELUQ). View all posts by Daniel Lemire Posted on November 7, 2023November 7, 2023Author Daniel Lemire Categories Leave a Reply Cancel reply Your email address will not be published. To create code blocks or other preformatted text, indent by four spaces: This will be displayed in a monospaced font. The first four spaces will be stripped off, but all other whitespace will be preserved. Markdown is turned off in code blocks: [This is not a link](http://example.com) To create not a block, but an inline code span, use backticks: Here is some inline `code`. For more help see http://daringfireball.net/projects/markdown/syntax [ ] [ ] [ ] [ ] [ ] [ ] [ ] Comment * [ ] Name * [ ] Email * [ ] Website [ ] [ ] Save my name, email, and website in this browser for the next time I comment. Receive Email Notifications? [no, do not subscribe ] [instantly ] Or, you can subscribe without commenting. [Post Comment] [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] You may subscribe to this blog by email. Post navigation Previous Previous post: Appending to an std::string character-by-character: how does the capacity grow? Terms of use Proudly powered by WordPress