[HN Gopher] Demystifying bitwise operations, a gentle C tutorial
___________________________________________________________________
Demystifying bitwise operations, a gentle C tutorial
Author : todsacerdoti
Score : 217 points
Date : 2023-03-03 15:01 UTC (7 hours ago)
(HTM) web link (www.andreinc.net)
(TXT) w3m dump (www.andreinc.net)
| justincredible wrote:
| [dead]
| whb07 wrote:
| I've found significant code in c/c++ where bitwise operations are
| done for things like division etc by shifting a certain way.
|
| I Can imagine in the past, this was "faster", yet clang/gcc can
| emit the same by just writing a basic A/B function.
|
| Seems the win goes to readability by reducing some of these old
| school hacks.
|
| What say you, greybeards ?
| nomemory wrote:
| https://www.andreinc.net/2023/02/01/demystifying-bitwise-ops...
|
| With -O1 it performed the optimisation.
| dhosek wrote:
| Oh definitely. Some of this goes back to my 6502 assembly days
| when there was no hardware multiply instruction. So to multiply
| by 40, for example. I would shift right 3 bits, store the
| result, shift right 2 more bits and add the stored result.
|
| Similarly, a fast divisibility test (we'll assume we're
| dividing _n_ by some odd prime _p_ ):
|
| 1. Shift the bits of _p_ right so that there is a 1 in the last
| position.
|
| 2. If _n_ = _p_ then _p_ | _n_ , if _n_ < _p_ then _p_ [?] _n_
| , otherwise continue.
|
| 3. Subtract _p_ and go back to step 1.
|
| (One of my ADD habits during meetings is to find the prime
| factors phone numbers or anything else very long. I do
| something similar with the numbers in decimal, but for this,
| I'll subtract multiples of the potential divisor to get 0s at
| the end of the number in decimal. I remember co-workers
| puzzling over a piece of paper with my notes I left behind in a
| conference room trying to figure out what the numbers
| represented and how the process worked.)
| unwind wrote:
| Left, you would (obviously, this is a typo) shift left. And 3
| followed by 2 since 1<<3 is 8, and 1<<5 is 32 and 8+32 is 40.
| mras0 wrote:
| One thing to consider is that the compiler can't simply replace
| a division by just a right shift for signed variables (it will
| round towards -inf for negative numbers), so even today there's
| a tiny bit of benefit of explicitly using shifts if you know
| that the number can't be negative (and the compiler can't prove
| it) or you don't care about the rounding
| (https://godbolt.org/z/vTzYYxqz9).
|
| Of course that tiny bit of extra work is usually negligible,
| but might explain why the idiom has stuck around longer than
| you might otherwise expect.
| AlotOfReading wrote:
| I only have a little grey in my beard so far, but like all
| optimizations it heavily depends on context. My broad rule of
| thumb is that if you only care what the code does, you should
| let the compiler figure it out. If you care how the compiler
| accomplishes that goal, you should specify that rather than
| hoping things don't silently break in the future. This is a
| fairly common thing in crypto and systems code.
|
| But yes, fast inverse sqrt is obsolete.
| cornstalks wrote:
| Depends on the situation. The compiler is smart, but in a way
| it's also dumb. It's very good at recognizing certain patterns
| and optimizing them, but not all patterns are recognized, and
| thus not all optimizations are applied, let alone consistently
| applied.
|
| For example, see the article and discussion on Bitwise Division
| from two days ago:
| https://news.ycombinator.com/item?id=34981027
| JohnFen wrote:
| > I've found significant code in c/c++ where bitwise operations
| are done for things like division etc by shifting a certain
| way.
|
| Oh, yes. I used to do that sort of thing frequently because the
| time savings was significant enough. As you say, though,
| compilers have improved a great deal since then, so it's not
| generally needed anymore.
|
| If stupid bit tricks like that aren't necessary, they shouldn't
| be used. They do bring a readability/mental load cost with
| them.
| zh3 wrote:
| Nowaways you just write 'divout = divin/8192' and assume the
| compiler is going to do the right thing (and very possibly do
| something deeper than "divin>>12" at the assembler level).
|
| Makes me wonder who pays attention to this sort of thing these
| days :)
| JohnFen wrote:
| I do! When optimizing code that must run obscenely fast, I
| look at the assembly the compiler spits out to make sure that
| it can't be improved on, speed-wise.
|
| Usually, it can't -- but sometimes...
| wg0 wrote:
| I'm trying to find some good resources or some book that covers
| modern C.
|
| For everything else from Python to Golangz Rust and everything in
| between - there are tons of quality books and even their own
| documentation in some cases is pretty decent. But not so with C.
|
| If you read this and have something, please share. Just C (not
| much interested in C++)
| mannyv wrote:
| I thought this was going to be ridiculous but it's actually a
| really good. It's clear enough that you could extend it down to
| showing how logic gates work.
|
| The only thing missing is a little endian discussion; it assumes
| big endian (network byte order), and that may be confusing for
| all those x86 users out there.
| riceart wrote:
| > that may be confusing for all those x86 users out there.
|
| It's pretty much _every_ user these days - even most MIPS based
| network oriented devices run LE. BE lost many years ago.
| mannyv wrote:
| The Network is big-endian (network byte order).
| edflsafoiewq wrote:
| That's just a data serialization format, there are zillions
| of those. The important thing is the endianness of the CPU.
| nwallin wrote:
| Endianness isn't relevant to any of the stuff discussed in the
| article. It is equally applicable to both big endian and little
| endian architectures.
| alain94040 wrote:
| This long article is a great example of why I believe GPT has a
| strong future.
|
| The article spends several pages to explain hexadecimal, bases,
| etc. Probably some fraction of the audience already knows that.
| In that case, the article should automatically adapt and skip
| that section. Some people will react negatively to the
| mathematical formulation, preferring the intuitive section that
| follows instead.
|
| But this is a static blog post, so it doesn't know how to adapt.
| Imagine the same post, but with some toggles/sliders (you can
| come up with even more sophisticated mechanisms): I indicate my
| level of competency, and the article re-writes itself to match my
| knowledge. Skip the boring math, show a lot of examples, etc. Or
| the opposite. The point is that it adapts to you. GPT (or some
| future variant) is really good at doing this.
| ablob wrote:
| Sure, let's just skip on the minor details that modern
| computation rests on. Who needed that anyway.
|
| Although I do agree that repeating the basics on every post is
| cumbersome, it seems adequate for this post.
| antegamisou wrote:
| > Some people will react negatively to the mathematical
| formulation, preferring the intuitive section that follows
| instead.
|
| They have no place in software engineering then.
| Jtsummers wrote:
| GPT isn't needed. Just hyperlinks, like exist at the top of the
| document. Paired with the ability to expand sections either in
| place or in an adjacent view. State the prerequisite knowledge,
| then give a way to skip past the explanations or have the
| explanations hidden by default. Permit further expansions up to
| the limit of what the author cares to provide, can be added to
| later.
| nicoburns wrote:
| The problem with GPT is there's a good chance that it
| introduces information which is plain wrong while doing that
| transformation.
| cinntaile wrote:
| "Hey chatGPT, paraphrase my comment without the passive
| aggressive dismissal of its contents."
|
| In all seriousness though, this blogpost does exactly what it
| intends to do. Demistify bitwise operations and you can't skip
| the math.
| Dowwie wrote:
| Some very interesting advent of code contributions written in
| Rust, available on github, use bitwise operations. Shout out to
| Tim Visee! https://github.com/timvisee/advent-of-
| code-2021/blob/master/...
| tapatio wrote:
| The graphics are impressive. Specifically, I've never seen
| someone fold a "bit surface" in half like that. I've also never
| seen a ladder view. Assuming Andrei Ciobanu himself posted this,
| how did you come up with those views? Bored on a Friday night?
| CodeMage wrote:
| Heads up: the gray_code function does not return the results
| shown in the table of correspondence above it.
|
| I suspect that this is because both the table and the code used
| were sourced from Wikipedia, and they correspond to different
| Gray codes. The table is for the BRGC, but the implementation
| isn't.
| sebastianconcpt wrote:
| Articles like these seen today should make us value more the love
| and effort that requires to train people. Because there is no
| shortage of "love" and effort (and money) for training AI models.
| cactusplant7374 wrote:
| I don't understand why bit masking and manipulation is so popular
| when it makes the code impossible to read. I like using Ruby,
| string, and pack and unpack.
| flohofwoe wrote:
| C23 finally introduces binary literals, e.g. you can write now
| "0b11110000" instead of "0xf0" for situations where binaries
| are more readable (but with a bit of 'training', hexadecimal
| works just as well).
|
| Also, bitwise operations are essentially "SIMD for bits" and
| important down on the machine code level, it makes a lot of
| sense to have that same functionality also in higher level
| languages (unfortunately not all common bitwise instructions -
| like rotations - made it into high level languages).
|
| (edited)
| dahart wrote:
| Well one reason is that bit operations are significantly faster
| than other methods, it's one reason for example, why compilers
| will turn a divide by a constant into bit manipulation.
|
| Using a scripting language's string packing is hundreds of
| times slower than doing a couple of bit operations, and if
| there's a memory allocation involved, it can be thousands of
| times slower. If you're curious about this, I recommend doing
| some profiling to find out how fast your favorite algorithms
| are when using C bitwise operators compared to Ruby string
| packing.
|
| FWIW, having the code be readable is mostly a familiarity
| problem that you can resolve by practicing more bitwise
| operations, if you want. If you spend your days in Ruby, then
| there might not be strong reasons to, but if you're curious and
| want to improve, you might have fun playing in C or C++. I
| spend my days mostly in CUDA, and using bit manipulation is par
| for the course, failure to use the best tricks can result in
| much lower compute throughput and much higher power
| consumption.
| anthomtb wrote:
| Do Ruby's pack and unpack routines let you access a field
| smaller than a byte? What about a field that is split across a
| byte boundary?
|
| I agree that bit masking and manipulation is hard to read and
| can be misused by ego-driven programmers. But the "popularity"
| because because bit masking and manipulation are both
| fundamental to computer science and an absolute necessity in
| certain situations.
| klyrs wrote:
| Do you ever wonder how Ruby's pack and unpack are implemented?
|
| https://github.com/ruby/ruby/blob/4ce642620f10ae18171b41e166...
| sylware wrote:
| integer promotion yuminess...
| zh3 wrote:
| Funnily enough just spent a day tracking down a weird problem in
| an embedded system - some event timestamps were getting corrupted
| in a weird way. The pattern wasn't obvious until in desperation I
| dumped them in hex and found the second MSB was toggling between
| 0 and 1 (whereas it should have been been part of a count
| sequence). That told me exactly where to look - where the count
| was re-assembled from four bytes and I found this upstream
| (paraphrased):- uint8_t tx_data[64];
| .... tx_data[43] = utime>>24; tx_data[44] =
| utime>16; tx_data[45] = utime>>8; tx_data[46] =
| utime;
| Gibbon1 wrote:
| This is a good example
|
| Why C's implicit conversions are trash. Why big endian is
| trash. Why you should do a reverse copy instead of stuff like
| this.
| zoomablemind wrote:
| tx_data[44] = utime>16;
|
| Good catch!
|
| I wonder if this would have been flagged with -Wall?
| edflsafoiewq wrote:
| Nope. There's no warning for bool->int conversion:
| https://stackoverflow.com/questions/28716391/gcc-forbid-
| impl...
| HarHarVeryFunny wrote:
| Somewhat oddly even in C++, where bool is a distinct type,
| g++ doesn't have any warning for implicit bool to int
| conversion, although clang's clang-tidy "linter" tool does.
| zh3 wrote:
| Nope, code was clean with -Wall on arm-9 compiler (i.e. gcc).
|
| Interesting thought now you say that (no warning) - only
| found it because of seeing the pattern (apropos the article)
| and from that having a good idea what was causing it (knowing
| the int was assembled a byte at a time).
|
| If I had a criticism of modern compilers, it's the blizzard
| of uninteresting warnings ("strncmp takes const char star,
| did you really mean to pass it unsigned char star") that make
| people want to not use -Wall.
| JohnFen wrote:
| All warnings are uninteresting until they're not.
|
| > ("strncmp takes const char star, did you really mean to
| pass it unsigned char star")
|
| This warning (with a different function) actually saved my
| bacon once, pointing me to a very obscure bug in the code.
|
| My practice is to use -Wall and make sure that the code
| compiles without any warnings at all. Then I don't have a
| deluge of warnings to wade through.
| kps wrote:
| clang-tidy has an open issue to catch this.
| https://github.com/llvm/llvm-project/issues/56009
| layer8 wrote:
| This is one case where lining up values (and surrounding
| operators with spaces) can be good practice:
| tx_data[43] = utime >> 24; tx_data[44] = utime >> 16;
| tx_data[45] = utime >> 8; tx_data[46] = utime >> 0;
|
| Or even: tx_data[43] = utime >> (3 * 8);
| tx_data[44] = utime >> (2 * 8); tx_data[45] = utime >>
| (1 * 8); tx_data[46] = utime >> (0 * 8);
| mtlmtlmtlmtl wrote:
| Yep. Bugs like this just pop out at you this way.
|
| Having spent a lot of time digging into C code and doing
| microcontroller programming professionally, I have learned a
| certain disdain for the practice of using minimal whitespace
| in expressions, and the related practice of minimising LOC
| using C's very dense expressions syntax(unary dec/inc
| operators, the fact that assignment is itself an expression,
| etc).
|
| The dense expressions syntax is pretty much a holdover from
| the time C was invented by Dennis Ritchie for the purpose of
| porting Unix; when t had to be typed on a painfully slow
| electromechanical teletype. This is also why most Unix
| commands are 2-4 characters long.
|
| But it serves very little meaningful purpose today. It's
| important to remember that you might be able to turn 3 lines
| of code into 1, but it'll still be the same amount of machine
| code. And it will probably be harder to read, harder to
| change, and harder to reason about. And it can be a lot
| easier to miss some edgecase or even a typo like in this
| case. I have seen such a silly amount of off-by-one errors in
| C code due to some subtle misuse of unary increment/decrement
| that I stopped using those operators altogether. They don't
| improve your software in any meaningful way.
| calvinmorrison wrote:
| For many years I have recommended "Code" by Charles Petzold and
| published by microsoft. It's a great bathroom reader, reading on
| a road trip, or over a spaghetti dinner. It covers bottom up how
| computers work, starting out with - well what the heck are
| numbers and works up from there.
| jjgreen wrote:
| That's a nicely written piece, I wish it had been around when I
| was first struggling with bit-mangling.
| jalino23 wrote:
| I wish I have a use case for this as a web dev. I just haven't
| found the need for this so Ill just forget this again
| JohnFen wrote:
| Sounds like you need a hobby project to exercise those new
| muscles.
| amenghra wrote:
| A related puzzle: https://www.quaxio.com/know_your_bits/
|
| More such puzzles:
| http://www.cs.cmu.edu/afs/cs/academic/class/15213-f02/www/L1...
| and http://csapp.cs.cmu.edu/public/datalab.pdf
|
| Bit Twiddling Hacks:
| http://graphics.stanford.edu/~seander/bithacks.html
| layer8 wrote:
| Bitwise NOT (~) gives you a bijection between negative and
| nonnegative integers in two's complement representation (which
| negation doesn't).
|
| This is for example exploited by the return value of Java's
| binarySearch() function, which returns the (nonnegative) index of
| the search key when found, or else the (negative) bitwise NOT of
| the index where the key would have to be inserted [0]. In other
| words, it combines a nonnegative int value plus a flag into one
| int, while making the flag easily testable (< 0) and the value
| easily flippable (~). Strangely, the API doc doesn't mention
| bitwise NOT, but instead expresses it as numeric negation minus
| one (which is equivalent, as TFA explains).
|
| [0] As opposed to C's bsearch(), which only returns a position
| when the key was found.
| Someone wrote:
| FTA: _"Based on the above picture, another important observation
| is that to represent the number 1078 in binary, we need at least
| ten memory cells (bits) for it (look at the most significant
| power of 2 used, which is 10)"_
|
| As the picture shows, we need 11 bits, with powers ranging from
| zero to ten, both included (under the implicit assumption that we
| want to represent all integers between 0 and 1078. If all we want
| to represent is 1078, we can do with one or, pedantically, zero
| bits)
| philiplu wrote:
| I suppose the online list of hacks at the 1st reference in the OP
| [1] makes more sense, but I've got a fondness for the book
| "Hacker's Delight", which I've owned and browsed for many years
| now [2]. And checking on that, I see there was a 2nd edition
| issued 10 years ago. Hmm, might need to pick that up.
|
| [1] https://graphics.stanford.edu/~seander/bithacks.html [2]
| https://en.wikipedia.org/wiki/Hacker's_Delight
| _a_a_a_ wrote:
| There are quite a lot of bit hacks on the web, but Hackers
| Delight is where it took off for me.It was a massive eye-
| opener, what was possible and even better doing it the old-
| fashioned way.
|
| The second book is very, very heavy on division and as such
| it's not really as much 'fun' as the original, however I'd
| still recommend it!
|
| I shared the original Hackers delight with a work colleague,
| who had a mathematical bent and he contacted Henry Warren with
| a possible addition for the forthcoming second edition, Morton
| curves (https://en.wikipedia.org/wiki/Z-order_curve) which are
| extremely simple to calculate, much more so than the space
| filling curve given in Hackers Delight, but which despite the
| author's interested response to us, did not go into the second
| edition. My colleague was very disappointed. Me too. Morton
| curves are just interlace-the-bits so would have slotted in so
| well.
|
| Sadly there won't be a third edition as the author died.I found
| out when I contacted him to let him know his website was down
| (again!). His family let me know he had gone.
| morelisp wrote:
| The last time I reached for Hacker's Delight for "productive"
| use it was specifically for this chapter (I was aware of
| space-filling curves from some years in the games industry
| but had never implemented one), and I remember being
| disappointed at the lack of depth and options compared to the
| rest of the book.
| chongli wrote:
| _The second book is very, very heavy on division and as such
| it 's not really as much 'fun' as the original, however I'd
| still recommend it!_
|
| What do you mean by this? Did they merely add a ton of
| information on division? Or did they take out the 'fun' bits
| from the original and replace them with division stuff?
| radicalbyte wrote:
| I have the second edition, back around 2010 I made a bit of a
| splurge on books. Super handy when you need it; which in my
| case has only been one time. Don't spend much time on low-
| level code nowadays.
| detrites wrote:
| While it's based on C, a large chunk of the content is simply a
| great introduction to number representations in other bases such
| as binary and hexadecimal etc. Fundamental knowledge and very
| well explained, and a few insights I've not seen before.
| nomemory wrote:
| Thanks, I wanted to post it after it was finished and revised.
| I also wanted to add more sections, but somebody was faster
| than I expected (thanks for posting it tod!). I suppose Getting
| to hn will add some valuable feedback in the first place.
|
| I will probably add some more content in the coming weeks.
| tester756 wrote:
| >Fundamental knowledge
|
| yes? no? hard to say
|
| The number of times I've needed this knowledge during all years
| of formal education and then years of work would be probably
| around 3.
|
| Then I started working with C and close to hardware and it
| became something that I need everyday.
|
| It's feels like bit proficiency is only useful in some very
| specific domains.
|
| Thought: is HTTP foundational knowledge nowadays? after all
| whole world is built on it
|
| if not, when will it become?
| sumtechguy wrote:
| Like you say it really depends on what your job is. For me
| those bit algo's for a few years were my jam. I was moving
| data between 4 different CPU types and across 3 different
| network transports depending on the project and 4 different
| OS's. You need to know your bits when moving data between
| diff arch types and across different transport layers. These
| days most of that same sort of thing I did then? I would drop
| it into a json text stream and call it a day.
| JohnFen wrote:
| I think it's fundamental because understanding it goes hand-
| in-hand with understanding how computers actually work. When
| I interview applicants, I usually ask one or two questions
| about bitwise operations for this reason -- an engineer who
| knows how machines work at a very low level is valuable even
| if their work is not low level.
| dahfizz wrote:
| If you're a web developer and your product relies on HTTP,
| then HTTP is absolutely fundamental knowledge.
|
| Similarly, if you're a web developer and your language
| doesn't even have integers, then understanding twos
| compliment is probably not fundamental.
|
| That said, the people who proudly know the very least amount
| possible to perform their day job usually are not top
| performers.
| whateveracct wrote:
| I think if you work with UUIDs, it's worth learning about
| bitwise stuff. So backend developers in general. It's not a
| day-to-day skill, but it is one I've used at just about every
| job in the last decade to elegantly and efficiently solve
| hard problems regarding idempotency and randomness.
| JohnFen wrote:
| Or if you work directly with hardware, as I do. I use
| bitwise stuff constantly. Or if you're doing certain
| obscure kinds of highly optimized mathematical operations.
| plugin-baby wrote:
| When would you want to perform bitwise operations on a
| UUID?
___________________________________________________________________
(page generated 2023-03-03 23:00 UTC)