hngopher.com

       [HN Gopher] Demystifying bitwise operations, a gentle C tutorial
       ___________________________________________________________________
        
       Demystifying bitwise operations, a gentle C tutorial
        
       Author : todsacerdoti
       Score  : 217 points
       Date   : 2023-03-03 15:01 UTC (7 hours ago)
        
 (HTM) web link (www.andreinc.net)
 (TXT) w3m dump (www.andreinc.net)
        
       | justincredible wrote:
       | [dead]
        
       | whb07 wrote:
       | I've found significant code in c/c++ where bitwise operations are
       | done for things like division etc by shifting a certain way.
       | 
       | I Can imagine in the past, this was "faster", yet clang/gcc can
       | emit the same by just writing a basic A/B function.
       | 
       | Seems the win goes to readability by reducing some of these old
       | school hacks.
       | 
       | What say you, greybeards ?
        
         | nomemory wrote:
         | https://www.andreinc.net/2023/02/01/demystifying-bitwise-ops...
         | 
         | With -O1 it performed the optimisation.
        
         | dhosek wrote:
         | Oh definitely. Some of this goes back to my 6502 assembly days
         | when there was no hardware multiply instruction. So to multiply
         | by 40, for example. I would shift right 3 bits, store the
         | result, shift right 2 more bits and add the stored result.
         | 
         | Similarly, a fast divisibility test (we'll assume we're
         | dividing _n_ by some odd prime _p_ ):
         | 
         | 1. Shift the bits of _p_ right so that there is a 1 in the last
         | position.
         | 
         | 2. If _n_ = _p_ then _p_ | _n_ , if _n_ < _p_ then _p_ [?] _n_
         | , otherwise continue.
         | 
         | 3. Subtract _p_ and go back to step 1.
         | 
         | (One of my ADD habits during meetings is to find the prime
         | factors phone numbers or anything else very long. I do
         | something similar with the numbers in decimal, but for this,
         | I'll subtract multiples of the potential divisor to get 0s at
         | the end of the number in decimal. I remember co-workers
         | puzzling over a piece of paper with my notes I left behind in a
         | conference room trying to figure out what the numbers
         | represented and how the process worked.)
        
           | unwind wrote:
           | Left, you would (obviously, this is a typo) shift left. And 3
           | followed by 2 since 1<<3 is 8, and 1<<5 is 32 and 8+32 is 40.
        
         | mras0 wrote:
         | One thing to consider is that the compiler can't simply replace
         | a division by just a right shift for signed variables (it will
         | round towards -inf for negative numbers), so even today there's
         | a tiny bit of benefit of explicitly using shifts if you know
         | that the number can't be negative (and the compiler can't prove
         | it) or you don't care about the rounding
         | (https://godbolt.org/z/vTzYYxqz9).
         | 
         | Of course that tiny bit of extra work is usually negligible,
         | but might explain why the idiom has stuck around longer than
         | you might otherwise expect.
        
         | AlotOfReading wrote:
         | I only have a little grey in my beard so far, but like all
         | optimizations it heavily depends on context. My broad rule of
         | thumb is that if you only care what the code does, you should
         | let the compiler figure it out. If you care how the compiler
         | accomplishes that goal, you should specify that rather than
         | hoping things don't silently break in the future. This is a
         | fairly common thing in crypto and systems code.
         | 
         | But yes, fast inverse sqrt is obsolete.
        
         | cornstalks wrote:
         | Depends on the situation. The compiler is smart, but in a way
         | it's also dumb. It's very good at recognizing certain patterns
         | and optimizing them, but not all patterns are recognized, and
         | thus not all optimizations are applied, let alone consistently
         | applied.
         | 
         | For example, see the article and discussion on Bitwise Division
         | from two days ago:
         | https://news.ycombinator.com/item?id=34981027
        
         | JohnFen wrote:
         | > I've found significant code in c/c++ where bitwise operations
         | are done for things like division etc by shifting a certain
         | way.
         | 
         | Oh, yes. I used to do that sort of thing frequently because the
         | time savings was significant enough. As you say, though,
         | compilers have improved a great deal since then, so it's not
         | generally needed anymore.
         | 
         | If stupid bit tricks like that aren't necessary, they shouldn't
         | be used. They do bring a readability/mental load cost with
         | them.
        
         | zh3 wrote:
         | Nowaways you just write 'divout = divin/8192' and assume the
         | compiler is going to do the right thing (and very possibly do
         | something deeper than "divin>>12" at the assembler level).
         | 
         | Makes me wonder who pays attention to this sort of thing these
         | days :)
        
           | JohnFen wrote:
           | I do! When optimizing code that must run obscenely fast, I
           | look at the assembly the compiler spits out to make sure that
           | it can't be improved on, speed-wise.
           | 
           | Usually, it can't -- but sometimes...
        
       | wg0 wrote:
       | I'm trying to find some good resources or some book that covers
       | modern C.
       | 
       | For everything else from Python to Golangz Rust and everything in
       | between - there are tons of quality books and even their own
       | documentation in some cases is pretty decent. But not so with C.
       | 
       | If you read this and have something, please share. Just C (not
       | much interested in C++)
        
       | mannyv wrote:
       | I thought this was going to be ridiculous but it's actually a
       | really good. It's clear enough that you could extend it down to
       | showing how logic gates work.
       | 
       | The only thing missing is a little endian discussion; it assumes
       | big endian (network byte order), and that may be confusing for
       | all those x86 users out there.
        
         | riceart wrote:
         | > that may be confusing for all those x86 users out there.
         | 
         | It's pretty much _every_ user these days - even most MIPS based
         | network oriented devices run LE. BE lost many years ago.
        
           | mannyv wrote:
           | The Network is big-endian (network byte order).
        
             | edflsafoiewq wrote:
             | That's just a data serialization format, there are zillions
             | of those. The important thing is the endianness of the CPU.
        
         | nwallin wrote:
         | Endianness isn't relevant to any of the stuff discussed in the
         | article. It is equally applicable to both big endian and little
         | endian architectures.
        
       | alain94040 wrote:
       | This long article is a great example of why I believe GPT has a
       | strong future.
       | 
       | The article spends several pages to explain hexadecimal, bases,
       | etc. Probably some fraction of the audience already knows that.
       | In that case, the article should automatically adapt and skip
       | that section. Some people will react negatively to the
       | mathematical formulation, preferring the intuitive section that
       | follows instead.
       | 
       | But this is a static blog post, so it doesn't know how to adapt.
       | Imagine the same post, but with some toggles/sliders (you can
       | come up with even more sophisticated mechanisms): I indicate my
       | level of competency, and the article re-writes itself to match my
       | knowledge. Skip the boring math, show a lot of examples, etc. Or
       | the opposite. The point is that it adapts to you. GPT (or some
       | future variant) is really good at doing this.
        
         | ablob wrote:
         | Sure, let's just skip on the minor details that modern
         | computation rests on. Who needed that anyway.
         | 
         | Although I do agree that repeating the basics on every post is
         | cumbersome, it seems adequate for this post.
        
         | antegamisou wrote:
         | > Some people will react negatively to the mathematical
         | formulation, preferring the intuitive section that follows
         | instead.
         | 
         | They have no place in software engineering then.
        
         | Jtsummers wrote:
         | GPT isn't needed. Just hyperlinks, like exist at the top of the
         | document. Paired with the ability to expand sections either in
         | place or in an adjacent view. State the prerequisite knowledge,
         | then give a way to skip past the explanations or have the
         | explanations hidden by default. Permit further expansions up to
         | the limit of what the author cares to provide, can be added to
         | later.
        
         | nicoburns wrote:
         | The problem with GPT is there's a good chance that it
         | introduces information which is plain wrong while doing that
         | transformation.
        
         | cinntaile wrote:
         | "Hey chatGPT, paraphrase my comment without the passive
         | aggressive dismissal of its contents."
         | 
         | In all seriousness though, this blogpost does exactly what it
         | intends to do. Demistify bitwise operations and you can't skip
         | the math.
        
       | Dowwie wrote:
       | Some very interesting advent of code contributions written in
       | Rust, available on github, use bitwise operations. Shout out to
       | Tim Visee! https://github.com/timvisee/advent-of-
       | code-2021/blob/master/...
        
       | tapatio wrote:
       | The graphics are impressive. Specifically, I've never seen
       | someone fold a "bit surface" in half like that. I've also never
       | seen a ladder view. Assuming Andrei Ciobanu himself posted this,
       | how did you come up with those views? Bored on a Friday night?
        
       | CodeMage wrote:
       | Heads up: the gray_code function does not return the results
       | shown in the table of correspondence above it.
       | 
       | I suspect that this is because both the table and the code used
       | were sourced from Wikipedia, and they correspond to different
       | Gray codes. The table is for the BRGC, but the implementation
       | isn't.
        
       | sebastianconcpt wrote:
       | Articles like these seen today should make us value more the love
       | and effort that requires to train people. Because there is no
       | shortage of "love" and effort (and money) for training AI models.
        
       | cactusplant7374 wrote:
       | I don't understand why bit masking and manipulation is so popular
       | when it makes the code impossible to read. I like using Ruby,
       | string, and pack and unpack.
        
         | flohofwoe wrote:
         | C23 finally introduces binary literals, e.g. you can write now
         | "0b11110000" instead of "0xf0" for situations where binaries
         | are more readable (but with a bit of 'training', hexadecimal
         | works just as well).
         | 
         | Also, bitwise operations are essentially "SIMD for bits" and
         | important down on the machine code level, it makes a lot of
         | sense to have that same functionality also in higher level
         | languages (unfortunately not all common bitwise instructions -
         | like rotations - made it into high level languages).
         | 
         | (edited)
        
         | dahart wrote:
         | Well one reason is that bit operations are significantly faster
         | than other methods, it's one reason for example, why compilers
         | will turn a divide by a constant into bit manipulation.
         | 
         | Using a scripting language's string packing is hundreds of
         | times slower than doing a couple of bit operations, and if
         | there's a memory allocation involved, it can be thousands of
         | times slower. If you're curious about this, I recommend doing
         | some profiling to find out how fast your favorite algorithms
         | are when using C bitwise operators compared to Ruby string
         | packing.
         | 
         | FWIW, having the code be readable is mostly a familiarity
         | problem that you can resolve by practicing more bitwise
         | operations, if you want. If you spend your days in Ruby, then
         | there might not be strong reasons to, but if you're curious and
         | want to improve, you might have fun playing in C or C++. I
         | spend my days mostly in CUDA, and using bit manipulation is par
         | for the course, failure to use the best tricks can result in
         | much lower compute throughput and much higher power
         | consumption.
        
         | anthomtb wrote:
         | Do Ruby's pack and unpack routines let you access a field
         | smaller than a byte? What about a field that is split across a
         | byte boundary?
         | 
         | I agree that bit masking and manipulation is hard to read and
         | can be misused by ego-driven programmers. But the "popularity"
         | because because bit masking and manipulation are both
         | fundamental to computer science and an absolute necessity in
         | certain situations.
        
         | klyrs wrote:
         | Do you ever wonder how Ruby's pack and unpack are implemented?
         | 
         | https://github.com/ruby/ruby/blob/4ce642620f10ae18171b41e166...
        
       | sylware wrote:
       | integer promotion yuminess...
        
       | zh3 wrote:
       | Funnily enough just spent a day tracking down a weird problem in
       | an embedded system - some event timestamps were getting corrupted
       | in a weird way. The pattern wasn't obvious until in desperation I
       | dumped them in hex and found the second MSB was toggling between
       | 0 and 1 (whereas it should have been been part of a count
       | sequence). That told me exactly where to look - where the count
       | was re-assembled from four bytes and I found this upstream
       | (paraphrased):-                   uint8_t tx_data[64];
       | ....         tx_data[43] = utime>>24;         tx_data[44] =
       | utime>16;         tx_data[45] = utime>>8;         tx_data[46] =
       | utime;
        
         | Gibbon1 wrote:
         | This is a good example
         | 
         | Why C's implicit conversions are trash. Why big endian is
         | trash. Why you should do a reverse copy instead of stuff like
         | this.
        
         | zoomablemind wrote:
         | tx_data[44] = utime>16;
         | 
         | Good catch!
         | 
         | I wonder if this would have been flagged with -Wall?
        
           | edflsafoiewq wrote:
           | Nope. There's no warning for bool->int conversion:
           | https://stackoverflow.com/questions/28716391/gcc-forbid-
           | impl...
        
             | HarHarVeryFunny wrote:
             | Somewhat oddly even in C++, where bool is a distinct type,
             | g++ doesn't have any warning for implicit bool to int
             | conversion, although clang's clang-tidy "linter" tool does.
        
           | zh3 wrote:
           | Nope, code was clean with -Wall on arm-9 compiler (i.e. gcc).
           | 
           | Interesting thought now you say that (no warning) - only
           | found it because of seeing the pattern (apropos the article)
           | and from that having a good idea what was causing it (knowing
           | the int was assembled a byte at a time).
           | 
           | If I had a criticism of modern compilers, it's the blizzard
           | of uninteresting warnings ("strncmp takes const char star,
           | did you really mean to pass it unsigned char star") that make
           | people want to not use -Wall.
        
             | JohnFen wrote:
             | All warnings are uninteresting until they're not.
             | 
             | > ("strncmp takes const char star, did you really mean to
             | pass it unsigned char star")
             | 
             | This warning (with a different function) actually saved my
             | bacon once, pointing me to a very obscure bug in the code.
             | 
             | My practice is to use -Wall and make sure that the code
             | compiles without any warnings at all. Then I don't have a
             | deluge of warnings to wade through.
        
           | kps wrote:
           | clang-tidy has an open issue to catch this.
           | https://github.com/llvm/llvm-project/issues/56009
        
         | layer8 wrote:
         | This is one case where lining up values (and surrounding
         | operators with spaces) can be good practice:
         | tx_data[43] = utime >> 24;         tx_data[44] = utime >> 16;
         | tx_data[45] = utime >>  8;         tx_data[46] = utime >>  0;
         | 
         | Or even:                   tx_data[43] = utime >> (3 * 8);
         | tx_data[44] = utime >> (2 * 8);         tx_data[45] = utime >>
         | (1 * 8);         tx_data[46] = utime >> (0 * 8);
        
           | mtlmtlmtlmtl wrote:
           | Yep. Bugs like this just pop out at you this way.
           | 
           | Having spent a lot of time digging into C code and doing
           | microcontroller programming professionally, I have learned a
           | certain disdain for the practice of using minimal whitespace
           | in expressions, and the related practice of minimising LOC
           | using C's very dense expressions syntax(unary dec/inc
           | operators, the fact that assignment is itself an expression,
           | etc).
           | 
           | The dense expressions syntax is pretty much a holdover from
           | the time C was invented by Dennis Ritchie for the purpose of
           | porting Unix; when t had to be typed on a painfully slow
           | electromechanical teletype. This is also why most Unix
           | commands are 2-4 characters long.
           | 
           | But it serves very little meaningful purpose today. It's
           | important to remember that you might be able to turn 3 lines
           | of code into 1, but it'll still be the same amount of machine
           | code. And it will probably be harder to read, harder to
           | change, and harder to reason about. And it can be a lot
           | easier to miss some edgecase or even a typo like in this
           | case. I have seen such a silly amount of off-by-one errors in
           | C code due to some subtle misuse of unary increment/decrement
           | that I stopped using those operators altogether. They don't
           | improve your software in any meaningful way.
        
       | calvinmorrison wrote:
       | For many years I have recommended "Code" by Charles Petzold and
       | published by microsoft. It's a great bathroom reader, reading on
       | a road trip, or over a spaghetti dinner. It covers bottom up how
       | computers work, starting out with - well what the heck are
       | numbers and works up from there.
        
       | jjgreen wrote:
       | That's a nicely written piece, I wish it had been around when I
       | was first struggling with bit-mangling.
        
       | jalino23 wrote:
       | I wish I have a use case for this as a web dev. I just haven't
       | found the need for this so Ill just forget this again
        
         | JohnFen wrote:
         | Sounds like you need a hobby project to exercise those new
         | muscles.
        
       | amenghra wrote:
       | A related puzzle: https://www.quaxio.com/know_your_bits/
       | 
       | More such puzzles:
       | http://www.cs.cmu.edu/afs/cs/academic/class/15213-f02/www/L1...
       | and http://csapp.cs.cmu.edu/public/datalab.pdf
       | 
       | Bit Twiddling Hacks:
       | http://graphics.stanford.edu/~seander/bithacks.html
        
       | layer8 wrote:
       | Bitwise NOT (~) gives you a bijection between negative and
       | nonnegative integers in two's complement representation (which
       | negation doesn't).
       | 
       | This is for example exploited by the return value of Java's
       | binarySearch() function, which returns the (nonnegative) index of
       | the search key when found, or else the (negative) bitwise NOT of
       | the index where the key would have to be inserted [0]. In other
       | words, it combines a nonnegative int value plus a flag into one
       | int, while making the flag easily testable (< 0) and the value
       | easily flippable (~). Strangely, the API doc doesn't mention
       | bitwise NOT, but instead expresses it as numeric negation minus
       | one (which is equivalent, as TFA explains).
       | 
       | [0] As opposed to C's bsearch(), which only returns a position
       | when the key was found.
        
       | Someone wrote:
       | FTA: _"Based on the above picture, another important observation
       | is that to represent the number 1078 in binary, we need at least
       | ten memory cells (bits) for it (look at the most significant
       | power of 2 used, which is 10)"_
       | 
       | As the picture shows, we need 11 bits, with powers ranging from
       | zero to ten, both included (under the implicit assumption that we
       | want to represent all integers between 0 and 1078. If all we want
       | to represent is 1078, we can do with one or, pedantically, zero
       | bits)
        
       | philiplu wrote:
       | I suppose the online list of hacks at the 1st reference in the OP
       | [1] makes more sense, but I've got a fondness for the book
       | "Hacker's Delight", which I've owned and browsed for many years
       | now [2]. And checking on that, I see there was a 2nd edition
       | issued 10 years ago. Hmm, might need to pick that up.
       | 
       | [1] https://graphics.stanford.edu/~seander/bithacks.html [2]
       | https://en.wikipedia.org/wiki/Hacker's_Delight
        
         | _a_a_a_ wrote:
         | There are quite a lot of bit hacks on the web, but Hackers
         | Delight is where it took off for me.It was a massive eye-
         | opener, what was possible and even better doing it the old-
         | fashioned way.
         | 
         | The second book is very, very heavy on division and as such
         | it's not really as much 'fun' as the original, however I'd
         | still recommend it!
         | 
         | I shared the original Hackers delight with a work colleague,
         | who had a mathematical bent and he contacted Henry Warren with
         | a possible addition for the forthcoming second edition, Morton
         | curves (https://en.wikipedia.org/wiki/Z-order_curve) which are
         | extremely simple to calculate, much more so than the space
         | filling curve given in Hackers Delight, but which despite the
         | author's interested response to us, did not go into the second
         | edition. My colleague was very disappointed. Me too. Morton
         | curves are just interlace-the-bits so would have slotted in so
         | well.
         | 
         | Sadly there won't be a third edition as the author died.I found
         | out when I contacted him to let him know his website was down
         | (again!). His family let me know he had gone.
        
           | morelisp wrote:
           | The last time I reached for Hacker's Delight for "productive"
           | use it was specifically for this chapter (I was aware of
           | space-filling curves from some years in the games industry
           | but had never implemented one), and I remember being
           | disappointed at the lack of depth and options compared to the
           | rest of the book.
        
           | chongli wrote:
           | _The second book is very, very heavy on division and as such
           | it 's not really as much 'fun' as the original, however I'd
           | still recommend it!_
           | 
           | What do you mean by this? Did they merely add a ton of
           | information on division? Or did they take out the 'fun' bits
           | from the original and replace them with division stuff?
        
           | radicalbyte wrote:
           | I have the second edition, back around 2010 I made a bit of a
           | splurge on books. Super handy when you need it; which in my
           | case has only been one time. Don't spend much time on low-
           | level code nowadays.
        
       | detrites wrote:
       | While it's based on C, a large chunk of the content is simply a
       | great introduction to number representations in other bases such
       | as binary and hexadecimal etc. Fundamental knowledge and very
       | well explained, and a few insights I've not seen before.
        
         | nomemory wrote:
         | Thanks, I wanted to post it after it was finished and revised.
         | I also wanted to add more sections, but somebody was faster
         | than I expected (thanks for posting it tod!). I suppose Getting
         | to hn will add some valuable feedback in the first place.
         | 
         | I will probably add some more content in the coming weeks.
        
         | tester756 wrote:
         | >Fundamental knowledge
         | 
         | yes? no? hard to say
         | 
         | The number of times I've needed this knowledge during all years
         | of formal education and then years of work would be probably
         | around 3.
         | 
         | Then I started working with C and close to hardware and it
         | became something that I need everyday.
         | 
         | It's feels like bit proficiency is only useful in some very
         | specific domains.
         | 
         | Thought: is HTTP foundational knowledge nowadays? after all
         | whole world is built on it
         | 
         | if not, when will it become?
        
           | sumtechguy wrote:
           | Like you say it really depends on what your job is. For me
           | those bit algo's for a few years were my jam. I was moving
           | data between 4 different CPU types and across 3 different
           | network transports depending on the project and 4 different
           | OS's. You need to know your bits when moving data between
           | diff arch types and across different transport layers. These
           | days most of that same sort of thing I did then? I would drop
           | it into a json text stream and call it a day.
        
           | JohnFen wrote:
           | I think it's fundamental because understanding it goes hand-
           | in-hand with understanding how computers actually work. When
           | I interview applicants, I usually ask one or two questions
           | about bitwise operations for this reason -- an engineer who
           | knows how machines work at a very low level is valuable even
           | if their work is not low level.
        
           | dahfizz wrote:
           | If you're a web developer and your product relies on HTTP,
           | then HTTP is absolutely fundamental knowledge.
           | 
           | Similarly, if you're a web developer and your language
           | doesn't even have integers, then understanding twos
           | compliment is probably not fundamental.
           | 
           | That said, the people who proudly know the very least amount
           | possible to perform their day job usually are not top
           | performers.
        
           | whateveracct wrote:
           | I think if you work with UUIDs, it's worth learning about
           | bitwise stuff. So backend developers in general. It's not a
           | day-to-day skill, but it is one I've used at just about every
           | job in the last decade to elegantly and efficiently solve
           | hard problems regarding idempotency and randomness.
        
             | JohnFen wrote:
             | Or if you work directly with hardware, as I do. I use
             | bitwise stuff constantly. Or if you're doing certain
             | obscure kinds of highly optimized mathematical operations.
        
             | plugin-baby wrote:
             | When would you want to perform bitwise operations on a
             | UUID?
        
       ___________________________________________________________________
       (page generated 2023-03-03 23:00 UTC)