[HN Gopher] Zero or Sign Extend
___________________________________________________________________
Zero or Sign Extend
Author : todsacerdoti
Score : 101 points
Date : 2024-10-24 00:48 UTC (22 hours ago)
(HTM) web link (fgiesen.wordpress.com)
(TXT) w3m dump (fgiesen.wordpress.com)
| eqvinox wrote:
| > ... this explicitly relies on shifting something into the sign
| bit, which depending on the exact flavor of language standard
| you're using is either not allowed or at best fairly recently ..
|
| An unsigned has no sign bit, so the left shift just needs to be
| unsigned to make it "technically correct".
|
| (Remember to not use smaller than int types though, due to
| integer promotion issues)
| fluoridation wrote:
| Yup. When you're twiddling bits you're better off using
| unsigned types in general anyway, and leaving converting to a
| signed type at the very end.
| jchw wrote:
| Of course doing the undefined thing works on almost any platform
| except DS9k, but that last formulation is quite elegant. It's a
| bit like byteswapping in that it's fairly simple to do but it's
| even simpler to _not_ do by just never relying on the machine
| endianness.
| Sesse__ wrote:
| Also shifts, especially variable-length shifts, are frequently
| slower than xor and add/sub (e.g., on x86, shl only works with
| cl and shlx has high latency), so that's another score for the
| xor variant.
| CalChris wrote:
| Maybe for _variable_ variable-length shifts but for
| _constant_ variable-length shifts, _SHL reg, imm8_ is single
| cycle on recent x86_64 microarchitectures.
| Sesse__ wrote:
| But xor and sub can go in way more ports, giving you higher
| throughput.
| dzaima wrote:
| "way more" is 2 vs 4 ports (-5 for >=alderlake); 1/cycle
| via shifts is probably good enough for most use-cases
| (though perhaps the more focused port pressure could be
| an issue with larger context).
|
| And with hard-coded immediates xor+sub also ends up at
| twice the code size as shl+shr, so there's some trade-
| off. (but yeah if code size isn't a concern, xor+sub wins
| out)
| Neywiny wrote:
| This is the perfect spot to use a bitfield. You can tell it
| signed or unsigned, and the compiler will deal with it all and
| optimize. No bit ops to get wrong or maintain. Very readable and
| scalable.
| edflsafoiewq wrote:
| But the width and signedness of a bitfield are defined at
| compile-time, while in this example they need to come from a
| format read at runtime.
| cryptonector wrote:
| So? The author knows a priori the size of the int on the
| wire.
| almostgotcaught wrote:
| > the compiler
|
| I love when people say this as if there's exactly one compiler
| with a fixed implementation for whatever opt pass.
| AlotOfReading wrote:
| That's not how this phrase is used. It usually encompasses
| any reasonably advanced compiler like clang, GCC, and
| sometimes MSVC.
| Joker_vD wrote:
| But not including any of the slightly broken C compilers
| that the embedded hardware manufacturers provide (also, ICC
| neither)?
| AlotOfReading wrote:
| I'm just providing examples, not excluding everything
| unmentioned.
| adgjlsfhk1 wrote:
| as of a few years ago, ICC is just LLVM with some tweaked
| settings
| monocasa wrote:
| Endianness of bit fields changes with arch. Ie. Is the first
| bit field member the most or least significant bit range of the
| associated word.
| cryptonector wrote:
| Yes, first you have to swab, if you have to swab.
| epcoa wrote:
| Not in C or C++, at least, the bit and byte order is not
| defined.
| gpderetta wrote:
| At least GCC was very conservative in dealing with bitfields
| and, last time I bothered to check, generated suboptimal code.
| IshKebab wrote:
| Eh the author's suggestions only seem better because C++ is
| insane.
|
| The last one is definitely nice though!
| vlovich123 wrote:
| Can you post examples in other languages where this would be
| easier?
| IshKebab wrote:
| Sure, in Rust: fn sign_extend_u11(x: u32) ->
| u32 { (((x as i32) << (32-11)) >> (32-11)) as u32
| }
|
| Doesn't have any of the C++ issues he mentions. And it will
| be faster than the alternative since it's just two
| instructions. (Ok this is never going to matter in practice
| but still...)
___________________________________________________________________
(page generated 2024-10-24 23:01 UTC)