[HN Gopher] The NSA Instruction (2019)
___________________________________________________________________
The NSA Instruction (2019)
Author : cjg
Score : 230 points
Date : 2021-06-11 08:06 UTC (14 hours ago)
(HTM) web link (vaibhavsagar.com)
(TXT) w3m dump (vaibhavsagar.com)
| st_goliath wrote:
| Some years back, I got myself a copy of Andrew Hodges "Alan
| Turing - The Enigma", a biography and IMO generally a good read,
| but also with some gems regarding very early computing history in
| it.
|
| Specifically, after WWII, Turing worked on the ACE 1 (later
| reduced to Pilot ACE) project to build an electronic computer,
| which didn't really progress due to management and bureaucracy
| overhead. He eventually went to Manchester, once they got their
| Manchester Mark 1 off the ground, which they tried to
| commercialize as "Ferranti Mark 1"
| (https://en.wikipedia.org/wiki/Ferranti_Mark_1).
|
| While employed for the University, Turing IIRC continued to work
| as an external consultant for whatever became of G.C. & C.S. on
| the side. According to the book, he convinced them to buy such a
| machine (presumably for crypt-analysis?) and, on the Manchester
| side of things, insisted on some modifications to be made,
| including a "horizontal adder", so it could count the number of
| bits set in a word with a single instruction, i.e. a popcount
| instruction. This would pre-date the IBM Stretch mentioned in the
| article.
| dwheeler wrote:
| Obviously using a dedicated instruction is fastest in normal
| cases.
|
| But if you need to implement popcount or many other bit
| manipulation algorithms in software, a good book to look at is
| "Hacker's Delight" by Henry S. Warren, Jr, 2003.
|
| "Hacker's Delight' page 65+ discuss "Counting 1-bits" (population
| counts). There are a lot of software algorithms to do this.
|
| One approach is to set each 2-bit field to the count of 2 1-bit
| fields, then each 4-bit field to the count of 2 2-bit fields,
| etc., like this: x = (x & 0x55555555) + ((x >>
| 1) & 0x55555555); x = (x & 0x33333333) + ((x >> 2) &
| 0x33333333); x = (x & 0x0f0f0f0f) + ((x >> 4) &
| 0x0f0f0f0f); x = (x & 0x00ff00ff) + ((x >> 8) &
| 0x00ff00ff); x = (x & 0x0000ffff) + (x >> 16);
|
| assuming x is 32 bits.
|
| I think this approach is a classic divide-and-conquer solution.
| dragontamer wrote:
| > But if you need to implement popcount or many other bit
| manipulation algorithms in software
|
| Power9, ARM, x86 BMI, Nvidia PTX, AMD GCN, and AMD RDNA all
| have a popcount instruction.
|
| Yeah, all mainstream CPUs and GPUs made in the past decade...
|
| Unfortunately, there's no system I can think of where you'd
| need the software solution anymore... Maybe if you wanted
| popcount on an Arduino??
| ncmncm wrote:
| Yet, practically all software running on 64-bit x86 machines
| is compiled without, because the original amd64 released in
| 2003 lacked it, and distributions still target that.
| Likewise, MSVC. There would be good reasons for Apple XCode
| not to, but that doesn't mean they don't.
|
| If you tell MSVC to issue a popcount instruction with
| "__popcnt64()" (etc.), it will. If you ask Gcc to issue a
| popcount instruction with "__builtin_popcount()", it will
| only do it if you have also told it to target an ISA that has
| one; otherwise it emulates.
|
| The only _portable_ way to get a popcount instruction, thus
| far, is to use C++ 's std::bitset::count() in circumstances
| where the compiler believes the instruction would work.
| Pleasingly, Gcc and Clang are both happy to hold an integer
| type and its std::bitset representation in the same register
| at the same time, so there is no runtime penalty for a round-
| trip through std::bitset.
|
| MSVC's standard library implementation of std::bitset does
| not use the popcount instruction.
| dlemire wrote:
| You can also go faster using SIMD instructions if you need to
| compute wider population counts (beyond 64 bits):
|
| Faster Population Counts Using AVX2 Instructions, Computer
| Journal, Volume 61, Issue 1, 2018
| https://arxiv.org/abs/1611.07612
| amalcon wrote:
| A neat one for large, sparse integers is:
| for(i=0; !x; ++i) { x = (x-1)&x } return i;
|
| Which runs only a number of iterations equal to the number of 1
| bits. This works because (x-1) actually just flips the
| rightmost 1 bit and all zeroes to its right, then the & zeroes
| all of those.
|
| It's not that fast unless your integer is really sparse (since
| it has a branch), but I've always liked the bit hack.
| pbsd wrote:
| When the integer is expected to be dense, you have the
| corresponding trick size_t count =
| sizeof(x) * 8; while(x != -1) { x |= x+1;
| --count; } return count;
| throw5away wrote:
| This is essentially equivalent to feeding the input through
| bitwise-NOT first. Unfortunately, there are far more
| integers that are neither sparse nor dense than integers
| that are sparse or dense.
| graderjs wrote:
| This is a great piece of writing. I wish more blogs tied together
| so many technical perspectives like this. Bravo auteur!
| torgoguys wrote:
| >You might be wondering, like I was, if there's more to this
| instruction, but that's all it does! This doesn't seem very
| useful, right?
|
| I have a hard time understanding how anybody who has done most
| any non-trivial amount of bit maniplation couldn't think of
| plenty of uses.
| mrobot wrote:
| Why did you write this comment?
| fuzzer37 wrote:
| Why did you write this one?
| torgoguys wrote:
| Simply because I found what I quoted from the article to be a
| weird comment. What I mean is that there probably isn't very
| much overlap in the two-circle Venn diagram of 1) people who
| already are low-level enough to care about native CPU
| instructions and 2) those who can't think of good uses for a
| population count.
|
| I'm wasn't trying to be a jerk to the author. I just found it
| strange and would be happy to find out that I'm an idiot and
| forgot about ____ programmers who fall into that category. My
| comment was inviting such comments. (After all, if the author
| meant what they wrote there, apparently they fall into that
| camp...)
| l33t2328 wrote:
| It's immediately relevant to the very beginning of the
| article.
| Alekhine wrote:
| But taken in the context of the whole article, it adds
| nothing of any value. The author literally spends the rest
| of the post describing how it _is_ useful. Saying 'doesn't
| seem useful, does it?' at the beginning is a rhetorical
| device. The author is assuming that the reader probably
| doesn't have experience using bit manipulations for complex
| problems.
|
| I am so tired of HN pedantry.
| pc86 wrote:
| How else are we supposed to feel intellectually superior
| to literally everyone if we're not pedantic assholes to
| each other?
| tbrake wrote:
| Put a little more blunt than I would have but there's
| nothing wrong there. Rarely a submission goes by where
| the first comments aren't people racing in to argue with
| the author.
|
| There's an adversarial air that exists around dissecting,
| criticizing, nit-picking etc ideas presented, both here
| and in the tech world at large. As if one's most valuable
| contribution to a conversation is assuming the role of
| smug contrarian.
|
| It's frankly tiring and obnoxious. I used to think non-
| geeks were just bad at communicating with us; that the
| fault was on their side somehow. But now I think we're
| just dicks.
| na85 wrote:
| Careful, pointing out smug asshole commenters is against
| the contribution guidelines.
| lurquer wrote:
| That's not the way 'literally' should be used.
| tptacek wrote:
| The consensus on the 1992 thread (including a really great
| comment from 'Animats) seems to be that `popcount` was generally
| not added to architectures at NSA's request --- that people
| familiar with those archs knew the actual reason `popcount` wound
| up in the ISA, and it preceded NSA purchases.
|
| https://groups.google.com/g/comp.arch/c/UXEi7G6WHuU/m/Z2z7fC...
| Animats wrote:
| The striking thing is that the IBM System/360 didn't have it.
| Nor does the System/370. Those were the standard mainframes for
| a generation.
|
| IBM Z-series machines do have population count, finally.
| ludamad wrote:
| My first thought "How else do you quickly count pieces on a
| bitboard?". Definitely chess programming caused me to never
| second guess the usefulness of `popcount`
| drichel wrote:
| Counting bits was the bottleneck in the genomic scan I co-
| authored (Kanoungi et al. 2020). popcnt resulted in insane
| perfomance gains comared to all other methods.
|
| However, we re-discovered the fact that some Intel CPUs,
| including the Nehalem mentioned in the article, have a bug that
| severly affects popcnt's performance, see for example here:
| https://github.com/komrad36/LATCH/issues/3#issuecomment-2671...
| 4gotunameagain wrote:
| Another interesting application of popcount is in computer
| vision, namely in matching keypoints that use binary descriptors
| for 3D reconstruction in SLAM/TRN etc
| jonatron wrote:
| Yep, I've used __builtin_popcountll for ORB from OpenCV (256
| bit binary descriptors).
| 4gotunameagain wrote:
| Looks like we've done similar things :)
|
| Horror story: I was once developing a TRN system for a
| spacecraft instrument which uses an ancient x86 processor
| that does not have popcnt, ended up using a look-up table
| instead...
| solarexplorer wrote:
| Did you know about HAKMEM 169? I guess it was/is not widely
| known since many people mention lookup tables as the only
| fast alternative to the popcnt instruction.
|
| http://www.hakmem.org/#item169
| rcgorton wrote:
| It is also incredibly useful for doing string scanning - look at
| strlen/strchr in various libc imp lementations
| chrchang523 wrote:
| Nitpick: it's the related "count trailing zeros" operation that
| is useful (in combination with movemask) there, not popcount
| itself.
| bsmith0 wrote:
| Here's a dumb question. If someone asked me to do it I'd probably
| write code like:
|
| while(x != 0) { c += x&1; x >>= 1; }
|
| Is this something that should be added to LLVM?
|
| Edit: flip the order
| secondcoming wrote:
| Both clang and gcc have __builtin_popcnt variants.
| sumtechguy wrote:
| I came across this long ago. But it shows some very nice ways
| to fiddle bits. It has a few different ways to do it. Which
| would be handy on systems that do not have a popcount.
|
| https://graphics.stanford.edu/~seander/bithacks.html
| [deleted]
| nickysielicki wrote:
| Popcount is easily recognized by llvm (and it's actually
| mentioned in the article...)
|
| In the case of the code you've posted, you're shifting out the
| LSB before you check the bit, so it's not quite right, but (in
| general) popcount is recognized and used when possible.
| bsmith0 wrote:
| Yep my bad! I think flipping the order should work still
| though.
|
| The two links in the article:
|
| https://lemire.me/blog/2016/05/23/the-surprising-
| cleverness-...
|
| And the LLVM source indicate to me it only picks up on
| x&(x-1) pattern, which would miss the popcount optimization
| on code like mine.
| nickysielicki wrote:
| Flipping the order works, except if the LSB on x is set.
|
| https://godbolt.org/z/qdWhxMPsf
|
| Note the run output under clang.
|
| edit:
|
| > And the LLVM source indicate to me it only picks up on
| x&(x-1) pattern, which would miss the popcount optimization
| on code like mine.
|
| Thanks for teaching me something this morning. That's
| annoying.
|
| I think the portable solution is std::popcount in C++ (or
| equivalent in Rust).
| vardump wrote:
| While it seems to be true gcc and clang don't recognize
| this pattern even when implemented correctly, your
| program becomes an infinite loop if the highest bit is
| set (negative), because 'i' will never become 0.
|
| Example with int8_t: int8_t i = -127; //
| 0b10000001 i >>= 1; // 0b11000000 i >>= 1; //
| 0b11100000 i >>= 1; // 0b11110000 i >>= 1; //
| 0b11111000 i >>= 1; // 0b11111100 i >>= 1; //
| 0b11111110 i >>= 1; // 0b11111111 i >>= 1; //
| 0b11111111 ad infinitum
|
| One needs to be careful when using >> (shift right) with
| signed integers.
|
| So your program is not equivalent to popcount.
| tialaramex wrote:
| > or equivalent in Rust
|
| https://doc.rust-lang.org/std/?search=count_ones
|
| Internally Rust actually just staples LLVM's
| implementation into your code, via an intrinsic - but if
| that were ever to change the standard library
| count_ones() methods will do whatever happens instead so
| you should use that.
| notacoward wrote:
| Back in the mid-2000s I worked at a company that made their own
| (MIPS-based) chips. NSA was one of our customers - supposedly the
| "defense" who could be considered the good side of NSA compared
| to the 10x larger "offense" but still. As we were planning for
| our second generation, they offered quite a bit of money if we'd
| implement a "sheep and goats" instruction. It would take two
| operands: an input and a mask. The masked-in bits of the input
| (the "sheep") would be packed toward the MSB of the output, while
| the masked-out bits (the "goats") would be packed toward the LSB.
| We had a lot of people on staff with serious chops in all sorts
| of math including cryptography, but none of them could identify
| an algorithm that would benefit from having such an instruction
| (as distinct from more conventional range-based bitfield
| instructions). Since the company went under shortly afterward, it
| remained a mystery. I still wonder about it.
| coolspot wrote:
| Is it possible that by some mistake the NSA your company was
| working with was National Sheepfarmers Association?
|
| Did representatives of the "NSA" have a New Zealand accent?
| dboreham wrote:
| I have also been around in a company that made CPUs that
| initially had no bit count instruction. Then at some point the
| instruction was added. At the time I heard that "men in black
| with mirrored sunglasses" had shown up and demanded that the
| instruction be added. Whether or not this was an accurate
| description of events, you can see the note on page 74 in this
| document (section 8.2) :
| http://www.transputer.net/iset/pdf/tis-acwg.pdf recording the
| instruction having been added.
|
| Edit, I see Roger Shepherd (one of the people in the know at
| above mentioned company) commented in the comp.arch thread
| (which I vaguely remember reading at the time) but no mention
| of MIB...
| pbsd wrote:
| That sounds like a decent primitive to accelerate arbitrary bit
| permutations in software. It's known as GRP in, e.g., [1].
|
| [1] http://palms.ee.princeton.edu/PALMSopen/shi00bit.pdf
| notacoward wrote:
| Very interesting. This was published in 2000, and the people
| we worked with were near Princeton, so this result - the
| specific utility of such an instruction, if not the semantics
| themselves - might very well be something that was known to
| some relevant people but not yet widely enough for any of our
| people to recognize it. Thanks!
| devit wrote:
| That's equivalent to !0 << popcount(!(x & mask)) (where left
| shift must saturate and not truncate the shift count, otherwise
| you need to special case x = 0) and seems much less useful than
| popcount.
| notacoward wrote:
| I think you're misunderstanding what the instruction (or
| similar ones that others have mentioned) would do. It's a
| specialized permutation function; every bit in the input is
| preserved, just in a different position. Your version doesn't
| have that property at all, and would indeed not be very
| useful.
| less_less wrote:
| Half of this instruction is present in AMD64's BMI2 extension
| as PEXT, and the reverse operation as PDEP. Unlike "sheep and
| goats", PEXT just extracts the sheep into the LSB and ignores
| the goats.
|
| If I recall the Knuth lecture correctly, given a "sheep and
| goats" instruction where one of the sets is packed in reverse
| order, you can implement any n-bit permutation in something
| like log2(n) instructions. I don't remember if this is true if
| they're both packed in forward order. But it would be nice for
| some hardware crypto designs, like DES or more recently GIFT.
|
| PEXT has at least two additional use cases I know of:
| manipulating bit indices for databases, and binary (GF2) matrix
| manipulation. I've used it in a (non-crypto) project to select
| a subset of columns from a binary matrix, to convert it to
| systematic form. This subroutine also used popcount.
|
| What I really wanted in that project was another "NSA
| instruction": bit matrix multiply. Cray supercomputers can
| multiply two 64x64 binary matrices in one instruction, though I
| have no idea how many cycles it takes. With AVX2, the best I
| could do is 6 instructions plus precomputation for 8x8 x 8x32,
| which is 1/128'th the work.
| Enginerrrd wrote:
| It's out of my depth, but my guess is on sething DES related.
| Here's a link to some possibly relevant discussion about it.
|
| http://www.icodeguru.com/Embedded/Hacker's-Delight/050.htm
| someguydave wrote:
| indeed, Cray famously said "If you were plowing a field,
| which would you rather use: two strong oxen or 1024
| chickens?"
|
| Unfortunately we only have 1024 chickens in modern computers.
| lowbloodsugar wrote:
| If you had to digest a million grains, which would you
| rather use?
| CalChris wrote:
| Yes, but those chickens now are as powerful as Cray's oxen
| were then.
| flavius29663 wrote:
| so, do you want 2 modern oxen or 1024 modern chickens?
| CalChris wrote:
| Gimme dem modern wide supercalar OOO cached chickens,
| please. Cray was right back then but he is no longer
| right now. If he were, the market would say so.
| lowbloodsugar wrote:
| These days, all the oxen are made up of chickens. The
| biggest one is 7,630,848 chickens.
|
| https://www.top500.org/lists/top500/2020/11/
| thomasmg wrote:
| Succinct (space-saving) data structures often need "rank" and
| "select" operations. Rank(n) is the number of 1 bits up to
| position n. Select(n) is the reverse: at which position is
| the n-th 1 bit.
|
| For "rank", the "popcount" instruction can be used.
| Interestingly, for "select", the "PDEP" instruction can be
| used: you can put the data array in the PDEP mask, and 1 << n
| in the value; basically flip the operands. I found this quite
| fascinating. For details, there is a short paper on this: "A
| Fast x86 Implementation of Select".
|
| I wonder if those succinct data structures are in any way
| related to what NSA is doing. I think not, but who knows.
| Bayart wrote:
| The paper, if anyone wants to save some clicks :
| https://arxiv.org/abs/1706.00990
| Sniffnoy wrote:
| Heh, so it's an instruction for INTERCAL's "select" (~)
| operator...
| mjevans wrote:
| The 'and goats' part leads me to conceptualizing the
| instruction more like:
|
| Bit Scrambler / Chutes - shuffle bits around in a way that
| divides a stream.
|
| This might also be useful in pre-filters for compression
| (entropy reduction) if you knew the content of the message.
| E.G. for ASCII text the upper 2-3 bits of each letter could be
| ranked to the side for better compression and a reduction of
| message size.
|
| As others have pointed out, modern CPUs ended up with 'half'
| that instruction, so I wonder if there were any other reasons
| for the full instruction.
| jhallenworld wrote:
| Compress and expand (from Hackers Delight) are like this, but
| only the selected bits are kept. These are quite useful
| instructions. One use is the hash function for perfect hash
| tables. The hash table includes a mask which picks the bits of
| the keys which actually change (between all keys) and
| compresses them all to the right as the hash index.
|
| Disclaimer: I contributed the "expand" algorithm shown in
| Hacker's Delight.
| dragontamer wrote:
| GPU-programmers use popcount-based programming all the time these
| days, but the abstractions are built on top and are hardware
| accelerated.
|
| CUDA's __activemask(); returns the 32-bit value of your current
| 32-wide EXEC mask. That is to say, if your current warp is:
| int foo = 0; if(threadIdx.x %= 2){ foo =
| __activemask(); }
|
| foo will be "0b01010101...." or 0x55555555. This __activemask()
| has a number of useful properties should you use __popc with it.
|
| popcount(__activemask()); returns the number of threads
| executing.
|
| lanemask_lt() returns "0b0000000000000001" for the 0th lane.
| 0b0000000000000011 for the 1st lane. 0b0000000000000111... for
| the 2nd lane... and 111111111...111 for the last 31st lane.
|
| popcount(__activemask() & lanemask_lt()); returns the "active
| lane count". All together now, we can make a parallel SIMD-stack
| that can push/pop together in parallel. int
| head = 0; char buffer[0x1000];
| while(fooBar()){ // Dynamic! We don't know who is, or is not
| active anymore int localPrefix =
| __popc(__activemask() & __lanemask_lt()); int
| totalWarpActive = __popc(__activemask());
| buffer[head + localPrefix] = generateValueThisThread();
| if(localPrefix == 0){ head += totalWarpActive; //
| Move the head forward, much like a "push" operation in single-
| thread land // Only one thread should move the
| head } __syncthreads(); // Thread
| barrier, make sure everyone is waiting on activeThread#0 before
| continuing. }
|
| ------------
|
| As such, you can dynamically load-balance between GPU threads
| (!!!) from a shared stack with minimal overheads.
|
| If you want to extend this larger than one 32-wide CUDA-warp,
| you'll need to use __shared __ memory to share the prefix with
| the rest of the block.
|
| It is a bad idea (too much overhead) to extend this much larger
| than a block, as there's no quick way to communicate outside of
| your block. Still though, having chunks of up to 1024 threads
| synchronized through a shared data-structure that only has
| nanoseconds of overhead is a nifty trick.
|
| -----------
|
| EDIT: Oh right, and this concept is now replicated very, very
| quickly in the dedicated __ballot_sync(...) function (which
| compiles down to just a few assembly instructions).
|
| Playing with the "Exec-mask" is a hugely efficient way to
| synchronously, and dynamically gather information across your
| warp. So lots of little tricks have been built around this.
| ncmncm wrote:
| It is appalling that, after _every_ other general-computing
| architecture in common use either started out with a popcount
| instruction, or had one added later at substantial expense,
| RISC-V came out without one.
|
| It still doesn't have any. The proposed B, "bitmanip" extension
| has it (along with a raft of trivial variations: count leading
| zeroes, count trailing ones, yada yada) but that is not ratified
| and not implemented in any chip I know of. Since B is a huge
| extension, we can expect it will be routinely omitted even after
| it's ratified, and compilers will need special prodding to
| produce any such instructions.
|
| It should have been in the base instruction set. We probably can
| blame its lack on the academic origins of the design. CS
| professors probably think of it as a thing not needed to
| implement Lisp, therefore not worth class time.
|
| (Some people say, "Oh, but you can trap and emulate it", which
| adds insult to injury. Trapping and emulating eliminates all the
| value the instruction offers.)
| carapace wrote:
| (I just want to add that this is the best thread on HN i've read
| in a while. Y'all bringing a little nerdy tear to my eye. <3 )
| FridayoLeary wrote:
| A bit off- topic but i want to know; is binary code (01etc) still
| used today in programming/coding? And for what applications?
| fortyrod wrote:
| Maybe not for general-purpose computing. I've used it for on-
| the-fly code generation (hacking display rotation into the
| Windows 3x BitBlt engine) and programming special-purpose media
| accelerators. In both cases you end up creating a bunch of
| convenience #defines or macros that generate the bits, which
| immediately takes you back into tiny language territory rather
| than pure machine code. The relative ease of creating new
| programmable hardware in FPGAs is another place this might
| occur.
| adrian_b wrote:
| It is possible that the "population count" instruction has been
| included in the instruction sets of most American supercomputers
| at the request of NSA, which was an important customer for them.
|
| Nevertheless, the first computer having this instruction was a
| British computer, the Ferranti Mark I (February 1951).
|
| The name used by Ferranti Mark I for this instruction was
| "sideways add".
|
| Also notable was that Ferranti Mark I had the equivalent of LZCNT
| (count leading zeroes) too.
|
| Both instructions are very useful and they are standard now for
| modern instruction sets, but they were omitted in most computers
| after Ferranti Mark I, except in expensive supercomputers.
| adrian_b wrote:
| Moreover, Ferranti Mark I included a hardware random number
| generator, another feature useful for cryptography, which was
| reintroduced only recently in modern CPUs.
| iib wrote:
| Hardware random number generators do have some security
| issues though. Linux devs were opposed to solely relying on
| them, because they can be compromised by the vendor [1]. So
| they are at best used in algorithms that they can not
| compromise (still in [1], but lower, in the comments).
|
| [1] https://web.archive.org/web/20180611180213/https://plus.g
| oog...
| adrian_b wrote:
| The security issues are not with hardware random number
| generators in general, but with those that are included
| inside complex devices like monolithic CPUs or TPMs, so
| that the owners of those devices cannot verify that the
| RNG's really do what they are claimed to do.
|
| Discrete hardware RNG's, like that of the Ferranti Mark I,
| are perfectly secure.
|
| For a modern device, the best way to implement a hardware
| RNG is to just include an ADC (analog-digital converter)
| input. Then you may connect externally on the PCB some
| noisy analog amplifier, e.g. one which has a noisy resistor
| or diode at its input. Digitizing the noise with the ADC
| will provide the random numbers and the ADC input can be
| isolated and tested separately at any time, so the user can
| verify that there is no hidden functionality.
|
| Most microcontrollers have ADC inputs, so it is easy to add
| a secure hardware RNG for them. The same could be done for
| a personal computer by making a noisy amplifier that can be
| plugged in the microphone input, or by making a USB device
| with a microcontroller.
| iib wrote:
| I remembered something, and I want to say as an aside,
| for anybody reading that at one point has to design a toy
| RNG from an ADC, as I had to some years ago, you should
| not take the last bits as they are--as was my first
| thought--, you should pass them through something like
| the von Neumann corrector [1].
|
| [1] https://everything2.com/title/von+Neumann+corrector
| ncmncm wrote:
| Indeed, AMD has more than once shipped CPUs in which the
| random-number instruction would always yield the same
| value, that had to be monkey-patched to yield apparently
| random numbers. A valuable hint.
| st_goliath wrote:
| I commented on that earlier, including that it probably also
| has a cryptanalysis background:
| https://news.ycombinator.com/item?id=27472900
|
| But yes, it definitely pre-dates the 1961 IBM machine in the
| article.
| pklausler wrote:
| Surely the Cray BMM (bit matrix multiplication) instructions have
| a better claim to that nickname.
| [deleted]
| oefrha wrote:
| Discussed at the time:
| https://news.ycombinator.com/item?id=20914479
| implements wrote:
| In this discussion someone offers:
|
| "I remember in one interview I was asked to write a function
| that returns true iff x is a power of two, so I wrote return 1
| == __builtin_popcount(x). They liked that."
|
| I'm no longer a programmer, but I wondered why it wasn't
| "return (__builtin_popcount(x) == 1)" - just out of interest.
| throw5away wrote:
| If you accidentally write "return x = 1" when x is a
| variable, you always return true. If you return "1 = x", you
| cause a syntax error. So some people have gotten into the
| habit of writing constants on the left, even if the return
| value of __builtin_popcount is not assignable.
| implements wrote:
| Thanks for explaining - that makes sense.
| NobodyReport wrote:
| I have always been a lurker and I am trying to add my account to
| my Keybase but it requires 2 karma. Could a few kind sirs give me
| the juice?
| NobodyReport wrote:
| Not gonna spam. Just this one comment to do and hope it works
| out. Edit- Thank you to the two who did but sadly someone down
| voted so my total is 1.
|
| I respect the community. I don't wish to spam just trying to
| take control of all online identities.
___________________________________________________________________
(page generated 2021-06-11 23:01 UTC)