https://retrocomputing.stackexchange.com/questions/29787/did-any-processor-implement-an-integer-square-root-instruction Stack Exchange Network Stack Exchange network consists of 183 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers. Visit Stack Exchange [ ] Loading... 1. + Tour Start here for a quick overview of the site + Help Center Detailed answers to any questions you might have + Meta Discuss the workings and policies of this site + About Us Learn more about Stack Overflow the company, and our products 2. 3. current community + Retrocomputing help chat + Retrocomputing Meta your communities Sign up or log in to customize your list. more stack exchange communities company blog 4. 5. Log in 6. Sign up Retrocomputing 1. 1. Home 2. Questions 3. Tags 4. 5. Users 6. Companies 7. Unanswered 2. Teams [teams-promo] Ask questions, find answers and collaborate at work with Stack Overflow for Teams. Explore Teams Create a free Team 3. Teams 4. Ask questions, find answers and collaborate at work with Stack Overflow for Teams. Explore Teams Teams Q&A for work Connect and share knowledge within a single location that is structured and easy to search. Learn more about Teams Did any processor implement an integer square root instruction? Ask Question Asked 2 days ago Modified 2 days ago Viewed 25k times 31 Has any processor ever implemented an integer square root instruction? Obviously, floating-point square root instructions are quite common, but I've never seen one specifically for integers. One close candidate I know of is the Nintendo DS, which contained a memory-mapped integer divider/square rooter. This was helpful for 3D calculations, given that its ARM processor had no FPU or hardware divider. However, it's not a native processor instruction. * history * instruction-set Share Improve this question Follow edited 2 days ago Raffzahn's user avatar Raffzahn 223k2222 gold badges631631 silver badges918918 bronze badges asked 2 days ago v-rob's user avatar v-robv-rob 76711 gold badge55 silver badges1010 bronze badges 7 * 2 tough one; square rooting is something that you don't want to do combinatorically in a single clock cycle, or else your maximum clock rate will go dooooown, and it uses the same algorithmic components that an integer multiply and division need. This makes it a prime candidate for implementation in iterative microcode. Now, what's the difference between a CPU that implements an isqrt instruction in microcode vs software that implements a isqrt function on a CPU that has no isqrt instruction, precisely? - Marcus Muller 2 days ago * 3 Non-iterative implementation for large input sizes get unhandy very quickly, and for small sizes, simple lookup tables might be quite performant. What is the minimum input size you would consider an implementation? - Marcus Muller 2 days ago * By "processor" do you mean microprocessor? Or do you intend to include mainframes, DSPs, etc? - DrSheldon 2 days ago * @DrSheldon I meant it in a pretty general sense of the word. If it's got an instruction set that includes an isqrt instruction, it'll probably do :) - v-rob 2 days ago * 1 I add the isqrt instruction only because I noticed that the bignum library had it as a primitive. It was obvious to me that it's not trivial to implement, particularly if someone were to need that for large integers. I went through the code, in that bignum library and noticed it was poor. I put fixes in to make it faster, and then due to the IKEA effect of having made it mine by working on it, I had to expose it in the language. - Kaz 5 hours ago | Show 2 more comments 3 Answers 3 Sorted by: Reset to default [Highest score (default) ] 38 Yes. The Harris RTX 2000 Forth CPU offered a multi-step square root instruction, as did its military-grade sibling, the RTX 2010. There may have been others, but that is the one I know of. See: https://users.ece.cmu.edu/~koopman/stack_computers/sec4_5.html Share Improve this answer Follow answered 2 days ago RichF's user avatar RichFRichF 9,20644 gold badges3131 silver badges5656 bronze badges 3 * Nice, do you know any more detailed description? - Raffzahn 2 days ago * 2 @Raffzahn Not off the top of my head. But I do see when searching, you should specify Harris as part of the search. Otherwise the results are mostly pages relating to a series of NVidia GPU boards. As far as I can tell, there is no commonality besides the general model number. - RichF 2 days ago * 5 Interesting; according to the reference manual (users.ece.cmu.edu /~koopman/forth/...), it's an iterated square root where you have to run 1 setup instruction and then 15 step instructions to get the final answer. I've seen that done for division on other processors. And of course it would be a Forth CPU that would have such an esoteric instruction, heh. - v-rob 2 days ago Add a comment | 32 Is 1946 retro enough? From Wikipedia: ENIAC used four of the accumulators (controlled by a special multiplier unit) to perform up to 385 multiplication operations per second; five of the accumulators were controlled by a special divider/square-rooter unit to perform up to 40 division operations per second or three square root operations per second. ENIAC accumulators operated on decimal integers. Share Improve this answer Follow answered 2 days ago John Doty's user avatar John DotyJohn Doty 2,45477 silver badges1313 bronze badges Add a comment | 17 It's not that easy. The most efficient method to calculate square root is to calculate inverse/reciprocal of the square root using Newton-Raphson iterations, and then multiply it with the original. This is best known as the "Quake method" (see also Where did Fast InvSqrt() come from?). The more modern version used by contemporary CPU and GPUs are generalized into two instructions, one for estimating the initial guess (e.g. frsqrte of ARMv8), another to run the following iterations (e.g. frsqrts of ARMv8). Single-instruction version of sqrt is a micro-coded or pseudo-instruction version of these two instructions. The prerequisite for all of this is a multiplier. If you want to calculate FP (i)sqrt, then you need a (fast) FP multiplier, which all FPUs have. If you want to calculate integer (i)sqrt, then you need a (fast) integer multiplier, which most CPUs don't have (historically). Otherwise it would be called a DSP. To make it better, you need a (fast) multiplier that is twice the width of your input to have sufficient precision, which most CPUs definitely don't have until "relatively" recently (relative to RetroComputing). And precision matters, or not? If you look at the "Quake method" closely, you notice that one of the iterations was commented out. There are a lot of use cases where the extreme precision isn't necessary and it'll be better to leave the choice of precision/speed trade off to programmers. isqrt was intentionally separated into fsqrte and fsqrts on ARMv8 exactly for this reason: so that the programmer can adjust the number of fsqrts for the desired speed and accuracy tradeoff. So I don't quite agree to the statement that single instruction sqrt is very common. It's there because the IEEEE754 and the C stand math library requires it (for the flag bits and exceptions), but that doesn't mean it's frequently used. Further reading * SPRC542 TI's math library for C64x DSP (8-issue VLIW CPU with two 32x32=64 multipliers). In this library _iq _IQisqrt is implemented using Newton-Raphson iterations and _iq _IQsqrt is calculated by multiplying the original with the isqrt. The source code is available on request. * SPRUGG9 TMS320C64x+ IQmath Library User's Guide.The user guide for SPRC542. * My implementation of square root using binary search, that doesn't depend on a multiplier. Only basic ALU instructions are used. It is vigorously undocumented. I have no idea what I wrote but it seems to work. . unsigned int usqrt(unsigned int x){ unsigned int a=0; unsigned int masksq=0,mask=0; unsigned int mask_shift=15; for(masksq=1u<<(mask_shift<<1),mask=1u<<(mask_shift); mask!=0; masksq=masksq>>2,mask=mask>>1,mask_shift--){ if(x>=masksq){ a=mask; break; } } x-=masksq;//masksq==a*a; mask=mask>>1; mask_shift--; while(mask>0){ unsigned int step=(mask<=step){ a|=mask; x-=step; } mask=mask>>1; mask_shift--; } return a; } Share Improve this answer Follow edited 2 days ago Stephen Kitt's user avatar Stephen Kitt 122k1717 gold badges505505 silver badges463463 bronze badges answered 2 days ago user3528438's user avatar user3528438user3528438 1,3791111 silver badges1111 bronze badges 11 * 3 What is recent for RetroComputing? The 8086 (from 1978) definitely did have an integer muliplier, even one that could multiply two 16 bit values into a 32 bit value. These instructions weren't really fast, though, and thus assembly books back then suggested to use the shift-register approach, as this was often faster than using the native imul/idiv instructions. - PMF 2 days ago * 2 @PMF An integer multiply instruction doesn't say anything about a integer multiplier. - user3528438 2 days ago * 2 In what sense do IEEE-754 and the C standard math library require a single instruction square root? As far as I know, all IEEE-754 cares about is things like memory layout and precision, and all C cares about is semantics. Unless there's something I'm missing, both will allow for the square root to be done in several instructions, a function call, or some memory-mapped funk like the one the Nintendo DS has. - Omar and Lorraine 2 days ago * 5 @OmarandLorraine you're right that nothing in IEEE754 requires an instruction. You're wrong about IEEE754 not defining semantics: the standard very much defines operations like multiplications, conversion to integer and square roots, along with their correct result. - Marcus Muller 2 days ago * @OmarandLorraine The IEEE 754 requirement for specific semantics of the operation (which are very much defined by the standard) heavily favors a single operation approach if not a single-instruction approach, because that helps ensure that programmers who need those semantics don't have to think about what they need to do to get those semantics. - Austin Hemmelgarn 2 days ago | Show 6 more comments You must log in to answer this question. Not the answer you're looking for? Browse other questions tagged * history * instruction-set . * The Overflow Blog * How do mixture-of-experts layers affect transformer models? * What a year building AI has taught Stack Overflow * Featured on Meta * New Focus Styles & Updated Styling for Button Groups * Upcoming initiatives on Stack Overflow and across the Stack Exchange network Linked 17 Where did Fast InvSqrt() come from? Related 14 Are there any articles elucidating the history of the POPCOUNT instruction? 12 Did any 360-compatible machine implement registers in core? 87 Why is the processor instruction called "move", not "copy"? 17 Has there ever been a instruction set architecture that did not require instruction decoding at all? 10 Did any core-memory computers have a read-and-erase instruction? 19 Have there been any instruction sets with an odd register width? 6 Were there any square monitors other than PLATO? 11 IMPI Instruction set: is there any reference? 11 Why did the VAX compatibility mode not implement the MARK instruction? Hot Network Questions * Would it be possible to convince a hungry alien species that human lives are valuable? * Is it possible to have a stable black hole that does not evaporate? * Could primordial black holes have been supermassive? * Why do people keep saying 'it is your responsibility' to make sure your supervisor uploads their letter of recommendation (or similar)? * Draw a car with TikZ * Why is LLC Converter optimal Lm Lr ratio around 10? * Sci-fi movie from the early 2000s with giant spiders, in which a lady goes into the jungle or forest looking for her lost partner or lost soldiers * Why do scientists use specialized units for distance when metric units are perfectly adequate? * Can I safely refinish 250 gallon propane tank? * Potential for LAMMPS Simulation * Any practical use for the underscore variable? * If lord krishna was vegetarian, why did he hunt animals and for what? * Syncthing on ZFS a good case for Deduplication? * Shading region under (or over) a curve in horizontal bands * How to list all file in a path with absolute path and spaces in a script * R' David Tzvi Hoffman's sun depression angle for nightfall * Alignment inside a equivalent statements proof * In this position, why is Nb4 better than b4? * Is the Umbrage Hill Quest in Dragon of Icespire Peak likely to kill 1st-level PCs? * Is infinity a concept or a word empty of meaning? * Must I declare all items in my vehicle when crossing from the U.S. to Canada, or only those inquired about? * Can definitions in the Oxford Dictionary of the English Language be considered definitive in informal philosophical presentations? * Masyu: Four Colours * Is there a dimensional multiplication operation? more hot questions Question feed Subscribe to RSS Question feed To subscribe to this RSS feed, copy and paste this URL into your RSS reader. [https://retrocomputi] * Retrocomputing * Tour * Help * Chat * Contact * Feedback Company * Stack Overflow * Teams * Advertising * Collectives * Talent * About * Press * Legal * Privacy Policy * Terms of Service * Cookie Settings * Cookie Policy Stack Exchange Network * Technology * Culture & recreation * Life & arts * Science * Professional * Business * API * Data * Blog * Facebook * Twitter * LinkedIn * Instagram Site design / logo (c) 2024 Stack Exchange Inc; user contributions licensed under CC BY-SA. rev 2024.4.5.7373