https://retrocomputing.stackexchange.com/questions/29787/did-any-processor-implement-an-integer-square-root-instruction

 
 

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including
Stack Overflow, the largest, most trusted online community for
developers to learn, share their knowledge, and build their careers.

Visit Stack Exchange
[                    ]
Loading...
 1.  
      + Tour Start here for a quick overview of the site
      + Help Center Detailed answers to any questions you might have
      + Meta Discuss the workings and policies of this site
      + About Us Learn more about Stack Overflow the company, and our
        products
 2.  
 3. current community

      +  
        Retrocomputing
        help chat
      + 
         
        Retrocomputing Meta
   
    your communities

    Sign up or log in to customize your list.

    more stack exchange communities

    company blog
 4.
 5. Log in
 6. Sign up

Retrocomputing

 1.
     1.  
        Home
     2.  
        Questions
     3.  
        Tags
     4.
     5.  
        Users
     6.  
        Companies
     7.  
        Unanswered
 2. Teams
    [teams-promo]

    Ask questions, find answers and collaborate at work with Stack
    Overflow for Teams.

    Explore Teams Create a free Team
 3.  
    Teams
 4. Ask questions, find answers and collaborate at work with Stack
    Overflow for Teams. Explore Teams

Teams

Q&A for work

Connect and share knowledge within a single location that is
structured and easy to search.

Learn more about Teams

Did any processor implement an integer square root instruction?

Ask Question
Asked 2 days ago
Modified 2 days ago
Viewed 25k times
31
 

Has any processor ever implemented an integer square root
instruction? Obviously, floating-point square root instructions are
quite common, but I've never seen one specifically for integers.

One close candidate I know of is the Nintendo DS, which contained a
memory-mapped integer divider/square rooter. This was helpful for 3D
calculations, given that its ARM processor had no FPU or hardware
divider. However, it's not a native processor instruction.

  * history
  * instruction-set

Share
Improve this question
Follow 
edited 2 days ago
 
Raffzahn's user avatar
Raffzahn
223k2222 gold badges631631 silver badges918918 bronze badges
asked 2 days ago
 
v-rob's user avatar
v-robv-rob
76711 gold badge55 silver badges1010 bronze badges
7

  * 2
    tough one; square rooting is something that you don't want to do
    combinatorically in a single clock cycle, or else your maximum
    clock rate will go dooooown, and it uses the same algorithmic
    components that an integer multiply and division need. This makes
    it a prime candidate for implementation in iterative microcode.
    Now, what's the difference between a CPU that implements an isqrt
    instruction in microcode vs software that implements a isqrt
    function on a CPU that has no isqrt instruction, precisely?
    - Marcus Muller
    2 days ago
  * 3
    Non-iterative implementation for large input sizes get unhandy
    very quickly, and for small sizes, simple lookup tables might be
    quite performant. What is the minimum input size you would
    consider an implementation?
    - Marcus Muller
    2 days ago
  * 
    By "processor" do you mean microprocessor? Or do you intend to
    include mainframes, DSPs, etc?
    - DrSheldon
    2 days ago
  * 
    @DrSheldon I meant it in a pretty general sense of the word. If
    it's got an instruction set that includes an isqrt instruction,
    it'll probably do :)
    - v-rob
    2 days ago
  * 1
    I add the isqrt instruction only because I noticed that the
    bignum library had it as a primitive. It was obvious to me that
    it's not trivial to implement, particularly if someone were to
    need that for large integers. I went through the code, in that
    bignum library and noticed it was poor. I put fixes in to make it
    faster, and then due to the IKEA effect of having made it mine by
    working on it, I had to expose it in the language.
    - Kaz
    5 hours ago

  |  Show 2 more comments

3 Answers 3

Sorted by: Reset to default
[Highest score (default)     ]
38
 

Yes. The Harris RTX 2000 Forth CPU offered a multi-step square root
instruction, as did its military-grade sibling, the RTX 2010. There
may have been others, but that is the one I know of.

See: https://users.ece.cmu.edu/~koopman/stack_computers/sec4_5.html

Share
Improve this answer
Follow 
answered 2 days ago
 
RichF's user avatar
RichFRichF
9,20644 gold badges3131 silver badges5656 bronze badges
3

  * 
    Nice, do you know any more detailed description?
    - Raffzahn
    2 days ago
  * 2
    @Raffzahn Not off the top of my head. But I do see when
    searching, you should specify Harris as part of the search.
    Otherwise the results are mostly pages relating to a series of
    NVidia GPU boards. As far as I can tell, there is no commonality
    besides the general model number.
    - RichF
    2 days ago
  * 5
    Interesting; according to the reference manual (users.ece.cmu.edu
    /~koopman/forth/...), it's an iterated square root where you have
    to run 1 setup instruction and then 15 step instructions to get
    the final answer. I've seen that done for division on other
    processors. And of course it would be a Forth CPU that would have
    such an esoteric instruction, heh.
    - v-rob
    2 days ago

Add a comment  |   
32
 

Is 1946 retro enough?

From Wikipedia:

    ENIAC used four of the accumulators (controlled by a special
    multiplier unit) to perform up to 385 multiplication operations
    per second; five of the accumulators were controlled by a special
    divider/square-rooter unit to perform up to 40 division
    operations per second or three square root operations per second.

ENIAC accumulators operated on decimal integers.

Share
Improve this answer
Follow 
answered 2 days ago
 
John Doty's user avatar
John DotyJohn Doty
2,45477 silver badges1313 bronze badges

Add a comment  |   
17
 

It's not that easy.

The most efficient method to calculate square root is to calculate
inverse/reciprocal of the square root using Newton-Raphson
iterations, and then multiply it with the original.

This is best known as the "Quake method" (see also Where did Fast
InvSqrt() come from?). The more modern version used by contemporary
CPU and GPUs are generalized into two instructions, one for
estimating the initial guess (e.g. frsqrte of ARMv8), another to run
the following iterations (e.g. frsqrts of ARMv8). Single-instruction
version of sqrt is a micro-coded or pseudo-instruction version of
these two instructions.

The prerequisite for all of this is a multiplier.

If you want to calculate FP (i)sqrt, then you need a (fast) FP
multiplier, which all FPUs have.

If you want to calculate integer (i)sqrt, then you need a (fast)
integer multiplier, which most CPUs don't have (historically).
Otherwise it would be called a DSP.

To make it better, you need a (fast) multiplier that is twice the
width of your input to have sufficient precision, which most CPUs
definitely don't have until "relatively" recently (relative to
RetroComputing).

And precision matters, or not?

If you look at the "Quake method" closely, you notice that one of the
iterations was commented out.

There are a lot of use cases where the extreme precision isn't
necessary and it'll be better to leave the choice of precision/speed
trade off to programmers. isqrt was intentionally separated into
fsqrte and fsqrts on ARMv8 exactly for this reason: so that the
programmer can adjust the number of fsqrts for the desired speed and
accuracy tradeoff.

So I don't quite agree to the statement that single instruction sqrt
is very common. It's there because the IEEEE754 and the C stand math
library requires it (for the flag bits and exceptions), but that
doesn't mean it's frequently used.

Further reading

  * SPRC542 TI's math library for C64x DSP (8-issue VLIW CPU with two
    32x32=64 multipliers). In this library _iq _IQisqrt is
    implemented using Newton-Raphson iterations and _iq _IQsqrt is
    calculated by multiplying the original with the isqrt. The source
    code is available on request.
  * SPRUGG9 TMS320C64x+ IQmath Library User's Guide.The user guide
    for SPRC542.
  * My implementation of square root using binary search, that
    doesn't depend on a multiplier. Only basic ALU instructions are
    used. It is vigorously undocumented. I have no idea what I wrote
    but it seems to work.

.

unsigned int usqrt(unsigned int x){
    unsigned int a=0;
    unsigned int masksq=0,mask=0;
    unsigned int mask_shift=15;
    for(masksq=1u<<(mask_shift<<1),mask=1u<<(mask_shift);
        mask!=0;
        masksq=masksq>>2,mask=mask>>1,mask_shift--){
        if(x>=masksq){
            a=mask;
            break;
        }
    }
    x-=masksq;//masksq==a*a;
    mask=mask>>1;
    mask_shift--;
    while(mask>0){
        unsigned int step=(mask<<mask_shift)+(a<<(mask_shift+1));
        if(x>=step){
            a|=mask;
            x-=step;
        }
        mask=mask>>1;
        mask_shift--;
    }
    return a;
}

Share
Improve this answer
Follow 
edited 2 days ago
 
Stephen Kitt's user avatar
Stephen Kitt
122k1717 gold badges505505 silver badges463463 bronze badges
answered 2 days ago
 
user3528438's user avatar
user3528438user3528438
1,3791111 silver badges1111 bronze badges
11

  * 3
    What is recent for RetroComputing? The 8086 (from 1978)
    definitely did have an integer muliplier, even one that could
    multiply two 16 bit values into a 32 bit value. These
    instructions weren't really fast, though, and thus assembly books
    back then suggested to use the shift-register approach, as this
    was often faster than using the native imul/idiv instructions.
    - PMF
    2 days ago
  * 2
    @PMF An integer multiply instruction doesn't say anything about a
    integer multiplier.
    - user3528438
    2 days ago
  * 2
    In what sense do IEEE-754 and the C standard math library require
    a single instruction square root? As far as I know, all IEEE-754
    cares about is things like memory layout and precision, and all C
    cares about is semantics. Unless there's something I'm missing,
    both will allow for the square root to be done in several
    instructions, a function call, or some memory-mapped funk like
    the one the Nintendo DS has.
    - Omar and Lorraine
    2 days ago
  * 5
    @OmarandLorraine you're right that nothing in IEEE754 requires an
    instruction. You're wrong about IEEE754 not defining semantics:
    the standard very much defines operations like multiplications,
    conversion to integer and square roots, along with their correct
    result.
    - Marcus Muller
    2 days ago
  * 
    @OmarandLorraine The IEEE 754 requirement for specific semantics
    of the operation (which are very much defined by the standard)
    heavily favors a single operation approach if not a
    single-instruction approach, because that helps ensure that
    programmers who need those semantics don't have to think about
    what they need to do to get those semantics.
    - Austin Hemmelgarn
    2 days ago

  |  Show 6 more comments

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged

  * history
  * instruction-set

.

  * The Overflow Blog
  * 
    How do mixture-of-experts layers affect transformer models?
  * 
    What a year building AI has taught Stack Overflow
  * Featured on Meta
  * 
    New Focus Styles & Updated Styling for Button Groups
  * 
    Upcoming initiatives on Stack Overflow and across the Stack
    Exchange network

Linked

 
17
Where did Fast InvSqrt() come from?

Related

 
14
Are there any articles elucidating the history of the POPCOUNT
instruction?
 
12
Did any 360-compatible machine implement registers in core?
 
87
Why is the processor instruction called "move", not "copy"?
 
17
Has there ever been a instruction set architecture that did not
require instruction decoding at all?
 
10
Did any core-memory computers have a read-and-erase instruction?
 
19
Have there been any instruction sets with an odd register width?
 
6
Were there any square monitors other than PLATO?
 
11
IMPI Instruction set: is there any reference?
 
11
Why did the VAX compatibility mode not implement the MARK
instruction?

Hot Network Questions

  * 
    Would it be possible to convince a hungry alien species that
    human lives are valuable?
  * 
    Is it possible to have a stable black hole that does not
    evaporate?
  * 
    Could primordial black holes have been supermassive?
  * 
    Why do people keep saying 'it is your responsibility' to make
    sure your supervisor uploads their letter of recommendation (or
    similar)?
  * 
    Draw a car with TikZ
  * 
    Why is LLC Converter optimal Lm Lr ratio around 10?
  * 
    Sci-fi movie from the early 2000s with giant spiders, in which a
    lady goes into the jungle or forest looking for her lost partner
    or lost soldiers
  * 
    Why do scientists use specialized units for distance when metric
    units are perfectly adequate?
  * 
    Can I safely refinish 250 gallon propane tank?
  * 
    Potential for LAMMPS Simulation
  * 
    Any practical use for the underscore variable?
  * 
    If lord krishna was vegetarian, why did he hunt animals and for
    what?
  * 
    Syncthing on ZFS a good case for Deduplication?
  * 
    Shading region under (or over) a curve in horizontal bands
  * 
    How to list all file in a path with absolute path and spaces in a
    script
  * 
    R' David Tzvi Hoffman's sun depression angle for nightfall
  * 
    Alignment inside a equivalent statements proof
  * 
    In this position, why is Nb4 better than b4?
  * 
    Is the Umbrage Hill Quest in Dragon of Icespire Peak likely to
    kill 1st-level PCs?
  * 
    Is infinity a concept or a word empty of meaning?
  * 
    Must I declare all items in my vehicle when crossing from the
    U.S. to Canada, or only those inquired about?
  * 
    Can definitions in the Oxford Dictionary of the English Language
    be considered definitive in informal philosophical presentations?
  * 
    Masyu: Four Colours
  * 
    Is there a dimensional multiplication operation?

more hot questions
Question feed

Subscribe to RSS

Question feed

To subscribe to this RSS feed, copy and paste this URL into your RSS
reader.

[https://retrocomputi]
 
*

Retrocomputing

  * Tour
  * Help
  * Chat
  * Contact
  * Feedback

Company

  * Stack Overflow
  * Teams
  * Advertising
  * Collectives
  * Talent
  * About
  * Press
  * Legal
  * Privacy Policy
  * Terms of Service
  * Cookie Settings
  * Cookie Policy

Stack Exchange Network

  * Technology
  * Culture & recreation
  * Life & arts
  * Science
  * Professional
  * Business
  * API
  * Data

  * Blog
  * Facebook
  * Twitter
  * LinkedIn
  * Instagram

Site design / logo (c) 2024 Stack Exchange Inc; user contributions
licensed under CC BY-SA. rev 2024.4.5.7373