[HN Gopher] Posit floating point numbers: thin triangles and oth...
       ___________________________________________________________________
        
       Posit floating point numbers: thin triangles and other tricks
       (2019)
        
       Author : fanf2
       Score  : 43 points
       Date   : 2025-06-19 14:42 UTC (8 hours ago)
        
 (HTM) web link (marc-b-reynolds.github.io)
 (TXT) w3m dump (marc-b-reynolds.github.io)
        
       | antiquark wrote:
       | This seems to be related to the "type III unum":
       | 
       | https://en.wikipedia.org/wiki/Unum_(number_format)#Posit_(Ty...
        
         | andrepd wrote:
         | Posit is the name of the 3rd in a series of John Gustafson's
         | proposals of an alternative to ieee floats.
        
       | andrepd wrote:
       | Great dive! I'm very interested in posits (and ieee float
       | replacements in general) and never read this post before. Tons of
       | insightful points.
        
       | adrian_b wrote:
       | The example where computing an expression with posits has much
       | better accuracy than when computing with IEEE FP32 is extremely
       | misleading.
       | 
       | Regardless whether you use 32-bit posits or IEEE FP32, you can
       | represent only the same count of numbers, i.e. of points on the
       | "real" numbers axis.
       | 
       | When choosing a representation format, you cannot change the
       | number of representable points, you can just choose to distribute
       | the points in different places.
       | 
       | The IEEE FP32 format distributes the points so that the relative
       | rounding error is approximately constant over the entire range.
       | 
       | Posits crowd the points into the segment close to zero, obtaining
       | there a better rounding error, with the price that the segments
       | distant to zero have very rare points, i.e. very high rounding
       | errors.
       | 
       | Posits behave pretty much like a fixed-point format that has
       | gradual overflow instead of a sharp cut-off. For big numbers you
       | do not get an overflow exception that would stop the computation,
       | but the accuracy of the results becomes very bad. For small
       | numbers the accuracy is good, but not as good as for a fixed-
       | point number, because some bit patterns must be reserved for
       | representing the big numbers, in order to avoid overflow.
       | 
       | The example that demonstrates better accuracy for posits is
       | manufactured by choosing values in the range where posits have
       | better accuracy. It is trivial to manufacture an almost identical
       | example where posits have worse accuracy, by choosing values in
       | an interval where FP32 has better accuracy.
       | 
       | There are indeed problems where posits can outperform IEEE FP32,
       | but it is quite difficult to predict which are those problems,
       | because for a complex problem it can be very difficult to predict
       | which will be the ranges for the intermediate results. This is
       | the very reason why floating-point numbers are preferred over
       | fixed-point numbers, to avoid the necessity of such analyses.
       | 
       | While for IEEE formats it is possible to make estimates of the
       | relative errors of the results of a long computation, due to the
       | guaranteed bounds for the relative error of each operation, that
       | is pretty much impossible for posits, where the relative error is
       | a function of the values of the operands, so you cannot estimate
       | it without actually doing the computation.
       | 
       | For scientific and technical computations, posits are pretty much
       | useless, because those have very wide value ranges for their
       | data, also because those computations need error estimations, and
       | also because posits can have significant advantages only for
       | small number formats, of 32 bits or less, while those
       | computations need mostly 64-bit numbers or even bigger.
       | 
       | Nevertheless, for special problems that are very well
       | characterized, i.e. you know with certainty some narrow ranges
       | for the values of the input data and of the intermediate results,
       | posits could get much more accuracy than IEEE FP32, but they
       | could have good performance only if they were implemented in
       | hardware.
        
         | wat10000 wrote:
         | Isn't that pretty much the entire point of this article?
        
         | andrepd wrote:
         | > The example where computing an expression with posits has
         | much better accuracy than when computing with IEEE FP32 is
         | extremely misleading.
         | 
         | Did you not rtfa or am I missing something?
        
           | dnautics wrote:
           | practically speaking we did put this to the test by having
           | someone run a fluid dynamics simulation on fp64 vs posit64 vs
           | posit32 with no algorithm changes (using fp128 as a "ground
           | truth").
           | 
           | unsurprisingly posit64 results were closer to fp128 than
           | fp64.
        
       | dnautics wrote:
       | One of the creators of posits here (I came up with the name and i
       | think ES is my idea, did the first full soft versions in julia,
       | and designed the first circuits, including a cool optimization
       | for addition). my personal stance is that posits are not great
       | for scientific work precisely because of the difficulties with
       | actually solving error propagation. Hopefully i can give a bit
       | more measured insights into why the "parlor tricks" appear in the
       | posits context.
       | 
       | John's background is in scientific compute/HPC and he previously
       | advocated for using unums (which do fully track errors) and there
       | is a version of posits (called valids) which do track errors,
       | encouraging the user to combine with other techniques to cut the
       | error bounds using invariants, but that requires algorithmic
       | shift. Alas, a lot of examples were lifted from the unums book
       | and sort of square peg/round holed into posits. you can see an
       | example of algorithmic shift in the demo of matrix multiplication
       | in the stanford talk (that demo is me; linked in OP).
       | 
       | as for me, i was much more interested in lower bit representation
       | for ml applications where you ~don't care about error
       | propagation. this also appears in the talk.
       | 
       | as it wound up, Facebook took some interest in it for AI but they
       | nih'd it and redid the mantissa as logarithmic (which i think was
       | a mistake).
       | 
       | and anyway redoing your silicon it turns out to be a pain in the
       | ass (quires only make sense in the burn-the-existing-world
       | perspective and are not so bad for training pipelines, where iirc
       | kronecker product dominates), but the addition operation takes up
       | quite a bit more floorspace, and just quantizing to int4 is with
       | grouped scaling factors is easier with existing gpu pipelines,
       | even custom hardware too.
       | 
       | fun side fact: Positron.ai, was so-named on the off chance that
       | using posits makes sense (you can see the through line to science
       | fiction that i was attempting to manifest when i came up with the
       | name)
        
         | dnautics wrote:
         | turns out only the slides are linked in op. here is the live
         | recording:
         | 
         | https://youtu.be/aP0Y1uAA-2Y?feature=shared
        
         | andrepd wrote:
         | > and designed the first circuits, including a cool
         | optimization for addition
         | 
         | Curious, what trick? :)
         | 
         | Wishing for mainstream CPU support for anything but IEEE
         | numbers was always a pipe dream on anything but the decades-
         | long term, but I gotta be honest, I was hoping the current AI
         | hype wave would bring some custom silicon for alternative float
         | formats, Posits included.
         | 
         | > the addition operation takes up quite a bit more floorspace,
         | and just quantizing to int4 is with grouped scaling factors is
         | easier with existing gpu pipelines
         | 
         | Can you elaborate on this part?
        
           | dnautics wrote:
           | > trick
           | 
           | Normally fp numbers have an "invisible leading one bit", so
           | an ieee fp is (1+0.nnnn) x 2^E where nnn is the mantissa and
           | E is the exponent.
           | 
           | as posits are always two's complement, you can model negative
           | posits as having an "invisible leading two bit", aka
           | (2-0.nnnn) x 2^E-1. this greatly simplifies circuitry for
           | addition and multiplication.
           | 
           | > the addition operation takes up quite a bit more
           | floorspace,
           | 
           | I miswrote, it's multiplication. This is simple: since the
           | math requires floorspace of size n^2 where n is the maximum
           | resolution of the mantissa, and posits have more resolution
           | around unity.
           | 
           | we did floorspace estimates for fp32 versus p32 and it looked
           | like the extra space required for the mantissa did not
           | overtake the savings from the crazy logic required for sign
           | flipping, nans, and denormals that IEEE requires that posits
           | don't.
           | 
           | > I was hoping the current AI hype wave would bring some
           | custom silicon for alternative float formats, Posits included
           | 
           | i was hoping for this too!
        
             | adgjlsfhk1 wrote:
             | I really want a standards body to standardize a positized
             | version of floating point numbers. Everything about the
             | encoding of posits (i.e. NaR, no -0.0, exponent encoding)
             | is so much nicer. The tapered precision part of posits is
             | IMO the least interesting part.
        
               | dnautics wrote:
               | the twos complement thing is really elegant (-p == (posit
               | -(int p))) in c notation
        
               | adgjlsfhk1 wrote:
               | relatedly, having the same ordering as signed ints just
               | makes everything so much easier. You can just use your
               | integer radix sort to sort your floats without having to
               | deal with sign and nans first!
        
               | immibis wrote:
               | So do floats if you don't care about NaN
        
         | gsf_emergency_2 wrote:
         | 2018 blog post on the FB effort:
         | https://engineering.fb.com/2018/11/08/ai-research/floating-p...
        
       | burnt-resistor wrote:
       | Condensed IEEE-like formats cheat sheet I threw together and
       | tested:
       | 
       | https://pastebin.com/aYwiVNcA
        
       | mserdarsanli wrote:
       | A while ago I built an interactive tool to display posits (Also
       | IEEE floats etc.): https://mserdarsanli.github.io/FloatInfo/
       | 
       | It is hard to understand at first but after playing with this a
       | bit it will make sense. As with everything, there are trade offs
       | compared to IEEE floats, but having more precision when numbers
       | are close to 1 is pretty nice.
        
       | im3w1l wrote:
       | So my pet theory for this space is that a pure exponent format
       | could be interesting.
       | 
       | Like a number would be s * 2 ^ (e / C), where s is a sign bit,
       | and e is a signed int and C is some constant maybe 2^20 ish for a
       | 32 bit format. Special cases would be carved out for +-0, +-inf
       | and NaN.
        
         | adgjlsfhk1 wrote:
         | https://en.wikipedia.org/wiki/Logarithmic_number_system this
         | has been tried before. The hard part is getting addition to
         | work fast enough.
        
           | im3w1l wrote:
           | Would addition be slower than for normal floats?
        
             | adgjlsfhk1 wrote:
             | much. To add 2 lns numbers, you need to essentially compute
             | ln(1+b). There isn't a good algorithm to do so (except for
             | lookup tables for low precision)
        
           | immibis wrote:
           | Silly idea: what if we ran ML models on pure logarithms and
           | used some completely terrible approximation for addition
           | (maybe just the max function, or max plus a lookup table
           | based on the difference between the two numbers when it's
           | small)
        
       | phkahler wrote:
       | My enthusiasm for posits was never about the precision. It was
       | due to not having signed zero, and not having NaN. Just give me
       | the nearest representable value please. Operations on numbers
       | should never yield results that are not numbers.
        
         | thfuran wrote:
         | What should happen if you divide 1 by 0?
        
           | dnautics wrote:
           | inf
        
             | thfuran wrote:
             | That's not really a number either. But what about 0/0?
        
         | spc476 wrote:
         | IEEE does define x/0 to be infinity and -x/0 as -infinity. But
         | what about 0/0? Or sqrt(-1)?
        
       ___________________________________________________________________
       (page generated 2025-06-19 23:00 UTC)