[HN Gopher] Posit floating point numbers: thin triangles and oth...
___________________________________________________________________
Posit floating point numbers: thin triangles and other tricks
(2019)
Author : fanf2
Score : 43 points
Date : 2025-06-19 14:42 UTC (8 hours ago)
(HTM) web link (marc-b-reynolds.github.io)
(TXT) w3m dump (marc-b-reynolds.github.io)
| antiquark wrote:
| This seems to be related to the "type III unum":
|
| https://en.wikipedia.org/wiki/Unum_(number_format)#Posit_(Ty...
| andrepd wrote:
| Posit is the name of the 3rd in a series of John Gustafson's
| proposals of an alternative to ieee floats.
| andrepd wrote:
| Great dive! I'm very interested in posits (and ieee float
| replacements in general) and never read this post before. Tons of
| insightful points.
| adrian_b wrote:
| The example where computing an expression with posits has much
| better accuracy than when computing with IEEE FP32 is extremely
| misleading.
|
| Regardless whether you use 32-bit posits or IEEE FP32, you can
| represent only the same count of numbers, i.e. of points on the
| "real" numbers axis.
|
| When choosing a representation format, you cannot change the
| number of representable points, you can just choose to distribute
| the points in different places.
|
| The IEEE FP32 format distributes the points so that the relative
| rounding error is approximately constant over the entire range.
|
| Posits crowd the points into the segment close to zero, obtaining
| there a better rounding error, with the price that the segments
| distant to zero have very rare points, i.e. very high rounding
| errors.
|
| Posits behave pretty much like a fixed-point format that has
| gradual overflow instead of a sharp cut-off. For big numbers you
| do not get an overflow exception that would stop the computation,
| but the accuracy of the results becomes very bad. For small
| numbers the accuracy is good, but not as good as for a fixed-
| point number, because some bit patterns must be reserved for
| representing the big numbers, in order to avoid overflow.
|
| The example that demonstrates better accuracy for posits is
| manufactured by choosing values in the range where posits have
| better accuracy. It is trivial to manufacture an almost identical
| example where posits have worse accuracy, by choosing values in
| an interval where FP32 has better accuracy.
|
| There are indeed problems where posits can outperform IEEE FP32,
| but it is quite difficult to predict which are those problems,
| because for a complex problem it can be very difficult to predict
| which will be the ranges for the intermediate results. This is
| the very reason why floating-point numbers are preferred over
| fixed-point numbers, to avoid the necessity of such analyses.
|
| While for IEEE formats it is possible to make estimates of the
| relative errors of the results of a long computation, due to the
| guaranteed bounds for the relative error of each operation, that
| is pretty much impossible for posits, where the relative error is
| a function of the values of the operands, so you cannot estimate
| it without actually doing the computation.
|
| For scientific and technical computations, posits are pretty much
| useless, because those have very wide value ranges for their
| data, also because those computations need error estimations, and
| also because posits can have significant advantages only for
| small number formats, of 32 bits or less, while those
| computations need mostly 64-bit numbers or even bigger.
|
| Nevertheless, for special problems that are very well
| characterized, i.e. you know with certainty some narrow ranges
| for the values of the input data and of the intermediate results,
| posits could get much more accuracy than IEEE FP32, but they
| could have good performance only if they were implemented in
| hardware.
| wat10000 wrote:
| Isn't that pretty much the entire point of this article?
| andrepd wrote:
| > The example where computing an expression with posits has
| much better accuracy than when computing with IEEE FP32 is
| extremely misleading.
|
| Did you not rtfa or am I missing something?
| dnautics wrote:
| practically speaking we did put this to the test by having
| someone run a fluid dynamics simulation on fp64 vs posit64 vs
| posit32 with no algorithm changes (using fp128 as a "ground
| truth").
|
| unsurprisingly posit64 results were closer to fp128 than
| fp64.
| dnautics wrote:
| One of the creators of posits here (I came up with the name and i
| think ES is my idea, did the first full soft versions in julia,
| and designed the first circuits, including a cool optimization
| for addition). my personal stance is that posits are not great
| for scientific work precisely because of the difficulties with
| actually solving error propagation. Hopefully i can give a bit
| more measured insights into why the "parlor tricks" appear in the
| posits context.
|
| John's background is in scientific compute/HPC and he previously
| advocated for using unums (which do fully track errors) and there
| is a version of posits (called valids) which do track errors,
| encouraging the user to combine with other techniques to cut the
| error bounds using invariants, but that requires algorithmic
| shift. Alas, a lot of examples were lifted from the unums book
| and sort of square peg/round holed into posits. you can see an
| example of algorithmic shift in the demo of matrix multiplication
| in the stanford talk (that demo is me; linked in OP).
|
| as for me, i was much more interested in lower bit representation
| for ml applications where you ~don't care about error
| propagation. this also appears in the talk.
|
| as it wound up, Facebook took some interest in it for AI but they
| nih'd it and redid the mantissa as logarithmic (which i think was
| a mistake).
|
| and anyway redoing your silicon it turns out to be a pain in the
| ass (quires only make sense in the burn-the-existing-world
| perspective and are not so bad for training pipelines, where iirc
| kronecker product dominates), but the addition operation takes up
| quite a bit more floorspace, and just quantizing to int4 is with
| grouped scaling factors is easier with existing gpu pipelines,
| even custom hardware too.
|
| fun side fact: Positron.ai, was so-named on the off chance that
| using posits makes sense (you can see the through line to science
| fiction that i was attempting to manifest when i came up with the
| name)
| dnautics wrote:
| turns out only the slides are linked in op. here is the live
| recording:
|
| https://youtu.be/aP0Y1uAA-2Y?feature=shared
| andrepd wrote:
| > and designed the first circuits, including a cool
| optimization for addition
|
| Curious, what trick? :)
|
| Wishing for mainstream CPU support for anything but IEEE
| numbers was always a pipe dream on anything but the decades-
| long term, but I gotta be honest, I was hoping the current AI
| hype wave would bring some custom silicon for alternative float
| formats, Posits included.
|
| > the addition operation takes up quite a bit more floorspace,
| and just quantizing to int4 is with grouped scaling factors is
| easier with existing gpu pipelines
|
| Can you elaborate on this part?
| dnautics wrote:
| > trick
|
| Normally fp numbers have an "invisible leading one bit", so
| an ieee fp is (1+0.nnnn) x 2^E where nnn is the mantissa and
| E is the exponent.
|
| as posits are always two's complement, you can model negative
| posits as having an "invisible leading two bit", aka
| (2-0.nnnn) x 2^E-1. this greatly simplifies circuitry for
| addition and multiplication.
|
| > the addition operation takes up quite a bit more
| floorspace,
|
| I miswrote, it's multiplication. This is simple: since the
| math requires floorspace of size n^2 where n is the maximum
| resolution of the mantissa, and posits have more resolution
| around unity.
|
| we did floorspace estimates for fp32 versus p32 and it looked
| like the extra space required for the mantissa did not
| overtake the savings from the crazy logic required for sign
| flipping, nans, and denormals that IEEE requires that posits
| don't.
|
| > I was hoping the current AI hype wave would bring some
| custom silicon for alternative float formats, Posits included
|
| i was hoping for this too!
| adgjlsfhk1 wrote:
| I really want a standards body to standardize a positized
| version of floating point numbers. Everything about the
| encoding of posits (i.e. NaR, no -0.0, exponent encoding)
| is so much nicer. The tapered precision part of posits is
| IMO the least interesting part.
| dnautics wrote:
| the twos complement thing is really elegant (-p == (posit
| -(int p))) in c notation
| adgjlsfhk1 wrote:
| relatedly, having the same ordering as signed ints just
| makes everything so much easier. You can just use your
| integer radix sort to sort your floats without having to
| deal with sign and nans first!
| immibis wrote:
| So do floats if you don't care about NaN
| gsf_emergency_2 wrote:
| 2018 blog post on the FB effort:
| https://engineering.fb.com/2018/11/08/ai-research/floating-p...
| burnt-resistor wrote:
| Condensed IEEE-like formats cheat sheet I threw together and
| tested:
|
| https://pastebin.com/aYwiVNcA
| mserdarsanli wrote:
| A while ago I built an interactive tool to display posits (Also
| IEEE floats etc.): https://mserdarsanli.github.io/FloatInfo/
|
| It is hard to understand at first but after playing with this a
| bit it will make sense. As with everything, there are trade offs
| compared to IEEE floats, but having more precision when numbers
| are close to 1 is pretty nice.
| im3w1l wrote:
| So my pet theory for this space is that a pure exponent format
| could be interesting.
|
| Like a number would be s * 2 ^ (e / C), where s is a sign bit,
| and e is a signed int and C is some constant maybe 2^20 ish for a
| 32 bit format. Special cases would be carved out for +-0, +-inf
| and NaN.
| adgjlsfhk1 wrote:
| https://en.wikipedia.org/wiki/Logarithmic_number_system this
| has been tried before. The hard part is getting addition to
| work fast enough.
| im3w1l wrote:
| Would addition be slower than for normal floats?
| adgjlsfhk1 wrote:
| much. To add 2 lns numbers, you need to essentially compute
| ln(1+b). There isn't a good algorithm to do so (except for
| lookup tables for low precision)
| immibis wrote:
| Silly idea: what if we ran ML models on pure logarithms and
| used some completely terrible approximation for addition
| (maybe just the max function, or max plus a lookup table
| based on the difference between the two numbers when it's
| small)
| phkahler wrote:
| My enthusiasm for posits was never about the precision. It was
| due to not having signed zero, and not having NaN. Just give me
| the nearest representable value please. Operations on numbers
| should never yield results that are not numbers.
| thfuran wrote:
| What should happen if you divide 1 by 0?
| dnautics wrote:
| inf
| thfuran wrote:
| That's not really a number either. But what about 0/0?
| spc476 wrote:
| IEEE does define x/0 to be infinity and -x/0 as -infinity. But
| what about 0/0? Or sqrt(-1)?
___________________________________________________________________
(page generated 2025-06-19 23:00 UTC)