[HN Gopher] Addition Is All You Need for Energy-Efficient Langua...
___________________________________________________________________
Addition Is All You Need for Energy-Efficient Language Models
Author : InvisibleUp
Score : 282 points
Date : 2024-10-09 04:47 UTC (18 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| md_rumpf wrote:
| The return of the CPU?!
| anticensor wrote:
| The reign of Threadripper!
| visarga wrote:
| > can potentially reduce 95% energy cost by elementwise floating
| point tensor multiplications and 80% energy cost of dot products
|
| It this were about convolutional nets then optimizing compute
| would be a much bigger deal. Transformers are lightweight on
| compute and heavy on memory. The weakest link in the chain is
| fetching the model weights into the cores. The 95% and 80% energy
| reductions cited are for the multiplication operations in
| isolation, not for the entire inference process.
| kendalf89 wrote:
| Maybe this technique can be used for training then since that
| is a lot more compute intensive?
| lifthrasiir wrote:
| I'm also sure that fp8 is small enough that multiplication can
| really be done in a much simpler circuit than larger fp
| formats. Even smaller formats like fp4 would be able to just
| use a lookup table, and that makes them more like sort-of-
| standardized quantization schemes.
| tankenmate wrote:
| i suspect that you could do fp8 with log tables and
| interpolation if you really wanted to (compared to the memory
| required for the model it's peanuts), it just turns into a
| LUT (log table look up) and bit shift (interpolation). so
| again, memory bandwidth is the limiting factor for
| transformers (as far as energy is concerned).
| lifthrasiir wrote:
| This time though LUT exists in a circuit, which is much
| more efficient than typical memory lookup. Such LUT would
| have to exist per each ALU though, so it can't be too
| large.
| brilee wrote:
| fp4/fp8 for neural networks don't work the way you think they
| do - they are merely compression formats - a set of, say, 256
| fp32 weights from 1 neuron are lossily turned into 1 max
| value (stored in fp32 precision) and 256 fp4/fp8 numbers.
| Those compressed numbers are multiplied by the fp32 number at
| runtime to restore the original weights and full fp32
| multiplication + additions are executed.
| lifthrasiir wrote:
| You are correct that the accumulation (i.e. additions in
| dot products) has to be done in a higher precision, however
| the multiplication can still be done via LUT. (Source: I
| currently work at a hardware-accelerated ML hardware
| startup.)
| imjonse wrote:
| With w8a8 quantization the hw (>= hopper) can do the heavy
| math in fp8 twice as fast as fp16.
| SuchAnonMuchWow wrote:
| The goal of this type of quantization is to move the
| multiplication by the fp32 rescale factor outside of the
| dot-product accumulation.
|
| So the multiplications+additions are done on
| fp8/int8/int4/whatever (when the hardware support those
| operators of course) and accumulated in a fp32 or similar,
| and only the final accumulator is multiplied by the rescale
| factor in fp32.
| rajnathani wrote:
| That's how Nvidia's mixed precision training worked with
| FP32-FP16, but it isn't the case for Bfloat16 on TPUs and
| maybe (I'm not sure) FP8 training on Nvidia Hopper GPUs.
| bee_rider wrote:
| What is fp4? 3 bits of exponent and one of mantissa?
| wruza wrote:
| SEEM (sign, exp, mantissa)
| bee_rider wrote:
| Interesting... I guess it must be biased, m*2^ee would
| leave like half of the limited space wasted, so 1.m*2^ee?
|
| I always wonder with these tiny formats if 0 should even
| be represented...
| wruza wrote:
| I'm not a binary guy that much, but iirc all floats are
| 1.m*2^e -- "1." is always there except for subnormals.
| There's also SEEE FP4 which is basically +-2^([u?]int3).
|
| https://medium.com/@harrietfiagbor/floating-points-and-
| deep-...
| SuchAnonMuchWow wrote:
| Its worse than that: the energy gains are when comparing
| computations made with fp32, but for fp8 the multipliers are
| really tiny and the adder/shifters represent a largest part of
| the operators (energy-wise and area-wise) and this paper will
| only have small gains.
|
| On fp8, the estimated gate count of fp8 multipliers is 296 vs.
| 157 with their technique, so the power gain on the multipliers
| will be much lower (50% would be a more reasonable estimation),
| but again for fp8 the additions in the dot products are a large
| part of the operations.
|
| Overall, its really disingenuous to claim 80% power gain and
| small drop in accuracy, when the power gain is only for fp32
| operations and the small drop in accuracy is only for fp8
| operators. They don't analyze the accuracy drop in fp32, and
| don't present the power saved for fp8 dot product.
| bobsyourbuncle wrote:
| I'm new to neural nets, when should one use fp8 vs fp16 vs
| fp32?
| reissbaker wrote:
| Basically no one uses FP32 at inference time. BF16/FP16 is
| typically considered unquantized, whereas FP8 is lightly
| quantized. That being said there's pretty minimal quality
| loss at FP8 compared to 16-bit typically; Llama 3.1 405b,
| for example, only benchmarks around ~1% worse when run at
| FP8: https://blog.vllm.ai/2024/07/23/llama31.html
|
| Every major inference provider other than Hyperbolic Labs
| runs Llama 3.1 405b at FP8, FWIW (e.g. Together, Fireworks,
| Lepton), so to compare against FP32 is misleading to say
| the least. Even Hyperbolic runs it at BF16.
|
| Pretraining is typically done in FP32, although some labs
| (e.g. Character AI, RIP) apparently train in INT8:
| https://research.character.ai/optimizing-inference/
| ericlewis wrote:
| Higher the precision the better. Use what works within your
| memory constraints.
| jasonjmcghee wrote:
| With serious diminishing returns. At inference time, no
| reason to use fp64 and should probably use fp8 or less.
| The accuracy loss is far less than you'd expect. AFAIK
| Llama 3.2 3B fp4 will outperform Llama 3.2 1B at fp32 in
| accuracy and speed, despite 8x precision.
| imjonse wrote:
| That is true for single user/light inference only. For training
| and batch inference you can get compute bound fast enough.
| saagarjha wrote:
| That really depends on what you're doing. Trying to feed a
| tensor core is pretty hard-they're really fast.
| woadwarrior01 wrote:
| Pre-fill (even in the single batch case) and multi-batch
| decoding are still compute dominated. The oft repeated trope of
| "decoder only transformer inference is bottle-necked on memory
| bandwidth" is only strictly true in the single batch decoding
| case, because you're mostly doing vector matrix mults when the
| batch size is one.
| ein0p wrote:
| Not even single batch. If you want reasonable latency per
| token (TPOT) even larger batches do not give you high compute
| utilization during extend. It's only when you don't care
| about TPOT at all, and your model is small enough to leave
| space for a large batch on an 8 GPU host, that's when you
| could get decent utilization. That's extend only - it's easy
| to get high utilization in prefill.
| mikewarot wrote:
| Imagine if you had a systolic array large enough that all the
| weights would only have to be loaded once at startup.
| Eliminating the memory-compute bottleneck of the von Neumann
| architecture could make this quite a bit more efficient.
| api wrote:
| Sounds like the awesome architecture for transformers would be
| colocation of memory and compute.
| Joker_vD wrote:
| Yes, that's why we generally run them on GPUs.
| moffkalast wrote:
| GPUs that pull a kilowatt when running yes. This might
| actually work on an FPGA if the addition doesn't take too
| many clock cycles compared to matmuls which were too slow.
| phkahler wrote:
| That's why we need a row of ALUs in RAM chips. Read a row
| of DRAM and use it in a vector operation. With the speed of
| row reading, the ALU could take many cycles per operation
| to limit area.
| api wrote:
| GPUs are better but I'm thinking of even tighter coupling,
| like an integrated architecture.
| cpldcpu wrote:
| It puzzles me that there does not seem to be a proper derivation
| and discussion of the error term in the paper. It's all treated
| indirectly way inference results.
| Lerc wrote:
| The paper has an odd feel about it to me too. Doing a gate
| estimation as a text explanation without a diagram makes it too
| easy to miss some required part. It wouldn't need to be a full
| gate level explanation but blocks labeled 'adder'.
|
| Seeing the name de Vries in the first paragraph didn't help my
| sense of confidence either.
| brcmthrowaway wrote:
| Because of the twisted mentat?
| Lerc wrote:
| No more because of things like
|
| http://blog.zorinaq.com/bitcoin-electricity-consumption/
|
| It's a long read to go over multiple years worth of posts
| and comments but gives you a measure of the man.
| CGamesPlay wrote:
| I believe this reduces the compute required, but still uses 8
| bits per value, so it does not reduce the memory requirements
| required to run inference, so it doesn't particularly make the
| models more accessible for inference. Is this storage method
| suitable for training? That could potentially be an interesting
| application.
| scotty79 wrote:
| All You Need is Considered Harmful.
| TaurenHunter wrote:
| We will need a paper titled '"Considered Harmful" Articles is
| All You Need' to complete that cycle.
| js8 wrote:
| Haven't read it, but isn't this just logarithmic tables in some
| form?
|
| I am asking not to dismiss it, I genuinely feel I don't
| understand logarithms on a fundamental level (of logic gates
| etc.). If multiplication can be replaced with table lookup and
| addition, then there has to be a circuit that gives you difficult
| addition and easy multiplication, or any combination of those
| tradeoffs.
| pclmulqdq wrote:
| Yes, this is logarithmic number systems at work.
| ranguna wrote:
| I've seen this claim a few time across the last couple years and
| I have a pet theory why this isn't explored a lot:
|
| Nvidia funds most research around LLMs, and they also fund other
| companies that fund other research. If transformers were to use
| addition and remova all usage of floating point multiplication,
| there's a good chance the gpu would no longer be needed, or in
| the least, cheaper ones would be good enough. If that were to
| happen, no one would need nvidia anymore and their trillion
| dollar empire would start to crumble.
|
| University labs get free gpus from nvidia -> University labs
| don't want to do research that would make said gpus obsolete
| because nvidia won't like that.
|
| If this were to be true, it would mean that we are stuck on an
| inificient research path due to corporate greed. Imagine if this
| really was the next best thing, and we just don't explore it more
| because the ruling corporation doesn't want to lose their market
| cap.
|
| Hopefully I'm wrong.
| yieldcrv wrote:
| Alternatively, other people fund LLM research
| chpatrick wrote:
| It's still a massively parallel problem suited to GPUs, whether
| it's float or int, or addition or multiplication doesn't really
| matter.
| teaearlgraycold wrote:
| NVidia GPUs support integer operations specifically for use
| with deep learning models.
| londons_explore wrote:
| If an addition-only LLM performed better, nvidia would probably
| still be the market leader.
|
| Next gen nvidia chips would have more adders and fewer
| multipliers.
| cpldcpu wrote:
| I have to disagree. Nvidia spent a lot of effort on researching
| improved numerical representations. You can see a summary in
| this talk:
|
| https://www.youtube.com/watch?v=gofI47kfD28
|
| A lot of their work was published but went by unnoticed. But in
| fact the majority of their performance increase in new
| architecture is resulting from this work.
|
| Reading between the lines, it seems that they came to the
| conclusion that a 4 bit representation with a group exponent
| ("FP4") is the most efficient representation of weights for
| inference. Reducing the number of bits in weights has the
| biggest impact on LLMs inference, since they are mostly memory
| bound. At these low bit numbers, the impact of using
| multiplication or other approaches is not really significiant
| anymore.
|
| (multiplying a 4 bit wight with a larger activation is
| effectively 4 additions, barely more than what the paper
| proposes)
| nayroclade wrote:
| "Good enough" for what? We're in the middle of an AI arms race.
| Why do you believe people would choose to run the same LLMs on
| cheaper equipment instead of using the greater efficiency to
| train and run even larger LLMs?
|
| Given LLM performance seems to scale with their size, this
| would result in more powerful models, which would grow the
| applicability, use and importance of AI, which would in turn
| grow the use and importance of Nvidia's hardware.
|
| So this theory doesn't really stack up for me.
| raincole wrote:
| > I have a pet theory
|
| You mean you have a conspiracy theory.
|
| Why wouldn't other companies that buy Nvidia GPU fund these
| researches? It would greatly cut their cost.
| yunohn wrote:
| Google & Apple already run custom chips, Meta and MS are
| deploying their own soon too. Your theory is that none of them
| have researched non-matrix-multiplication solutions before
| investing billions?
| miohtama wrote:
| There are several patents on this topic so they have
| twoodfin wrote:
| I'd estimate that fraction of Nvidia's dominance that's
| dependent on their distinctive advantages in kernel primitives
| (add vs. multiply) would be a rounding error in FP8.
|
| The CUDA tooling and ecosystem, VLSI architecture,
| organizational prowess... all matter at multiple orders of
| magnitude more.
| iamgopal wrote:
| no matter how fast cpu, network and browser has become,
| websites are still slow. we will run out of data to train much
| earlier than people will stop inventing even larger models.
| WrongAssumption wrote:
| So let me get this straight. Universities don't want to show
| that Nvidia gpus are obsolete, so they can receive a steady
| stream of obsolete gpus? For what possible reason, that doesn't
| make sense.
| concrete_head wrote:
| Just too add an alternative addition based architecture into the
| mix.
|
| https://www.youtube.com/watch?v=VqXwmVpCyL0
| pjc50 wrote:
| "We recommend training and hosting L-Mul-based models on devices
| integrated with specialized architectural designs. Patent
| pending"
|
| (from footnote in method section)
| cpldcpu wrote:
| Bill Dally from nvidia introduced a log representation that
| basically allows to replace a multiplication with an add, without
| loss of accuracy (in contract to proposal above)
|
| https://youtu.be/gofI47kfD28?t=2248
| Buttons840 wrote:
| Would using this neural network based on integer addition be
| faster? The paper does not claim it would be faster, so I'm
| assuming not?
|
| What about over time? If this L-Mul (the matrix operation based
| on integer addition) operation proved to be much more energy
| efficient and became popular, would new hardware be created that
| was faster?
| tantalor wrote:
| [2023] GradIEEEnt half decent: The hidden power of imprecise
| lines
|
| http://tom7.org/grad/murphy2023grad.pdf
|
| Also in video form: https://www.youtube.com/watch?v=Ae9EKCyI1xU
| indrora wrote:
| I had hoped that they would reference this in their paper as
| some kind of "supporting previous exploration" but no, alas.
| A4ET8a8uTh0 wrote:
| Uhh.. I hate to be the one to ask this question, but shouldn't we
| be focused on making LLMs work well first and then focused on
| desired optimizations? Using everyone's car analogy, it is like
| making sure early cars are using lower amount of coal. It is a
| fool's errand.
| Maken wrote:
| The optimizations described could easily work on other models,
| not just transformers. Following your analogy, this is
| optimizing plumbing, pistons and valves on steam engines, it
| could be useful for whatever follows.
| lukev wrote:
| Also, making neural networks faster/cheaper is a big part of
| how they advance.
|
| We've known about neural architectures since the 70s, but we
| couldn't build them big enough to be actually useful until the
| advent of the GPU.
|
| Similarly, the LLM breakthrough was because someone decided it
| was worth spending millions of dollars to train one. Efficiency
| improvements lower that barrier for all future development (or
| alternatively, allow us to build even bigger models for the
| same cost.)
| itishappy wrote:
| Coal (and even wood!) powered cars actually existed long before
| Ford, but didn't take off because they were too heavy and
| unwieldly. The Model T was the result of a century of
| optimization.
|
| https://en.wikipedia.org/wiki/Nicolas-Joseph_Cugnot
| spencerchubb wrote:
| Cheaper compute is basically a prerequisite to making better
| models. You can get some improvements on the margins by making
| algorithms better with current hardware, but not an order of
| magnitude improvement.
|
| When there is an order of magnitude improvement in hardware,
| the AI labs will figure out an algorithm to best take advantage
| of it.
| shrubble wrote:
| I remember that many years ago, when floating point computation
| was expensive for Intel CPUs to do, there were multiple ways that
| programmers used integer trickery to work around this.
|
| Chuck Moore of Forth fame demonstrated taking the value, say 1.6
| multiplied by 4.1 and doing all the intermediate calculations via
| integers (16 * 41) and then formatting the output by putting the
| decimal point back in the "right place"; this worked as long as
| the range of floating point values was within a range that
| multiplying by 10 didn't exceed 65536 (16 bit integers), for
| instance. For embedded chips where for instance, you have an
| analog reading with 10 bits precision to quickly compute multiple
| times per second, this worked well.
|
| I also recall talking many years ago with a Microsoft engineer
| who had worked with the Microsoft Streets and Trips program
| (https://archive.org/details/3135521376_qq_CD1 for a screenshot)
| and that they too had managed to fit what would normally be
| floating point numbers and the needed calculations into some kind
| of packed integer format with only the precision that was
| actually needed, that was faster on the CPUs of the day as well
| as more easily compressed to fit on the CDROM.
| dwattttt wrote:
| That particular trick is known as fixed point arithmetic (not
| to be confused with a fixed point of a function)
| candiddevmike wrote:
| AFAIK this is still the best way to handle money/financial
| numbers.
| amanda99 wrote:
| That's got nothing to do with perf tho.
| Maxatar wrote:
| Nothing to do with perf is a strong claim. If you genuinely
| don't care about performance you can use an arbitrary-
| precision rational number representation.
|
| But performance often matters, so you trade off precision
| for performance. I think people are wrong to dismiss
| floating point numbers in favor of fixed point arithmetic,
| and I've seen plenty of fixed point arithmetic that has
| failed spectacularly because people think if you use it, it
| magically solves all your problems...
|
| Whatever approach you take other than going all in with
| arbitrary precision fractions, you will need to have a good
| fundamental understanding of your representation and its
| trade-offs. For me personally I use floating point binary
| and adjust the decimal point so I can exactly represent any
| value to 6 decimal places. It's a good trade-off between
| performance, flexibility, and precision.
|
| It's also what the main Bitcoin implementation uses.
| fluoridation wrote:
| Huh? Bitcoin uses integers. The maximum supply of BTC in
| satoshis fits in 64 bits. JS implementations that need to
| handle BTC amounts use doubles, but only by necessity,
| since JS doesn't have an integer type. They still use the
| units to represent satoshis, which works because the
| maximum supply also fits in 53 bits, so effectively
| they're also using integers.
|
| Anyone who uses binary floating point operations on
| monetary values doesn't know what they're doing and is
| asking for trouble.
| wbl wrote:
| So if I want to price a barrier in Bermudan rainbow via
| Monte Carlo I should take the speed hit for a few oddball
| double rounding problems that are pennies?
| fluoridation wrote:
| I mean, you do you. People generally don't complain if
| you're a couple hundred nanoseconds (if that) late. They
| do complain if your accounts don't add up by a single
| penny.
| wbl wrote:
| The quoting of something exotic like this is not well
| defined to the penny. It's transactions where people
| really care about pennies.
| asadalt wrote:
| this is still true for many embedded projects. like pi pico
| (2040) uses a table.
| dajoh wrote:
| What you're describing is called fixed point arithmetic, a
| super cool technique I wish more programmers knew about.
|
| Proper finance related code should use it, but in my experience
| in that industry it doesn't seem very common unless you're
| running mainframes.
|
| Funnily enough, I've seen a lot more fixed point arithmetic in
| software rasterizers than anywhere else. FreeType, GDI, WPF,
| WARP (D3D11 reference rasterizer) all use it heavily.
| kccqzy wrote:
| I have worked on firmware that has plenty of fixed point
| arithmetic. The firmware usually runs on processors without
| hardware floating point units. For example certain Tesla ECUs
| use 32-bit integers where they divide it into four bits of
| integer part and 28 bits of fractional part. So values are
| scaled by 2^28.
| phkahler wrote:
| >> The firmware usually runs on processors without hardware
| floating point units.
|
| I'm working on control code one an ARM cortex-M4f. I wrote
| it all in fixed point because I don't trust an FPU to be
| faster, and I also like to have a 32bit accumulator instead
| of 24bit. I recently converted it all to floating point
| since we have the M4f part (f indicate FPU), and it's a
| little slower now. I did get to remove some limit checking
| since I can rely on the calculations being inside the
| limits but it's still a little slower than my fixed point
| implementation.
| sitkack wrote:
| The other great thing about going fixed point is that it
| doesn't expose you to device specific floating point
| bugs, making your embedded code way more portable and
| easier to test.
|
| 32b float on your embedded device doesn't necessary match
| your 32b float running on your dev machine.
| bobmcnamara wrote:
| 32b float can match your desktop. Really just takes a few
| compiler flags(like avoiding -funsafe-math), setting
| rounding modes, and not using the 80bit Intel
| mode(largely disused after 64bit transition).
| gatane wrote:
| Are there any good benchmarks for float vs fixed point,
| specially for ARM systems?
| aatd86 wrote:
| What do they use? Not float I hope. Plus given that some
| currencies have different precisions... Don't tell me it's
| rounding errors over trillion monies?! :o)
| fluoridation wrote:
| The industry standard in finance is decimal floating point.
| C# for example has 'decimal', with 128 bits of precision.
|
| On occasion I've seen people who didn't know any better use
| floats. One time I had to fix errors of single satoshis in
| a customer's database because their developer used 1.0 to
| represent 1 BTC.
| Maxatar wrote:
| As I indicate in another post, I work in finance and I use
| binary floats. So do a lot of others who work in the
| industry. I sympathize with people who think that IEEE
| floating points are some weird or error prone
| representation and that fixed point arithmetic solves every
| problem, but in my professional experience that isn't true
| and systems that start by using fixed point arithmetic
| eventually end up making a half-assed error prone and slow
| version of floating point arithmetic as soon as they need
| to handle more sophisticated use cases like handling
| multiple currencies, doing calculations involving
| percentages such as interest rates, etc etc...
|
| The IEEE 754 floating point standard is a very well thought
| out standard that is suitable for representing money as-is.
| If you have requirements such as
| compliance/legal/regulatory needs that mandate a minimum
| precision, then you can either opt to use decimal floating
| point or use binary floating point where you adjust the
| decimal place up to whatever legally required precision you
| are required to handle.
|
| For example the common complaint about binary floating
| point is that $1.10 can't be represented exactly so you
| should instead use a fixed integer representation in terms
| of cents and represent it as 110. But if your requirement
| is to be able to represent values exactly to the penny,
| then you can simply do the same thing but using a floating
| point to represent cents and represent $1.10 as the
| floating point 110.0. The fixed integer representation
| conveys almost no benefit over the floating point
| representation, and once you need to work with and mix
| currencies that are significantly out of proportion to one
| another, you begin to really appreciate the nuances and
| work that went into IEEE 754 for taking into account a
| great deal of corner cases that a fixed integer
| representation will absolutely and spectacularly fail to
| handle.
| vidarh wrote:
| It really depends on your need. In some countries e.g.
| VAT calculations used to specify rounding requirements
| that were a pain to guarantee with floats. I at one point
| had our CFO at the time breathing down my neck while I
| implemented the VAT calculations while clutching a
| printout of the relevant regulations on rounding because
| in _theory_ he could end up a defendant in a court case
| if I got it wrong (in _practice_ not so much, but it
| spooked him enough that it was the one time he paid
| attention to what I was up to). Many tax authorities are
| now more relaxed, as long as your results average out in
| their favour, but there 's a reason for this advice.
| kbolino wrote:
| There are more problems with using floating-point for
| exact monetary quantities than just the inexact
| representations of certain quantities which are exact in
| base 10. For example, integers have all of the following
| advantages over floats:
|
| Integer arithmetic will never return NaN or infinity.
|
| Integer (a*b)*c will always equal a*(b*c).
|
| Integer (a+b)%n will always equal (a%n+b%n)%n, i.e. low-
| order bits are always preserved.
|
| IEEE 754 is not bad and shouldn't be feared, but it is
| not a universal solution to every problem.
|
| It's also not hard to multiply by fractions in fixed-
| point. You do a widening multiplication by the numerator
| followed by a narrowing division by the denominator. For
| percentages and interest rates etc., you can represent
| them using percentage points, basis points, or even
| parts-per-million depending on the precision you need.
| bee_rider wrote:
| Are there cases where float could return a NaN or
| infinity, where you instead prefer the integer result?
| That seems a little odd to me.
| kbolino wrote:
| Integer division by zero will raise an exception in most
| modern languages.
|
| Integer overflow is more problematic. While some
| languages in some situations will raise exceptions, most
| don't. While it's easier to detect overflow that has
| already occurred with floats (though you'll usually have
| lost low-order bits long before you get infinity), it's
| easier to avoid overflow in the first place with
| integers.
| estebarb wrote:
| Most people would love their bank accounts to underflow.
| Maxatar wrote:
| >Integer arithmetic will never return NaN or infinity.
|
| I use C++ and what integer arithmetic will do in
| situations where floating point returns NaN is undefined
| behavior.
|
| I prefer the NaN over undefined behavior.
|
| >Integer (a _b)_ c will always equal a _(b_ c).
|
| In every situation where an integer will do that, a
| floating point will do that as well. Floating point
| numbers behave like integers for integer values, the only
| question is what do you do for non-integer values. My
| argument is that in many if not most cases you can apply
| the same solution you would have applied using integers
| to floating points and get an even more robust, flexible,
| and still high performance solution.
|
| >For percentages and interest rates etc., you can
| represent them using percentage points, basis points, or
| even parts-per-million depending on the precision you
| need.
|
| And this is precisely when people end up reimplementing
| their own ad-hoc floating point representation. You end
| up deciding and hardcoding what degree of precision you
| need to use depending on assumptions you make beforehand
| and having to switch between different fixed point
| representations and it just ends up being a matter of
| time before someone somewhere makes a mistake and mixes
| two close fixed point representations and ends up causing
| headaches.
|
| With floating point values, I do hardcode a degree of
| precision I want to guarantee, which in my case is 6
| decimal places, but in certain circumstances I might
| perform operations or work with data that needs more than
| 6 decimal places and using floating point values will
| still accommodate that to a very high degree whereas the
| fixed arithmetic solution will begin to fail
| catastrophically.
| fluoridation wrote:
| >I use C++ and what integer arithmetic will do in
| situations where floating point returns NaN is undefined
| behavior. I prefer the NaN over undefined behavior.
|
| Really? IME it's much more difficult to debug where a NaN
| value came from, since it's irreversible and infectious.
| And although the standard defines which integer
| operations should have undefined behavior, _usually_ the
| compiler just generates code that behaves reasonably.
| Like, you can take INT_MAX and then increment and
| decrement it and get INT_MAX back.
|
| (That does mean that you're left with a broken program
| that works by accident, but hey, the program works.)
| kbolino wrote:
| C++ is no excuse; it has value types and operator
| overloading. You can write your own types and define your
| own behavior, or use those already provided by others.
| Even if you insist on using raw ints (or just want a
| safety net), there's compiler flags to define that
| undefined behavior.
|
| Putting everything into floats as integers defeats the
| purpose of using floats. Obviously you will want some
| fractions at some point and then you will have to deal
| with that issue, and the denominator of those fractions
| being a power of 2 and not a power of 10. Approximation
| is good enough for some things, but not others. Accounts
| and ledgers are definitely in the latter category, even
| if lots of other financial math isn't.
|
| You need always be mindful of your operating precision
| and scale. Even double-precision floats have finite
| precision, though this won't be a huge issue until you've
| compounded the results of many operations. If you use
| fixed-point and have different denominators all over the
| place, then it's probably time to break out rational
| numbers or use the type system to your advantage. You
| will know the precision and scale of types called
| BasisPoints or PartsPerMillion or Fixed6 because it's in
| the name and is automatically handled as part of the
| operations between types.
| bobmcnamara wrote:
| > if your requirement is to be able to represent values
| exactly to the penny, then you can simply do the same
| thing but using a floating point to represent cents and
| represent $1.10 as the floating point 110.0.
|
| Not if you need to represent more than about 170 kilo
| dollars.
| myst wrote:
| Every half-competent software engineer knows about fixed
| point arithmetic, my friend.
| phkahler wrote:
| >> Every half-competent software engineer...
|
| You meant 8192/16384 right? I like q14.
| EGreg wrote:
| Smart contracts on EVM and other blockchains all use fixed
| point, for the simple reason that all machines have to get
| exactly the same result.
| andrewla wrote:
| I recall playing with FRACTINT, which was a fractal generator
| that existed before floating point coprocessors were common,
| that used fixed point math to calculate and display fractals.
| That was back when fractals were super cool and everyone wanted
| to be in the business of fractals, and all the Nobel Prizes
| were given out to fractal researchers.
| kragen wrote:
| Sure, FRACTINT is called FRACTINT because it uses fixed-point
| ("integer") math. And fixed-point math is still standard in
| Forth; you can do your example in GForth like this:
| : organize; gforth Gforth 0.7.3, Copyright (C)
| 1995-2008 Free Software Foundation, Inc. Gforth comes
| with ABSOLUTELY NO WARRANTY; for details type `license'
| Type `bye' to exit : %* d>s 10 m*/ ; : %. <# # [char]
| . hold #s #> type ; ok 1.6 4.1 %* %. 6.5 ok
|
| Note that the correct answer is 6.56, so the result 6.5 is
| incorrectly rounded. Here's how this works.
|
| (If you're not familiar with Forth, Forth's syntax is that
| words are separated by spaces. "ok" is the prompt, ":" defines
| a subroutine terminated with ";", and you use RPN, passing
| parameters and receiving results on a stack.)
|
| In standard Forth, putting a decimal point in a number makes it
| a double-precision number, occupying two cells on the stack,
| and in most Forths the number of digits after the decimal point
| is stored (until the next number) in the non-standardized
| variable _dpl_ , decimal point location. Here I've just decided
| that all my numbers are going to have one decimal place. This
| means that after a multiplication I need to divide by 10, so I
| define a subroutine called %* to do this operation. (Addition
| and subtraction can use the standard d+ and d- subroutines; I
| didn't implement division, but it would need to pre-multiply
| the dividend by the scale factor 10.)
|
| "%*" is defined in terms of the standard subroutine m*/, which
| multiplies a double-precision number by a single-precision
| number and divides the result by a divisor, and the standard
| subroutine d>s, which converts a double-precision number to a
| single-precision number. (There's probably a better way to do
| %*. I'm no Forth expert.)
|
| I also need to define a way to print out such numbers, so I
| define a subroutine called "%.", using Forth's so-called
| "pictured numeric output", which prints out an unsigned double-
| precision number inserting a decimal point in the right place
| with "hold", after printing out the least significant digit.
| (In PNO we write the format backwards, starting from the least
| significant digit.) The call to "type" types out the formatted
| number from the hold space used by PNO.
|
| Then I invoked %* on 1.6 and 4.1 and %. on its result, and it
| printed out 6.5 before giving me the "ok" prompt.
|
| If you want to adapt this to use two decimal places:
| : %* d>s 100 m*/ ; : %. <# # # [char] . hold #s #> type ;
| redefined %* redefined %. ok 1.60 4.10 %* %. 6.56 ok
|
| Note, however, that a fixed-point multiplication still involves
| a _multiplication_ , requiring potentially many additions, not
| just an addition. The paper, which I haven't read yet, is about
| how to approximate a _floating-point_ multiplication by using
| an addition, presumably because in multiplication you add the
| mantissas, or maybe using a table of logarithms.
|
| Forth's approach to decimal numbers was a clever hack for the
| 01970s and 01980s on sub-MIPS machines with 8-bit and 16-bit
| ALUs, where you didn't want to be invoking 32-bit arithmetic
| casually, and you didn't have floating-point hardware. Probably
| on 32-bit machines it was already the wrong approach (a double-
| precision number on a 32-bit Forth is 64 bits, which is about
| 19 decimal digits) and clearly it is on 64-bit machines, where
| you don't even get out of the first 64-bit word until that many
| digits: 0 1 %. 184467440737095516.16 ok
|
| GForth and other modern standard Forths do support floating-
| point, but for backward compatibility, they treat input with
| decimal points as double-precision integers.
| touisteur wrote:
| Ozaki has been doing fp64 matrix-multiplication using int8
| tensor cores
|
| https://arxiv.org/html/2306.11975v4
|
| Interesting AF.
| ein0p wrote:
| More than 10x the amount of energy is spent moving bytes around.
| Compute efficiency is not as big of an issue as people think.
| It's just that the compute is in the wrong place now - it needs
| to be right next to memory cells, bypassing the memory bus, at
| least in the initial aggregations that go into dot products.
| entropicdrifter wrote:
| This could still be useful for battery constrained devices,
| right?
| ein0p wrote:
| It's even worse in battery constrained devices - they tend to
| also be memory constrained and run with batch size 1 during
| extend. IOW the entire model (or parts thereof, if the model
| is MoE), gets read for every generated token. Utilization of
| compute is truly abysmal in that case and almost all energy
| is spent pushing bytes through the memory bus, which on
| battery powered devices doesn't have high throughput
| presspot wrote:
| From my experience, the absolute magicians in fixed point math
| were the 8-bit and 16-bit video game designers. I was in awe of
| the optimizations they did. They made it possible to calculate 3D
| matrix maths in real time, for example, in order to make the
| first flight simulators and first person shooter games.
| hinkley wrote:
| Redefining degrees to be 2pi = 256 was a pretty clever trick.
| m3kw9 wrote:
| So instead of say 2x3 you go 2+2+2?
___________________________________________________________________
(page generated 2024-10-09 23:00 UTC)