[HN Gopher] Addition Is All You Need for Energy-Efficient Langua...
       ___________________________________________________________________
        
       Addition Is All You Need for Energy-Efficient Language Models
        
       Author : InvisibleUp
       Score  : 282 points
       Date   : 2024-10-09 04:47 UTC (18 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | md_rumpf wrote:
       | The return of the CPU?!
        
         | anticensor wrote:
         | The reign of Threadripper!
        
       | visarga wrote:
       | > can potentially reduce 95% energy cost by elementwise floating
       | point tensor multiplications and 80% energy cost of dot products
       | 
       | It this were about convolutional nets then optimizing compute
       | would be a much bigger deal. Transformers are lightweight on
       | compute and heavy on memory. The weakest link in the chain is
       | fetching the model weights into the cores. The 95% and 80% energy
       | reductions cited are for the multiplication operations in
       | isolation, not for the entire inference process.
        
         | kendalf89 wrote:
         | Maybe this technique can be used for training then since that
         | is a lot more compute intensive?
        
         | lifthrasiir wrote:
         | I'm also sure that fp8 is small enough that multiplication can
         | really be done in a much simpler circuit than larger fp
         | formats. Even smaller formats like fp4 would be able to just
         | use a lookup table, and that makes them more like sort-of-
         | standardized quantization schemes.
        
           | tankenmate wrote:
           | i suspect that you could do fp8 with log tables and
           | interpolation if you really wanted to (compared to the memory
           | required for the model it's peanuts), it just turns into a
           | LUT (log table look up) and bit shift (interpolation). so
           | again, memory bandwidth is the limiting factor for
           | transformers (as far as energy is concerned).
        
             | lifthrasiir wrote:
             | This time though LUT exists in a circuit, which is much
             | more efficient than typical memory lookup. Such LUT would
             | have to exist per each ALU though, so it can't be too
             | large.
        
           | brilee wrote:
           | fp4/fp8 for neural networks don't work the way you think they
           | do - they are merely compression formats - a set of, say, 256
           | fp32 weights from 1 neuron are lossily turned into 1 max
           | value (stored in fp32 precision) and 256 fp4/fp8 numbers.
           | Those compressed numbers are multiplied by the fp32 number at
           | runtime to restore the original weights and full fp32
           | multiplication + additions are executed.
        
             | lifthrasiir wrote:
             | You are correct that the accumulation (i.e. additions in
             | dot products) has to be done in a higher precision, however
             | the multiplication can still be done via LUT. (Source: I
             | currently work at a hardware-accelerated ML hardware
             | startup.)
        
             | imjonse wrote:
             | With w8a8 quantization the hw (>= hopper) can do the heavy
             | math in fp8 twice as fast as fp16.
        
             | SuchAnonMuchWow wrote:
             | The goal of this type of quantization is to move the
             | multiplication by the fp32 rescale factor outside of the
             | dot-product accumulation.
             | 
             | So the multiplications+additions are done on
             | fp8/int8/int4/whatever (when the hardware support those
             | operators of course) and accumulated in a fp32 or similar,
             | and only the final accumulator is multiplied by the rescale
             | factor in fp32.
        
             | rajnathani wrote:
             | That's how Nvidia's mixed precision training worked with
             | FP32-FP16, but it isn't the case for Bfloat16 on TPUs and
             | maybe (I'm not sure) FP8 training on Nvidia Hopper GPUs.
        
           | bee_rider wrote:
           | What is fp4? 3 bits of exponent and one of mantissa?
        
             | wruza wrote:
             | SEEM (sign, exp, mantissa)
        
               | bee_rider wrote:
               | Interesting... I guess it must be biased, m*2^ee would
               | leave like half of the limited space wasted, so 1.m*2^ee?
               | 
               | I always wonder with these tiny formats if 0 should even
               | be represented...
        
               | wruza wrote:
               | I'm not a binary guy that much, but iirc all floats are
               | 1.m*2^e -- "1." is always there except for subnormals.
               | There's also SEEE FP4 which is basically +-2^([u?]int3).
               | 
               | https://medium.com/@harrietfiagbor/floating-points-and-
               | deep-...
        
         | SuchAnonMuchWow wrote:
         | Its worse than that: the energy gains are when comparing
         | computations made with fp32, but for fp8 the multipliers are
         | really tiny and the adder/shifters represent a largest part of
         | the operators (energy-wise and area-wise) and this paper will
         | only have small gains.
         | 
         | On fp8, the estimated gate count of fp8 multipliers is 296 vs.
         | 157 with their technique, so the power gain on the multipliers
         | will be much lower (50% would be a more reasonable estimation),
         | but again for fp8 the additions in the dot products are a large
         | part of the operations.
         | 
         | Overall, its really disingenuous to claim 80% power gain and
         | small drop in accuracy, when the power gain is only for fp32
         | operations and the small drop in accuracy is only for fp8
         | operators. They don't analyze the accuracy drop in fp32, and
         | don't present the power saved for fp8 dot product.
        
           | bobsyourbuncle wrote:
           | I'm new to neural nets, when should one use fp8 vs fp16 vs
           | fp32?
        
             | reissbaker wrote:
             | Basically no one uses FP32 at inference time. BF16/FP16 is
             | typically considered unquantized, whereas FP8 is lightly
             | quantized. That being said there's pretty minimal quality
             | loss at FP8 compared to 16-bit typically; Llama 3.1 405b,
             | for example, only benchmarks around ~1% worse when run at
             | FP8: https://blog.vllm.ai/2024/07/23/llama31.html
             | 
             | Every major inference provider other than Hyperbolic Labs
             | runs Llama 3.1 405b at FP8, FWIW (e.g. Together, Fireworks,
             | Lepton), so to compare against FP32 is misleading to say
             | the least. Even Hyperbolic runs it at BF16.
             | 
             | Pretraining is typically done in FP32, although some labs
             | (e.g. Character AI, RIP) apparently train in INT8:
             | https://research.character.ai/optimizing-inference/
        
             | ericlewis wrote:
             | Higher the precision the better. Use what works within your
             | memory constraints.
        
               | jasonjmcghee wrote:
               | With serious diminishing returns. At inference time, no
               | reason to use fp64 and should probably use fp8 or less.
               | The accuracy loss is far less than you'd expect. AFAIK
               | Llama 3.2 3B fp4 will outperform Llama 3.2 1B at fp32 in
               | accuracy and speed, despite 8x precision.
        
         | imjonse wrote:
         | That is true for single user/light inference only. For training
         | and batch inference you can get compute bound fast enough.
        
           | saagarjha wrote:
           | That really depends on what you're doing. Trying to feed a
           | tensor core is pretty hard-they're really fast.
        
         | woadwarrior01 wrote:
         | Pre-fill (even in the single batch case) and multi-batch
         | decoding are still compute dominated. The oft repeated trope of
         | "decoder only transformer inference is bottle-necked on memory
         | bandwidth" is only strictly true in the single batch decoding
         | case, because you're mostly doing vector matrix mults when the
         | batch size is one.
        
           | ein0p wrote:
           | Not even single batch. If you want reasonable latency per
           | token (TPOT) even larger batches do not give you high compute
           | utilization during extend. It's only when you don't care
           | about TPOT at all, and your model is small enough to leave
           | space for a large batch on an 8 GPU host, that's when you
           | could get decent utilization. That's extend only - it's easy
           | to get high utilization in prefill.
        
         | mikewarot wrote:
         | Imagine if you had a systolic array large enough that all the
         | weights would only have to be loaded once at startup.
         | Eliminating the memory-compute bottleneck of the von Neumann
         | architecture could make this quite a bit more efficient.
        
         | api wrote:
         | Sounds like the awesome architecture for transformers would be
         | colocation of memory and compute.
        
           | Joker_vD wrote:
           | Yes, that's why we generally run them on GPUs.
        
             | moffkalast wrote:
             | GPUs that pull a kilowatt when running yes. This might
             | actually work on an FPGA if the addition doesn't take too
             | many clock cycles compared to matmuls which were too slow.
        
             | phkahler wrote:
             | That's why we need a row of ALUs in RAM chips. Read a row
             | of DRAM and use it in a vector operation. With the speed of
             | row reading, the ALU could take many cycles per operation
             | to limit area.
        
             | api wrote:
             | GPUs are better but I'm thinking of even tighter coupling,
             | like an integrated architecture.
        
       | cpldcpu wrote:
       | It puzzles me that there does not seem to be a proper derivation
       | and discussion of the error term in the paper. It's all treated
       | indirectly way inference results.
        
         | Lerc wrote:
         | The paper has an odd feel about it to me too. Doing a gate
         | estimation as a text explanation without a diagram makes it too
         | easy to miss some required part. It wouldn't need to be a full
         | gate level explanation but blocks labeled 'adder'.
         | 
         | Seeing the name de Vries in the first paragraph didn't help my
         | sense of confidence either.
        
           | brcmthrowaway wrote:
           | Because of the twisted mentat?
        
             | Lerc wrote:
             | No more because of things like
             | 
             | http://blog.zorinaq.com/bitcoin-electricity-consumption/
             | 
             | It's a long read to go over multiple years worth of posts
             | and comments but gives you a measure of the man.
        
       | CGamesPlay wrote:
       | I believe this reduces the compute required, but still uses 8
       | bits per value, so it does not reduce the memory requirements
       | required to run inference, so it doesn't particularly make the
       | models more accessible for inference. Is this storage method
       | suitable for training? That could potentially be an interesting
       | application.
        
       | scotty79 wrote:
       | All You Need is Considered Harmful.
        
         | TaurenHunter wrote:
         | We will need a paper titled '"Considered Harmful" Articles is
         | All You Need' to complete that cycle.
        
       | js8 wrote:
       | Haven't read it, but isn't this just logarithmic tables in some
       | form?
       | 
       | I am asking not to dismiss it, I genuinely feel I don't
       | understand logarithms on a fundamental level (of logic gates
       | etc.). If multiplication can be replaced with table lookup and
       | addition, then there has to be a circuit that gives you difficult
       | addition and easy multiplication, or any combination of those
       | tradeoffs.
        
         | pclmulqdq wrote:
         | Yes, this is logarithmic number systems at work.
        
       | ranguna wrote:
       | I've seen this claim a few time across the last couple years and
       | I have a pet theory why this isn't explored a lot:
       | 
       | Nvidia funds most research around LLMs, and they also fund other
       | companies that fund other research. If transformers were to use
       | addition and remova all usage of floating point multiplication,
       | there's a good chance the gpu would no longer be needed, or in
       | the least, cheaper ones would be good enough. If that were to
       | happen, no one would need nvidia anymore and their trillion
       | dollar empire would start to crumble.
       | 
       | University labs get free gpus from nvidia -> University labs
       | don't want to do research that would make said gpus obsolete
       | because nvidia won't like that.
       | 
       | If this were to be true, it would mean that we are stuck on an
       | inificient research path due to corporate greed. Imagine if this
       | really was the next best thing, and we just don't explore it more
       | because the ruling corporation doesn't want to lose their market
       | cap.
       | 
       | Hopefully I'm wrong.
        
         | yieldcrv wrote:
         | Alternatively, other people fund LLM research
        
         | chpatrick wrote:
         | It's still a massively parallel problem suited to GPUs, whether
         | it's float or int, or addition or multiplication doesn't really
         | matter.
        
         | teaearlgraycold wrote:
         | NVidia GPUs support integer operations specifically for use
         | with deep learning models.
        
         | londons_explore wrote:
         | If an addition-only LLM performed better, nvidia would probably
         | still be the market leader.
         | 
         | Next gen nvidia chips would have more adders and fewer
         | multipliers.
        
         | cpldcpu wrote:
         | I have to disagree. Nvidia spent a lot of effort on researching
         | improved numerical representations. You can see a summary in
         | this talk:
         | 
         | https://www.youtube.com/watch?v=gofI47kfD28
         | 
         | A lot of their work was published but went by unnoticed. But in
         | fact the majority of their performance increase in new
         | architecture is resulting from this work.
         | 
         | Reading between the lines, it seems that they came to the
         | conclusion that a 4 bit representation with a group exponent
         | ("FP4") is the most efficient representation of weights for
         | inference. Reducing the number of bits in weights has the
         | biggest impact on LLMs inference, since they are mostly memory
         | bound. At these low bit numbers, the impact of using
         | multiplication or other approaches is not really significiant
         | anymore.
         | 
         | (multiplying a 4 bit wight with a larger activation is
         | effectively 4 additions, barely more than what the paper
         | proposes)
        
         | nayroclade wrote:
         | "Good enough" for what? We're in the middle of an AI arms race.
         | Why do you believe people would choose to run the same LLMs on
         | cheaper equipment instead of using the greater efficiency to
         | train and run even larger LLMs?
         | 
         | Given LLM performance seems to scale with their size, this
         | would result in more powerful models, which would grow the
         | applicability, use and importance of AI, which would in turn
         | grow the use and importance of Nvidia's hardware.
         | 
         | So this theory doesn't really stack up for me.
        
         | raincole wrote:
         | > I have a pet theory
         | 
         | You mean you have a conspiracy theory.
         | 
         | Why wouldn't other companies that buy Nvidia GPU fund these
         | researches? It would greatly cut their cost.
        
         | yunohn wrote:
         | Google & Apple already run custom chips, Meta and MS are
         | deploying their own soon too. Your theory is that none of them
         | have researched non-matrix-multiplication solutions before
         | investing billions?
        
           | miohtama wrote:
           | There are several patents on this topic so they have
        
         | twoodfin wrote:
         | I'd estimate that fraction of Nvidia's dominance that's
         | dependent on their distinctive advantages in kernel primitives
         | (add vs. multiply) would be a rounding error in FP8.
         | 
         | The CUDA tooling and ecosystem, VLSI architecture,
         | organizational prowess... all matter at multiple orders of
         | magnitude more.
        
         | iamgopal wrote:
         | no matter how fast cpu, network and browser has become,
         | websites are still slow. we will run out of data to train much
         | earlier than people will stop inventing even larger models.
        
         | WrongAssumption wrote:
         | So let me get this straight. Universities don't want to show
         | that Nvidia gpus are obsolete, so they can receive a steady
         | stream of obsolete gpus? For what possible reason, that doesn't
         | make sense.
        
       | concrete_head wrote:
       | Just too add an alternative addition based architecture into the
       | mix.
       | 
       | https://www.youtube.com/watch?v=VqXwmVpCyL0
        
       | pjc50 wrote:
       | "We recommend training and hosting L-Mul-based models on devices
       | integrated with specialized architectural designs. Patent
       | pending"
       | 
       | (from footnote in method section)
        
       | cpldcpu wrote:
       | Bill Dally from nvidia introduced a log representation that
       | basically allows to replace a multiplication with an add, without
       | loss of accuracy (in contract to proposal above)
       | 
       | https://youtu.be/gofI47kfD28?t=2248
        
       | Buttons840 wrote:
       | Would using this neural network based on integer addition be
       | faster? The paper does not claim it would be faster, so I'm
       | assuming not?
       | 
       | What about over time? If this L-Mul (the matrix operation based
       | on integer addition) operation proved to be much more energy
       | efficient and became popular, would new hardware be created that
       | was faster?
        
       | tantalor wrote:
       | [2023] GradIEEEnt half decent: The hidden power of imprecise
       | lines
       | 
       | http://tom7.org/grad/murphy2023grad.pdf
       | 
       | Also in video form: https://www.youtube.com/watch?v=Ae9EKCyI1xU
        
         | indrora wrote:
         | I had hoped that they would reference this in their paper as
         | some kind of "supporting previous exploration" but no, alas.
        
       | A4ET8a8uTh0 wrote:
       | Uhh.. I hate to be the one to ask this question, but shouldn't we
       | be focused on making LLMs work well first and then focused on
       | desired optimizations? Using everyone's car analogy, it is like
       | making sure early cars are using lower amount of coal. It is a
       | fool's errand.
        
         | Maken wrote:
         | The optimizations described could easily work on other models,
         | not just transformers. Following your analogy, this is
         | optimizing plumbing, pistons and valves on steam engines, it
         | could be useful for whatever follows.
        
         | lukev wrote:
         | Also, making neural networks faster/cheaper is a big part of
         | how they advance.
         | 
         | We've known about neural architectures since the 70s, but we
         | couldn't build them big enough to be actually useful until the
         | advent of the GPU.
         | 
         | Similarly, the LLM breakthrough was because someone decided it
         | was worth spending millions of dollars to train one. Efficiency
         | improvements lower that barrier for all future development (or
         | alternatively, allow us to build even bigger models for the
         | same cost.)
        
         | itishappy wrote:
         | Coal (and even wood!) powered cars actually existed long before
         | Ford, but didn't take off because they were too heavy and
         | unwieldly. The Model T was the result of a century of
         | optimization.
         | 
         | https://en.wikipedia.org/wiki/Nicolas-Joseph_Cugnot
        
         | spencerchubb wrote:
         | Cheaper compute is basically a prerequisite to making better
         | models. You can get some improvements on the margins by making
         | algorithms better with current hardware, but not an order of
         | magnitude improvement.
         | 
         | When there is an order of magnitude improvement in hardware,
         | the AI labs will figure out an algorithm to best take advantage
         | of it.
        
       | shrubble wrote:
       | I remember that many years ago, when floating point computation
       | was expensive for Intel CPUs to do, there were multiple ways that
       | programmers used integer trickery to work around this.
       | 
       | Chuck Moore of Forth fame demonstrated taking the value, say 1.6
       | multiplied by 4.1 and doing all the intermediate calculations via
       | integers (16 * 41) and then formatting the output by putting the
       | decimal point back in the "right place"; this worked as long as
       | the range of floating point values was within a range that
       | multiplying by 10 didn't exceed 65536 (16 bit integers), for
       | instance. For embedded chips where for instance, you have an
       | analog reading with 10 bits precision to quickly compute multiple
       | times per second, this worked well.
       | 
       | I also recall talking many years ago with a Microsoft engineer
       | who had worked with the Microsoft Streets and Trips program
       | (https://archive.org/details/3135521376_qq_CD1 for a screenshot)
       | and that they too had managed to fit what would normally be
       | floating point numbers and the needed calculations into some kind
       | of packed integer format with only the precision that was
       | actually needed, that was faster on the CPUs of the day as well
       | as more easily compressed to fit on the CDROM.
        
         | dwattttt wrote:
         | That particular trick is known as fixed point arithmetic (not
         | to be confused with a fixed point of a function)
        
         | candiddevmike wrote:
         | AFAIK this is still the best way to handle money/financial
         | numbers.
        
           | amanda99 wrote:
           | That's got nothing to do with perf tho.
        
             | Maxatar wrote:
             | Nothing to do with perf is a strong claim. If you genuinely
             | don't care about performance you can use an arbitrary-
             | precision rational number representation.
             | 
             | But performance often matters, so you trade off precision
             | for performance. I think people are wrong to dismiss
             | floating point numbers in favor of fixed point arithmetic,
             | and I've seen plenty of fixed point arithmetic that has
             | failed spectacularly because people think if you use it, it
             | magically solves all your problems...
             | 
             | Whatever approach you take other than going all in with
             | arbitrary precision fractions, you will need to have a good
             | fundamental understanding of your representation and its
             | trade-offs. For me personally I use floating point binary
             | and adjust the decimal point so I can exactly represent any
             | value to 6 decimal places. It's a good trade-off between
             | performance, flexibility, and precision.
             | 
             | It's also what the main Bitcoin implementation uses.
        
               | fluoridation wrote:
               | Huh? Bitcoin uses integers. The maximum supply of BTC in
               | satoshis fits in 64 bits. JS implementations that need to
               | handle BTC amounts use doubles, but only by necessity,
               | since JS doesn't have an integer type. They still use the
               | units to represent satoshis, which works because the
               | maximum supply also fits in 53 bits, so effectively
               | they're also using integers.
               | 
               | Anyone who uses binary floating point operations on
               | monetary values doesn't know what they're doing and is
               | asking for trouble.
        
               | wbl wrote:
               | So if I want to price a barrier in Bermudan rainbow via
               | Monte Carlo I should take the speed hit for a few oddball
               | double rounding problems that are pennies?
        
               | fluoridation wrote:
               | I mean, you do you. People generally don't complain if
               | you're a couple hundred nanoseconds (if that) late. They
               | do complain if your accounts don't add up by a single
               | penny.
        
               | wbl wrote:
               | The quoting of something exotic like this is not well
               | defined to the penny. It's transactions where people
               | really care about pennies.
        
         | asadalt wrote:
         | this is still true for many embedded projects. like pi pico
         | (2040) uses a table.
        
         | dajoh wrote:
         | What you're describing is called fixed point arithmetic, a
         | super cool technique I wish more programmers knew about.
         | 
         | Proper finance related code should use it, but in my experience
         | in that industry it doesn't seem very common unless you're
         | running mainframes.
         | 
         | Funnily enough, I've seen a lot more fixed point arithmetic in
         | software rasterizers than anywhere else. FreeType, GDI, WPF,
         | WARP (D3D11 reference rasterizer) all use it heavily.
        
           | kccqzy wrote:
           | I have worked on firmware that has plenty of fixed point
           | arithmetic. The firmware usually runs on processors without
           | hardware floating point units. For example certain Tesla ECUs
           | use 32-bit integers where they divide it into four bits of
           | integer part and 28 bits of fractional part. So values are
           | scaled by 2^28.
        
             | phkahler wrote:
             | >> The firmware usually runs on processors without hardware
             | floating point units.
             | 
             | I'm working on control code one an ARM cortex-M4f. I wrote
             | it all in fixed point because I don't trust an FPU to be
             | faster, and I also like to have a 32bit accumulator instead
             | of 24bit. I recently converted it all to floating point
             | since we have the M4f part (f indicate FPU), and it's a
             | little slower now. I did get to remove some limit checking
             | since I can rely on the calculations being inside the
             | limits but it's still a little slower than my fixed point
             | implementation.
        
               | sitkack wrote:
               | The other great thing about going fixed point is that it
               | doesn't expose you to device specific floating point
               | bugs, making your embedded code way more portable and
               | easier to test.
               | 
               | 32b float on your embedded device doesn't necessary match
               | your 32b float running on your dev machine.
        
               | bobmcnamara wrote:
               | 32b float can match your desktop. Really just takes a few
               | compiler flags(like avoiding -funsafe-math), setting
               | rounding modes, and not using the 80bit Intel
               | mode(largely disused after 64bit transition).
        
               | gatane wrote:
               | Are there any good benchmarks for float vs fixed point,
               | specially for ARM systems?
        
           | aatd86 wrote:
           | What do they use? Not float I hope. Plus given that some
           | currencies have different precisions... Don't tell me it's
           | rounding errors over trillion monies?! :o)
        
             | fluoridation wrote:
             | The industry standard in finance is decimal floating point.
             | C# for example has 'decimal', with 128 bits of precision.
             | 
             | On occasion I've seen people who didn't know any better use
             | floats. One time I had to fix errors of single satoshis in
             | a customer's database because their developer used 1.0 to
             | represent 1 BTC.
        
             | Maxatar wrote:
             | As I indicate in another post, I work in finance and I use
             | binary floats. So do a lot of others who work in the
             | industry. I sympathize with people who think that IEEE
             | floating points are some weird or error prone
             | representation and that fixed point arithmetic solves every
             | problem, but in my professional experience that isn't true
             | and systems that start by using fixed point arithmetic
             | eventually end up making a half-assed error prone and slow
             | version of floating point arithmetic as soon as they need
             | to handle more sophisticated use cases like handling
             | multiple currencies, doing calculations involving
             | percentages such as interest rates, etc etc...
             | 
             | The IEEE 754 floating point standard is a very well thought
             | out standard that is suitable for representing money as-is.
             | If you have requirements such as
             | compliance/legal/regulatory needs that mandate a minimum
             | precision, then you can either opt to use decimal floating
             | point or use binary floating point where you adjust the
             | decimal place up to whatever legally required precision you
             | are required to handle.
             | 
             | For example the common complaint about binary floating
             | point is that $1.10 can't be represented exactly so you
             | should instead use a fixed integer representation in terms
             | of cents and represent it as 110. But if your requirement
             | is to be able to represent values exactly to the penny,
             | then you can simply do the same thing but using a floating
             | point to represent cents and represent $1.10 as the
             | floating point 110.0. The fixed integer representation
             | conveys almost no benefit over the floating point
             | representation, and once you need to work with and mix
             | currencies that are significantly out of proportion to one
             | another, you begin to really appreciate the nuances and
             | work that went into IEEE 754 for taking into account a
             | great deal of corner cases that a fixed integer
             | representation will absolutely and spectacularly fail to
             | handle.
        
               | vidarh wrote:
               | It really depends on your need. In some countries e.g.
               | VAT calculations used to specify rounding requirements
               | that were a pain to guarantee with floats. I at one point
               | had our CFO at the time breathing down my neck while I
               | implemented the VAT calculations while clutching a
               | printout of the relevant regulations on rounding because
               | in _theory_ he could end up a defendant in a court case
               | if I got it wrong (in _practice_ not so much, but it
               | spooked him enough that it was the one time he paid
               | attention to what I was up to). Many tax authorities are
               | now more relaxed, as long as your results average out in
               | their favour, but there 's a reason for this advice.
        
               | kbolino wrote:
               | There are more problems with using floating-point for
               | exact monetary quantities than just the inexact
               | representations of certain quantities which are exact in
               | base 10. For example, integers have all of the following
               | advantages over floats:
               | 
               | Integer arithmetic will never return NaN or infinity.
               | 
               | Integer (a*b)*c will always equal a*(b*c).
               | 
               | Integer (a+b)%n will always equal (a%n+b%n)%n, i.e. low-
               | order bits are always preserved.
               | 
               | IEEE 754 is not bad and shouldn't be feared, but it is
               | not a universal solution to every problem.
               | 
               | It's also not hard to multiply by fractions in fixed-
               | point. You do a widening multiplication by the numerator
               | followed by a narrowing division by the denominator. For
               | percentages and interest rates etc., you can represent
               | them using percentage points, basis points, or even
               | parts-per-million depending on the precision you need.
        
               | bee_rider wrote:
               | Are there cases where float could return a NaN or
               | infinity, where you instead prefer the integer result?
               | That seems a little odd to me.
        
               | kbolino wrote:
               | Integer division by zero will raise an exception in most
               | modern languages.
               | 
               | Integer overflow is more problematic. While some
               | languages in some situations will raise exceptions, most
               | don't. While it's easier to detect overflow that has
               | already occurred with floats (though you'll usually have
               | lost low-order bits long before you get infinity), it's
               | easier to avoid overflow in the first place with
               | integers.
        
               | estebarb wrote:
               | Most people would love their bank accounts to underflow.
        
               | Maxatar wrote:
               | >Integer arithmetic will never return NaN or infinity.
               | 
               | I use C++ and what integer arithmetic will do in
               | situations where floating point returns NaN is undefined
               | behavior.
               | 
               | I prefer the NaN over undefined behavior.
               | 
               | >Integer (a _b)_ c will always equal a _(b_ c).
               | 
               | In every situation where an integer will do that, a
               | floating point will do that as well. Floating point
               | numbers behave like integers for integer values, the only
               | question is what do you do for non-integer values. My
               | argument is that in many if not most cases you can apply
               | the same solution you would have applied using integers
               | to floating points and get an even more robust, flexible,
               | and still high performance solution.
               | 
               | >For percentages and interest rates etc., you can
               | represent them using percentage points, basis points, or
               | even parts-per-million depending on the precision you
               | need.
               | 
               | And this is precisely when people end up reimplementing
               | their own ad-hoc floating point representation. You end
               | up deciding and hardcoding what degree of precision you
               | need to use depending on assumptions you make beforehand
               | and having to switch between different fixed point
               | representations and it just ends up being a matter of
               | time before someone somewhere makes a mistake and mixes
               | two close fixed point representations and ends up causing
               | headaches.
               | 
               | With floating point values, I do hardcode a degree of
               | precision I want to guarantee, which in my case is 6
               | decimal places, but in certain circumstances I might
               | perform operations or work with data that needs more than
               | 6 decimal places and using floating point values will
               | still accommodate that to a very high degree whereas the
               | fixed arithmetic solution will begin to fail
               | catastrophically.
        
               | fluoridation wrote:
               | >I use C++ and what integer arithmetic will do in
               | situations where floating point returns NaN is undefined
               | behavior. I prefer the NaN over undefined behavior.
               | 
               | Really? IME it's much more difficult to debug where a NaN
               | value came from, since it's irreversible and infectious.
               | And although the standard defines which integer
               | operations should have undefined behavior, _usually_ the
               | compiler just generates code that behaves reasonably.
               | Like, you can take INT_MAX and then increment and
               | decrement it and get INT_MAX back.
               | 
               | (That does mean that you're left with a broken program
               | that works by accident, but hey, the program works.)
        
               | kbolino wrote:
               | C++ is no excuse; it has value types and operator
               | overloading. You can write your own types and define your
               | own behavior, or use those already provided by others.
               | Even if you insist on using raw ints (or just want a
               | safety net), there's compiler flags to define that
               | undefined behavior.
               | 
               | Putting everything into floats as integers defeats the
               | purpose of using floats. Obviously you will want some
               | fractions at some point and then you will have to deal
               | with that issue, and the denominator of those fractions
               | being a power of 2 and not a power of 10. Approximation
               | is good enough for some things, but not others. Accounts
               | and ledgers are definitely in the latter category, even
               | if lots of other financial math isn't.
               | 
               | You need always be mindful of your operating precision
               | and scale. Even double-precision floats have finite
               | precision, though this won't be a huge issue until you've
               | compounded the results of many operations. If you use
               | fixed-point and have different denominators all over the
               | place, then it's probably time to break out rational
               | numbers or use the type system to your advantage. You
               | will know the precision and scale of types called
               | BasisPoints or PartsPerMillion or Fixed6 because it's in
               | the name and is automatically handled as part of the
               | operations between types.
        
               | bobmcnamara wrote:
               | > if your requirement is to be able to represent values
               | exactly to the penny, then you can simply do the same
               | thing but using a floating point to represent cents and
               | represent $1.10 as the floating point 110.0.
               | 
               | Not if you need to represent more than about 170 kilo
               | dollars.
        
           | myst wrote:
           | Every half-competent software engineer knows about fixed
           | point arithmetic, my friend.
        
             | phkahler wrote:
             | >> Every half-competent software engineer...
             | 
             | You meant 8192/16384 right? I like q14.
        
           | EGreg wrote:
           | Smart contracts on EVM and other blockchains all use fixed
           | point, for the simple reason that all machines have to get
           | exactly the same result.
        
         | andrewla wrote:
         | I recall playing with FRACTINT, which was a fractal generator
         | that existed before floating point coprocessors were common,
         | that used fixed point math to calculate and display fractals.
         | That was back when fractals were super cool and everyone wanted
         | to be in the business of fractals, and all the Nobel Prizes
         | were given out to fractal researchers.
        
         | kragen wrote:
         | Sure, FRACTINT is called FRACTINT because it uses fixed-point
         | ("integer") math. And fixed-point math is still standard in
         | Forth; you can do your example in GForth like this:
         | : organize; gforth         Gforth 0.7.3, Copyright (C)
         | 1995-2008 Free Software Foundation, Inc.         Gforth comes
         | with ABSOLUTELY NO WARRANTY; for details type `license'
         | Type `bye' to exit         : %* d>s 10 m*/ ;  : %. <# # [char]
         | . hold #s #> type ;  ok         1.6 4.1 %* %. 6.5 ok
         | 
         | Note that the correct answer is 6.56, so the result 6.5 is
         | incorrectly rounded. Here's how this works.
         | 
         | (If you're not familiar with Forth, Forth's syntax is that
         | words are separated by spaces. "ok" is the prompt, ":" defines
         | a subroutine terminated with ";", and you use RPN, passing
         | parameters and receiving results on a stack.)
         | 
         | In standard Forth, putting a decimal point in a number makes it
         | a double-precision number, occupying two cells on the stack,
         | and in most Forths the number of digits after the decimal point
         | is stored (until the next number) in the non-standardized
         | variable _dpl_ , decimal point location. Here I've just decided
         | that all my numbers are going to have one decimal place. This
         | means that after a multiplication I need to divide by 10, so I
         | define a subroutine called %* to do this operation. (Addition
         | and subtraction can use the standard d+ and d- subroutines; I
         | didn't implement division, but it would need to pre-multiply
         | the dividend by the scale factor 10.)
         | 
         | "%*" is defined in terms of the standard subroutine m*/, which
         | multiplies a double-precision number by a single-precision
         | number and divides the result by a divisor, and the standard
         | subroutine d>s, which converts a double-precision number to a
         | single-precision number. (There's probably a better way to do
         | %*. I'm no Forth expert.)
         | 
         | I also need to define a way to print out such numbers, so I
         | define a subroutine called "%.", using Forth's so-called
         | "pictured numeric output", which prints out an unsigned double-
         | precision number inserting a decimal point in the right place
         | with "hold", after printing out the least significant digit.
         | (In PNO we write the format backwards, starting from the least
         | significant digit.) The call to "type" types out the formatted
         | number from the hold space used by PNO.
         | 
         | Then I invoked %* on 1.6 and 4.1 and %. on its result, and it
         | printed out 6.5 before giving me the "ok" prompt.
         | 
         | If you want to adapt this to use two decimal places:
         | : %* d>s 100 m*/ ;  : %. <# # # [char] . hold #s #> type ;
         | redefined %*  redefined %.   ok         1.60 4.10 %* %. 6.56 ok
         | 
         | Note, however, that a fixed-point multiplication still involves
         | a _multiplication_ , requiring potentially many additions, not
         | just an addition. The paper, which I haven't read yet, is about
         | how to approximate a _floating-point_ multiplication by using
         | an addition, presumably because in multiplication you add the
         | mantissas, or maybe using a table of logarithms.
         | 
         | Forth's approach to decimal numbers was a clever hack for the
         | 01970s and 01980s on sub-MIPS machines with 8-bit and 16-bit
         | ALUs, where you didn't want to be invoking 32-bit arithmetic
         | casually, and you didn't have floating-point hardware. Probably
         | on 32-bit machines it was already the wrong approach (a double-
         | precision number on a 32-bit Forth is 64 bits, which is about
         | 19 decimal digits) and clearly it is on 64-bit machines, where
         | you don't even get out of the first 64-bit word until that many
         | digits:                   0 1 %. 184467440737095516.16 ok
         | 
         | GForth and other modern standard Forths do support floating-
         | point, but for backward compatibility, they treat input with
         | decimal points as double-precision integers.
        
         | touisteur wrote:
         | Ozaki has been doing fp64 matrix-multiplication using int8
         | tensor cores
         | 
         | https://arxiv.org/html/2306.11975v4
         | 
         | Interesting AF.
        
       | ein0p wrote:
       | More than 10x the amount of energy is spent moving bytes around.
       | Compute efficiency is not as big of an issue as people think.
       | It's just that the compute is in the wrong place now - it needs
       | to be right next to memory cells, bypassing the memory bus, at
       | least in the initial aggregations that go into dot products.
        
         | entropicdrifter wrote:
         | This could still be useful for battery constrained devices,
         | right?
        
           | ein0p wrote:
           | It's even worse in battery constrained devices - they tend to
           | also be memory constrained and run with batch size 1 during
           | extend. IOW the entire model (or parts thereof, if the model
           | is MoE), gets read for every generated token. Utilization of
           | compute is truly abysmal in that case and almost all energy
           | is spent pushing bytes through the memory bus, which on
           | battery powered devices doesn't have high throughput
        
       | presspot wrote:
       | From my experience, the absolute magicians in fixed point math
       | were the 8-bit and 16-bit video game designers. I was in awe of
       | the optimizations they did. They made it possible to calculate 3D
       | matrix maths in real time, for example, in order to make the
       | first flight simulators and first person shooter games.
        
         | hinkley wrote:
         | Redefining degrees to be 2pi = 256 was a pretty clever trick.
        
       | m3kw9 wrote:
       | So instead of say 2x3 you go 2+2+2?
        
       ___________________________________________________________________
       (page generated 2024-10-09 23:00 UTC)