https://www.tomshardware.com/tech-industry/artificial-intelligence/ai-engineers-build-new-algorithm-for-ai-processing-replace-complex-floating-point-multiplication-with-integer-addition

Skip to main content
(*) ( )
Open menu Close menu
 Tom's Hardware
[ ] Search
Search Tom's Hardware [                    ]
RSS
     
US Edition
flag of US
flag of UK
UK
flag of US
US
flag of Australia
Australia
flag of Canada
Canada

  *  
  * Best Picks
  * Raspberry Pi
  * CPUs
  * GPUs
  * 3D Printers
  * News
  * Coupons
  * More
      + Newsletter
      + PC Components
      + SSDs
      + Motherboards
      + PC Building
      + Monitors
      + Laptops
      + Desktops
      + Cooling
      + Cases
      + RAM
      + Power Supplies
      + 3D Printers
      + Peripherals
      + Overclocking
      + About Us
      + Reviews

Forums
Trending

  * Intel Arrow Lake
  * AMD EPYC
  * Nvidia Blackwell
  * Intel Lunar Lake
  * AMD Instinct MI355X

 1. Tech Industry
 2. Artificial Intelligence

AI engineers claim new algorithm reduces AI power consumption by 95%
-- replaces complex floating-point multiplication with integer
addition

News
By Jowi Morales
published 17 October 2024

Addition is simpler than multiplication, after all.

  *  
  *  
  *  
  *  
  *  
  *  
  *  

Comments (10)

When you purchase through links on our site, we may earn an affiliate
commission. Here's how it works.

addition sign floating on hand
(Image credit: Shutterstock)

Engineers from BitEnergy AI, a firm specializing in AI inference
technology, has developed a means of artificial intelligence
processing that replaces floating-point multiplication (FPM) with
integer addition. 

The new method, called Linear-Complexity Multiplication (L-Mul),
comes close to the results of FPM while using the simpler algorithm.
But despite that, it's still able to maintain the high accuracy and
precision that FPM is known for. As TechXplore reports, this method
reduces the power consumption of AI systems, potentially up to 95%,
making it a crucial development for our AI future.

Since this is a new process, popular and readily available hardware
on the market, like Nvidia's upcoming Blackwell GPUs, aren't designed
to handle this algorithm. So, even if BitEnergy AI's algorithm is
confirmed to perform at the same level as FPM, we still need systems
that could handle it. This might give a few AI companies pause,
especially after they just invested millions, or even billions, of
dollars in AI hardware. Nevertheless, the massive 95% reduction in
power consumption would probably make the biggest tech companies jump
ship, especially if AI chip makers build application-specific
integrated circuits (ASICs) that will take advantage of the
algorithm.

Power is now the primary constraint on AI development, with all data
center GPUs sold last year alone consuming more power than one
million homes in a year. Even Google put its climate target in the
backseat because of AI's power demands, with its greenhouse gas
emissions increasing by 48% from 2019, instead of declining
year-on-year, as expected. The company's former CEO even suggested
opening the floodgates for power production by dropping climate goals
and using more advanced AI to solve the global warming problem.

But if AI processing can be more power efficient, then it seems that
we can still get advanced AI technologies without sacrificing the
planet. Aside from that, this 95% drop in energy use would also
reduce the burden that these massive data centers put on the national
grid, reducing the need to build more energy plants to power our
future quickly.

While most of us are amazed by the additional power that new AI chips
bring every generation, true advancement only comes when these
processors are more powerful and more efficient. So, if L-Mul works
as advertised, then humanity could have its AI cake and eat it, too.

Stay On the Cutting Edge: Get the Tom's Hardware Newsletter

Get Tom's Hardware's best news and in-depth reviews, straight to your
inbox.

[                    ][ ]Contact me with news and offers from other
Future brands[ ]Receive email from us on behalf of our trusted
partners or sponsors[Sign me up]
By submitting your information you agree to the Terms & Conditions
and Privacy Policy and are aged 16 or over.
Jowi Morales
Jowi Morales
Social Links Navigation
Contributing Writer

Jowi Morales is a tech enthusiast with years of experience working in
the industry. He's been writing with several tech publications since
2021, where he's been interested in tech hardware and consumer
electronics.

More about artificial intelligence
 
shutterstock_2284126663

AI researcher scrapes usable data from a 35-second screen recording
for less than one cent via Google Gemini

 
Microsoft

Microsoft Azure CTO claims distribution of AI training is needed as
AI datacenters approach power grid limits

Latest
 
Intel Pentium 4

Former Intel CPU engineer details how internal x86-64 efforts were
suppressed prior to AMD64's success

See more latest >
See all comments (10)
[ ]
10 Comments Comment from the forums

  * 
    yahrightthere
    Seeing is believing, where's the white paper on this? As I could
    not find it.
    As for the load on the grid, have seen many reports of data
    center's inking deals to get this that & the other nuclear sites
    back up & running on line, as well as adding new nuclear sites &
    including small modular reactors, which would add to the
    infrastructure of the grid & reduce the load.
    It's understood that all this will take time, money & efforts
    from all facets to accomplish this.
    Reply
  * 
    ekio
    If that can apply to ClosedAI, Meta, Google and co, that would be
    a game changer but without proof, no beliefs.
    Reply
  * 
    nitrium
    "potentially up to 95%". I mean that corporate speak, for
    anywhere from 0% to 95%. The "Up to" number is not something
    anyone cares about. What's the average saving for typical AI
    workloads?
    Reply
  * 
    Mama Changa
    They don't say at what level of precision. Is it like fp4, fp16
    etc. Also, have they never heard of fixed point math?
    Reply
  * 
    Li Ken-un
    "Work smarter not harder." 

    The operating cost to feed the power-hungry algorithms should
    convince them if the 95% reduction is true.

    What's the relatively fixed cost of investment into the hardware
    and nuclear power plants compared to the ongoing cost of feeding
    the less efficient algorithms?
    Reply
  * 
    JTWrenn
    Not sure if promising or just a flare for the hope of capital
    investment. The hedged wording and apparently no fully working
    product screams "please invest in us" to me.
    Reply
  * 
    AkroZ
    Here is the paper: https://arxiv.org/html/2410.00907v2
    I have read it, it's interesting but it is listing only the
    advantages and not the downfalls, basically this is a paper to
    ask for investments.
    They demonstrate higher precision than FP8 with theorically less
    costly operations but their implementation is a FP32 meaning that
    it use 4 times more memory and they do not calculate the
    potential energy drain of those memory operations.
    This is not considered for inference but only for the execution
    of models (as memory is the main limiting factor), notably for AI
    processor unit.
    Reply
  * 
    bit_user

        AkroZ said:
        Here is the paper: https://arxiv.org/html/2410.00907v2

    Thanks for this! @yahrightthere take note!


        AkroZ said:
        I have read it, it's interesting but it is listing only the
        advantages and not the downfalls, basically this is a paper
        to ask for investments.

    They do list its limitations.


        AkroZ said:
        They demonstrate higher precision than FP8 with theorically
        less costly operations but their implementation is a FP32
        meaning that it use 4 times more memory and they do not
        calculate the potential energy drain of those memory
        operations.

    They merely prototyped it on existing hardware. Nvidia GPUs, to
    be precise. Nvidia doesn't support general arithmetic on
    lower-precision data types than that.

    From briefly skimming the paper, I think they're actually
    proposing to implement it at 16 bit, but they also work out the
    implementation cost at 8-bit.


        AkroZ said:
        This is not considered for inference but only for the
        execution of models

    "inference" is the term used for what I think you mean by
    "execution of models". Here's what the abstract says:
    "We further show that replacing all floating point
    multiplications with 3-bit mantissa -Mul in a transformer model
    achieves equivalent precision as using float8_e4m3 as
    accumulation precision in both fine-tuning and inference."
    So, they claim that it's applicable to both inference and a
    subset of training work (i.e. fine-tuning).
    Reply
  * 
    bit_user

        Mama Changa said:
        They don't say at what level of precision.

    The paper mostly focuses on comparing it against different fp8
    number formats.


        Mama Changa said:
        Also, have they never heard of fixed point math?

    What good would that do? The problem with fp multiplication is in
    the mantissa, which is actually cheaper than multiplying
    fixed-point, since it's fewer bits.
    Reply
  * 
    ex_bubblehead

        yahrightthere said:
        Seeing is believing, where's the white paper on this? As I
        could not find it.
        As for the load on the grid, have seen many reports of data
        center's inking deals to get this that & the other nuclear
        sites back up & running on line, as well as adding new
        nuclear sites & including small modular reactors, which would
        add to the infrastructure of the grid & reduce the load.
        It's understood that all this will take time, money & efforts
        from all facets to accomplish this.

    Billions of $$ and decades to implement. I'm not holding my
    breath.
    Reply
  * View All 10 Comments

Show more comments

Most Popular

 
[missing-im]
Samsung reportedly delays purchase of fab tools for fab in Texas --
chipmaker allegedly faces challenges with securing clients
 
[missing-im]
Intel Core 200U CPU spotted with Alder Lake silicon -- Core 7 250U
shows identical configuration as the Core 7 150U, Core i7-1355U and
Core i7-1255U
 
[missing-im]
This Spooky Raspberry Pi Halloween eye uses AI to stalk you around
the room
 
[missing-im]
Intel CEO displays Panther Lake sample at Lenovo event
 
[missing-im]
Turn your monitor into a powerful all-in-one computer with the MSI
Cubi NUC 1M mini PC
 
[missing-im]
AMD Ryzen AI 300 MAX series APU support spotted in latest chipset
driver from Asus
 
[missing-im]
Microsoft Azure CTO claims distribution of AI training is needed as
AI datacenters approach power grid limits
 
[missing-im]
University student builds simple raycaster maze demo with
transparency support in Microsoft Excel
 
[missing-im]
Arm wants to sell directly to Chinese customers, sidestep Arm China
 
[missing-im]
Qualcomm abruptly cancels Snapdragon X Elite dev kit -- refunds
customers for mini PC, ends sales and support for the device
immediately
 
[missing-im]
Asus ROG Thor III PSUs come with magnetic OLED displays -- and power
ratings up to 1,600W

Tom's Hardware is part of Future US Inc, an international media group
and leading digital publisher. Visit our corporate site.

  * Terms and conditions
  * Contact Future's experts
  * Privacy policy
  * Cookies policy
  * Accessibility Statement
  * Advertise with us
  * About us
  * Coupons
  * Careers

(c) Future US, Inc. Full 7th Floor, 130 West 42nd Street, New York, NY
10036.

[]