https://spectrum.ieee.org/deep-learning-computational-cost

[                    ]

IEEE.orgIEEE Xplore Digital LibraryIEEE StandardsMore Sites
Sign InJoin IEEE
 
Deep Learning's Diminishing Returns
Share
FOR THE TECHNOLOGY INSIDER
[                    ]
Explore by topic
AerospaceArtificial IntelligenceBiomedicalComputingConsumer
ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsSensors
TelecommunicationsTransportation
 
FOR THE TECHNOLOGY INSIDER

Topics

AerospaceArtificial IntelligenceBiomedicalComputingConsumer
ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsSensors
TelecommunicationsTransportation

Sections

FeaturesNewsOpinionCareersDIYEngineering Resources

More

Special ReportsExplainersPodcastsVideosNewslettersTop Programming
LanguagesRobots Guide

For IEEE Members

The MagazineThe Institute

For IEEE Members

The MagazineThe Institute

IEEE Spectrum

About UsContact UsReprints & PermissionsAdvertising

Follow IEEE Spectrum

     

Support IEEE Spectrum

IEEE Spectrum is the flagship publication of the IEEE -- the world's
largest professional organization devoted to engineering and applied
sciences. Our articles, podcasts, and infographics inform our readers
about developments in technology, engineering, and science.
Join IEEE
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTerms
IEEE Privacy Policy
(c) Copyright 2021 IEEE -- All rights reserved. A not-for-profit
organization, IEEE is the world's largest technical professional
organization dedicated to advancing technology for the benefit of
humanity.

IEEE websites place cookies on your device to give you the best user
experience. By using our websites, you agree to the placement of
these cookies. To learn more, read our Privacy Policy.

view privacy policy accept & close
 

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Access to Spectrum's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE
Spectrum, including the ability to save articles to read later,
download Spectrum Collections, and participate in conversations with
readers and editors. For more exclusive content and features,
consider Joining IEEE.

Join the world's largest professional organization devoted to
engineering and applied sciences and get access to all of Spectrum's
articles, archives, PDF downloads, and other benefits. Learn more -

CREATE AN ACCOUNTSIGN IN
JOIN IEEESIGN IN
 

Enjoy more free content and benefits by creating an account

Create an account to access more content and features on IEEE
Spectrum, including the ability to save articles to read later,
download Spectrum Collections, and participate in conversations with
readers and editors. For more exclusive content and features,
consider Joining IEEE.

CREATE AN ACCOUNTSIGN IN
Type Feature Special reports

Deep Learning's Diminishing Returns

The cost of improvement is becoming unsustainable

Neil C. Thompson
Kristjan Greenewald
Keeheon Lee
Gabriel F. Manso
3h
10 min read
     
Vertical
A robot arm being pushed down by a very big dollar icon
Eddie Guy
LightGreen

Deep learning is now being used to translate between languages,
predict how proteins fold, analyze medical scans, and play games as
complex as Go, to name just a few applications of a technique that is
now becoming pervasive. Success in those and other realms has brought
this machine-learning technique from obscurity in the early 2000s to
dominance today.

Although deep learning's rise to fame is relatively recent, its
origins are not. In 1958, back when mainframe computers filled rooms
and ran on vacuum tubes, knowledge of the interconnections between
neurons in the brain inspired Frank Rosenblatt at Cornell to design
the first artificial neural network, which he presciently described
as a "pattern-recognizing device." But Rosenblatt's ambitions
outpaced the capabilities of his era--and he knew it. Even his
inaugural paper was forced to acknowledge the voracious appetite of
neural networks for computational power, bemoaning that "as the
number of connections in the network increases...the burden on a
conventional digital computer soon becomes excessive."

---------------------------------------------------------------------

This article is part of our special report on AI, "The Great AI
Reckoning."

Fortunately for such artificial neural networks--later rechristened
"deep learning" when they included extra layers of neurons--decades of
Moore's Law and other improvements in computer hardware yielded a
roughly 10-million-fold increase in the number of computations that a
computer could do in a second. So when researchers returned to deep
learning in the late 2000s, they wielded tools equal to the
challenge.

These more-powerful computers made it possible to construct networks
with vastly more connections and neurons and hence greater ability to
model complex phenomena. Researchers used that ability to break
record after record as they applied deep learning to new tasks.

While deep learning's rise may have been meteoric, its future may be
bumpy. Like Rosenblatt before them, today's deep-learning researchers
are nearing the frontier of what their tools can achieve. To
understand why this will reshape machine learning, you must first
understand why deep learning has been so successful and what it costs
to keep it that way.

Deep learning is a modern incarnation of the long-running trend in
artificial intelligence that has been moving from streamlined systems
based on expert knowledge toward flexible statistical models. Early
AI systems were rule based, applying logic and expert knowledge to
derive results. Later systems incorporated learning to set their
adjustable parameters, but these were usually few in number.

Today's neural networks also learn parameter values, but those
parameters are part of such flexible computer models that--if they are
big enough--they become universal function approximators, meaning they
can fit any type of data. This unlimited flexibility is the reason
why deep learning can be applied to so many different domains.

The flexibility of neural networks comes from taking the many inputs
to the model and having the network combine them in myriad ways. This
means the outputs won't be the result of applying simple formulas but
instead immensely complicated ones.

For example, when the cutting-edge image-recognition system Noisy
Student converts the pixel values of an image into probabilities for
what the object in that image is, it does so using a network with 480
million parameters. The training to ascertain the values of such a
large number of parameters is even more remarkable because it was
done with only 1.2 million labeled images--which may understandably
confuse those of us who remember from high school algebra that we are
supposed to have more equations than unknowns. Breaking that rule
turns out to be the key.

Deep-learning models are overparameterized, which is to say they have
more parameters than there are data points available for training.
Classically, this would lead to overfitting, where the model not only
learns general trends but also the random vagaries of the data it was
trained on. Deep learning avoids this trap by initializing the
parameters randomly and then iteratively adjusting sets of them to
better fit the data using a method called stochastic gradient
descent. Surprisingly, this procedure has been proven to ensure that
the learned model generalizes well.

The success of flexible deep-learning models can be seen in machine
translation. For decades, software has been used to translate text
from one language to another. Early approaches to this problem used
rules designed by grammar experts. But as more textual data became
available in specific languages, statistical approaches--ones that go
by such esoteric names as maximum entropy, hidden Markov models, and
conditional random fields--could be applied.

Initially, the approaches that worked best for each language differed
based on data availability and grammatical properties. For example,
rule-based approaches to translating languages such as Urdu, Arabic,
and Malay outperformed statistical ones--at first. Today, all these
approaches have been outpaced by deep learning, which has proven
itself superior almost everywhere it's applied.

So the good news is that deep learning provides enormous flexibility.
The bad news is that this flexibility comes at an enormous
computational cost. This unfortunate reality has two parts.

A chart with an arrow going down to the right

A chart showing computations, billions of floating-point operations 
Extrapolating the gains of recent years might suggest that by 2025
the error level in the best deep-learning systems designed for
recognizing objects in the ImageNet data set should be reduced to
just 5 percent [top]. But the computing resources and energy required
to train such a future system would be enormous, leading to the
emission of as much carbon dioxide as New York City generates in one
month [bottom]. SOURCE: N.C. THOMPSON, K. GREENEWALD, K. LEE, G.F.
MANSO

The first part is true of all statistical models: To improve
performance by a factor of k, at least k^2 more data points must be
used to train the model. The second part of the computational cost
comes explicitly from overparameterization. Once accounted for, this
yields a total computational cost for improvement of at least k^4.
That little 4 in the exponent is very expensive: A 10-fold
improvement, for example, would require at least a 10,000-fold
increase in computation.

To make the flexibility-computation trade-off more vivid, consider a
scenario where you are trying to predict whether a patient's X-ray
reveals cancer. Suppose further that the true answer can be found if
you measure 100 details in the X-ray (often called variables or
features). The challenge is that we don't know ahead of time which
variables are important, and there could be a very large pool of
candidate variables to consider.

The expert-system approach to this problem would be to have people
who are knowledgeable in radiology and oncology specify the variables
they think are important, allowing the system to examine only those.
The flexible-system approach is to test as many of the variables as
possible and let the system figure out on its own which are
important, requiring more data and incurring much higher
computational costs in the process.

Models for which experts have established the relevant variables are
able to learn quickly what values work best for those variables,
doing so with limited amounts of computation--which is why they were
so popular early on. But their ability to learn stalls if an expert
hasn't correctly specified all the variables that should be included
in the model. In contrast, flexible models like deep learning are
less efficient, taking vastly more computation to match the
performance of expert models. But, with enough computation (and
data), flexible models can outperform ones for which experts have
attempted to specify the relevant variables.

Clearly, you can get improved performance from deep learning if you
use more computing power to build bigger models and train them with
more data. But how expensive will this computational burden become?
Will costs become sufficiently high that they hinder progress?

To answer these questions in a concrete way, we recently gathered
data from more than 1,000 research papers on deep learning, spanning
the areas of image classification, object detection, question
answering, named-entity recognition, and machine translation. Here,
we will only discuss image classification in detail, but the lessons
apply broadly.

Over the years, reducing image-classification errors has come with an
enormous expansion in computational burden. For example, in 2012
AlexNet, the model that first showed the power of training
deep-learning systems on graphics processing units (GPUs), was
trained for five to six days using two GPUs. By 2018, another model,
NASNet-A, had cut the error rate of AlexNet in half, but it used more
than 1,000 times as much computing to achieve this.

Our analysis of this phenomenon also allowed us to compare what's
actually happened with theoretical expectations. Theory tells us that
computing needs to scale with at least the fourth power of the
improvement in performance. In practice, the actual requirements have
scaled with at least the ninth power.

This ninth power means that to halve the error rate, you can expect
to need more than 500 times the computational resources. That's a
devastatingly high price. There may be a silver lining here, however.
The gap between what's happened in practice and what theory predicts
might mean that there are still undiscovered algorithmic improvements
that could greatly improve the efficiency of deep learning.

To halve the error rate, you can expect to need more than 500 times
the computational resources.

As we noted, Moore's Law and other hardware advances have provided
massive increases in chip performance. Does this mean that the
escalation in computing requirements doesn't matter? Unfortunately,
no. Of the 1,000-fold difference in the computing used by AlexNet and
NASNet-A, only a six-fold improvement came from better hardware; the
rest came from using more processors or running them longer,
incurring higher costs.

Having estimated the computational cost-performance curve for image
recognition, we can use it to estimate how much computation would be
needed to reach even more impressive performance benchmarks in the
future. For example, achieving a 5 percent error rate would require
10 ^19 billion floating-point operations.

Important work by scholars at the University of Massachusetts Amherst
allows us to understand the economic cost and carbon emissions
implied by this computational burden. The answers are grim: Training
such a model would cost US $100 billion and would produce as much
carbon emissions as New York City does in a month. And if we estimate
the computational burden of a 1 percent error rate, the results are
considerably worse.

Is extrapolating out so many orders of magnitude a reasonable thing
to do? Yes and no. Certainly, it is important to understand that the
predictions aren't precise, although with such eye-watering results,
they don't need to be to convey the overall message of
unsustainability. Extrapolating this way would be unreasonable if we
assumed that researchers would follow this trajectory all the way to
such an extreme outcome. We don't. Faced with skyrocketing costs,
researchers will either have to come up with more efficient ways to
solve these problems, or they will abandon working on these problems
and progress will languish.

On the other hand, extrapolating our results is not only reasonable
but also important, because it conveys the magnitude of the challenge
ahead. The leading edge of this problem is already becoming apparent.
When Google subsidiary DeepMind trained its system to play Go, it was
estimated to have cost $35 million. When DeepMind's researchers
designed a system to play the StarCraft II video game, they
purposefully didn't try multiple ways of architecting an important
component, because the training cost would have been too high.

At OpenAI, an important machine-learning think tank, researchers
recently designed and trained a much-lauded deep-learning language
system called GPT-3 at the cost of more than $4 million. Even though
they made a mistake when they implemented the system, they didn't fix
it, explaining simply in a supplement to their scholarly publication
that "due to the cost of training, it wasn't feasible to retrain the
model."

Even businesses outside the tech industry are now starting to shy
away from the computational expense of deep learning. A large
European supermarket chain recently abandoned a deep-learning-based
system that markedly improved its ability to predict which products
would be purchased. The company executives dropped that attempt
because they judged that the cost of training and running the system
would be too high.

Faced with rising economic and environmental costs, the deep-learning
community will need to find ways to increase performance without
causing computing demands to go through the roof. If they don't,
progress will stagnate. But don't despair yet: Plenty is being done
to address this challenge.

One strategy is to use processors designed specifically to be
efficient for deep-learning calculations. This approach was widely
used over the last decade, as CPUs gave way to GPUs and, in some
cases, field-programmable gate arrays and application-specific ICs
(including Google's Tensor Processing Unit). Fundamentally, all of
these approaches sacrifice the generality of the computing platform
for the efficiency of increased specialization. But such
specialization faces diminishing returns. So longer-term gains will
require adopting wholly different hardware frameworks--perhaps
hardware that is based on analog, neuromorphic, optical, or quantum
systems. Thus far, however, these wholly different hardware
frameworks have yet to have much impact.

We must either adapt how we do deep learning or face a future of much
slower progress.

Another approach to reducing the computational burden focuses on
generating neural networks that, when implemented, are smaller. This
tactic lowers the cost each time you use them, but it often increases
the training cost (what we've described so far in this article).
Which of these costs matters most depends on the situation. For a
widely used model, running costs are the biggest component of the
total sum invested. For other models--for example, those that
frequently need to be retrained-- training costs may dominate. In
either case, the total cost must be larger than just the training on
its own. So if the training costs are too high, as we've shown, then
the total costs will be, too.

And that's the challenge with the various tactics that have been used
to make implementation smaller: They don't reduce training costs
enough. For example, one allows for training a large network but
penalizes complexity during training. Another involves training a
large network and then "prunes" away unimportant connections. Yet
another finds as efficient an architecture as possible by optimizing
across many models--something called neural-architecture search. While
each of these techniques can offer significant benefits for
implementation, the effects on training are muted--certainly not
enough to address the concerns we see in our data. And in many cases
they make the training costs higher.

One up-and-coming technique that could reduce training costs goes by
the name meta-learning. The idea is that the system learns on a
variety of data and then can be applied in many areas. For example,
rather than building separate systems to recognize dogs in images,
cats in images, and cars in images, a single system could be trained
on all of them and used multiple times.

Unfortunately, recent work by Andrei Barbu of MIT has revealed how
hard meta-learning can be. He and his coauthors showed that even
small differences between the original data and where you want to use
it can severely degrade performance. They demonstrated that current
image-recognition systems depend heavily on things like whether the
object is photographed at a particular angle or in a particular pose.
So even the simple task of recognizing the same objects in different
poses causes the accuracy of the system to be nearly halved.

Benjamin Recht of the University of California, Berkeley, and others
made this point even more starkly, showing that even with novel data
sets purposely constructed to mimic the original training data,
performance drops by more than 10 percent. If even small changes in
data cause large performance drops, the data needed for a
comprehensive meta-learning system might be enormous. So the great
promise of meta-learning remains far from being realized.

Another possible strategy to evade the computational limits of deep
learning would be to move to other, perhaps as-yet-undiscovered or
underappreciated types of machine learning. As we described,
machine-learning systems constructed around the insight of experts
can be much more computationally efficient, but their performance
can't reach the same heights as deep-learning systems if those
experts cannot distinguish all the contributing factors.
Neuro-symbolic methods and other techniques are being developed to
combine the power of expert knowledge and reasoning with the
flexibility often found in neural networks.

Like the situation that Rosenblatt faced at the dawn of neural
networks, deep learning is today becoming constrained by the
available computational tools. Faced with computational scaling that
would be economically and environmentally ruinous, we must either
adapt how we do deep learning or face a future of much slower
progress. Clearly, adaptation is preferable. A clever breakthrough
might find a way to make deep learning more efficient or computer
hardware more powerful, which would allow us to continue to use these
extraordinarily flexible models. If not, the pendulum will likely
swing back toward relying more on experts to identify what needs to
be learned.

Special Report: The Great AI Reckoning

[image]

READ NEXT: How the U.S. Army Is Turning Robots Into Team Players

Or see the full report for more articles on the future of AI.

From Your Site Articles

  * Deep Learning at the Speed of Light - IEEE Spectrum >
  * Facebook AI Director Yann LeCun on His Quest to Unleash Deep ...
    >
  * The Future of Deep Learning Is Photonic - IEEE Spectrum >

Related Articles Around the Web

  * Deep learning | Nature >
  * Deep Learning by deeplearning.ai | Coursera >
  * Deep learning - Wikipedia >

 
Neil C. Thompson

Neil C. Thompson is a research scientist at MIT's Computer Science
and Artificial Intelligence Laboratory.

,
 
Kristjan Greenewald
Kristjan Greenewald is a member of the MIT-IBM Watson AI Lab research
staff.
,
 
Keeheon Lee
Keeheon Lee is assistant professor at Yonsei University, in Seoul.
and
 
Gabriel F. Manso
Gabriel F. Manso is a student at the University of Brasilia.
 
The Conversation (0)
Blue water in the foreground. On land are multiple white buildings
and structures. The back third of the photo shows hundreds of white
and blue cylindrical tanks grouped together.
Type Analysis Topic Energy

Will Fukushima's Water Dump Set a Risky Precedent?

2h
3 min read
Photo of Lisa Su
The Institute Type Profile Topic Careers

AMD's Lisa Su Breaks Through the Silicon Ceiling

4h
5 min read
An orange legged robot and a human team member standing in front of
the entrance to a cave
Type News Topic Robotics

Video Friday: DARPA Subterranean Challenge Final

7h
1 min read
Type News Topic Sensors

Making 3D-Printed Objects Feel

3D-printing technique lets objects sense forces applied onto them for
new interactive applications

Charles Q. Choi

Charles Q. Choi is a science reporter who contributes regularly to
IEEE Spectrum. He has written for Scientific American, The New York
Times, Wired, and Science, among others.

8h
2 min read
A black shiny structure composed of a grid of squares forming a
larger square. The center is a gold circular piece with more squares,
and 4 squares on the top, bottom, left, and right. 4 fingers are
holding it to show the scale being about a finger length.

Researchers from MIT have developed a method to integrate sensing
capabilities into 3D printable structures comprised of repetitive
cells, which enables designers to rapidly prototype interactive input
devices.

MIT
     
3d printing sensing materials metamaterials

Some varieties of 3D-printed objects can now "feel," using a new
technique that builds sensors directly into their materials. This
research could lead to novel interactive devices such as intelligent
furniture, a new study finds.

The new technique 3D-prints objects made from metamaterials
--substances made of grids of repeating cells. When force is applied
to a flexible metamaterial, some of their cells may stretch or
compress. Electrodes incorporated within these structures can detect
the magnitude and direction of these changes in shape, as well as
rotation and acceleration.

In the new study, researchers manufactured objects made of flexible
plastic and electrically conductive filaments. These had cells as
small as 5 millimeters wide.

Each cell had two opposing walls made of conductive filament and
nonconductive plastic, with the conductive walls serving as
electrodes. Forces applied onto the objects change the distance and
overlapping area between the opposing electrodes, generating electric
signals that revealed details about the applied forces. In this
manner, this new technique can "seamlessly and unobtrusively
integrate sensing into the printed objects," says study co-author Jun
Gong, a research scientist at Apple.

The researchers suggest these metamaterials could help designers
quickly create and tweak flexible input devices for a computer. For
instance, they created a music controller using these metamaterials
that was designed to conform to a person's hand. When a user squeezes
one of the flexible buttons, the resulting electric signals help
control a digital synthesizer.

A video clip of two fingers pressing down on a structure composed of
two bendable black squares on each side and two copper bars in the
middle. This flexible input device has been 3D printed in one piece
with copper-colored sensing electrodes integrated into its structure.
MIT

The scientists also fabricated a metamaterial joystick to play a game
of Pac-Man. By understanding how people apply forces onto this
joystick, a designer could prototype unique handle shapes and sizes
for people with limited grip strength in certain directions.

"We can sense movement in any 3D-printed object," says study
co-author Cedric Honnet, an embedded systems engineer at MIT. "From
musical to game interfaces, the potential is really exciting."

The researchers also created 3D editing software, known as MetaSense,
to help users build interactive devices using these metamaterials. It
simulates how 3D-printed objects will deform when different forces
are applied and calculates which cells change the most and are the
best to use for electrodes.

"MetaSense allows designers to 3D print structures with built-in
sensing capability in one go. This allows for super quick prototyping
of devices, such as joysticks, for example, that can be customized
for individuals with different accessibility needs," says study
co-author Olivia Seow, a creative machine learning engineer at MIT.

Embedding hundreds or thousands of sensor cells into an object could
help enable high-resolution, real-time analysis of how users interact
with it, Gong says. For instance, a smart chair made with such
metamaterials could detect a user's body and then switch on the light
or TV, or collect data for later analysis such as detecting and
correcting body posture. These metamaterials may also find use in
wearable applications, Honnet says.

The scientists will detail their findings in October at the
Association for Computing Machinery Symposium on User Interface
Software and Technology.

Keep Reading | Show less
Type News Topic Computing

Benchmark Shows AIs Are Getting Speedier

MLPerf stats show some systems have doubled performance this year,
competing benchmark coming

Samuel K. Moore

Samuel K. Moore is the senior editor at IEEE Spectrum in charge of
semiconductors coverage. An IEEE member, he has a bachelor's degree
in biomedical engineering from Brown University and a master's degree
in journalism from New York University.

9h
4 min read
Benchmark Shows AIs Are Getting Speedier
Qualcomm
     
machine learning software benchmarks natural language processing
machine vision artificial intelligence

This week, AI industry group MLCommons released a new set of results
for AI performance. The new list, MLPerf Version 1.1, follows the
first official set of benchmarks by five months and includes more
than 1800 results from 20 organizations, with 350 measurements of
energy efficiency. The majority of systems improved by between 5-30
percent from earlier this year, with some more than doubling their
previous performance stats, according to MLCommons. The new results
come on the heels of the announcement, last week, of a new
machine-learning benchmark, called TCP-AIx.

In MLPerf's inferencing benchmarks, systems made up of combinations
of CPUs and GPUs or other accelerator chips are tested on up to six
neural networks performing a variety of common functions--image
classification, object detection, speech recognition, 3D medical
imaging, natural language processing, and recommendation. For
commercially available datacenter-based systems they were tested
under two conditions--a simulation of real datacenter activity where
queries arrive in bursts and "offline" activity where all the data is
available at once. Computers meant to work onsite instead of in the
data center--what MLPerf calls the edge--were measured in the offline
state and as if they were receiving a single stream of data, such as
from a security camera.

Although there were datacenter-class submissions from Dell, HPE,
Inspur, Intel, LTech Korea, Lenovo, Nvidia, Neuchips, Qualcomm, and
others, all but those from Qualcomm and Neuchips used Nvidia AI
accelerator chips. Intel used no accelerator chip at all, instead
demonstrating the performance of its CPUs alone. Neuchips only
participated in the recommendation benchmark, as their accelerator,
the RecAccel, is designed specifically to speed up recommender
systems--which are used for recommending e-commerce items and for
ranking search results.

A chart labelled MLPerf Inference 1:1 Diverse data center and edge
use cases and scenarios. The bottom compares data center and edge.
Each has a cylinder labelled ? on the left and a check on the right,
but underneath the ? on Data Center are 9 disordered ?s and Edge are
4 ?s. There are 8 check boxes under data center, and 4 under edge. 
MLPerf tests six common AIs under several conditions.NVIDIA

For the results Nvidia submitted itself, the company used software
improvements alone to eke out as much as a 50 percent performance
improvement over the past year. The systems tested were usually made
up of one or two CPUs along with as many as eight accelerators. On a
per-accelerator basis, systems with Nvidia A100 accelerators showed
about double or more the performance those using the lower-power
Nvidia A30. A30-based computers edged out systems based on Qualcomm's
Cloud AI 100 in four of six tests in the server scenario.

However, Qualcomm senior director of product management John Kehrli 
points out that his company's accelerators were deliberately limited
to a datacenter-friendly 75-watt power envelope per chip, but in the
offline image recognition task they still managed to speed past some
Nvidia A100-based computers with accelerators that had peak thermal
designs of 400 W each.

Nvidia senior product manager for AI inferencing Dave Salvator
pointed to two other outcomes for the company's accelerators: First,
for the first time Nvidia A100 accelerators were paired with
server-class Arm CPUs instead of x86 CPUs. The results were nearly
identical between Arm and x86 systems across all six benchmarks.
"That's an important milestone for Arm," says Salvator. "It's also a
statement about the readiness of our software stack to be able to run
the Arm architecture in a datacenter environment."

Chart labelled Comparing MLPerf 0.7 to MLPerf 1.1 on NVIDIA A100
shows Speedup Over V0.7 submissions from 101% to 150% by topic Nvidia
has made gains in AI using only software improvements.NVIDIA

Separately from the formal MLPerf benchmarks, Nvidia showed off a new
software technique called multi-instance GPU (MiG), which allows a
single GPU to act as if it's seven separate chips from the point of
view of software. When the company ran all six benchmarks
simultaneously plus an extra instance of object detection (just as a
flex, I assume) the results were 95 percent of the single-instance
value.

Nvidia A100-based systems also cleaned up on the edge server
category, where systems are designed for places like stores and
offices. These computers were tested along most of the same six
benchmarks but with the recommender system swapped out for a low-res
version of object detection. But in this category, there was a wider
range of accelerators on offer, including Centaur's AI Integrated
Coprocessor; Qualcomm's AI 100; Edgecortix' DNA-F200 v2, Nvidia's
Jetson Xavier, and FuriosaAI's Warboy.

Purple and white chart labelled Inference power efficiency.
Qualcomm's Cloud AI100 PCIe is labelled as 197.40, well above the
others, which range from 48.22 to 112.03. Qualcomm topped the
efficiency ranking for a machine vision test.Qualcomm

With six tests under two conditions each in two commercial categories
using systems that vary in number of CPUs and accelerators, MLPerf
performance results don't really lend themselves to some kind of
simple ordered list like Top500.org achieves with supercomputing. The
parts that come closest are the efficiency tests, which can be boiled
down to inferences per second per watt for the offline component.
Qualcomm systems were tested for efficiency on object recognition,
object detection, and natural language processing in both the
datacenter and edge categories. In terms of inferences per second per
watt, they beat the Nvidia-backed systems at the machine vision
tests, but not on language processing. Nvidia-accelerated systems
took all the rest of the spots.

In seeming opposition to MLPerf's multidimensional nature, a new
benchmark was introduced last week that aims for a single number. The
Transaction Processing Performance Council says the TCP-Aix benchmark
:

  * Generates and processes large volumes of data
  * Trains preprocessed data to produce realistic machine learning
    models
  * Conducts accurate insights for real-world customer scenarios
    based on the generated models
  * Can scale to large distributed configurations
  * Allows for flexibility in configuration changes to meet the
    demands of the dynamic AI landscape.

The benchmark is meant to capture the complete end-to-end process of
machine learning and AI, explains Hamesh Patel, chair of the TPCx-AI
committee and principal engineer at Intel. That includes parts of the
process that aren't included in MLPerf such as preparing the data and
optimization. "There was no benchmark that emulates an entire data
science pipeline," he says. "Customers have said it can take a week
to prep [the data] and two days to train" a neural network.

Big differences between MLPerf and TPC-Aix include the latter's
dependence on synthetic data--data that resembles real data but is
generated on the fly. MLPerf uses sets of real data for both training
and inference, and MLCommons executive director David Kanter was
skeptical about the value of results from synthetic data.

Membership among MLCommons and TPC has a lot of overlap, so it
remains to be seen which if either of the two benchmarks gains over
the other in credibility. MLPerf certainly has the advantage for the
moment, and computer system makers are already being asked for MLPerf
data as part of requests for proposals, at least two MLPerf
participants report.

Keep Reading | Show less
Semiconductors Whitepaper

Simulation Apps at Work: 4 Use Cases

Specialized simulation apps enable collaboration across the
enterprise and drive innovation

COMSOL
23 Sep 2021
1 min read
 
     
type:whitepaper simulation comsol

Organizations are turning to specialized simulation apps to enable
collaboration between engineers across the enterprise. This white
paper covers the underlying technology for creating and deploying
simulation apps to larger groups of people. Use cases highlight how
apps are being used to benefit product development and drive
innovation.

Trending Stories

The most-read stories on IEEE Spectrum right now
Robotics Type Feature Topic Special reports Magazine

How the U.S. Army Is Turning Robots Into Team Players

23 Sep 2021
11 min read
Robot with threads near a fallen branch
Type News Topic Biomedical Magazine

COVID Breathalyzers Could Transform Rapid Testing

23 Sep 2021
4 min read
Image of a women using the Imspex breathspec.
Type News Topic Energy

Graphene Jolts Sodium-Ion Batteries' Capacity

11 Sep 2021
2 min read
Four layers of silver balls and connected lines. In between are
molecule symbols and three large green balls with black plus signs on
them.
Type News Topic Aerospace Magazine

China Aims for a Permanent Moon Base in the 2030s

22 Sep 2021
6 min read
Image of the China national flash with the moon in the background.
Type Opinion Topic Consumer Electronics Magazine

We Need Software Updates Forever

21 Sep 2021
3 min read
A hand holding an electronic device with an updating icon over it.
Robotics Type News Topic

DARPA SubT Finals: Robot Operator Wisdom

23 Sep 2021
15 min read
Human operators and a robot during the DARPA Subterranean Challenge
Finals
Consumer Electronics Type News Topic

Hum to Google to Identify Songs

09 Nov 2020
3 min read
 
Type Opinion Topic Consumer Electronics Magazine

Air Quality: Easy to Measure, Tough to Fix

22 Sep 2021
3 min read
Illustration of a phone with with a sensor on top.