https://spectrum.ieee.org/on-beyond-moores-law-4-new-laws-of-computing

[                    ]

IEEE.orgIEEE Xplore Digital LibraryIEEE StandardsMore Sites
Sign InJoin IEEE
 
Moore's Not Enough:  4 New Laws of Computing
Share
FOR THE TECHNOLOGY INSIDER
[                    ]
Explore by topic
AerospaceArtificial IntelligenceBiomedicalComputingConsumer
ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsSensors
TelecommunicationsTransportation
IEEE Spectrum
FOR THE TECHNOLOGY INSIDER

Topics

AerospaceArtificial IntelligenceBiomedicalComputingConsumer
ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsSensors
TelecommunicationsTransportation

Sections

FeaturesNewsOpinionCareersDIYEngineering Resources

More

Special ReportsExplainersPodcastsVideosNewslettersTop Programming
LanguagesRobots Guide

For IEEE Members

Current IssueMagazine ArchiveThe Institute

For IEEE Members

Current IssueMagazine ArchiveThe Institute

IEEE Spectrum

About UsContact UsReprints & PermissionsAdvertising

Follow IEEE Spectrum

     

Support IEEE Spectrum

IEEE Spectrum is the flagship publication of the IEEE -- the world's
largest professional organization devoted to engineering and applied
sciences. Our articles, podcasts, and infographics inform our readers
about developments in technology, engineering, and science.
Join IEEE
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTerms
IEEE Privacy Policy
(c) Copyright 2022 IEEE -- All rights reserved. A not-for-profit
organization, IEEE is the world's largest technical professional
organization dedicated to advancing technology for the benefit of
humanity.

IEEE websites place cookies on your device to give you the best user
experience. By using our websites, you agree to the placement of
these cookies. To learn more, read our Privacy Policy.

view privacy policy accept & close
Close

Stay ahead of the latest technology trends. Become an IEEE member.

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Access to Spectrum's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE
Spectrum, including the ability to save articles to read later,
download Spectrum Collections, and participate in conversations with
readers and editors. For more exclusive content and features,
consider Joining IEEE.

This article is for IEEE members only. Join the world's largest
professional organization devoted to engineering and applied sciences
and get access to all of Spectrum's articles, podcasts, and special
reports. Learn more -

Join the world's largest professional organization devoted to
engineering and applied sciences and get access to all of Spectrum's
articles, archives, PDF downloads, and other benefits. Learn more -

CREATE AN ACCOUNTSIGN IN
JOIN IEEESIGN IN
Close

Enjoy more free content and benefits by creating an account

Create an account to access more content and features on IEEE
Spectrum, including the ability to save articles to read later,
download Spectrum Collections, and participate in conversations with
readers and editors. For more exclusive content and features,
consider Joining IEEE.

CREATE AN ACCOUNTSIGN IN
Computing Topic Type Analysis

Moore's Not Enough:  4 New Laws of Computing

Moore's and Metcalfe's conjectures are taught in classrooms every
day--these four deserve consideration, too

Adenekan Dedeke
04 Feb 2022
8 min read
Laptop and book low poly vector illustration stock illustration
iStock Photo
     
Moore's Law Metcalfe's Law computing

I teach technology and information-systems courses at Northeastern
University, in Boston. The two most popular laws that we teach
there--and, one presumes, in most other academic departments that
offer these subjects--are Moore's Law and Metcalfe's Law. Moore's Law,
as everyone by now knows, predicts that the number of transistors on
a chip will double every two years. One of the practical values of
Intel cofounder Gordon Moore's legendary law is that it enables
managers and professionals to determine how long they should keep
their computers. It also helps software developers to anticipate,
broadly speaking, how much bigger their software releases should be.

Metcalfe's Law is similar to Moore's Law in that it also enables one
to predict the direction of growth for a phenomenon. Based on the
observations and analysis of Robert Metcalfe, co-inventor of the
Ethernet and pioneering innovator in the early days of the Internet,
he postulated that the value of a network would grow proportionately
to the number of its users squared. A limitation of this law is that
a network's value is difficult to quantify. Furthermore, it is
unclear that the growth rate of every network value changes
quadratically at the power of two. Nevertheless, this law as well as
Moore's Law remain a centerpiece in both the IT industry and academic
computer-science research. Both provide tremendous power to explain
and predict behaviors of some seemingly incomprehensible systems and
phenomena in the sometimes inscrutable information-technology world.

King Camp Gillette reduced the price of the razors, and the demand
for razor blades increased. The history of IT contains numerous
examples of this phenomenon, too.

I contend, moreover, that there are still other regularities in the
field of computing that could also be formulated in a fashion similar
to that of Moore's and Metcalfe's relationships. I would like to
propose four such laws.

Law 1. Yule's Law of Complementarity

I named this law after George Udny Yule (1912), who was the
statistician who proposed the seminal equation for explaining the
relationship between two attributes. I formulate this law as follows:

    If two attributes or products are complements, the value/demand
    of one of the complements will be inversely related to the price
    of the other complement.

In other words, if the price of one complement is reduced, the demand
for the other will increase. There are a few historical examples of
this law. One of the famous ones is the marketing of razor blades.
The legendary King Camp Gillette gained market domination by applying
this rule. He reduced the price of the razors, and the demand for
razor blades increased. The history of IT contains numerous examples
of this phenomenon, too.

The case of Atari 2600 is one notable example. Atari video games
consisted of the console system hardware and the read-only memory
cartridges that contained a game's software. When the product was
released, Atari Inc. marketed three products, namely the Atari Video
Computer System (VCS) hardware and the two games that it had created,
the arcade shooter game Jet Fighter and Tank, a heavy-artillery
combat title involving, not surprisingly, tanks.

Crucially, Atari engineers decided that they would use a microchip
for the VCS instead of a custom chip. They also made sure that any
programmer hoping to create a new game for the VCS would be able to
access and use all the inner workings of the system's hardware. And
that was exactly what happened. In other words, the designers reduced
the barriers and the cost necessary for other players to develop VCS
game cartridges. More than 200 such games have since been developed
for the VCS--helping to spawn the sprawling US $170 billion global
video game industry today.

A similar law of complementarity exists with computer printers. The
more affordable the price of a printer is kept, the higher the demand
for that printer's ink cartridges. Managing complementary components
well was also crucial to Apple's winning the MP3 player wars of the
early 2000s, with its now-iconic iPod.

From a strategic point of view, technology firms ultimately need to
know which complementary element of their product to sell at a low
price--and which complement to sell at a higher price. And, as the
economist Bharat Anand points out in his celebrated 2016 book The
Content Trap, proprietary complements tend to be more profitable than
nonproprietary ones.

Law 2. Hoff's Law of Scalability

This law is named after Marcian Edward (Ted) Hoff Jr.--the engineer
who convinced the CEO of Intel to apply the law of scalability to the
design and development of processors. Certainly, the phenomenon of
scalability was well known in the automobile industry before it made
a significant impact on the computing industry. Henry Ford was a
notable example of the application of this scalability law. Henry
Ford's company was perhaps the first company to apply this law on a
grand scale. Ford produced the Model T, which was the first
mass-produced car. At the core of Henry Ford's achievement was the
design of an automobile that was made for mass production. Ford's
engineers broke down the assembly process of the Model T into 84
discrete steps. The company standardized all the tasks and assigned
each worker to do just one task, thus standardizing the work each
worker performed as well. Ford further built machines that could
stamp out parts automatically. Together with Ford's innovative
development of the first moving assembly line, this production system
cut the time to build a car from 12 hours to about 1.5 hours. The
Model T is probably the paradigmatic example of how standardization
enables designing processes for scalability.

Until the early 1960s, each IBM system had its own distinct operating
system, processor, peripherals, and application software. After the
purchase of a new IBM computer, customers had to rewrite all their
existing code.

Intel also mastered the law of scalability early in its history. In
1969, Busicom, a Japanese company, approached Intel about building
custom chips for use in its programmable computers. Gordon Moore was
not interested in a custom chip because he knew that it would not be
scalable. It was the quest to create a scalable product that led
Intel's Ted Hoff to partition the chip into a general-purpose logic
processor chip and a separate read-only memory (ROM) chip that stored
an application program. As Albert Yu shows in his history of Intel,
Creating the Digital Future, the fledgling semiconductor company's
general-purpose processor, the 4004, was scalable and pretty much
bequeathed the world the hardware architecture of the modern
computer. And it was Hoff who redesigned the 4004 to scale. Hoff's
Law of Scalability could thus be described as follows:

    The potential for scalability of a technology product is
    inversely proportional to its degree of customization and
    directly proportional to its degree of standardization.

In sum, the law predicts that a technology component or process that
has a high degree of customization and/or a lower degree of
standardization will be a poor candidate for scaling.

Law 3. Evans's Law of Modularity

This law derives its name from Bob Overton Evans. He was the engineer
who in the early 1960s persuaded IBM's chairman, Thomas J. Watson Jr.
, to discontinue IBM's technology design approach, which had produced
a hodgepodge of incompatible computers. Evans advocated that IBM
should instead embark on the development of a family of modular
computers that would share peripheries, instructions, and common
interfaces. IBM's first product family under this new design rubric
was called System/360.

Prior to this era, IBM and other mainframe computer manufacturers
produced systems that were unique. Each system had its own distinct
operating system, processor, peripherals, and application software.
After the purchase of a new IBM computer, customers had to rewrite
all their existing code. Evans convinced CEO Watson that a line of
computers should be designed to share many of the same instructions
and interfaces.

If a paper is copied four times, one can now share the resource with
five people. But digitize the document and the value-creation
opportunities are multiplicative rather than additive.

This new approach of modular design meant that IBM's engineers
developed a common architecture (the specification of which functions
and modules will be part of the system), common interfaces (a
description of how the modules will interact, fit together, connect,
and communicate), and common standards (a definition of shared rules
and methods that would be used to achieve common functions and
tasks). This bold move on Big Blue's part created a new family of
computers that revolutionized the computer industry. Customers could
now protect their investments because the instructions, software, and
peripheries were reusable and compatible within each computer family.

Evans's Law could be formulated as follows:

    The inflexibilities, incompatibilities, and rigidities of complex
    and/or monolithically structured technologies could be simplified
    by the modularization of the technology structures (and
    processes).

This law predicts that the application of modularization will reduce
incompatibilities and complexities.

One further example of Evans's Law can be seen in the software
development industry, as it has shifted from the "waterfall" to the
agile software development methodology. The former is a linear and
sequential model stipulating that each project phase can begin only
the previous phase has ended. (The name comes from the fact that
water flows in only one direction down a waterfall.) By contrast, the
agile development approach applies the law of modularization to
software design and the software development process. Agile software
developments tend to be more flexible, more responsive, and faster.

In other words, modularization of software projects and the
development process makes such endeavors more efficient. As outlined
in a helpful 2016 Harvard Business Review article, the preconditions
for an agile methodology are as follows: The problem to be solved is
complex; the solutions are initially unknown, with product
requirements evolving; the work can be modularized; and close
collaboration with end users is feasible.

Law 4. The Law of Digitiplication

The concept of digitiplication is derived from two concepts:
digitalization and multiplication. The law stems from my own study
and observations of what happens when a resource is digitized or a
process is digitalized.

    The law of digitiplication stipulates that whenever a resource or
    process is digitalized, its potential value grows in a
    multiplicative manner.

For example, if a paper is copied four times, one can now share the
resource with five people. But digitize the document and the
value-creation opportunities are multiplicative rather than additive.

Consider the example of a retail store. The store's sales reps,
tasked with selling physical products to individual people, are able
to service only one customer at a time. However, if the same retail
environment is placed online, many customers can view the store's
products and services. Digital text can also easily be transformed
into an audio format, providing a different kind of value to
customers. Search functionality within the store's inventory of
course adds another layer of value to the customer. The store's
managers can also monitor how many customers are viewing the store's
website's pages and for how long. All of these enhancements to the
customer's (and retailer's) experience provide different kinds of
value. As can be seen by these examples, the digitalization of a
resource, asset, or process creates multiplicative rather than
additive value.

As a further example, Amazon founder Jeff Bezos first began
digitizing data about books as a way to facilitate more and greater
book sales online. Bezos quickly transformed Amazon into a
digitiplication engine by becoming a data-centric e-commerce company.
The company now benefits from the multiplicative effects of
digitalized processes and digitized information. Amazon's search,
selection, and purchase functions also allow the company to record
and produce data that can be leveraged to predict what the customer
wants to buy--and thus select which products it should show to
customers. The digitization of customer feedback, seller ratings, and
seller feedback creates its own dimension of multiplicative value.

Conclusion

These four laws can be useful for engineers and designers to pose
questions as they begin to develop a product. For example: Do
customer requirements lend themselves to a product design that could
be scaled (or mass-produced)? Might the functional requirements
they're working with be satisfied through the development of a
modular product design? Could Yule's Law of Complementarity provide
cues toward mass production or modular design alternatives? Could
product complements be developed in-house or outsourced? Software
engineers might also be led toward productive questions about how
data could be digitized, or how specific processes could be
digitalized to leverage the law of digitiplication.

The fields of IT and electrical engineering and computer science
(EECS) have become critical disciplines of the digital age. To pass
along the most succinct and relevant formulations of accumulated
knowledge to date to the next generation, it's incumbent on academics
and thought leaders in these essential technical fields to translate
lessons learned into more formalized sets of theorems and laws. Such
formulations would, I hope, enable current and future generations of
IT and EECS professionals to develop the most useful, relevant,
impactful, and indeed sometimes even disruptive technologies. I hope
that the proposed four laws in this article could help to trigger a
larger discussion about the need for and relevance of new laws for
our disciplines.

From Your Site Articles

  * What Kind of Thing Is Moore's Law? - IEEE Spectrum >
  * How Will We Go Beyond Moore's Law? Experts Weigh In - IEEE ... >

Related Articles Around the Web

  * Computing Beyond Moore's Law >
  * The future of computing beyond Moore's Law | Philosophical ... >

Moore's Law Metcalfe's Law computing
 
Adenekan Dedeke

Adenekan (Nick) Dedeke is an Executive Professor of Supply Chain and
Information Management at Northeastern University, Boston. His work
has been published in IEEE Software, Computer, IEEE Security &
Privacy, and other academic journals.

 
The Conversation (3)
[defa]
Jeremy Chabot 04 Feb, 2022

I'm not sure the modding community would at all agree with the
formulation of this 'law of scalability'.

Extensible platforms definitely need careful standardization to be
successful but I would argue their entire premise is that they break
this 'law of scalability'.

The most famous example would be Minecraft, however there are many
other heavily modded platforms which are far more customizable than
Minecraft even down at a systems level. For example Warcraft III,
Civilization IV, Starcraft 2 in that order.

    
0 Replies Hide replies
Show More Replies
[defa]
Duncan Walker 05 Feb, 2022
LM

Law 3 most overlaps with Dave Parnas' Information Hiding Principle,
used throughout software development. In System/360, it was used in
the computer architecture and the hardware interfaces, to hide the
wide variation in system implementations.

    
0 Replies Hide replies
Show More Replies
[defa]
Ashok Deobhakta 10 Feb, 2022
SM

Nice learning !

    
0 Replies Hide replies
Show More Replies
A white humanoid robotic torso mounted on a four wheel bogey system
rolls across a simulated lunar surface
Robotics News Type Topic

Video Friday: Lunar Rover

3h
4 min read
Electricity pylons and power lines stand among nuclear towers with
steam rising out of them.
Energy Topic Type Analysis

Is Europe's Nuclear Phaseout Starting to Phase Out?

5h
3 min read
Stacks of purple and blue layers sit on a gray base. In the center is
a section with multiple orange molecule shapes.
Energy Topic News Type

These Superabsorbent Batteries Charge Faster the Larger They Get

10 Feb 2022
2 min read

Related Stories

Topic Magazine Feature Computing Special reports Type

Frontier Supercomputer to Usher in Exascale Computing

 
Topic Magazine Type Computing Opinion

A Quadrillion Mainframes on Your Lap

 
Computing Topic Review Type

10 Gifts For Retrocomputing Fans

 
Computing Topic Type Feature

The Future of Deep Learning Is Photonic

Computing with light could slash the energy needs of neural networks

Ryan Hamerly
29 Jun 2021
10 min read
     
 

This computer rendering depicts the pattern on a photonic chip that
the author and his colleagues have devised for performing
neural-network calculations using light.

Alexander Sludds
DarkBlue1

Think of the many tasks to which computers are being applied that in
the not-so-distant past required human intuition. Computers routinely
identify objects in images, transcribe speech, translate between
languages, diagnose medical conditions, play complex games, and drive
cars.

The technique that has empowered these stunning developments is
called deep learning, a term that refers to mathematical models known
as artificial neural networks. Deep learning is a subfield of machine
learning, a branch of computer science based on fitting complex
models to data.

While machine learning has been around a long time, deep learning has
taken on a life of its own lately. The reason for that has mostly to
do with the increasing amounts of computing power that have become
widely available--along with the burgeoning quantities of data that
can be easily harvested and used to train neural networks.

The amount of computing power at people's fingertips started growing
in leaps and bounds at the turn of the millennium, when graphical
processing units (GPUs) began to be harnessed for nongraphical
calculations, a trend that has become increasingly pervasive over the
past decade. But the computing demands of deep learning have been
rising even faster. This dynamic has spurred engineers to develop
electronic hardware accelerators specifically targeted to deep
learning, Google's Tensor Processing Unit (TPU) being a prime
example.

Here, I will describe a very different approach to this problem--using
optical processors to carry out neural-network calculations with
photons instead of electrons. To understand how optics can serve
here, you need to know a little bit about how computers currently
carry out neural-network calculations. So bear with me as I outline
what goes on under the hood.

Almost invariably, artificial neurons are constructed using special
software running on digital electronic computers of some sort. That
software provides a given neuron with multiple inputs and one output.
The state of each neuron depends on the weighted sum of its inputs,
to which a nonlinear function, called an activation function, is
applied. The result, the output of this neuron, then becomes an input
for various other neurons.

Reducing the energy needs of neural networks might require computing
with light

For computational efficiency, these neurons are grouped into layers,
with neurons connected only to neurons in adjacent layers. The
benefit of arranging things that way, as opposed to allowing
connections between any two neurons, is that it allows certain
mathematical tricks of linear algebra to be used to speed the
calculations.

While they are not the whole story, these linear-algebra calculations
are the most computationally demanding part of deep learning,
particularly as the size of the network grows. This is true for both
training (the process of determining what weights to apply to the
inputs for each neuron) and for inference (when the neural network is
providing the desired results).

What are these mysterious linear-algebra calculations? They aren't so
complicated really. They involve operations on matrices, which are
just rectangular arrays of numbers--spreadsheets if you will, minus
the descriptive column headers you might find in a typical Excel
file.

This is great news because modern computer hardware has been very
well optimized for matrix operations, which were the bread and butter
of high-performance computing long before deep learning became
popular. The relevant matrix calculations for deep learning boil down
to a large number of multiply-and-accumulate operations, whereby
pairs of numbers are multiplied together and their products are added
up.

Multiplying With Light

[origin] Two beams whose electric fields are proportional to the
numbers to be multiplied, x and y, impinge on a beam splitter (blue
square). The beams leaving the beam splitter shine on photodetectors
(ovals), which provide electrical signals proportional to these
electric fields squared. Inverting one photodetector signal and
adding it to the other then results in a signal proportional to the
product of the two inputs. David Schneider

Over the years, deep learning has required an ever-growing number of
these multiply-and-accumulate operations. Consider LeNet, a
pioneering deep neural network, designed to do image classification.
In 1998 it was shown to outperform other machine techniques for
recognizing handwritten letters and numerals. But by 2012 AlexNet, a
neural network that crunched through about 1,600 times as many
multiply-and-accumulate operations as LeNet, was able to recognize
thousands of different types of objects in images.

Advancing from LeNet's initial success to AlexNet required almost 11
doublings of computing performance. During the 14 years that took,
Moore's law provided much of that increase. The challenge has been to
keep this trend going now that Moore's law is running out of steam.
The usual solution is simply to throw more computing resources--along
with time, money, and energy--at the problem.

As a result, training today's large neural networks often has a
significant environmental footprint. One 2019 study found, for
example, that training a certain deep neural network for
natural-language processing produced five times the CO[2] emissions
typically associated with driving an automobile over its lifetime.

Improvements in digital electronic computers allowed deep learning to
blossom, to be sure. But that doesn't mean that the only way to carry
out neural-network calculations is with such machines. Decades ago,
when digital computers were still relatively primitive, some
engineers tackled difficult calculations using analog computers
instead. As digital electronics improved, those analog computers fell
by the wayside. But it may be time to pursue that strategy once
again, in particular when the analog computations can be done
optically.

It has long been known that optical fibers can support much higher
data rates than electrical wires. That's why all long-haul
communication lines went optical, starting in the late 1970s. Since
then, optical data links have replaced copper wires for shorter and
shorter spans, all the way down to rack-to-rack communication in data
centers. Optical data communication is faster and uses less power.
Optical computing promises the same advantages.

But there is a big difference between communicating data and
computing with it. And this is where analog optical approaches hit a
roadblock. Conventional computers are based on transistors, which are
highly nonlinear circuit elements--meaning that their outputs aren't
just proportional to their inputs, at least when used for computing.
Nonlinearity is what lets transistors switch on and off, allowing
them to be fashioned into logic gates. This switching is easy to
accomplish with electronics, for which nonlinearities are a dime a
dozen. But photons follow Maxwell's equations, which are annoyingly
linear, meaning that the output of an optical device is typically
proportional to its inputs.

The trick is to use the linearity of optical devices to do the one
thing that deep learning relies on most: linear algebra.

To illustrate how that can be done, I'll describe here a photonic
device that, when coupled to some simple analog electronics, can
multiply two matrices together. Such multiplication combines the rows
of one matrix with the columns of the other. More precisely, it
multiplies pairs of numbers from these rows and columns and adds
their products together--the multiply-and-accumulate operations I
described earlier. My MIT colleagues and I published a paper about
how this could be done in 2019. We're working now to build such an
optical matrix multiplier.

Optical data communication is faster and uses less power. Optical
computing promises the same advantages.

The basic computing unit in this device is an optical element called
a beam splitter. Although its makeup is in fact more complicated, you
can think of it as a half-silvered mirror set at a 45-degree angle.
If you send a beam of light into it from the side, the beam splitter
will allow half that light to pass straight through it, while the
other half is reflected from the angled mirror, causing it to bounce
off at 90 degrees from the incoming beam.

Now shine a second beam of light, perpendicular to the first, into
this beam splitter so that it impinges on the other side of the
angled mirror. Half of this second beam will similarly be transmitted
and half reflected at 90 degrees. The two output beams will combine
with the two outputs from the first beam. So this beam splitter has
two inputs and two outputs.

To use this device for matrix multiplication, you generate two light
beams with electric-field intensities that are proportional to the
two numbers you want to multiply. Let's call these field intensities
x and y. Shine those two beams into the beam splitter, which will
combine these two beams. This particular beam splitter does that in a
way that will produce two outputs whose electric fields have values
of (x + y)/[?]2 and (x - y)/[?]2.

In addition to the beam splitter, this analog multiplier requires two
simple electronic components--photodetectors--to measure the two output
beams. They don't measure the electric field intensity of those
beams, though. They measure the power of a beam, which is
proportional to the square of its electric-field intensity.

Why is that relation important? To understand that requires some
algebra--but nothing beyond what you learned in high school. Recall
that when you square ( x + y)/[?]2 you get (x^2 + 2xy + y^2)/2. And
when you square (x - y)/[?]2, you get (x^2 - 2xy + y^2)/2. Subtracting
the latter from the former gives 2xy.

Pause now to contemplate the significance of this simple bit of math.
It means that if you encode a number as a beam of light of a certain
intensity and another number as a beam of another intensity, send
them through such a beam splitter, measure the two outputs with
photodetectors, and negate one of the resulting electrical signals
before summing them together, you will have a signal proportional to
the product of your two numbers.

Image of simulations of the Mach-Zehnder interferometer. Simulations
of the integrated Mach-Zehnder interferometer found in Lightmatter's
neural-network accelerator show three different conditions whereby
light traveling in the two branches of the interferometer undergoes
different relative phase shifts (0 degrees in a, 45 degrees in b, and
90 degrees in c). Lightmatter

My description has made it sound as though each of these light beams
must be held steady. In fact, you can briefly pulse the light in the
two input beams and measure the output pulse. Better yet, you can
feed the output signal into a capacitor, which will then accumulate
charge for as long as the pulse lasts. Then you can pulse the inputs
again for the same duration, this time encoding two new numbers to be
multiplied together. Their product adds some more charge to the
capacitor. You can repeat this process as many times as you like,
each time carrying out another multiply-and-accumulate operation.

Using pulsed light in this way allows you to perform many such
operations in rapid-fire sequence. The most energy-intensive part of
all this is reading the voltage on that capacitor, which requires an
analog-to-digital converter. But you don't have to do that after each
pulse--you can wait until the end of a sequence of, say, N pulses.
That means that the device can perform N multiply-and-accumulate
operations using the same amount of energy to read the answer whether
N is small or large. Here, N corresponds to the number of neurons per
layer in your neural network, which can easily number in the
thousands. So this strategy uses very little energy.

Sometimes you can save energy on the input side of things, too.
That's because the same value is often used as an input to multiple
neurons. Rather than that number being converted into light multiple
times--consuming energy each time--it can be transformed just once, and
the light beam that is created can be split into many channels. In
this way, the energy cost of input conversion is amortized over many
operations.

Splitting one beam into many channels requires nothing more
complicated than a lens, but lenses can be tricky to put onto a chip.
So the device we are developing to perform neural-network
calculations optically may well end up being a hybrid that combines
highly integrated photonic chips with separate optical elements.

I've outlined here the strategy my colleagues and I have been
pursuing, but there are other ways to skin an optical cat. Another
promising scheme is based on something called a Mach-Zehnder
interferometer, which combines two beam splitters and two fully
reflecting mirrors. It, too, can be used to carry out matrix
multiplication optically. Two MIT-based startups, Lightmatter and
Lightelligence, are developing optical neural-network accelerators
based on this approach. Lightmatter has already built a prototype
that uses an optical chip it has fabricated. And the company expects
to begin selling an optical accelerator board that uses that chip
later this year.

Another startup using optics for computing is Optalysis, which hopes
to revive a rather old concept. One of the first uses of optical
computing back in the 1960s was for the processing of
synthetic-aperture radar data. A key part of the challenge was to
apply to the measured data a mathematical operation called the
Fourier transform. Digital computers of the time struggled with such
things. Even now, applying the Fourier transform to large amounts of
data can be computationally intensive. But a Fourier transform can be
carried out optically with nothing more complicated than a lens,
which for some years was how engineers processed synthetic-aperture
data. Optalysis hopes to bring this approach up to date and apply it
more widely.

Theoretically, photonics has the potential to accelerate deep
learning by several orders of magnitude.

There is also a company called Luminous, spun out of Princeton
University, which is working to create spiking neural networks based
on something it calls a laser neuron. Spiking neural networks more
closely mimic how biological neural networks work and, like our own
brains, are able to compute using very little energy. Luminous's
hardware is still in the early phase of development, but the promise
of combining two energy-saving approaches--spiking and optics--is quite
exciting.

There are, of course, still many technical challenges to be overcome.
One is to improve the accuracy and dynamic range of the analog
optical calculations, which are nowhere near as good as what can be
achieved with digital electronics. That's because these optical
processors suffer from various sources of noise and because the
digital-to-analog and analog-to-digital converters used to get the
data in and out are of limited accuracy. Indeed, it's difficult to
imagine an optical neural network operating with more than 8 to 10
bits of precision. While 8-bit electronic deep-learning hardware
exists (the Google TPU is a good example), this industry demands
higher precision, especially for neural-network training.

There is also the difficulty integrating optical components onto a
chip. Because those components are tens of micrometers in size, they
can't be packed nearly as tightly as transistors, so the required
chip area adds up quickly. A 2017 demonstration of this approach by
MIT researchers involved a chip that was 1.5 millimeters on a side.
Even the biggest chips are no larger than several square centimeters,
which places limits on the sizes of matrices that can be processed in
parallel this way.

There are many additional questions on the computer-architecture side
that photonics researchers tend to sweep under the rug. What's clear
though is that, at least theoretically, photonics has the potential
to accelerate deep learning by several orders of magnitude.

Based on the technology that's currently available for the various
components (optical modulators, detectors, amplifiers,
analog-to-digital converters), it's reasonable to think that the
energy efficiency of neural-network calculations could be made 1,000
times better than today's electronic processors. Making more
aggressive assumptions about emerging optical technology, that factor
might be as large as a million. And because electronic processors are
power-limited, these improvements in energy efficiency will likely
translate into corresponding improvements in speed.

Many of the concepts in analog optical computing are decades old.
Some even predate silicon computers. Schemes for optical matrix
multiplication, and even for optical neural networks, were first
demonstrated in the 1970s. But this approach didn't catch on. Will
this time be different? Possibly, for three reasons.

First, deep learning is genuinely useful now, not just an academic
curiosity. Second, we can't rely on Moore's Law alone to continue
improving electronics. And finally, we have a new technology that was
not available to earlier generations: integrated photonics. These
factors suggest that optical neural networks will arrive for real
this time--and the future of such computations may indeed be photonic.

From Your Site Articles

  * Deep Learning's Diminishing Returns - IEEE Spectrum >
  * How Deep Learning Works >

Keep Reading | Show less