https://spectrum.ieee.org/fei-fei-li-world-labs

[                    ]

IEEE.orgIEEE Xplore Digital LibraryIEEE StandardsMore Sites
Sign InJoin IEEE
 
AI Godmother Fei-Fei Li Has a Vision for Computer Vision
Share
FOR THE TECHNOLOGY INSIDER
Search: [                    ]
Explore by topic
AerospaceArtificial IntelligenceBiomedicalClimate TechComputing
Consumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductors
TelecommunicationsTransportation
IEEE Spectrum
FOR THE TECHNOLOGY INSIDER

Topics

AerospaceArtificial IntelligenceBiomedicalClimate TechComputing
Consumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductors
TelecommunicationsTransportation

Sections

FeaturesNewsOpinionCareersDIYEngineering Resources

More

NewslettersPodcastsSpecial ReportsCollectionsExplainersTop
Programming LanguagesRobots Guide /IEEE Job Site /

For IEEE Members

Current IssueMagazine ArchiveThe InstituteThe Institute Archive

For IEEE Members

Current IssueMagazine ArchiveThe InstituteThe Institute Archive

IEEE Spectrum

About UsContact UsReprints & Permissions /Advertising /

Follow IEEE Spectrum

        

Support IEEE Spectrum

IEEE Spectrum is the flagship publication of the IEEE -- the world's
largest professional organization devoted to engineering and applied
sciences. Our articles, podcasts, and infographics inform our readers
about developments in technology, engineering, and science.
Join IEEE
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTerms
IEEE Privacy PolicyCookie PreferencesAd Privacy Options
(c) Copyright 2024 IEEE -- All rights reserved. A public charity, IEEE
is the world's largest technical professional organization dedicated
to advancing technology for the benefit of humanity.
 

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE
Spectrum , including the ability to save articles to read later,
download Spectrum Collections, and participate in conversations with
readers and editors. For more exclusive content and features,
consider Joining IEEE .

Join the world's largest professional organization devoted to
engineering and applied sciences and get access to all of Spectrum's
articles, archives, PDF downloads, and other benefits. Learn more
about IEEE -

Join the world's largest professional organization devoted to
engineering and applied sciences and get access to this e-book plus
all of IEEE Spectrum's articles, archives, PDF downloads, and other
benefits. Learn more about IEEE -

CREATE AN ACCOUNTSIGN IN
JOIN IEEESIGN IN
Close

Access Thousands of Articles -- Completely Free

Create an account and get exclusive content and features: Save
articles, download collections, and talk to tech insiders -- all free!
For full access and benefits, join IEEE as a paying member.

CREATE AN ACCOUNTSIGN IN
AIComputingInterview

AI Godmother Fei-Fei Li Has a Vision for Computer Vision

Her startup, World Labs, is giving machines 3D spatial intelligence

Eliza Strickland
6h
5 min read

Eliza Strickland is a senior editor at IEEE Spectrum covering AI and
biomedical engineering.

Fei-Fei Li wearing a black dress and posing against a concrete wall
with arms crossed.

AI pioneer Fei-Fei Li says to unlock visual intelligence, we need to
respect the fact that "the world is 3D."

Andria Lo

Stanford University professor Fei-Fei Li has already earned her place
in the history of AI. She played a major role in the deep learning
revolution by laboring for years to create the ImageNet dataset and
competition, which challenged AI systems to recognize objects and
animals across 1,000 categories. In 2012, a neural network called
AlexNet sent shockwaves through the AI research community when it
resoundingly outperformed all other types of models and won the
ImageNet contest. From there, neural networks took off, powered by
the vast amounts of free training data now available on the Internet
and GPUs that deliver unprecedented compute power.

In the 13 years since ImageNet, computer vision researchers mastered
object recognition and moved on to image and video generation. Li
cofounded Stanford's Institute for Human-Centered AI (HAI) and
continued to push the boundaries of computer vision. Just this year
she launched a startup, World Labs, which generates 3D scenes that
users can explore. World Labs is dedicated to giving AI "spatial
intelligence," or the ability to generate, reason within, and
interact with 3D worlds. Li delivered a keynote yesterday at NeurIPS,
the massive AI conference, about her vision for machine vision, and
she gave IEEE Spectrum an exclusive interview before her talk.

Why did you title your talk "Ascending the Ladder of Visual
Intelligence"?

Fei-Fei Li: I think it's intuitive that intelligence has different
levels of complexity and sophistication. In the talk, I want to
deliver the sense that over the past decades, especially the past
10-plus years of the deep learning revolution, the things we have
learned to do with visual intelligence are just breathtaking. We are
becoming more and more capable with the technology. And I was also
inspired by Judea Pearl's "ladder of causality" [in his 2020 book The
Book of Why].

The talk also has a subtitle, "From Seeing to Doing." This is
something that people don't appreciate enough: that seeing is closely
coupled with interaction and doing things, both for animals as well
as for AI agents. And this is a departure from language. Language is
fundamentally a communication tool that's used to get ideas across.
In my mind, these are very complementary, but equally profound,
modalities of intelligence.

Do you mean that we instinctively respond to certain sights?

Li: I'm not just talking about instinct. If you look at the evolution
of perception and the evolution of animal intelligence, it's deeply,
deeply intertwined. Every time we're able to get more information
from the environment, the evolutionary force pushes capability and
intelligence forward. If you don't sense the environment, your
relationship with the world is very passive; whether you eat or
become eaten is a very passive act. But as soon as you are able to
take cues from the environment through perception, the evolutionary
pressure really heightens, and that drives intelligence forward.

Do you think that's how we're creating deeper and deeper machine
intelligence? By allowing machines to perceive more of the
environment?

Li: I don't know if "deep" is the adjective I would use. I think
we're creating more capabilities. I think it's becoming more complex,
more capable. I think it's absolutely true that tackling the problem
of spatial intelligence is a fundamental and critical step towards
full-scale intelligence.

I've seen the World Labs demos. Why do you want to research spatial
intelligence and build these 3D worlds?

Li: I think spatial intelligence is where visual intelligence is
going. If we are serious about cracking the problem of vision and
also connecting it to doing, there's an extremely simple,
laid-out-in-the-daylight fact: The world is 3D. We don't live in a
flat world. Our physical agents, whether they're robots or devices,
will live in the 3D world. Even the virtual world is becoming more
and more 3D. If you talk to artists, game developers, designers,
architects, doctors, even when they are working in a virtual world,
much of this is 3D. If you just take a moment and recognize this
simple but profound fact, there is no question that cracking the
problem of 3D intelligence is fundamental.

I'm curious about how the scenes from World Labs maintain object
permanence and compliance with the laws of physics. That feels like
an exciting step forward, since video-generation tools like Sora 
still fumble with such things.

Li: Once you respect the 3D-ness of the world, a lot of this is
natural. For example, in one of the videos that we posted on social
media, basketballs are dropped into a scene. Because it's 3D, it
allows you to have that kind of capability. If the scene is just
2D-generated pixels, the basketball will go nowhere.

Or, like in Sora, it might go somewhere but then disappear. What are
the biggest technical challenges that you're dealing with as you try
to push that technology forward?

Li: No one has solved this problem, right? It's very, very hard. You
can see [in a World Labs demo video] that we have taken a Van Gogh
painting and generated the entire scene around it in a consistent
style: the artistic style, the lighting, even what kind of buildings
that neighborhood would have. If you turn around and it becomes
skyscrapers, it would be completely unconvincing, right? And it has
to be 3D. You have to navigate into it. So it's not just pixels.

Can you say anything about the data you've used to train it?

Li: A lot.

Do you have technical challenges regarding compute burden?

Li: It is a lot of compute. It's the kind of compute that the public
sector cannot afford. This is part of the reason I feel excited to
take this sabbatical, to do this in the private sector way. And it's
also part of the reason I have been advocating for public sector
compute access because my own experience underscores the importance
of innovation with an adequate amount of resourcing.

It would be nice to empower the public sector, since it's usually
more motivated by gaining knowledge for its own sake and knowledge
for the benefit of humanity.

Li: Knowledge discovery needs to be supported by resources, right? In
the times of Galileo, it was the best telescope that let the
astronomers observe new celestial bodies. It's Hooke who realized
that magnifying glasses can become microscopes and discovered cells.
Every time there is new technological tooling, it helps
knowledge-seeking. And now, in the age of AI, technological tooling
involves compute and data. We have to recognize that for the public
sector.

What would you like to happen on a federal level to provide
resources?

Li: This has been the work of Stanford HAI for the past five years.
We have been working with Congress, the Senate, the White House,
industry, and other universities to create NAIRR, the National AI
Research Resource.

Assuming that we can get AI systems to really understand the 3D
world, what does that give us?

Li: It will unlock a lot of creativity and productivity for people. I
would love to design my house in a much more efficient way. I know
that lots of medical usages involve understanding a very particular
3D world, which is the human body. We always talk about a future
where humans will create robots to help us, but robots navigate in a
3D world, and they require spatial intelligence as part of their
brain. We also talk about virtual worlds that will allow people to
visit places or learn concepts or be entertained. And those use 3D
technology, especially the hybrids, what we call AR [augmented
reality]. I would love to walk through a national park with a pair of
glasses that give me information about the trees, the path, the
clouds. I would also love to learn different skills through the help
of spatial intelligence.

What kind of skills?

Li: My lame example is if I have a flat tire on the highway, what do
I do? Right now, I open a "how to change a tire" video. But if I
could put on glasses and see what's going on with my car and then be
guided through that process, that would be cool. But that's a lame
example. You can think about cooking, you can think about
sculpting--fun things.

How far do you think we're going to get with this in our lifetime?

Li: Oh, I think it's going to happen in our lifetime because the pace
of technology progress is really fast. You have seen what the past 10
years have brought. It's definitely an indication of what's coming
next.

From Your Site Articles

  * California's "AI Safety" Bill Will Have Global Effects >
  * Computer Vision Leader Fei-Fei Li on Why AI Needs Diversity >

Related Articles Around the Web

  * Fei-Fei Li - World Labs | LinkedIn >
  * Fei-Fei Li's Profile | Stanford Profiles >

fei-fei licomputer visionmachine visionstartupsgenerative ai
Eliza Strickland

Eliza Strickland is a senior editor at IEEE Spectrum, where she
covers AI, biomedical engineering, and other topics. She holds a
master's degree in journalism from Columbia University.

 
The Conversation (0)
2D grayscale and single-color images of plateau-like structures at
various resolutions.
SemiconductorsNewsComputing

Carbon Nanotube Circuits Find Their Place in Chips

5h
3 min read
Light grey rectangles and lines set in a dark grey background. The
shapes are large at the top but get smaller as you down
SemiconductorsNews

TSMC Lifts the Curtain on Nanosheet Transistors

5h
3 min read
Collage of older black and white photos and video stills showing a
man using a computer, a woman talking on the phone, and a room full
of small computer screens.
History of TechnologyGuest ArticleComputing

When IBM Built a War Room for Executives

6h
10 min read

Related Stories

AISensorsFeature

AI Is Driving India's Next Agricultural Revolution

RoboticsNewsConsumer Electronics

Robot Photographer Takes the Perfect Picture

AIExplainer

What Are AI Agents?