https://spectrum.ieee.org/fei-fei-li-world-labs [ ] IEEE.orgIEEE Xplore Digital LibraryIEEE StandardsMore Sites Sign InJoin IEEE AI Godmother Fei-Fei Li Has a Vision for Computer Vision Share FOR THE TECHNOLOGY INSIDER Search: [ ] Explore by topic AerospaceArtificial IntelligenceBiomedicalClimate TechComputing Consumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductors TelecommunicationsTransportation IEEE Spectrum FOR THE TECHNOLOGY INSIDER Topics AerospaceArtificial IntelligenceBiomedicalClimate TechComputing Consumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductors TelecommunicationsTransportation Sections FeaturesNewsOpinionCareersDIYEngineering Resources More NewslettersPodcastsSpecial ReportsCollectionsExplainersTop Programming LanguagesRobots Guide /IEEE Job Site / For IEEE Members Current IssueMagazine ArchiveThe InstituteThe Institute Archive For IEEE Members Current IssueMagazine ArchiveThe InstituteThe Institute Archive IEEE Spectrum About UsContact UsReprints & Permissions /Advertising / Follow IEEE Spectrum Support IEEE Spectrum IEEE Spectrum is the flagship publication of the IEEE -- the world's largest professional organization devoted to engineering and applied sciences. Our articles, podcasts, and infographics inform our readers about developments in technology, engineering, and science. Join IEEE Subscribe About IEEEContact & SupportAccessibilityNondiscrimination PolicyTerms IEEE Privacy PolicyCookie PreferencesAd Privacy Options (c) Copyright 2024 IEEE -- All rights reserved. A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. Enjoy more free content and benefits by creating an account Saving articles to read later requires an IEEE Spectrum account The Institute content is only available for members Downloading full PDF issues is exclusive for IEEE Members Downloading this e-book is exclusive for IEEE Members Access to Spectrum 's Digital Edition is exclusive for IEEE Members Following topics is a feature exclusive for IEEE Members Adding your response to an article requires an IEEE Spectrum account Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE . Join the world's largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum's articles, archives, PDF downloads, and other benefits. Learn more about IEEE - Join the world's largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum's articles, archives, PDF downloads, and other benefits. Learn more about IEEE - CREATE AN ACCOUNTSIGN IN JOIN IEEESIGN IN Close Access Thousands of Articles -- Completely Free Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders -- all free! For full access and benefits, join IEEE as a paying member. CREATE AN ACCOUNTSIGN IN AIComputingInterview AI Godmother Fei-Fei Li Has a Vision for Computer Vision Her startup, World Labs, is giving machines 3D spatial intelligence Eliza Strickland 6h 5 min read Eliza Strickland is a senior editor at IEEE Spectrum covering AI and biomedical engineering. Fei-Fei Li wearing a black dress and posing against a concrete wall with arms crossed. AI pioneer Fei-Fei Li says to unlock visual intelligence, we need to respect the fact that "the world is 3D." Andria Lo Stanford University professor Fei-Fei Li has already earned her place in the history of AI. She played a major role in the deep learning revolution by laboring for years to create the ImageNet dataset and competition, which challenged AI systems to recognize objects and animals across 1,000 categories. In 2012, a neural network called AlexNet sent shockwaves through the AI research community when it resoundingly outperformed all other types of models and won the ImageNet contest. From there, neural networks took off, powered by the vast amounts of free training data now available on the Internet and GPUs that deliver unprecedented compute power. In the 13 years since ImageNet, computer vision researchers mastered object recognition and moved on to image and video generation. Li cofounded Stanford's Institute for Human-Centered AI (HAI) and continued to push the boundaries of computer vision. Just this year she launched a startup, World Labs, which generates 3D scenes that users can explore. World Labs is dedicated to giving AI "spatial intelligence," or the ability to generate, reason within, and interact with 3D worlds. Li delivered a keynote yesterday at NeurIPS, the massive AI conference, about her vision for machine vision, and she gave IEEE Spectrum an exclusive interview before her talk. Why did you title your talk "Ascending the Ladder of Visual Intelligence"? Fei-Fei Li: I think it's intuitive that intelligence has different levels of complexity and sophistication. In the talk, I want to deliver the sense that over the past decades, especially the past 10-plus years of the deep learning revolution, the things we have learned to do with visual intelligence are just breathtaking. We are becoming more and more capable with the technology. And I was also inspired by Judea Pearl's "ladder of causality" [in his 2020 book The Book of Why]. The talk also has a subtitle, "From Seeing to Doing." This is something that people don't appreciate enough: that seeing is closely coupled with interaction and doing things, both for animals as well as for AI agents. And this is a departure from language. Language is fundamentally a communication tool that's used to get ideas across. In my mind, these are very complementary, but equally profound, modalities of intelligence. Do you mean that we instinctively respond to certain sights? Li: I'm not just talking about instinct. If you look at the evolution of perception and the evolution of animal intelligence, it's deeply, deeply intertwined. Every time we're able to get more information from the environment, the evolutionary force pushes capability and intelligence forward. If you don't sense the environment, your relationship with the world is very passive; whether you eat or become eaten is a very passive act. But as soon as you are able to take cues from the environment through perception, the evolutionary pressure really heightens, and that drives intelligence forward. Do you think that's how we're creating deeper and deeper machine intelligence? By allowing machines to perceive more of the environment? Li: I don't know if "deep" is the adjective I would use. I think we're creating more capabilities. I think it's becoming more complex, more capable. I think it's absolutely true that tackling the problem of spatial intelligence is a fundamental and critical step towards full-scale intelligence. I've seen the World Labs demos. Why do you want to research spatial intelligence and build these 3D worlds? Li: I think spatial intelligence is where visual intelligence is going. If we are serious about cracking the problem of vision and also connecting it to doing, there's an extremely simple, laid-out-in-the-daylight fact: The world is 3D. We don't live in a flat world. Our physical agents, whether they're robots or devices, will live in the 3D world. Even the virtual world is becoming more and more 3D. If you talk to artists, game developers, designers, architects, doctors, even when they are working in a virtual world, much of this is 3D. If you just take a moment and recognize this simple but profound fact, there is no question that cracking the problem of 3D intelligence is fundamental. I'm curious about how the scenes from World Labs maintain object permanence and compliance with the laws of physics. That feels like an exciting step forward, since video-generation tools like Sora still fumble with such things. Li: Once you respect the 3D-ness of the world, a lot of this is natural. For example, in one of the videos that we posted on social media, basketballs are dropped into a scene. Because it's 3D, it allows you to have that kind of capability. If the scene is just 2D-generated pixels, the basketball will go nowhere. Or, like in Sora, it might go somewhere but then disappear. What are the biggest technical challenges that you're dealing with as you try to push that technology forward? Li: No one has solved this problem, right? It's very, very hard. You can see [in a World Labs demo video] that we have taken a Van Gogh painting and generated the entire scene around it in a consistent style: the artistic style, the lighting, even what kind of buildings that neighborhood would have. If you turn around and it becomes skyscrapers, it would be completely unconvincing, right? And it has to be 3D. You have to navigate into it. So it's not just pixels. Can you say anything about the data you've used to train it? Li: A lot. Do you have technical challenges regarding compute burden? Li: It is a lot of compute. It's the kind of compute that the public sector cannot afford. This is part of the reason I feel excited to take this sabbatical, to do this in the private sector way. And it's also part of the reason I have been advocating for public sector compute access because my own experience underscores the importance of innovation with an adequate amount of resourcing. It would be nice to empower the public sector, since it's usually more motivated by gaining knowledge for its own sake and knowledge for the benefit of humanity. Li: Knowledge discovery needs to be supported by resources, right? In the times of Galileo, it was the best telescope that let the astronomers observe new celestial bodies. It's Hooke who realized that magnifying glasses can become microscopes and discovered cells. Every time there is new technological tooling, it helps knowledge-seeking. And now, in the age of AI, technological tooling involves compute and data. We have to recognize that for the public sector. What would you like to happen on a federal level to provide resources? Li: This has been the work of Stanford HAI for the past five years. We have been working with Congress, the Senate, the White House, industry, and other universities to create NAIRR, the National AI Research Resource. Assuming that we can get AI systems to really understand the 3D world, what does that give us? Li: It will unlock a lot of creativity and productivity for people. I would love to design my house in a much more efficient way. I know that lots of medical usages involve understanding a very particular 3D world, which is the human body. We always talk about a future where humans will create robots to help us, but robots navigate in a 3D world, and they require spatial intelligence as part of their brain. We also talk about virtual worlds that will allow people to visit places or learn concepts or be entertained. And those use 3D technology, especially the hybrids, what we call AR [augmented reality]. I would love to walk through a national park with a pair of glasses that give me information about the trees, the path, the clouds. I would also love to learn different skills through the help of spatial intelligence. What kind of skills? Li: My lame example is if I have a flat tire on the highway, what do I do? Right now, I open a "how to change a tire" video. But if I could put on glasses and see what's going on with my car and then be guided through that process, that would be cool. But that's a lame example. You can think about cooking, you can think about sculpting--fun things. How far do you think we're going to get with this in our lifetime? Li: Oh, I think it's going to happen in our lifetime because the pace of technology progress is really fast. You have seen what the past 10 years have brought. It's definitely an indication of what's coming next. From Your Site Articles * California's "AI Safety" Bill Will Have Global Effects > * Computer Vision Leader Fei-Fei Li on Why AI Needs Diversity > Related Articles Around the Web * Fei-Fei Li - World Labs | LinkedIn > * Fei-Fei Li's Profile | Stanford Profiles > fei-fei licomputer visionmachine visionstartupsgenerative ai Eliza Strickland Eliza Strickland is a senior editor at IEEE Spectrum, where she covers AI, biomedical engineering, and other topics. She holds a master's degree in journalism from Columbia University. The Conversation (0) 2D grayscale and single-color images of plateau-like structures at various resolutions. SemiconductorsNewsComputing Carbon Nanotube Circuits Find Their Place in Chips 5h 3 min read Light grey rectangles and lines set in a dark grey background. The shapes are large at the top but get smaller as you down SemiconductorsNews TSMC Lifts the Curtain on Nanosheet Transistors 5h 3 min read Collage of older black and white photos and video stills showing a man using a computer, a woman talking on the phone, and a room full of small computer screens. History of TechnologyGuest ArticleComputing When IBM Built a War Room for Executives 6h 10 min read Related Stories AISensorsFeature AI Is Driving India's Next Agricultural Revolution RoboticsNewsConsumer Electronics Robot Photographer Takes the Perfect Picture AIExplainer What Are AI Agents?