Spatial Intelligence: Dr. Fei-Fei Li emphasizes the importance of spatial intelligence as the next frontier in AI.
World Models: World models predict physical environments' next states, contrasting with language models focused on text.
Data Challenges: Acquiring spatial data is difficult and scarce, presenting a significant barrier to developing spatial AI.
Gradual Rollout: The emergence of spatial AI is likely to be gradual and domain-specific, differing from language AI.
Preparation Urgency: Organizations need to start preparing for spatial AI's impact sooner rather than later to avoid pitfalls.
The AI tools that have reshaped knowledge work over the past three years share a common limitation: they exist entirely in the realm of language.
They read, write, summarize, and generate. What they cannot do is understand the physical world, the geometry of a warehouse floor, the spatial logic of a surgical procedure, or the three-dimensional dynamics of a manufacturing line.
That gap is where Dr. Fei-Fei Li has staked her career.
Speaking at HumanX yesterday, the Stanford computer scientist and co-founder of World Labs made the case that spatial intelligence, the ability of machines to perceive, reason, and act in three-dimensional space, represents the next meaningful frontier in AI development. It is not a replacement for language models, she was careful to note. It is a different category of problem entirely.
Human intelligence is not just linguistic. Think about everything we do in our daily life as well as in our work. Everything involves the 3D world, involves space, involves movements, involves interaction.
For business leaders who have spent the last two years reorganizing workflows around large language models, Li's argument is a useful reset. The productivity gains from AI text generation are real, but they represent a narrow slice of what intelligence, human or machine, actually does.
The harder and more consequential work, physically navigating environments, interpreting spatial data, operating in the world rather than describing it, remains largely unsolved.
What World Models Do
Li draws a clear distinction between language models and what she calls "world models."
Where a language model predicts the next token in a sequence, a world model predicts the next state of a physical environment. A tennis player returning a 120-mph serve is doing something closer to the latter: reading the current state of the ball and body and computing what comes next in milliseconds.
State prediction or state generation is fundamental" to spatial intelligence,
Her company's model, called Marble, generates true three-dimensional worlds, not video or flat images, but persistent 3D environments that can be navigated, modified, and used as training environments for downstream applications.
The immediate applications she cited are instructive for leaders trying to read where this is headed.
Robotics training is one: labs are already using generated 3D environments as synthetic data to train physical robots, reducing dependence on costly and slow real-world data collection. Radiology is another. Diagnosing conditions from imaging data is fundamentally a spatial problem. A lung nodule exists in three dimensions; an AI that only processes two-dimensional images is working with incomplete information.
Self-driving cars are the most visible example already in the market. Li pointed to Tesla and Waymo as companies that have already built functional world models within a specialized domain.
"From that point of view, in this very critical but specialized domain, we already have spatial intelligence," she said.
The Data Problem
Li was direct about the primary constraint on progress. Compute is expensive but available. Model architecture is advancing. Data is the hard part.
Language models benefited from essentially unlimited training material: the accumulated text of the internet, digitized books, transcribed conversations. Spatial data has no equivalent corpus. Three-dimensional representations of physical environments are scarce, expensive to produce, and difficult to standardize.
If you think that 3D world data is lacking," Li said, "robotics data is even more lacking.
This is not an abstract research problem. For any organization planning to integrate physical AI, whether in manufacturing, healthcare, logistics, or facilities management, data infrastructure will be the rate-limiting factor.
The companies that are already capturing spatial data from their physical operations, through sensors, imaging systems, or digital twins, are building an asset they may not fully appreciate yet.
No ChatGPT Moment on the Horizon
One thing Li declined to promise is a watershed moment comparable to the release of ChatGPT, when a single consumer product brought a new class of AI capability into mass awareness overnight.
"Chat is such a ubiquitous consumer behavior," she said, "and when there is such a ubiquitous consumer behavior, you have a watershed moment."
She's skeptical that spatial intelligence will arrive the same way, because it may not have a single, simple consumer behavior to anchor it. There is no obvious equivalent to typing a question into a chat box.
That means the rollout of spatial AI is likely to be domain-specific and gradual rather than sudden and universal. Robotics labs, medical imaging companies, game developers, and VFX studios will encounter it well before it becomes a fixture of mainstream enterprise software.
For CHROs and COOs, that timeline matters. The pressure to act on spatial AI is not the same as the pressure to act on language AI was in early 2023. But the preparation window is also shorter than it appears.
The foundational work, understanding where physical intelligence could change your operations, inventorying your spatial data assets, building literacy in your leadership team, takes longer than adopting a new SaaS product.
Li described the current moment as a convergence: transformer model architectures developed for language are now meeting advances in computer vision and 3D computing that have been building for years.
"For the first time, it gives us the opportunity to really conquer some of the most fundamental problems in AI," she said.
Treating that convergence as someone else's problem to watch will lead many orgs to make the same mistake many made when ChatGPT launched. They assumed they had more runway than they did.
