Skip to content

Physical AI is already here. But what is it?

The term “physical AI” is largely attributed to Jensen Huang, the CEO of NVIDIA, to refer to AI’s evolution away from the digital screen into the real world.

A man leaps in the air to send a kick into a humanoid robot while others look on.
Physical AI refers to any AI system that is designed to interact with the environment, typically through specialized sensors. Photo by Matthew Modoono/Northeastern University

You may have seen it: that humanoid robot moonwalking across a stage to Michael Jackson’s “Billie Jean,” executing those deft slides and heel pivots before slipping several times on a set of steps and lying motionless. The video, filmed in Shenzhen, China, has taken social media by storm this week.

The scene may well augur a future in which robots imitate, perform and work alongside humans. That’s because it is an early example of what many experts are calling the next phase of the artificial intelligence boom: physical AI. 

What is Physical AI?

The term “physical AI” is largely attributed to Jensen Huang, the CEO of NVIDIA, to refer to AI’s evolution away from the digital screen into the real world. In a recent company blog post, Huang suggested that the “ChatGPT moment for general robotics is just around the corner.”

Physical AI refers to any AI system that is designed to interact with the environment, typically through specialized sensors, said Yanzhi Wang, professor of electrical and computer engineering.

Examples of physical AI systems extend beyond robotics, and include medical devices, autonomous vehicles, smart manufacturing systems and AI-powered drones. 

What does it mean for AI to interact with its environment? 

Experts describe that interaction in terms of a system’s ability to “perceive, reason and learn” from the environment around it. These AI systems would be able to learn the laws of physics with some degree of autonomy and adaptability. 

Sarah Ostadabbas, associate professor of electrical and computer engineering, said that in addition to “sensing and learning from the world,” physical AI systems should, in theory, also be able to “act independently” based on the information they take in from their surroundings.

Northeastern Global News, in your inbox.

Sign up for NGN’s daily newsletter for news, discovery and analysis from around the world.

But to bridge the gap between the simulated or virtual world and the real world, “You need to reason about what you have seen, or what you have perceived,” she added. “So this reasoning component is really important.”

Ostadabbas explained that the reasoning model relies heavily on text, or an understanding of language. Language-based systems reason from descriptions and patterns rather than through direct interaction with the physical world. “We hope that this reasoning component in these systems eventually is derived from the actual physics of the world,” she said. 

At Northeastern University’s Physical AI Research Initiative, or PAIR, Ostadabbas and her colleagues are attempting to establish a framework that would help guide the development of physical AI systems.  

One emerging template for physical AI systems is the “vision-language-action” model, or VLA, Wang said. A vision-language-action model describes any system that unifies visual perception and language processing to act and make decisions. Early models, such as NVIDIA’s GR00T N1 and Google DeepMind’s RT-1, are designed with the aim of helping robots interpret their surroundings and carry out complex physical tasks. 

What are some of Physical AI’s applications?

Physical AI is already being deployed in some sectors, including manufacturing, Wang said. The most recognizable examples include robotic arms that assemble products on factory lines, and autonomous warehouse robots that help transport inventory, sort packages and perform other rote tasks with minimal human intervention.  

Unlike traditional industrial robots, which are typically programmed to repeat the same fixed motions, physical AI systems are designed so they can adapt to changing conditions, identify objects and independently navigate spaces. 

Wang noted that physical AI systems could transform manufacturing and other industries by allowing machines to operate in less predictable environments, where they can learn and adapt. 

How far does Physical AI still have to go?

Physical AI is still largely theoretical in nature. Ostadabbas said there are many hurdles to actualizing the kind of physical AI systems she and her colleagues are attempting to define and conceptualize. 

One difficulty, she said, is the dynamic and unpredictable nature of the real world. The visual and physical data these systems would rely on is often “unclean” or “dirty,” referring to the ways environments shift or contain obstacles and other unexpected variables. 

Safety is also another concern. Physical AI systems operating around people must avoid causing harm and engender trust, Ostadabbas said, which raises a whole slew of technical and legal questions. 

China’s dancing robot seemed harmless enough. But in other contexts, such systems could pose dangers to human beings. 

“How can we make sure that this action is safe, trustworthy, verifiable and something that is robust?” Ostadabbas said. “That is the final pillar in our framework.”

Wang thinks physical AI designs may soon be implemented at scale, provided innovation and development continue at its current pace.

“I think it will become more mainstream, but it is still a far way to go,” Wang said. “But … based on the current progress of this generation of tools, maybe after two or three years there will be a big breakthrough.”

Tanner Stening is an assistant news editor at Northeastern Global News. Email him at t.stening@northeastern.edu. Follow him on X/Twitter @tstening90.