Generative AI and robotics bring us closer and closer to the day when we can request an object and have it created within minutes. In fact, researchers at MIT have developed an audio reality system, an AI-driven workflow that allows input into a robotic arm to “bring objects into existence audibly,” creating furniture and more in just five minutes.
A speech recognition system allows a table-mounted robotic arm to receive voice input from a human (such as “I want a simple stool”) and build an object from modular components. So far, researchers have used the system to create decorative items such as stools, shelves, chairs, small tables, and even dog statues.
“We are marrying natural language processing, 3D generative AI, and robotic assembly,” says Alexander Thet Cho, MIT graduate student and Morningside Academy for Design (MAD) fellow. “These are rapidly evolving areas of research that have never before been put together in a way that allows you to actually create physical objects from just simple audio prompts.”
play video
Speech to Reality: On-demand production and individualized robot assembly using 3D generative AI
The idea began when Kyaw, a graduate student in architecture, electrical engineering, and computer science, took Professor Neil Gershenfeld’s course “How to Build Almost Anything.” In that class, he built a system that mirrors audio into reality. He continued working on the project at the MIT Center for Bits and Atoms (CBA), where Gershenfeld directs, in collaboration with Se Hwan Jeon, a graduate student in the Department of Mechanical Engineering, and CBA’s Miana Smith.
Speech reality systems start with speech recognition, which uses large language models to process user requests, followed by 3D generation AI, which creates a digital mesh representation of the object, and voxelization algorithms, which break down the 3D mesh into assembly components.
Geometric processing then modifies the AI-generated assembly to account for manufacturing and physical constraints associated with the real world, such as number of components, overhangs, and geometric connectivity. It then creates an executable assembly sequence and automated path planning for the robotic arm to assemble the physical object from the user’s prompts.
By leveraging natural language, the system makes design and manufacturing accessible to those without expertise in 3D modeling or robot programming. And unlike 3D printing, which can take hours or days, this system is built within minutes.
“This project is an interface between humans, AI and robots to co-create the world around us,” says Kyaw. “Imagine a scenario where you say, ‘I want a chair,’ and within five minutes a physical chair appears in front of you.”
The research team has immediate plans to improve the furniture’s load-bearing capacity by changing the cube’s connection method from magnets to more robust connections.
“We have also developed a pipeline to convert voxel structures into feasible assembly sequences for small distributed mobile robots, which will help translate this work to structures of any size,” says Smith.
The purpose of using modular components is to eliminate the waste that occurs when creating physical objects by disassembling them and reassembling them into something else. For example, you can turn your sofa into a bed when you no longer need it.
Kyaw has experience using gesture recognition and augmented reality to interact with robots in manufacturing processes, so he is currently working on incorporating both voice and gesture control into speech synthesis systems.
Cho shares his vision, recalling memories of the replicators from the Star Trek series and the robots from the animated film Big Hero 6.
“We want to enable people to create physical objects quickly, easily and sustainably,” he says. “I’m working toward a future where we can truly control the very nature of matter. We’re working towards a future where we can generate reality on demand.”
The team presented their paper, “Speech to Reality: On-Demand Manufacturing with Natural Language, 3D Generative AI, and Discrete Robotic Assembly,” at the Association for Computing Machinery (ACM) Symposium on Computer Manufacturing (SCF ’25) held at MIT on November 21.
