While everyone's training robot brains on YouTube videos, DAIMON just released 10,000 hours of what it feels like to actually touch things.

The Summary

  • DAIMON Robotics released Daimon-Infinity, what they call the largest omni-modal robotic dataset for physical AI, featuring high-resolution tactile sensing data across tasks from laundry folding to factory assembly
  • The Hong Kong startup's tactile sensor packs 110,000+ sensing units into a fingertip-sized module, and they're open-sourcing 10,000 hours of touch data to accelerate embodied AI deployment
  • Co-founder Michael Yu Wang pioneered Vision-Tactile-Language-Action (VTLA) architecture, elevating touch to the same status as vision in robot learning—a direct challenge to the Vision-Language-Action models everyone else is using

The Signal

The robot manipulation problem has always been a data problem disguised as a hardware problem. You can watch a thousand hours of humans folding shirts on video, but that doesn't teach a robot what fabric tension feels like or how much pressure breaks an egg. DAIMON's release of Daimon-Infinity marks the first serious attempt to fix this at dataset scale.

Their core insight: Vision-Language-Action models are fundamentally incomplete for physical tasks. You can't learn to fold a shirt, assemble electronics, or handle fragile objects from vision alone—humans close their eyes and still know what they're touching. DAIMON's Vision-Tactile-Language-Action architecture treats touch as a first-class citizen alongside vision, which sounds obvious until you realize almost nobody else is doing it at scale.

"The company has open-sourced 10,000 hours of data and can generate millions of hours annually through its distributed collection network."

The numbers reveal the ambition. Most robotics datasets measure in hundreds of hours. DAIMON's sensor hardware—110,000+ sensing units per fingertip—generates orders of magnitude more tactile information than previous solutions. Their distributed data collection network can produce millions of hours yearly, which puts them on a trajectory to create tactile datasets comparable in scale to the vision datasets that powered the current AI boom.

What makes this interesting is the partner list: Google DeepMind, Northwestern, National University of Singapore. These aren't PR partnerships. DeepMind has been vocal about embodied AI being the next frontier. Northwestern has serious robotics pedigree. The collaboration suggests DAIMON's approach has research credibility beyond just hardware specs.

The competitive angle: DAIMON is a two-and-a-half-year-old startup going after a problem the big foundation model companies haven't solved. OpenAI, Anthropic, Google—they're all racing toward general-purpose humanoid robots, but their training regimes are vision-heavy. If tactile sensing turns out to be non-optional for real-world manipulation (which physics suggests it is), DAIMON has a serious moat.

The Implication

Open-sourcing 10,000 hours of tactile data is a land grab for mindshare. DAIMON wants their sensors to become the standard input device for the next generation of manipulation models—the same way cameras became the default for vision models. If researchers and companies start building on DAIMON's data format and sensor specifications, the company wins even if competitors copy the hardware.

For anyone building physical AI agents—warehouse automation, manufacturing, home robotics—the question is whether your training data includes what things actually feel like. If not, you're training one-handed.

Sources

IEEE Spectrum AI