The gig economy is about to teach robots what humans already know: context is everything.
The Summary
- Spark Capital's Nabeel Hyatt says physical AI needs real-world training data diversity that the internet can't provide, and gig workers may become the collectors
- Instawork, a gig-work platform, launched a robotics division and built Instacore, a data-collection device targeting the "100,000-year problem" of robot training data
- The shift from software AI (trained on scraped internet data) to physical AI (trained on messy real-world environments) is creating new job categories in the gig economy
The Signal
Hyatt's thesis is simple but consequential: robots can't learn from Reddit threads. They need video of actual humans doing actual work in actual environments. Kitchens in New York hotels don't look like catering facilities in Houston. Warehouse layouts vary. Even the way someone chops an onion changes by context. The internet gave us abundant text and images, but physical AI needs what researchers call the "100,000-year problem," gathering enough diverse real-world training data for robots to generalize across environments.
This is where Instawork's pivot gets interesting. The company built its business connecting gig workers to short-term shifts. Warehouses, kitchens, events. Now it's building Instacore, a data-collection device designed to capture the kind of granular, contextual footage robot companies desperately need. The workers already on-site doing the work become the data collectors. They're not being replaced by robots yet. They're teaching them.
"The real world is messy. There's no standardization on what 'good' video data even looks like yet."
Hyatt sits on Instawork's board. He was early in Discord and Cruise, companies that redefined communities and autonomous driving. His pattern recognition here matters. He's watching gig platforms evolve from labor marketplaces into data infrastructure for the agent economy. The shift isn't subtle:
- Gig workers move from pure task execution to task execution plus data generation
- Platforms that own access to diverse real-world work environments become critical AI infrastructure
- The "training data supply chain" becomes a new category, sitting between human labor and robot deployment
The economic logic is straightforward. OpenAI and other foundation model builders scraped the internet for free. Physical AI companies can't do that. Real-world data requires real people in real places capturing real variance. That costs money. It also creates a new job category that didn't exist 18 months ago.
The Implication
If Hyatt and Instawork are right, gig platforms become dual-purpose: labor supply today, training data supply tomorrow. Workers who understand they're contributing to both streams have leverage. They're not just doing a shift. They're generating the dataset that might automate their role in three years. That's not dystopian if the platforms share upside. It's dystopian if they don't.
Watch how Instawork and similar platforms structure compensation for data contribution. If data becomes the more valuable output than labor, the pricing should reflect that. If it doesn't, someone will build a worker-owned alternative that does.