Mati Staniszewski, co-founder and CEO of ElevenLabs, says voice is becoming the next major interface for AI – the way people will increasingly interact with machines as models move beyond text and screens.
Speaking to Web Summit in DohaStaniszewski told TechCrunch, speech models like those developed by ElevenLabs have recently gone beyond simply imitating human speech – including emotion and intonation – to work in tandem with the reasoning capabilities of large language models. The result, he claims, is a change in the way people interact with technology.
In the coming years, he said, “we hope that all our phones will return to our pockets and we will be able to immerse ourselves in the real world around us, with voice as a mechanism to control technology.”
This vision fueled the ElevenLabs project $500 million increase this week, at a valuation of $11 billion, and it is increasingly shared across the AI industry. OpenAI And Google have both made voice a central part of their next-generation models, while Apple appears to be quietly building voice-adjacent and always-on technologies through acquisitions like Q.ai. As AI spreads to wearables, cars and other new hardware, control becomes less about touching screens and more about speaking, making voice a key battleground for the next phase of AI development.
Seth Pierrepont, general partner at Iconiq Capital, echoed this view on stage at Web Summit, saying that while screens will continue to be important for gaming and entertainment, traditional input methods like keyboards are starting to feel “outdated.”
And as AI systems become more agent-like, Pierrepont said, the interaction itself will also change, with models benefiting from guardrails, integrations and the context to respond with less explicit prompts from users.
Staniszewski highlighted this agentic shift as one of the biggest changes taking place. Rather than spelling out every instruction, he said future voice systems will increasingly rely on persistent memory and context built over time, making interactions more natural and requiring less effort from users.
Techcrunch event
Boston, Massachusetts
|
June 23, 2026
This development, he added, will influence how voice models are deployed. While high-quality audio models largely live in the cloud, Staniszewski said ElevenLabs is working on a hybrid approach combining cloud and on-device processing — one aimed at supporting new hardware, including headphones and other wearable devices, where voice becomes a constant companion rather than a feature you decide when you engage with.
ElevenLabs already partners with Meta to bring its voice technology to products like Instagram and Horizon Worlds, the company’s virtual reality platform. Staniszewski said he would also be open to working with Meta on its Ray-Ban smart glasses as voice interfaces expand into new form factors.
But as voice becomes more persistent and integrated into everyday hardware, it opens the door to serious concerns about privacy, surveillance, and the amount of personal data that voice systems will store as they move closer to users’ daily lives – something companies like Google have already been accused of abuse.
#ElevenLabs #CEO #Voice #interface