Cartesia, a company whose state-space models (SSMs) are transforming the next wave of innovation in generative AI, announced $22 million in new funding led by Index Ventures, bringing the total capital raised to $27 million. Also joining in this round are venture funds like A* Capital, Conviction, General Catalyst, Lightspeed, and SV Angel, along with 90 prominent angel investors, including the founders of Abridge, Airtable, Captions, Cognition, Cohere, Databricks, Datadog, Hugging Face, Hubspot, Infinitus, Llamaindex, Mercury, Mistral, Okta, Perplexity, Pika, Pinterest, Postman, Ramp, RunwayML, Snorkel, Sonos, Together AI, Tripedot Studios, Typeface, Vercel, Weaviate, Weights and Biases, and Zapier.
The new funding will enable Cartesia to expand and accelerate its mission of building real-time, multimodal intelligence available on any device. Cartesia’s SSM technology allows developers to create highly efficient AI applications for various verticals like customer service, sales & marketing, robotics, healthcare, transportation, education, gaming, defense, security, and more.
Created by Stanford’s PhD AI lab researchers, Cartesia’s SSM architecture offers clear advantages over transformers as they scale linearly with sequence length and enable cheap, high-throughput inference. Even though transformers have advanced AI and support many applications today, these models are limited as they scale quadratically in context length, leading to slower inference. In comparison, Cartesia’s models are highly efficient, with better long-term memory, lower latency, and the ability to run locally on any device.
While transformers attend to every past token, SSMs update the model’s state and discard previous tokens as they stream in, making them the ideal architecture for real-time inference. The widely cited Mamba architecture from Cartesia’s founding team demonstrates that SSMs can already match transformer performance with fewer resources, making them a more efficient and cost-effective alternative for developers building real-time AI applications.
Earlier this year, Cartesia released Sonic, their low-latency voice model that generates expressive, lifelike speech, showcasing the power of their SSM architecture for real-time AI use cases. Besides being the fastest text-to-speech model with less than 90 ms latency to first audio, Sonic outperforms the best existing models on the market on voice quality, stability, and accuracy when compared head-to-head in blind human preference tests by third-party evaluation like Labelbox.
Due to the underlying SSM architecture, Sonic has introduced never-before-seen features, such as an on-device product that can run locally without an internet connection, and advanced controllability features like emotion, speed, and prompting. Developed in just a few months, the Sonic API already supports a variety of real-time use cases—customer service, debt collection, interview screening, voiceovers, and interactive character voices—with hundreds of customers ranging from new startups to public companies.
Sonic is especially suited for a new wave of startups building real-time voice agents. The interactive voice response (IVR) market alone is worth $6 billion and is expected to grow fourfold in the near term due to improvements pioneered by emerging AI models like Sonic. This is just one sliver of Sonic’s current customer base.
Cartesian plans to build on the success of Sonic with a long-term roadmap that includes developing multimodal AI models capable of ingesting and processing different inputs, such as text, audio, video, images, and time-series data, of creating real-time intelligence that can reason over massive contexts across a wide range of applications.
By making the next wave of foundation models with long-term memory and low latency, Cartesian aims to transform industries ranging from healthcare to robotics to gaming, paving the way for ubiquitous, interactive, and real-time AI available to anyone on any device.
Cartesia is headed by a group of Stanford researchers, including Goel, his former labmates Albert Gu (named one of Time’s 100 most influential people in AI), Arjun Desai, Brandon Yang, and former professor Chris Ré.
Recognized globally for their development of SSMs, the team is situated at the epicenter of a rich ecosystem of talented PhDs and academic partners, with Ré’s Stanford lab, in particular, serving as a hotbed of research and multiple billion-dollar startups in recent years like SambaNova, Snorkel AI, and Together AI. They’re joined by a diverse and well-rounded product team that brings experience from companies like DoorDash, Salesforce, Meta, Scale AI, Microsoft, Google Brain, and Zoom, ensuring that Cartesia can deliver real-world value to businesses across various industries.
KEY QUOTES:
“It’s well-known that today’s foundation models fall far short of the standard set by human intelligence. Not only do these models lack the depth of understanding that humans possess, they’re slow and computationally expensive in a way that restricts their development and use to only the largest companies. At Cartesia, we believe the next generation of AI requires a phase shift in how we think about model architectures and machine learning. That includes SSMs that bring intelligence directly to the device, where it can operate efficiently, in real-time, without reliance on data centers.”
- Karan Goel, Cartesia’s co-founder and CEO
“Transformers have provided a step-change in model performance and fueled much of the recent AI mania, but given their limitations there is opportunity for a fundamentally new and different architecture to unlock the next wave of AI innovation. We believe Cartesia’s SSMs can be that new architecture, allowing developers to build real-time applications that benefit users on any device. We’re excited to support this team of incredible researchers and engineers who are not only redefining AI performance but also making it more accessible and scalable for businesses of all sizes.”
- Mike Volpi, Partner at Index Ventures