AssemblyAI: $50 Million Raised To Build Superhuman Speech AI Models

By Amit Chowdhry ● Dec 6, 2023

AssemblyAI recently announced it raised $50 million in Series C funding in a round led by Accel, which also led the company’s Series A funding. There was also participation from Keith Block and Smith Point Capital, Insight Partners, Daniel Gross and Nat Friedman, and Y Combinator for this round.

AssemblyAI founder and CEO Dylan Fox noted that this brings AssemblyAI’s total funds raised to $115 million — 90% of which the company raised in the last 22 months. This funding round comes at a time when organizations across virtually every industry have raced to embed Speech AI capabilities into their products, systems, and workflows.

In the last six months, AssemblyAI has also been hard at work on its next-generation Universal model, which will be a new state-of-the-art solution for several multilingual Speech AI tasks. And this new model is being trained on >10 million hours of voice data (1 petabyte) leveraging Google’s new TPU chips — and represents a 1,250x increase in training data compared to the first-ever model made available by AssemblyAI back in 2019. 

There also now exist incredibly capable LLMs that could be used to ingest accurately recognized speech and generate summaries, insights, takeaways, and classifications that enable entirely new products and workflows to be created with voice data for the first time. And this new LLM technology is what the company’s popular Audio Intelligence models like Auto Chapters and Content Moderation are based on — which power brand safety and content moderation workloads at scale for leading enterprise companies — and the latest product LeMUR that can be used to perform text generation tasks over recognized speech.

The combination of these new capabilities has enabled thousands of fast-growing organizations to create powerful Speech AI capabilities into their products and workflows on top of the company’s models. And now the company regularly serves over 25 million inference calls and processes over 10TB of voice data daily through their API for customers that include industry-leading startups like Fireflies.ai, Veed, TypeForm, Close, Loop Media, and CallRail. And, with 10,000+ new organizations signing up for their API every month, we’re just scratching the surface of the new voice-powered AI applications we’ll see enter the market over the next year.

This new funding will support their ambitious research plans, new model development, training computation, and market expansion, as well as help build the team. We believe that the best way for us to continue to innovate is to bring together some of the best minds in AI, and we’re proud to have had an impressive roster of research leaders and scientists from DeepMind, Microsoft, Google, Amazon, and Meta join us over the past year.

KEY QUOTES:

“We founded AssemblyAI with the vision of creating superhuman Speech AI models that would unlock an entirely new class of AI applications to be built leveraging voice data. There is a tremendous amount of information embedded within human speech. Think of all the knowledge that exists within a company’s virtual meetings, for example. Or podcast and video data on the internet, phone calls into a small business or large contact centers, or even the ability to interact with machines using your voice. Being able to accurately understand, interpret, and build on top of voice data unlocks a tremendous amount of new opportunities for organizations across every industry.”

“Over the past two years, we’ve seen the combination of bigger datasets, better compute, and new neural network architectures like the Transformer make possible the significant advancement of AI models across nearly every modality — and make our vision of building superhuman Speech AI models more achievable than ever before.”

“Take our latest Conformer-2 model, for example. This model was trained on 1.1 million hours of voice data, and achieved industry-leading accuracy and robustness for various tasks like speech-to-text and speaker identification upon release. Conformer-2 makes up to 43% fewer errors on noisy data compared to other models and demonstrated a nearly 50% accuracy improvement compared to our previous generation of models. This greater level of accuracy and capability has helped our customers like Fireflies.ai to offer far more useful and reliable AI-meeting notes to their millions of users.”

— Dylan Fox

Exit mobile version