WellSaid Labs: How This Company Is Using AI To Drive Human Parity In Voice

By Amit Chowdhry ● Sep 16, 2024

WellSaid Labs is a leading AI text-to-speech technology company and the first synthetic media service to achieve human parity in voice. Pulse 2.0 interviewed WellSaid Labs co-founder Matt Hocking to learn more about the company.

Formation Of WellSaid Labs

How did the idea for the company come together? Hocking said:

“Since the beginning of my career, I have always wanted to develop businesses with real solutions for people. I have a background in design and technology, which nurtured a long-time interest in the creative fields, and I wanted to find a new way to connect that to technology. I started in the early-stage startup community in New York, working alongside other entrepreneurs trying to get their ideas off the ground.”

“In 2018, I joined The Allen Institute for Artificial Intelligence (AI2) as an Entrepreneur in Residence and was inspired by the ways different founders were making AI research and solutions more practical and scalable. While exploring business ideas at AI2, I was introduced to Michael Petrochuk (WellSaid Labs co-founder), who was working on deep learning research for voice-centric healthcare avatars. Michael’s idea set off a lot of alarm bells for me – in terms of how much time and money this technology could save – in a way that has never been done before. Since then, we’ve pieced together the progression from an AI text-to-speech (TTS) generator to an audio foundation model that not only receives the prompt but renders it in a way comparable to a human voice recording. We wanted to take the concept of a human-quality recording and deliver it in a basic, easy-to-use studio experience so users can audition virtual voice actors to make a recording of their script instantly.”

Core Solutions

What are the company’s core solutions and features? Hocking explained:

“WellSaid Labs offers two main products: WellSaid Studio and AP.”

WellSaid Studio is a creative space where teams across any enterprise organization explore ethically sourced diverse voices tailored to their creative vision. With a spectrum of 110+ voice styles, Studio is designed for creatives, allowing teams to make edits and record retakes on-demand in a shared creative environment. Companies have the ability to get a customized voice avatar as well for their brand needs.”

“The WellSaid API grants brands seamless access to cutting-edge AI voice technology in any product or application. We handle hosting, scaling, and infrastructure upgrades, empowering brands to focus on content creation while ensuring top-tier voice capabilities. Brands can add AI voices to all things digital, creating more natural, yet engaging, voiceover experiences at scale, on-demand, and across the globe.”

“It’s important to note that we work with companies in every sector and many teams within organizations as well. From Insurance to media companies, each one can have multiple teams using our voice platform, including marketing, product, and even corporate training teams., These teams are getting real business value from our AI voices which we are quite proud of. Teams use our platform for things like e-learning content, advertising, video production, voice-guided products and experiences, and narrating wiki docs as well as editorial content, offering high-quality customizable AI voices for producing and editing content at scale and across the world. For corporate training and e-learning, WellSaid voices elevate internal modules to ensure employee engagement and comprehension. Advertising and video production projects benefit from localization, personalization, and increased engagement across a spectrum of content, from marketing clips to extensive cinematic projects. We seamlessly integrate our TTS technology to bring product roadmaps to life for voice-guided products and experiences, as well as cater to the needs of brands and creators for crafting specific narratives.”

Challenges Faced

What challenges have Hocking and the team faced in building the company? Hocking acknowledged:

“The development of AI voice technology has created an entirely new set of obstacles for both developers and consumers. Many organizations are attempting to cash in on the new short-term AI voiceover hype and buzz, but we try to avoid getting caught up in the noise that floods the AI space. At WellSaid Labs, we want to provide a voice for everyone at enterprise organizations, guided by central ethical principles and policies.”

“These principles are represented as Accountability, Transparency, Privacy and Security, and Fairness. Our prioritization of these principles can often delay the development and deployment of our technologies but solidify the safety and security of WellSaid voices and data.”

“Accountability: We maintain strict standards for appropriate content, prohibiting the use of our voices for content that is harmful, hateful, fraudulent, or intended to incite violence. Our Trust & Safety team upholds these standards with a rigorous content moderation program, blocking and removing users who attempt to violate our Terms of Service.”

“Transparency: We require explicit consent before building a synthetic voice with someone’s voice data. Users are not able to upload voice data from politicians, celebrities, or anyone else to create a clone of their voice unless we have that person’s explicit, written consent.”

“Privacy and Security: We protect the identities of our voice actors by using stock images and aliases to represent the synthetic voices. We also encourage them to exercise caution about how and with whom they share their association with WellSaid Labs or other synthetic voice companies to reduce the opportunity for misuse of their voice.”

“Fairness: We compensate all voice actors who provide voice data for our platform, and we provide them with ongoing revenue share for the use of the synthetic voice we build with their data.”

“An additional challenge of building the WellSaid platform was developing specific consent guidelines to ensure that our voice talent is aware and informed. To address this, we seek out collaborative, long-term partnerships and contribute closely to voiceover development to increase our accountability and transparency, as well as user security. We seek partnerships with voice talent from all kinds of backgrounds, organizations, and experiences to ensure that WellSaid Labs’ voice library reflects the voices of its creators and audiences. These processes are designed to be intentional and detail-oriented to ensure our technology is being used safely and ethically.”

Significant Milestones

What have been some of the company’s most significant milestones? Hocking cited:

“One of our most significant recent milestones was releasing Highly Intuitive Naturally Tailored Speech  (HINTS), a new model architecture that allows for precise control of model output. Generative AI models have brought about a major shift in content production, but ensuring these models answer users’ specific creative preferences is still a challenge. Until now, the prevailing method for solving this challenge and controlling generative models has been using natural language descriptions, but many artistic preferences are nuanced and difficult to capture within a single prompt. HINTS is a significant breakthrough for synthetic speech and speech synthesis, as well as the field of generative modeling. Our release of HINTS addresses the need for more customizable and creator-focused AI voice tools, which uncover possibilities for expressive, voice-based content across audiobooks, training narrations, marketing materials, and more.”

“Another significant milestone for the company was our partnership with Oxford Languages to improve our Respelling system, which powers our voices and allows users to customize and shape pronunciation. We partnered with Oxford to increase our volume of training data and gain more context around each word, which helps bring our voices closer to human parity. By incorporating Oxford’s data into our Respelling system, we can train our models on a larger and more accurate dataset, a key factor in ensuring users’ Respelling rules are consistently and reliably adhered to. We also included industry-specific terminology in the dataset, in sectors like medicine, where terms can be uncommon and complex. Produced in collaboration with Oxford, our improved Respelling capabilities support our users in approaching technical words and producing a more precise outcome. This partnership was significant for us because TTS models have historically struggled to integrate Respelling systems, due to their lack of precision and consistency. WellSaid Labs’ Respelling capabilities address these challenges and offer full pronunciation control, allowing various preferences to be integrated into the digital content.”

Customer Success Stories

After asking Hocking about customer success stories, he highlighted:

“One of our most exciting stories is when we teamed up with NPR’s Planet Money for a three-part series to create an entire podcast produced with AI – from the interview research and questions to the episode script and voiceover. We made an AI voice of Planet Money’s host, Robert Smith, using recordings from previous episodes, training the model from a static and unintelligible mumble to an indistinguishable replica of Smith’s voice. Our proprietary generative AI technology was able to capture the specific qualities that make a human voice unique and achieve that human parity. The collaboration came to fruition to underscore the social, ethical, and commercial responsibility that comes with voice cloning. We focused on educating Planet Money listeners on why and how WellSaid Labs builds AI voices ethically and responsibly, and the importance of explicit consent and ethical considerations during all steps of the process. This collaboration was a significant milestone for our company, but it also made history as the first-ever NPR editorial production made by AI. After its release, the series surpassed its expectations in educating audiences about responsible AI practices, reaching over 500,000 listeners.”

“We have also worked with Five9 to develop custom AI voices for customer support solutions. We created custom voices leveraging vocal data from real humans who consented to provide this data and integrated them into Five9’s Studio platform, allowing businesses to build and manage virtual agents within their contact centers – with exceptional voice quality. This marked a significant advancement in enhancing the customer support experience using AI. As the dynamics of the contact center evolve to accommodate new technologies, the role of TTS becomes increasingly important. We believe an increasing number of requirements will be handled by virtual agents, emphasizing the importance of our advanced and practical TTS solutions.”

Differentiation From The Competition

What differentiates the company from its competition? Hocking affirmed:

“WellSaid Labs is a leading player in the AI voice space in terms of exceptionally high-quality voice performance and is committed to safety, trust, and transparency. The latter is demonstrated through our in-house models trained on ethically sourced data, explicit consent requirements, content moderation program, and commitment to brand protection. WellSaid Labs was founded with ethical principles in mind; we continue to lean on the pillars of Responsible AI – Accountability, Transparency, Privacy and Security, and Fairness – to shape our decisions and designs, which extend to the use of our voices.”

“In addition to these principles, we also strictly respect intellectual property. WellSaid does not claim ownership over any content provided to us by users or voice actors. Our commitment to responsible innovation and developing AI voices with ethics in mind sets us apart from competitors seeking to capitalize on a new, unregulated industry by any unethical means. Our early investments and commitments to ethics, safety, and privacy have established trust between our company, voice actors, and customers, who seek ethically created products and services from companies at the forefront of innovation.”

“At WellSaid Labs, our primary measure of voice quality is human naturalness. Speech perfection is a mechanical concept that leads to a robotically flawless, unnatural output. We train our models on authentic human voice data that we source by partnering with voice actors, recording audio in the studio and forming a long-term partnership with these individuals. Our voice talent reads their scripts authentically and engagingly and as a result, the AI voices we create   perform engaging narrations.”

Additional Thoughts

Any other topics you would like to discuss? Hocking concluded:

“It is clear that AI voices are already impacting the careers of professional voice actors, as demonstrated by numerous union strikes; however, AI voices can serve as a tool for voice actors that can supplement their work and aid their careers. At WellSaid Labs, our voice talent is our partners. Creating WellSaid AI voice avatars requires explicit, informed consent from voice talent. Our partnerships with voice actors mean that they retain full visibility into the projects their voices are used for, as well as full control over their voices, with the option to be removed from the platform if desired.”

“By creating an AI voice, actors can find new opportunities to generate passive income. Voice actors can monetize their expertise and open up additional commercial opportunities without needing to worry about scaling and workflow challenges like technical terminologies and scheduling availability. Additionally, human voice actors play a pivotal role in training the AI models and ensuring that emotional nuances and authenticity are properly conveyed. Human efforts in training AI voice models will not lead to complete actor replacement but rather provide the actors with a virtual version of their own voice. As AI voices continue to advance and become more realistic, companies developing the technology and voice actors must find a balance between innovating and maintaining human creativity and connection in storytelling.”

“We regularly update and add new vocal styles and accents to our avatar library to ensure that WellSaid represents the voices of its community; however, we don’t think AI will completely replace the work of individual voice actors. Our technology supplements voice talent, enabling actors to expand their reach and generate revenue.”

Exit mobile version