Modulate: Interview With Co-Founder & CEO Mike Pappas About The Voice Intelligence Platform

Modulate develops AI-powered voice intelligence and moderation platforms, like ToxMod, that analyze the nuance and tone of human voices to detect toxicity and promote safe, inclusive online spaces. Pulse 2.0 interviewed Modulate co-founder and CEO Mike Pappas to learn more.

Mike Pappas’ Background

Could you tell me more about your background? Pappas said:

“I’m an MIT-trained mathematician and physicist whose first two jobs out of school were in cloud computing and then machine learning. So obviously I run Modulate’s business team today! That forced me to rapidly learn the basics of marketing, sales, and customer success (while hiring an amazing team), but also allowed me to connect the dots between different parts of my background.”

“At Bridgewater Associates, where I was a cloud engineer, I also experienced a unique culture of feedback that helps me think about effective leadership today. At Lola Travel, working under serial entrepreneur Paul English (co-founder of Kayak.com), I wasn’t just an ML engineer but also got to watch a startup grow from the inside, and was the engineer spending the most time with our customer support team. And today, while our amazing team drives the day to day, I bring together my technical background with my business acumen to act as a thought leader and guide for our customers and other industry leaders trying to understand how AI is evolving, and how to use it safely and predictably.”

Formation Of The Company

How did the idea for the company come together? Pappas shared:

“I met Carter in college, and it didn’t take long for us to realize we wanted to build a company together. Our first interaction was a little awkward: I spotted someone working through an interesting physics problem on a whiteboard and, as a freshman, ended up hovering over what turned out to be Carter’s shoulder for a few minutes. Eventually I noticed a missing step in the solution and pointed it out. After that, we became fast friends. We spent much of our undergraduate years experimenting with different company ideas, a social media platform, a scheduling app, a robotic delivery system, and plenty more, but none of them really stuck. It wasn’t until after we graduated that inspiration finally struck. Carter went on to work at NASA’s Jet Propulsion Lab, where he was exposed to cutting-edge research on highly efficient machine learning models. Through him, I got pulled into what started as a hobbyist interest in audio AI systems, which eventually led to our first real breakthrough: a neural network capable of real-time voice changing, long before the term ‘deepfakes’ was even part of the conversation.”

“We brought that technology to a number of different industries, but gaming showed the most enthusiasm. What stood out wasn’t the novelty, it was the potential to protect players from toxicity based on age or gender. That insight pushed us deeper into content moderation and ultimately led to ToxMod, which now powers voice moderation for games like Call of Duty, Grand Theft Auto, and many others. The same core AI innovations behind ToxMod also unlocked broader conversational analysis. Since then, we’ve expanded the technology to support Fortune 500 enterprises across additional use cases, including preventing voice-based fraud, monitoring customer experiences, and reducing hallucinations in voice AI systems.”

Favorite Memory

What has been your favorite memory working for the company so far? Pappas reflected:

“About a year into Modulate’s history, when the team was only about 6 people, Carter and I needed to attend some meetings in San Francisco and made the mistake of booking flights for a day trip from Boston – a red-eye in, meetings through the day, and a red-eye back out. Arriving back in Boston the next morning, both of us were exhausted, but, being founders, we were committed to getting into the office to get some work done. We arrived at the office, and just as we were about to open the door, Carter stopped me and pointed through the window inside – where we could see our colleagues energetically talking through an AI architecture problem at the whiteboard. It was the first time it truly struck me that Modulate had become bigger than Carter and I, and that the energy and potential for what we were building was limitless.”

Core Products

What are the company’s core products and features? Pappas explained:

“Modulate is the pioneer in the next generation of more efficient AI infrastructure. Our core product is centered on this groundbreaking new architecture we’ve introduced to the industry: Ensemble Listening Models (ELMs).”

While most companies rely on automatic speech recognition (ASR) – LLM pipelines that are slow, expensive, and prone to hallucination, Modulate’s ELM-based stack is purpose-built for real-world voice intelligence. It delivers faster, more cost-efficient, and more reliable analysis than text-only LLM systems, unlocking insights that traditional models simply cannot access.

The release of ELMs represents a fundamental shift in the industry. Where LLMs struggle with subtle audio cues, hallucination risk, and high compute costs, our Ensemble Listening Model is:

Faster: ****ELMs run specialized components optimized for streaming audio—dramatically reducing inference costs and latency.
More Cost Efficient: ****By replacing general-purpose LLMs with smaller, task-specific detectors like Modulate’s, enterprises can scale voice intelligence across every call without incurring LLM-scale compute bills.
More Accurate and More Reliable: The Modulate platform processes acoustic, behavioral, and contextual signals that LLMs never see—unlocking superior performance in fraud detection, escalation prediction, emotion understanding, and safety monitoring.
More Explainable: ****With time-aligned model outputs and the Conversation Fingerprint, customers get a level of transparency impossible with end-to-end black-box systems.

Our flagship product is Modulate’s implementation of an Ensemble Listening Model. Unlike monolithic LLMs, our platform:

Runs dozens of specialized acoustic, semantic, behavioral, and risk-focused detectors in parallel
Operates in real time or batch mode, making it viable for both live fraud detection and post-call CX analytics
Produces high-precision, time-aligned signals across emotion, frustration, urgency, fraud patterns, AI-speech detection, safety violations, escalation risk, and more
Offers transparent, auditable reasoning, enabling teams to understand why a prediction was made—critical in regulated or high-stakes environments

This architecture is not just more accurate than text-only LLM pipelines—it is inherently more efficient, since each model is specialized, lightweight, and optimized for the type of signal it analyzes. Customers gain LLM-level sophistication at a fraction of the operational cost, with significantly lower hallucination rates.

Key features include:

Multi-layer visualization of emotion, fraud cues, safety signals, and conversational structure
Click-to-audio alignment for rapid audit and QA workflows
Configurable signal layers for domain-specific use cases
For analysts, supervisors, fraud teams, and product leaders, Modulate transforms unseen dynamics into actionable insight. It’s a leap forward in interpretability, one that aligns with Modulate’s commitment to transparent AI.
APIs for batch and streaming call analysis
Configurable behavior packs for fraud, CX, trust & safety, and AI-agent governance

Challenges Faced

Have you faced any challenges in your sector of work recently? Pappas acknowledged:

“Deepfakes and Synthetic Voices

A major challenge has been the rise of sophisticated audio deepfakes (both text-to-speech and speech-to-speech) and synthetic voices used for fraud. At Modulate, our technology overcomes this by combining passive voice ID, far more difficult to spoof than active passphrases, with context-aware analysis that evaluates how a person speaks, responds, and interacts. This layer of conversational voice intelligence helps us reliably differentiate a fraudster from a legitimate customer who may be using a synthetic voice aid.”

AI Distrust

While AI is, in concept, extremely exciting to most enterprises, many have attempted trials and found challenges actually getting reliable and clear results. At first, this sense of skepticism impacted Modulate as well, but it’s since actually become a moat for us. Given Modulate’s history in content moderation and the unique AI architecture we use, one of our greatest strengths is the absence of issues like hallucinations, and the fact that our analyses are actually transparent and can be clearly understood by businesses looking to build trust in their tools. As such, we’ve found that our pitch has grown more and more resonant with prospective customers as we’ve made these features clearer.”

Evolution Of The Company’s Technology

How has the company’s technology evolved since launching? Pappas noted:

“At Modulate our technology has evolved from an early focus on gaming to a sophisticated platform for voice understanding in any application. The first unique model we created, way back in 2015, was real-time voice changing tech; we then repurposed many of the core insights into emotion-, nuance-, and context-understanding tools that we’ve kept on leading edge through today. The first application for this tech was in some sense the hardest – real-time content moderation in online games, rich with friendly banter that should NOT be punished, complex emotions and jargon, and rapid transitions. After proving our chops with some of the largest titles in the world, including Call of Duty and Grand Theft Auto, we then expanded to support enterprises on their own concerns, including identifying social engineering and deepfake fraud, monitoring AI agents for hallucinations or problematic behavior, and much more.”

Significant Milestones

What have been some of the company’s most significant milestones? Pappas cited:

“Scale of Analysis

Modulate has processed over 300 million hours of audio and protected nearly 415 million people across 18 languages. Of course, our first large-scale deployment with Call of Duty was particularly exciting, as was our first Fortune 500 enterprise deployment in 2014.

Synthetic Voice Detection

In a time when many providers are exiting the space, no longer confident they can deliver reliable synthetic voice detection, Modulate was able to develop our first-generation model in less than two months – and immediately shot to the top of the leaderboards. Not only is this a hugely valuable feature, but it also underscores the value of the data and expertise we’ve developed over time.

Launch of Modulate Platform

Modulate launched a category-defining advancement that unified all of the company’s conversational voice-intelligence capabilities into a single engine designed to deliver precise, real-time answers to critical business questions.

Total Addressable Market (TAM)

What total addressable market (TAM) size is the company pursuing? Pappas assessed:

“Modulate’s goal is to augment every digital conversation. That means helping individuals express themselves and understand each other, helping platforms monitor for fraud and abuse, helping human and AI agents better support their customers, and of course helping AIs in particular engage more intuitively, with higher EQ, with people in every setting. We tend to think more about the impact we’re having on a per-person or per-conversation basis, but if you try to translate that into economic impact, there’s no question the TAM exceeds hundreds of billions.”

Differentiation From The Competition

What differentiates the company from its competition? Pappas affirmed:

“Unique AI Architecture

Rather than a black-box AI model, Modulate uses hundreds of component models which are woven together dynamically depending on the customer’s needs. This allows for customers to understand Modulate’s logic when we claim e.g. a conversation is fraudulent (we can highlight exactly what we saw), and makes it trivial for customers to change what they are looking for (e.g. we’re good on fraud for now, let’s focus on negative customer experience) without requiring retraining or delicate prompting.

Voice-Native Analysis

Where many competitors rely heavily on transcription or keyword filtering, our models analyze acoustic cues, emotional tone, conversational dynamics, and intent, enabling far more complete understanding of each conversation. This emphasis on nuance has been proven in arguably the toughest arena of all – catching the minor differences between friendly banter and harmful attacks in competitive online games.

Build for speed, scale, and accuracy

Modulate’s systems power hundreds of millions of conversations today, escalating important events in seconds so the platform can react before the call completes. By focusing precisely on conversational understanding and building a unique “triage”-based architecture, Modulate is able to deliver all these insights with industry-leading accuracy while orders of magnitude less expensive than LLMs or other AI solutions.

Data Quality

We train our models on noisy, real-world conversational audio, including mumbling, overlapping voices, and background noise, rather than synthetic datasets. This gives the technology a far better understanding of how people actually speak in live environments, making it significantly more accurate and reliable in real-use scenarios. Combined with technical rigor, real-time performance, and ability to interpret the subtle nuances of human speech, this approach allows our technology to solve challenges competitors often oversimplify or fail to address, ultimately positioning us as a leader in voice intelligence.”

Modulate: Interview With Co-Founder & CEO Mike Pappas About The Voice Intelligence Platform

Consumer Tech