Inferless is a company that developed serverless GPU inference technology to scale your machine learning inference without any hassle of managing servers while deploying complicated and custom models with ease. Pulse 2.0 interviewed Inferless co-founder and CEO Aishwarya Goel to learn more about the company.
Aishwarya Goel’s Background
What is Aishwarya Goel’s background? Goel said:
“When I was in college, I started my entrepreneurial journey with a project to help young kids learn science. That fueled my passion in the art and science of building companies.”
“As a repeat founder, I have more than a decade of experience in the tech space. Prior to establishing Inferless, I built a foodtech company on my own and sold it in an all cash deal. I then worked in the fintech space for more than five years and was an early member of the business team PhonePe, one of the largest fintech unicorns in Asia.”
Formation Of The Company
How did the idea for the company come together? Goel shared:
“Several years ago, when I was building an AI-based digital coaching platform to help managers become leaders, there were a number of deployment challenges: developer experience, cost and time. While we were able to scale the service arm to $900K ARR and acquire more than 35 customers, we couldn’t find product market fit. That’s when my co-founder, Nilesh Argawal, and I pivoted to create Inferless. His extensive machine learning infrastructure background provided him insights into how to solve these issues. Together, we’re building Inferless to make deploying models easy for developers.”
Favorite Memory
What has been your favorite memory working for the company so far? Goel reflected:
“Since day one, we have operated as a very lean team. We’re a group of individual contributors focused on our specific goals, and I enjoy reminiscing about how our small, focused team can do wonders when we’re all aligned and can act quickly on things without too many decision layers.”
“For example, every few months, we conduct internal hackathons to build tools and solutions for the challenges reported by our customers. In fact, some of our best features have come out of these hackathons: debugging logs, reducing model import time and a quick deploy feature. We even launched an AI-based error fixing suggestion within user emails that helped us reduce model import time to five minutes.”
Core Products
What are the company’s core products and features? Goel explained:
“Our core product is a serverless GPU platform that helps developers deploy machine learning models.”
“Deploying AI models comes with a number of issues that slow developers down and impact inference performance: low GPU utilization, long cold-start times and complex deployment workflows. Inferless’ infrastructure-aware load balancer increases GPU utilization by up to 100 percent. Our infrastructure platform significantly reduces cold-start times to mere seconds by enabling containers to run efficiently to ensure seamless, high-performance AI inference at scale.”
“Features include NFS-like volumes; custom runtimes; advanced monitoring; full CI/CD integration; dynamic batching; and remote code execution.”
Challenges Faced
What challenges have Goel and the team face in building the company? Goel acknowledged:
“From the outside, the serverless inference space looks really competitive; plenty of startups have raised a lot of money, but very few are focused on actually solving hard technical issues around model loading time, developer experience and autoscaling.”
“Developers like to try tools before they fully move workloads, but because they often have bad scaling experiences with young AI infrastructure companies, it’s difficult to convince them to try an alternative. But once they try Inferless, they are hooked. One client, in particular, was able to scale from zero to hundreds of GPUs with consistent cold-starts, ultimately providing delightful experiences to their end-users.”
Evolution Of The Company’s Technology
How has the company’s technology evolved since launching? Goel noted:
“From very early days, our focus was on solving cold start issues and providing a better developer experience. But getting there wasn’t a linear process. We had to go through multiple iterations and reevaluated our architecture choices based on customer feedback, performance issues, model import time and process, integrations and managing time outs, for example.”
“But after all that iteration, we’ve finally built something special; in the last few weeks, we’ve onboarded more new users than we did over the course of one year! Also, the model import process has improved significantly with strong cold starts and seamless scaling. It’s a great sign that developers are using our platform without any hassles.”
Significant Milestones
What have been some of the company’s most significant milestones? Goel cited:
“We recently had our first Product Hunt launch and ranked number one, which resulted in an overwhelming response from users. In addition, we’ve signed more than two hundred new users since then.”
Customer Success Stories
When asking Goel about customer success stories, she highlighted:
“Cleanlab, which helps enterprises clean data and labels by automatically detecting issues in a ML dataset, launched its Trustworthy Language Model to add a trustworthiness score to every LLM response. It’s designed for high-quality outputs and enhanced reliability, which is critical for enterprise applications to prevent unchecked hallucinations.”
“However, the company experienced increased GPU costs due to GPUs running even when they weren’t actively being used. Cleanlab’s problems were typical for traditional cloud GPU providers: high latency, inefficient cost management and a complex environment to manage.”
“With serverless inference from Inferless, the company cut costs by 90 percent while maintaining performance levels. More importantly, they went live within two weeks with no additional engineering overhead costs.”
Total Addressable Market
What total addressable market (TAM) size is the company pursuing? Goel assessed:
“The AI inference market is evolving and growing very quickly. Overall Serverless GPU inference is emerging as a critical market. The overall AI inference market was valued at $15.8billion in 2023 and is projected to reach $90.6billion by 2030, a CAGR of over 22 percent, according to Verified Market Research.”
Differentiation From The Competition
What differentiates the company from its competition? Goel affirmed:
“We compete with major hyperscalers like AWS and Azure as well as with young startups. Our key focus is providing extremely low model loading times with consistent autoscaling so developers can run their workflows in a true serverless manner. A great developer experience is very important to us; we want it to be super-easy to deploy models.”
Future Company Goals
What are some of the future company goals? Goel concluded:
“In the next few quarters, we are looking to go upmarket with enterprises to help them deploy models in their own cloud. We’re also strengthening our core serverless inference offering with more kinds of GPU support, features, enterprise offerings and more.”