Refuel AI – a platform to automate dataset creation and labeling for every business and every use case – announced recently that it exited from stealth with $5.2 million in seed funding led by General Catalyst and XYZ Ventures.
Angel investors who have been executives and leaders at OpenAI, Slack, Meta, Datadog, Upstart, and Red Hat also joined. The funding will be used for growing its team and launching its platform capabilities.
The company was launched by Stanford grads Nihit Desai and Rishabh Bhargava. In one of Bhargava’s previous roles, he had to train dozens of NLP classifiers and everyone at the company (from engineers to Senior Directors) spent weeks just labeling data. This was slow and painful, and the final labels weren’t very high quality either, because it is difficult to share context and write accurate labeling guidelines.
Building new AI use cases requires data collection and annotation, a process that often takes weeks before even training your first model. And every Machine Learning team has felt the pain of being in a perpetual waiting cycle for enough labeled data.
Desai led ML teams for content integrity at Meta. Imagine the massive pieces of content that are uploaded to Meta every day. Labeling just 0.01% of this data requires an army of hundreds of people. However, this labeling work is critical to building performant and adaptive AI models for flagging and taking down harmful content.
“With the rise of LLMs, the need for labeling has gone up dramatically – whether you look at the large number of contractors hired by OpenAI or the months-long effort from Databricks employees to label data for Dolly 2.0. We need better solutions to make AI work for everyone,” said Bhargava in a statement. “With the immense costs and time constraints involved in data labeling, it’s no wonder that for every AI project that is successful, there are hundreds that don’t even get off the ground. This is the problem that we will tackle first.”
By utilizing LLMs, the company’s users are able to create large, clean, and diverse datasets in less than an hour instead of weeks. Along with the platform that the company is building, they are also releasing Autolabel, an open-source library to label your data using LLMs. Early users and internal benchmarks show a 25-100x speedup for dataset creation and labeling at par or better than human accuracy.
After clean data is no longer a bottleneck, every company will be able to train its own models and LLMs. For the company, this is the first step towards the mission of ushering in the era of AI abundance. Prior to Refuel, Nihit and Bhargava spent more than a decade building large-scale AI and data systems at Meta, Primer.ai, LinkedIn, Cloudera, and Stanford. The early engineering team has previously worked at companies like Amazon, Apple, Google DeepMind, Lyft, and Uber.