Datasaur – a leading natural language processing (NLP) platform that helps annotators train AI algorithms – recently announced the closing of a $4 million seed funding round and launched a new feature called Datasaur Dinamic, allowing users to train custom NLP models efficiently. The funding round was led by Initialized Capital with participation from HNVR, Gold House Ventures, and TenOneTen. This funding round brough Datasaur’s total funding to date to $7.9 million. And the latest investment will be used to democratize access to the latest advancements in NLP and LLM technology.
As NLP model training processes and platform capabilities have advanced and converged, increasingly proprietary datasets power the unique capabilities of the resulting models. Datasaur has invested the last four years in building an intuitive and efficient platform that enables companies to label their data, which transforms raw data into valuable AI datasets.
With Datasaur’s new product Dinamic, users can take this labeled data one step further with a click of a single button to train a custom NLP model. As more data is labeled, the model automatically learns and becomes more accurate and powerful.
Using a streamlined process, teams can quickly build and iterate on models. Dinamic turns a complex, multi-step process spanning multiple platforms and technologies into a simple two-step process. And companies can now annotate the data based on business requirements and automatically receive a fully trained NLP model, saving millions of dollars in data science costs.
With OpenAI’s president Greg Brockman as an early investor, Datasaur helped support companies such as Spotify, Google, and Qualtrics label a vast array of text data ranging from Word documents to PDFs to audio clips. And the platform employs state-of-the-art techniques such as weak supervision and LLM-labeling to save customers up to 80% of their time and costs. Datasaur’s workforce management platform and Conflict Review mode also support teams in scaling their efforts and utilizing best practices to identify errors in their training dataset.
Datasaur built the NLP industry’s most efficient data labeling tool and will leverage that foundation to expand into a full-fledged, all-in-one NLP platform. And the company’s mission has been to increase accessibility to NLP technologies and support NLP development in international languages for a global audience. Datasaur Dinamic allows non-technical teams to build and develop their proprietary NLP solutions.
KEY QUOTES:
“I’ve long observed that the primary differentiating factor between NLP models is the underlying training data. We initially founded Datasaur with a focus on the labeling platform because that was the most painful, complex, and time consuming step in the NLP development cycle. We’ve built a configurable and comprehensive interface for labeling the petabytes of raw text and audio data companies have accumulated. Today we are in a perfect storm between the dizzying advancements in LLM technology alongside renewed vigor from business stakeholders in translating AI into cost savings and accelerated revenue generation. At this key inflection point, we’re excited to accelerate our product development and help our customers tap into the full potential of NLP.”
— Ivan Lee, CEO and founder of Datasaur
“The NLP space is clearly primed for growth. We’re seeing companies in every industry and vertical rushing to discover how to apply ChatGPT-like technology to their own processes. Over the last few years, we’ve been impressed by the Datasaur team’s ability to take complex technical workflows and condense them into an intuitive experience for data scientists and non-technical annotators alike. The current LLM space is highly fragmented and evolving rapidly. Products like Datasaur Dinamic simplify and standardize the process for those new to the NLP space. We saw the potential in the NLP space in 2020 when we first invested in this team, and the time is ripe to capture the rapidly growing market.”
— Brett Gibson, Managing Partner at Initialized Capital