Unstructured: LLM Data Preprocessing Solutions Company Raises $25 Million

By Dan Anderson • Aug 3, 2023

Unstructured recently announced the closing of its Seed and Series A funding rounds, raising $25 million. The Series A funding round was led by Madrona with participation from the seed lead, Bain Capital Ventures, and joined by M12 Ventures, Mango Capital, MongoDB Ventures, and Shield Capital. And notable angel investors Harrison Chase of LangChain, Bob van Luijt of Weaviate, and Josh Lefkowitz of Flashpoint also participated. As part of the financing, Madrona Managing Director Karan Mehandru and Bain Capital Ventures Partner Enrique Salem joined the board of directors.

Unstructured rapidly emerged as a leader in data transformation, making it easy for enterprises to utilize their natural language data in conjunction with large language models (LLMs), regardless of file type, document layout, or location. And accessing and transforming this data is significant because over 80% of enterprise data resides in documents and other unstructured files. And with over 700,000 downloads and integrated into more than 2,400 GitHub repos, Unstructured has established itself as a leading provider of LLM data preprocessing solutions, enabling organizations to leverage their unstructured data at a speed and ease previously unimaginable.

The company also released a major product update – which is a single API that further accelerates the ability for users to leverage their natural language data in conjunction with LLMs. And users can now point any file containing natural language at Unstructured’s API and receive back data in a format ready for vector databases, LangChain, and LLMs. Plus, the company has introduced more than 15 production-grade data connectors, making it possible to connect to natural language data wherever it is stored. Enterprises can use these connectors to build a data pipeline that can be continuously updated. In the last six months, over $40 billion has been invested in AI startups, and many of these companies are building solutions on top of LLMs. The introduction of Unstructured’s API and data connectors will further accelerate companies’ ability to connect, transform, and stage data for use with LLMs.

Unstructured developed its technology in partnership with the open-source community, commercial enterprises, and select U.S. Government defense and intelligence organizations. And the company has been awarded Phase I and two Phase II Small Business Innovation and Research contracts by the U.S. Air Force and U.S. Space Force. And U.S. Special Operations Command (SOCOM) established a Cooperative Research and Development Agreement with Unstructured and has served as a critical design partner since the company’s infancy. This past winter, Unstructured partnered with SOCOM to help deploy the first use of an LLM on a stand-alone system and in conjunction with mission-relevant data.

The company is also pleased to welcome retired General Michael Groen to strengthen its advisory board. Groen is the former Director of the Joint Artificial Intelligence Center at the Pentagon, where he built and deployed machine learning and analytics solutions across the DoD. Joining Groen on the advisory board are Mike Brown, former Director of the Defense Innovation Unit and the lead for Shield Capital’s investment in Unstructured, and Ryan Lewis, an In-Q-Tel and AWS National Security veteran.

KEY QUOTES:

“Organizations generate vast amounts of unstructured data daily, which, when combined with LLMs, can supercharge productivity. However, this data is often scattered across numerous databases, file formats, and document layouts. By automating the preprocessing of natural language data, Unstructured eliminates the need for laborious manual preprocessing, removing one of the most time-consuming and expensive bottlenecks data scientists encounter in deploying LLM-based solutions across their organizations.”

— Brian Raymond, Founder and CEO of Unstructured

“In today’s digital age, the world runs on documents. From research reports and memos to quarterly filings and plans of action, documents are the unit of information that organizations depend on. And yet, most of this information is trapped in inaccessible formats, and organizations have long struggled to unlock this data, leading to information silos, inefficient decision-making, and repetitive work. With the advent of Large Language Models (LLMs) and now with Unstructured, we believe that enterprises can finally realize the untapped potential of document data. We have been inspired by the early success Unstructured is experiencing with large customers in the commercial and government sectors and the open source adoption of the product. We are thrilled to partner with Brian and his team and help them build Unstructured to be an iconic company in the modern data stack.”

— Karan Mehandru from Madrona

“Unstructured’s rapid growth from an idea to working with more than 100 companies in a year is proof that there’s a real market need to eliminate data silos. We led the seed round and doubled down on our investment because we believe in Unstructured’s unique approach to LLM data preprocessing, which will help companies leverage their proprietary data and be able to effectively ingest it. Unstructured will become a must-have product for every business.”

— Enrique Salem, partner at Bain Capital Ventures