Datavolo: AI-Based Data Pipeline Company Raises $21 Million

By Amit Chowdhry • Updated April 25, 2024

Datavolo, a leader in multimodal data pipelines for AI, announced that it has raised over $21 million in financing, led by General Catalyst, with participation from notable investors, including Citi Ventures, Human Capital, Rob Bearden, and MVP Ventures. This funding includes the company’s seed and Series A funding round.

Organizations are looking for ways to use GenAI to transform their businesses dramatically and create customer value, increasing revenues and reducing costs simultaneously. As AI models rapidly advance, their core effectiveness is constrained by the ability to access timely, secure, and complete data sets.

Most of the data generated by organizations is unstructured. However, enterprises depend on data pipeline software that cannot handle the unstructured data necessary to fully unlock GenAI potential. Datavolo was built to address this issue.

Datavolo is powered with Apache NiFi, created at the National Security Agency (NSA) specifically to handle secure pipelines of multimodal data.  Over the last decade, NiFi has also evolved to handle the structured data needs of modern enterprises and is used by thousands of the most secure corporations and agencies. However, the use case of multimodal data pipelines for GenAI is similar to a homecoming for the Datavolo team as it returns the product to its unique differentiation in the market and why it was initially created.

Datavolo’s founders have a deep history as leaders in the data and analytics space. Joe Witt, CEO, created the project that became Apache NiFi while working at the NSA in 2006. He also founded Onyara, which was acquired by Hortonworks in 2015, and most recently was Corporate Vice President of Engineering for the Data-In-Motion portfolio at Cloudera. Luke Roquet, COO, has been a senior sales and marketing executive in the data and analytics industry since 2007 across innovative companies such as Oracle, Hortonworks, Unravel Data, AWS, and Cloudera.  Joe and Luke have worked with the world’s largest trailblazing companies to solve their data and AI challenges.


“When AI systems become the backbone of daily business operations, it will be built on a data architecture which is multimodal and real time. Joe and Luke are not just building another data platform; they’re setting the stage for a future where data isn’t merely handled but intelligently harnessed to fulfill the evolving requirements driven by AI. We believe Datavolo has one of the best open-source teams out there and has the product and partners in place to make this vision a reality.”

– Quentin Clark, Managing Director of General Catalyst

“At Citi Ventures, we have been investing in artificial intelligence and machine learning companies for over a decade. When we approached Datavolo, we were particularly excited by their ability to meet the needs of large enterprises like Citi. Their scalable, flexible and secure multimodal data pipeline platform enables users to ingest, process, govern, schedule and track unstructured data from beginning to end, establishing a chain of custody for mission-critical generative AI retrieval-augmented generation (RAG) applications. These are key requirements for regulated and security-sensitive industries such as banking. Our investment in Datavolo is part of a commitment to exploring new generative AI products that may benefit the bank and its customers around the world.”

– Vibhor Rastogi, Head of AI Investments at Citi Ventures

“Luke and I feel fortunate to collaborate with exceptional investors and advisors, assembling an extraordinary team with deep enterprise expertise. Every team member is dedicated to the mission of advancing Generative AI applications tailored to the data-intensive needs of our customers.”

– Datavolo co-founder Joe Witt