DataPelago: Universal Data Processing Engine And $47 Million Funding Revealed

By Amit Chowdhry • Oct 9, 2024

DataPelago has unveiled a revolutionary Universal Data Processing Engine to accelerate any engine, including open source, on any hardware, using any data type. And the company is launching from stealth with $47 million in funding from Eclipse, Taiwania Capital, Qualcomm Ventures, Alter Venture Partners, Nautilus Venture Partners, and Silicon Valley Bank (a division of First Citizens Bank).

DataPelago’s engine enables organizations to extract value from data at unprecedented prices and performance for their GenAI and analytics workloads. The traditional processing solutions based on CPUs and software architectures cannot handle the complexity and volume of data, doubling every two years, with unstructured data now accounting for 90% of all data created. And the surge of GenAI and its dependence on huge volumes of unstructured data compound the processing challenge. DataPelago is also creating a new data processing standard for the accelerated computing era to overcome these performance, cost, and scalability limitations.

DataPelago’s Universal Data Processing Engine is also available as an end-to-end solution or in integration with Substrait-based open source frameworks to turbocharge Spark and Trino with accelerated computing. And it provides customers disruptive price/performance advantages without any change in application or workflows. DataPelago seamlessly integrates into existing data stores and lakehouse platforms, eliminating the need for data migration and avoiding vendor lock-in.

DataPelago’s engine has an innovative architecture comprised of three layers that together combine to process data one to two orders of magnitude faster than today’s query engines.

1.) DataVM – The industry’s first virtual machine with a domain-specific Instruction Set Architecture (ISA) for data operators providing a common abstraction for execution on accelerated computing elements, spanning CPU, GPU, FPGA, and custom silicon.

2.) DataOS – The operating system layer that maps data operations to heterogeneous accelerated computing elements and manages them dynamically to optimize performance at scale.

3.) DataApp – A pluggable layer that enables integration with platforms including Spark and Trino to deliver acceleration capabilities to these engines.

DataPelago’s engine is suited for use cases that are resource intensive, such as analyzing billions of transactions while ensuring data freshness, supporting AI-driven models to detect threats at wire-line speeds across millions of consumer and data center endpoints, and offering a scalable platform to facilitate the rapid deployment of training, fine-tuning and RAG inference pipelines.

Co-founder and CEO Rajan Goyal has 20+ years of experience building accelerated computing solutions across domains such as security, data movement, and data storage. And with DataPelago, Goyal has assembled a multi-disciplinary team with decades of experience across system, architecture, data analytics, cloud SaaS, open source development, and more to shatter the limits that data processing faces today in performance, cost, and scalability.

KEY QUOTES:

“Today, organizations are faced with an insurmountable barrier to unlocking breakthrough intelligence and innovation: processing an endless sea of data. We created DataPelago to address this critical need. By applying nonlinear thinking to overcome data processing’s current limits, we’ve built an engine capable of processing exponentially increasing volumes of complex data across varied formats, making it possible for organizations to truly realize the value of their data.” 

-DataPelago co-founder and CEO Rajan Goyal

“The exponential growth of semi-structured and unstructured data along with rapid Gen AI/AI adoption is driving innovation, not only in AI, but in data management and data processing. McAfee has been proud to partner with DataPelago on the design of their technology that shows promising results, including significant performance and cost improvements on certain workloads.”

-Steve Grobman, Executive VP and CTO, McAfee, a DataPelago design partner

“Partnering with DataPelago exemplifies our dedication to innovating for exceptional customer service. DataPelago’s engine allows us to unify our GenAI and data analytics pipelines by processing structured, semi-structured, and unstructured data on the same pipeline while reducing our costs by more than 50%.”

-André Fichel, CTO at Akad Seguros, an early DataPelago customer

“When data can be extracted as quickly as it’s generated, businesses can harness insights to make better decisions and operate more efficiently. DataPelago’s universal data processing engine represents a paradigm shift that will unlock new possibilities in the worlds of supply chain, sustainable energy, the medical field, and beyond.”

-Lior Susan, CEO and Founding Partner at Eclipse and a DataPelago board member

“DataPelago’s foresight to cleverly architect their engine to be processing unit agnostic, including GPUs, positions them as an undisputed leader in data acceleration. DataPelago has a visionary founder, a top-notch team, and a track record of proven results to support their claims at every stage of their journey in the new Data + AI world.”

-Cheng Wu, General Partner at Taiwania Capital Management and a DataPelago board member