NVIDIA Unveils RAPIDS, A GPU Acceleration Platform For Data Science And Machine Learning

By Amit Chowdhry • Oct 11, 2018

NVIDIA has announced a new open-source GPU acceleration platform for large-scale data analytics and machine learning called RAPIDS. This platform will enable the largest companies to analyze massive amounts of data and make accurate business predictions at impressive speeds.

With the performance boost that the RAPIDS open source software provides, data scientists can build tools for predicting credit card fraud, forecasting retail inventory, and understanding purchasing behaviors. Some of NVIDIA’s partners that are supporting RAPIDS include Hewlett Packard Enterprise, IBM, Oracle, Databricks, and Anaconda.

“Data analytics and machine learning are the largest segments of the high-performance computing market that have not been accelerated — until now,” said NVIDIA founder and CEO Jensen Huang during a keynote address at the GPU Technology Conference. “The world’s largest industries run algorithms written by machine learning on a sea of servers to sense complex patterns in their market and environment, and make fast, accurate predictions that directly impact their bottom line.”

Huang also pointed out that the RAPIDS GPU acceleration platform seamlessly integrates with the most popular data science libraries and workflows for speeding up machine learning. “We are turbocharging machine learning like we have done with deep learning,” added Huang.

RAPIDS has been developed over the past two years by NVIDIA’s engineers in collaboration with key open-source contributors. Using the XGBoost machine learning algorithm, benchmarking for RAPIDS for training on an NVIDIA DGX-2 system showed up to 50 times speedups compared with CPU-only systems. What this means is that data scientists can reduce typical training times from days to hours or hours to minutes depending on the dataset size.

According to TechCrunch, RAPIDS is based on Python. And it has interfaces similar to the Pandas and Scikit machine learning and data analysis libraries. And it is designed for accelerating data science end-to-end from the data prep, machine learning, and deep learning.

RAPIDS is being integrated into Apache Spark, which is the leading open-source framework for analytics and data science. “At Databricks, we are excited about RAPIDS’ potential to accelerate Apache Spark workloads,” said Matei Zaharia, who is the co-founder and chief technologist of Databricks and is the original creator of Apache Spark. “We have multiple ongoing projects to integrate Spark better with native accelerators, including Apache Arrow support and GPU scheduling with Project Hydrogen. We believe that RAPIDS is an exciting new opportunity to scale our customers’ data science and AI Workloads.”

Kaustubh Das, the VP of Product Management at Cisco’s Data Center Group, said that his team is collaborating with NVIDIA on AI/ML software stacks on NVIDIA GPU-optimized Cisco UCS platforms for simplifying and accelerating AI/ML workload deployment. Das said that his team is excited to learn that NVIDIA is expanding its GPU applicability with accelerated software stacks for addressing traditional
machine learning and big data analytics through RAPIDS.

And the Georgia Institute of Technology (Georgia Tech) is also going contribute to RAPIDS graph libraries. Professor David Bader said that the school’s contributions will help data scientists gain “meaningful knowledge from ever-growing datasets.”