NVIDIA's RAPIDS cuDF Enhances pandas Performance by 30x on Large Datasets

Felix Pinkston  Aug 10, 2024 10:42  UTC 02:42

0 Min Read

NVIDIA has unveiled new features in RAPIDS cuDF, significantly improving the performance of the pandas library when handling large and text-heavy datasets. According to NVIDIA Technical Blog, the enhancements enable data scientists to accelerate their workloads by up to 30x.

RAPIDS cuDF and pandas

RAPIDS is a suite of open-source GPU-accelerated data science and AI libraries, and cuDF is its Python GPU DataFrame library designed for data loading, joining, aggregating, and filtering. pandas, a widely-used data analysis and manipulation library for Python, has struggled with processing speed and efficiency as dataset sizes grow, particularly on CPU-only systems.

At GTC 2024, NVIDIA announced that RAPIDS cuDF could accelerate pandas nearly 150x without requiring code changes. Google later revealed that RAPIDS cuDF is available by default on Google Colab, making it more accessible to data scientists.

Tackling Limitations

User feedback on the initial release of cuDF highlighted several limitations, particularly with the size and type of datasets that could benefit from acceleration:

  • To maximize acceleration, datasets needed to fit within GPU memory, limiting the data size and complexity of operations that could be performed.
  • Text-heavy datasets faced constraints, with the original cuDF release supporting only up to 2.1 billion characters in a column.

To address these issues, the latest release of RAPIDS cuDF includes:

  • Optimized CUDA unified memory, allowing for up to 30x speedups of larger datasets and more complex workloads.
  • Expanded string support from 2.1 billion characters in a column to 2.1 billion rows of tabular text data.

Accelerated Data Processing with Unified Memory

cuDF relies on CPU fallback to ensure a seamless experience. When memory requirements exceed GPU capacity, cuDF transfers data into CPU memory and uses pandas for processing. However, to avoid frequent CPU fallback, datasets should ideally fit within GPU memory.

With CUDA unified memory, cuDF can now scale pandas workloads beyond GPU memory. Unified memory provides a single address space spanning CPUs and GPUs, enabling virtual memory allocations larger than available GPU memory and migrating data as needed. This helps maximize performance, although datasets should still be sized to fit in GPU memory for peak acceleration.

Benchmarks show that using cuDF for data joins on a 10 GB dataset with a 16 GB memory GPU can achieve up to 30x speedups compared to CPU-only pandas. This is a significant improvement, especially for processing datasets larger than 4 GB, which previously faced performance issues due to GPU memory constraints.

Processing Tabular Text Data at Scale

The original cuDF release's 2.1 billion character limit in a column posed challenges for large datasets. With the new release, cuDF can now handle up to 2.1 billion rows of tabular text data, making pandas a viable tool for data preparation in generative AI pipelines.

These improvements make pandas code execution much faster, especially for text-heavy datasets like product reviews, customer service logs, and datasets with substantial location or user ID data.

Get Started

All these features are available with RAPIDS 24.08, which can be downloaded from the RAPIDS Installation Guide. Note that the unified memory feature is only supported on Linux-based systems.



Read More