NVIDIA CUDA Dive Using Python

Written by Nikos Vaggalis

Thursday, 15 May 2025

NVIDIA adds native support to CUDA for Python, making it more accessible to developers at large.

CUDA is, of course, NVIDIA's toolkit and programming model which provides a development environment for speeding up computing applications by harnessing the power of GPUs. It's not easy to conquer since it requires the code to be written code in C++ and as C++ by default is not user-friendly and difficult to master, these properties subsequently rub off on the toolkit itself.

Back in 2021, we looked at an alternative to accessing CUDA using the most user friendly language there is - Python.
This was Triton, an open-source, Python-like programming language which enables researchers with no CUDA experience to write highly efficient GPU code, most of the time on par with what an expert would be able to produce. And surprise, surprise Triton is developed by OpenAI.

CuPy, the open-source array library for GPU-accelerated computing with Python, was another option too. But the time has come for NVIDIA to realize that having Python as a first class citizen is very beneficial for the toolkit's adoption by both developers and other communities such as scientists. Adoption is one thing; the other is that with the rise of AI, GPU accessible programming is on demand and NVIDIA wants everybody working on its chips.

As such, the emergence of Pythonic CUDA-accessing NVIDIA’s CUDA platform from Python. While the project existed for a couple of years, it is now in version 12.9 that gets really usable. Native support means brand new APIs and components:

cuda.core: Pythonic access to CUDA runtime and other core functionalities
cuda.bindings: Low-level Python bindings to CUDA C APIs
cuda.cooperative: A Python package providing CCCL’s reusable block-wide and warp-wide device primitives for use within Numba CUDA kernels
cuda.parallel: A Python package for easy access to CCCL’s highly efficient and customizable parallel algorithms, like sort, scan, reduce, transform, etc, that are callable on the host
numba.cuda: Numba’s target for CUDA GPU programming by directly compiling a restricted subset of Python code into CUDA kernels and device functions following the CUDA execution model.
nvmath-python for access to NVIDIA CPU & GPU Math Libraries.

A simple example of the new core API in action that enumerates the device's properties in Python code, follows:

As you see the API is experimental, but upon stabilization it will be moved out of the experimental namespace.

cuda.core supports Python 3.9 - 3.13, on Linux (x86-64, arm64) and Windows (x86-64) and of course to run CUDA Python, you’ll need the CUDA Toolkit installed on a system with CUDA-capable GPUs. If you don’t have a CUDA-capable GPU, you can access one from cloud service providers, like Amazon AWS and Microsoft Azure.

With that said, just use Python for everything, even on the GPU.

More Information

CUDA Python

Understanding GPU Architecture With Cornell

Program Deep Learning on the GPU with Triton

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Angular 20 Improves Reactivity
10/06/2025

Angular 20 has been released, and the new version has moved a number of APIs to stable, along with new features in the template compiler to align it with TypeScript expressions and to improve the deve [ ... ]

+ Full Story

GitLab 18 Extends Duo AI Feature
29/05/2025

GitLab 18 has been released with extensions to the Duo AI-based assistant. The news was followed by reports that Duo had a security vulnerability that provided a route for attackers. The problem has n [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

More Information

Related Articles

Comments