Multi Device GPU Code Generation with LLVM

General purpose GPU computing is becoming increasingly more relevant in terms of leveraging massively parallel processing power, but writing code for the very heterogeneous space of GPU architectures is often cumbersome. We propose a solution that enables the automated generation of device independent GPGPU code in the compiler infrastructure project LLVM. There exist a number of different tools for generating GPGPU code from a set of given source languages but they are typically limited to one GPU architecture or can only work with few source languages. We offer a way to utilize the modular power of LLVM and standards like OpenCL to escape such limitations by extending LLVM’s polyhedral optimizer with an OpenCL runtime and SPIR code generation, allowing execution of GPGPU code on any compatible device. By opening new doors for executing the same code on multiple different platforms, this allows us to build performance models that tell us where our code can be run most efficiently, and potentially enables the execution of GPGPU code on multiple different devices in parallel. Additionally, transporting code from one architecture to a different one does not require the code to be rewritten, thus greatly reducing the time investments in an architectural change.

The complete work can be found here, with slides to a defense talk in the corresponding research group here.

Contributions made to LLVM and LLVM/Polly with respect to this work:

Provide Polly with an OpenCL Runtime, building a future proof base for different GPGPU architectures. (Related commits in chronological order)
Provide Polly with SPIR Code Generation, specifically for Intel’s Beignet driver. Allows execution on Intel GPUs.
- [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for Intel
Provide Polly with AMD GPU Code Generation.
- [GPGPU][PPCGCodeGen][GPUJIT] Added AMD support to GPGPU code generation
Bugfixes
- [Polly][GPGPU] Fixed undefined reference for CUDA's managed memory in Runtime library