Multi Device GPU Code Generation with LLVM
General purpose GPU computing is becoming increasingly more relevant in terms of leveraging massively parallel processing power, but writing code for the very heterogeneous space of GPU architectures is often cumbersome. We propose a solution that enables the automated generation of device independent GPGPU code in the compiler infrastructure project LLVM. There exist a number of different tools for generating GPGPU code from a set of given source languages but they are typically limited to one GPU architecture or can only work with few source languages. We offer a way to utilize the modular power of LLVM and standards like OpenCL to escape such limitations by extending LLVM’s polyhedral optimizer with an OpenCL runtime and SPIR code generation, allowing execution of GPGPU code on any compatible device. By opening new doors for executing the same code on multiple different platforms, this allows us to build performance models that tell us where our code can be run most efficiently, and potentially enables the execution of GPGPU code on multiple different devices in parallel. Additionally, transporting code from one architecture to a different one does not require the code to be rewritten, thus greatly reducing the time investments in an architectural change.
The complete work can be found here, with slides to a defense talk in the corresponding research group here.
Contributions made to LLVM and LLVM/Polly with respect to this work:
- Provide Polly with an OpenCL Runtime, building a future proof base for
different GPGPU architectures. (Related commits in chronological order)
- Updated PPCG Code Generation for OpenCL compatibility
- [Polly] [GPUJIT] Moved error prints to stderr
- [Polly] [GPUJIT] Adapted argument capitalization to fit standard
- [Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen
- [Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen
- [Polly][GPUJIT] Fixed OpenCL 2.0 min requirement for Error codes
- [Polly][GPUJIT] Disabled gcc's -Wpedantic for use of dlsym
- [PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtime
- Provide Polly with SPIR Code Generation, specifically for Intel’s Beignet driver. Allows execution on Intel GPUs.
- Provide Polly with AMD GPU Code Generation.
- Bugfixes