The increasingly heterogeneous computer architecture landscape makes writing fast and energy efficient scientific high performance computing applications an increasingly complex yet important task. This has given rise to separate performance engineers, designated to optimizing domain scientists’ software to run optimally on different hardware architectures, but interaction between the two roles is complex and inefficient, and tool integration for performance engineers is lacking. We aim to better understand the needs of performance engineers with a qualitative interview study and provide an integrated development environment for domain scientists and performance engineers alike with the goal to enable fast and efficient optimization workflows. Our environment provides easy access to visual feedback for new static and dynamic analysis passes geared to understanding an application’s performance, while also allowing for simple exploratory optimization. We show that our environment allows domain scientists with little to no computer science background to optimize their code to a considerable extent, while cleanly separating the concerns of program definition and optimization with a visual intermediate language. (Click to read more)
General purpose GPU computing is becoming increasingly more relevant in terms of leveraging massively parallel processing power, but writing code for the very heterogeneous space of GPU architectures is often cumbersome. We propose a solution that enables the automated generation of device independent GPGPU code in the compiler infrastructure project LLVM. There exist a number of different tools for generating GPGPU code from a set of given source languages but they are typically limited to one GPU architecture or can only work with few source languages. We offer a way to utilize the modular power of LLVM and standards like OpenCL to escape such limitations by extending LLVM’s polyhedral optimizer with an OpenCL runtime and SPIR code generation, allowing execution of GPGPU code on any compatible device. By opening new doors for executing the same code on multiple different platforms, this allows us to build performance models that tell us where our code can be run most efficiently, and potentially enables the execution of GPGPU code on multiple different devices in parallel. Additionally, transporting code from one architecture to a different one does not require the code to be rewritten, thus greatly reducing the time investments in an architectural change. (Click to read more)
The administration mode panels of VirtaMed AG’s surgical simulators is meant to house all the settings for the simulator. However, the panels are disorganized and are unsatisfying to use by both experts and novices. This study uses card-sorting in order to find categories that would work for novices and for experts. Novices who have never seen the system and experts who have used this system for longer periods both categorized 30 cards from the panels in an open card-sorting exercise. The experts also weighted the importance of the cards. A structure of 5 categories is proposed for implementation in future software by VirtaMed AG based upon the results of these card-sorting tasks. (Click to read more)
In this paper we implement and compare four distributed algorithms for solving the Minimum Spanning Tree (MST) problem: edge partitioning, vertex partitioning and two versions of parallel prim. The main use scenarios for those algorithms are cases where the entirety of the graph does not fit into the main memory of a single machine. The basic assumption for all of them is, that just the graph vertices fit into the memory of each individual compute node. The idea thus is to scale linearly in problem size with the number of added distributed compute nodes running the algorithms. (Click to read more)
A simple Python (3) script, that uses the service stats.grok.se to retrieve and tabulate the traffic to Wikipedia articles in CSV format. (Click to read more)