In today's computing, where multi-core is the new modality and parallelism is clearly here to stay, we have to look into finding higher level abstractions in C++ allowing to easily write applications which utilize all of the available parallelism. Ideally, those abstractions provide uniform access to various types of parallelism: be it local (shared memory), remote (distributed memory), or heterogeneous (using accelerators and coprocessors). The current work being done in the context of the C++ standardization efforts provides a good starting point, but is not sufficient to uniformly cover the bulk of those use cases. This talk will give an introduction to a reference implementation of a heterogeneous, and extensible future. We will focus on giving a walkthrough through the various components and showcase the general structure of our library by outlining specific implementation techniques ranging from general API design, serialization and high speed network support. This is in contrast to exisiting parallel programming models such as OpenMP or MPI which tend to be unable to fully exploit all available hardware resources. Especially for modern hardware architectures, and even more for the architectures to come, this poses a big challenge for application developers who want to fully employ the existing hardware resources. We will present results from our work on a new programming model targeting best possible application scalability. This programming model is maximally aligned with the current C++ standard and the related standards proposals. This ensures an easy learning curve for new programmers. We will demonstrate that this programming model enables to write application which consistently out-scale and out-perform existing ones. The resulting codes enable performance portability towards future architectures.
Slides