Live Webcast 15th Annual Charm++ Workshop

-->
End-to-end Performance Modeling of Distributed GPU Applications
| Jaemin Choi | David Richards | Laxmikant Kale | Abhinav Bhatele
International Conference on Supercomputing (ICS) 2020
Publication Type: Paper
Repository URL:
Abstract
With the growing number of GPU-based supercomputing platforms and GPU-enabled applications, the ability to accurately model the performance of such applications is becoming increasingly important. Most current performance models for GPU-enabled applications are limited to single node performance. In this work, we propose a methodology for end-to-end performance modeling of distributed GPU applications. Our work strives to create performance models that are both accurate and easily applicable to any distributed GPU application. We combine trace-driven simulation of MPI communication based on the TraceR-CODES framework with a profiling-based roofline model for GPU kernels. We make substantial modifications to these models to capture the complex effects of both on-node and off-node networks in today's multi-GPU supercomputers. We validate our model against empirical data from GPU platforms and also vary tunable parameters of our model to observe how they affect application performance.
TextRef
People
Research Areas