Graphics processing units today are up to a hundred times faster at executing floating point operations than current-generation multicore processors. While systems for general purpose programming on the GPU are becoming available, many programming tasks on the GPU remain difficult as a result of the limitations in programmability of GPUs and the relative immaturity of the field. In particular, tools for using a large number of GPUs as co-processors in clusters and supercomputers are lacking. In this work we present the Hybrid Application Programming Interface, an extension to NVIDIA's Compute Unified Device Architecture which enables writing parallel Charm++ applications for execution on a large number of hybrid CPU/GPU nodes. Hybrid API features a clean model for organizing and scheduling work on the GPU when the device is used for execution by multiple parallel objects. We also propose an alernative methodology, Asynchronous API, to allow for fine-grained management of work on the GPU.










