Live Webcast 15th Annual Charm++ Workshop

-->

Release Highlights

  • Calls to entry methods taking a single fixed-size parameter can now automatically be aggregated and routed through the TRAM library by marking them with the [aggregate] attribute.
  • Calls to parameter-marshalled entry methods with large array arguments can ask for asynchronous zero-copy send behavior with a 'nocopy' tag in the parameter's declaration.
even more...

What's new in 6.8.0

Over 900 commits (bugfixes + improvements + cleanups) have been applied across the entire system. Major changes are described below:

  • Charm++ Features
    1. Calls to entry methods taking a single fixed-size parameter can now automatically be aggregated and routed through the TRAM library by marking them with the [aggregate] attribute.
    2. Calls to parameter-marshalled entry methods with large array arguments can ask for asynchronous zero-copy send behavior with a 'nocopy' tag in the parameter's declaration.
    3. The runtime system now integrates an OpenMP runtime library so that code using OpenMP parallelism will dispatch work to idle worker threads within the Charm++ process.
    4. Applications can ask the runtime system to perform automatic high-level end-of-run performance analysis by linking with the '-tracemode perfReport' option.
    5. Added a new dynamic remapping/load-balancing strategy, GreedyRefineLB, that offers high result quality and well bounded execution time.
    6. Improved and expanded topology-aware spanning tree generation strategies, including support for runs on a torus with holes, such as Blue Waters and other Cray XE/XK systems.
    7. Charm++ programs can now define their own main() function, rather than using a generated implementation from a mainmodule/mainchare combination. This extends the existing Charm++/MPI interoperation feature.
    8. Improvements to Sections:
      1. Array sections API has been simplified, with array sections being automatically delegated to CkMulticastMgr (the most efficient implementation in Charm++). Changes are reflected in Chapter 14 of the manual.
      2. Group sections can now be delegated to CkMulticastMgr (improved performance compared to default implementation). Note that they have to be manually delegated. Documentation is in Chapter 14 of Charm++ manual.
      3. Group section reductions are now supported for delegated sections via CkMulticastMgr.
      4. Improved performance of section creation in CkMulticastMgr.
      5. CkMulticastMgr uses the improved spanning tree strategies. See above.
    9. GPU manager now creates one instance per OS process and scales the pre-allocated memory pool size according to the GPU memory size and number of GPU manager instances on a physical node.
    10. Several GPU Manager API changes including:
      1. Replaced references to global variables in the GPU manager API with calls to functions.
      2. The user is no longer required to specify a bufferID in dataInfo struct.
      3. Replaced calls to kernelSelect with direct invocation of functions passed via the work request object (allows CUDA to be built with all programs).
    11. Added support for malleable jobs that can dynamically shrink and expand the set of compute nodes hosting Charm++ processes.
    12. Greatly expanded and improved reduction operations:
      1. Added built-in reductions for all logical and bitwise operations on integer and boolean input.
      2. Reductions over groups and chare arrays that apply commutative, associative operations (e.g. MIN, MAX, SUM, AND, OR, XOR) are now processed in a streaming fashion. This reduces the memory footprint of reductions. User-defined reductions can opt into this mode as well.
      3. Added a new 'Tuple' reducer that allows combining multiple reductions of different input data and operations from a common set of source objects to a single target callback.
      4. Added a new 'Summary Statistics' reducer that provides count, mean, and standard deviation using a numerically-stable streaming algorithm.
    13. Added a '++quiet' option to suppress charmrun and charm++ non-error messages at startup.
    14. Calls to chare array element entry methods with the [inline] tag now avoid copying their arguments when the called method takes its parameters by const&, offering a substantial reduction in overhead in those cases.
    15. Synchronous entry methods that block until completion (marked with the [sync] attribute) can now return any type that defines a PUP method, rather than only message types.
  • AMPI Features
    1. More efficient implementations of message matching infrastructure, multiple completion routines, and all varieties of reductions and gathers.
    2. Support for user-defined non-commutative reductions, MPI_BOTTOM, cancelling receive requests, MPI_THREAD_FUNNELED, PSCW synchronization for RMA, and more.
    3. Fixes to AMPI's extensions for load balancing and to Isomalloc on SMP builds.
    4. More robust derived datatype support, optimizations for truly contiguous types.
    5. ROMIO is now built on AMPI and linked in by ampicc by default.
    6. A version of HDF5 v1.10.1 that builds and runs on AMPI with virtualization is now available at https://charm.cs.illinois.edu/gerrit/#/admin/projects/hdf5-ampi
    7. Improved support for performance analysis and visualization with Projections.
  • Platforms and Portability
    1. The runtime system code now requires compiler support for C++11 R-value references and move constructors. This is not expected to be incompatible with any currently supported compilers.
    2. The next feature release (anticipated to be 6.9.0 or 7.0) will require full C++11 support from the compiler and standard library.
    3. Added support for IBM POWER8 systems with the PAMI communication API, such as development/test platforms for the upcoming Sierra and Summit supercomputers at LLNL and ORNL. Contributed by Sameer Kumar of IBM.
    4. Mac OS (darwin) builds now default to the modern libc++ standard library instead of the older libstdc++.
    5. Blue Gene/Q build targets have been added for the 'bgclang' compiler.
    6. Charm++ can now be built on Cray's CCE 8.5.4+.
    7. Charm++ will now build without custom configuration on Arch Linux
    8. Charmrun can automatically detect rank and node count from Slurm/srun environment variables.

The complete list of issues that have been merged/resolved in 6.8.0 can be found here. The associated git commits can be viewed here.

6.7.1

Changes in this release are primarily bug fixes for 6.7.0. The major exception is AMPI. A brief list of changes follows:

  • Charm++ Bug Fixes
    1. Startup and exit sequences are more robust
    2. Error and warning messages are generally more informative
    3. CkMulticast's set and concat reducers work correctly
  • Adaptive MPI Features
    1. AMPI's extensions have been renamed to use the prefix 'AMPI_' and to follow MPI's naming conventions
    2. AMPI_Migrate(MPI_Info) is now used for both dynamic load balancing and all fault tolerance schemes
    3. AMPI now officially supports MPI-2.2, and has support for MPI-3.1's nonblocking and neighborhood collectives
  • Platforms and Portability
    1. Cray regularpages build has been fixed
    2. Clang compiler target for BlueGene/Q systems added
    3. Communication thread tracing for SMP mode added
    4. AMPI compiler wrappers are easier to use with autoconf and cmake

The complete list of issues that have been merged/resolved in 6.7.1 can be found here. The associated git commits can be viewed here.

6.7.0

Here is a list of significant changes that this release contains over version 6.6.1

  • Features
    1. New API for efficient formula-based distributed spare array creation.
    2. Missing MPI-2.0 API additions to AMPI.
    3. Out-of-tree build is now supported.
    4. New target: multicore-linux-arm7
    5. PXSHM auto detects the node size.
    6. Added support for ++mpiexec with poe.
    7. Add new API related to migration in AMPI.
    8. CkLoop is now built by default.
    9. Scalable startup is now the default behavior when launching a job using chamrun.

    Over 120 bug fixes, spanning areas across the entire system. Here is a list of the major fixes:

  • Bug Fixes
    1. Bug fix to handle CUDA threads correctly at exit.
    2. Bug fix in the recovery code on a node failure.
    3. Bug fixes in AMPI functions - MPI_Comm_create, MPI_Testall.
    4. Disable ASLR on Darwin builds to fix multi-node executions.
    5. Add flags to enable compilation of Charm++ on newer Cray compilers with C++11 support.
  • Deprecations and Deletions
    1. CommLib has been deleted.
    2. +nodesize option for PXSHM is deprecated
    3. CmiBool has been dropped in favor of C++'s bool.
    4. CBase_Foo::pup need not be called from Foo::pup.

The complete list of issues that have been merged/resolved in 6.7.0 can be found here. The associated git commits can be viewed here.

6.6.1

Changes in this release are primarily bug fixes for 6.6.0. A concise list of affected components follows:

  1. CkIO
  2. Reductions with syncFT
  3. mpicxx based MPI builds
  4. Increased support for macros in CI file
  5. GNI + RDMA related communication
  6. MPI_STATUSES_IGNORE support for AMPIF
  7. Restart on different node count with chkpt
  8. Immediate msgs on multicore builds

A complete listing of features added and bugs fixed can be seen in our issue tracker here.

6.6.0

  • Machine target files for Cray XC systems ('gni-crayxc') have been added
  • Interoperability with MPI code using native communication interfaces on Blue Gene Q (PAMI) and Cray XE/XK/XC (uGNI) systems, in addition to the universal MPI communication interface
  • Support for partitioned jobs on all machine types, including TCP/IP and IB Verbs networks using 'netlrts' and 'verbs' machine layers
  • A substantially improved version of our asynchronous library, CkIO, for parallel output of large files
  • Narrowing the circumstances in which the runtime system will send overhead-inducing ReductionStarting messages
  • A new fully distributed load balancing strategy, DistributedLB, that produces high quality results with very low latency
  • An API for applications to feed custom per-object data to specialized load balancing strategies (e.g. physical simulation coordinates)
  • SMP builds on LRTS-based machine layers (pamilrts, gni, mpi, netlrts, verbs) support tracing messages through communication threads
  • Thread affinity mapping with +pemap now supports Intel's Hyperthreading more conveniently
  • After restarting from a checkpoint, thread affinity will use new +pemap/+commap arguments
  • Queue order randomization options were added to assist in debugging race conditions in application and runtime code
  • The full runtime code and associated libraries can now compile under the C11 and C++11/14 standards.
  • Numerous bug fixes, performance enhancements, and smaller improvements in the provided runtime facilities
  • Deprecations
    • The long-unsupported FEM library has been deprecated in favor of ParFUM
    • The CmiBool typedefs have been deleted, as C++ bool has long been universal
    • Future versions of the runtime system and libraries will require some degree of support for C++11 features from compilers
Binaries:
Filter by: (select multiple by holding down ctrl on Windows and Linux or alt on Mac OS)

Binary tarballs with 'devel' in their name include support for debugging and tracing, and are compiled without optimization. Tarballs with 'production' in their name are optimized, omit assertion checks, and avoid the overhead of debugging and tracing support.

The latest development version of Charm++ can be downloaded directly from our source archive. The Git version control system is used, which is available from here.

This development version may not be as portable or robust as the released versions. Therefore, it may be prudent to keep a backup of old copies of Charm++.

  1. Check out the latest development version of Charm++ from the repository:

    • $ git clone https://charm.cs.illinois.edu/gerrit/charm

  2. This will create a directory named charm. Move to this directory:

    $ cd charm

    To obtain the current stable release, 6.8.0, switch to branch charm-6.8:

    • $ git checkout charm-6.8
  3. And now build Charm (netlrts-linux example):

    $ ./build charm++ netlrts-linux-x86_64 [ --with-production | -g ]

This will make a netlrts-linux-x86_64 directory, with bin, include, lib etc subdirectories.

Nightly Charm Binaries:
Filter by: (select multiple by holding down ctrl on Windows and Linux or alt on Mac OS)

These binaries are compiled every night from the version control system, and tested for every platform, so you will always find here a working version. Every precompiled binary contains also the entire source tree, and it will be guaranteed to compile on the desired architecture. Previous nightly build versions of Charm++ are also available.

Binary tarballs with 'devel' in their name include support for debugging and tracing, and are compiled without optimization. Tarballs with 'production' in their name are optimized, omit assertion checks, and avoid the overhead of debugging and tracing support.

The latest development version of Projections can be downloaded directly from our source archive. The Git version control system is used, which is available from here. To build Projections, you will also need gradle.

  1. Check out Projections from the repository:

    • $ git clone http://charm.cs.uiuc.edu/gerrit/projections

  2. This will create a directory named projections. Move to this directory:

    $ cd projections

  3. And now build Projections:

    $ make

The latest development version of Charm Debug can be downloaded directly from our source archive. The Git version control system is used, which is available from here.

  1. Check out Charm Debug from the repository:

    • $ git clone http://charm.cs.uiuc.edu/gerrit/ccs_tools

  2. This will create a directory named ccs_tools. Move to this directory:

    $ cd ccs_tools

  3. And now build Charm Debug:

    $ ant