The Message Passing Interface (MPI) Standard defines a message passing library, which serves as the basis for many high-performance computing applications today. It provides portable scalable functions for data exchange in parallel computations for various parallel computing architectures. Originally application programing interfaces had been defined for C and Fortran as well as for C++. In the 2008 update, known as MPI-3, however, the C++ bindings have been removed from the MPI standard. During the various revisions the MPI standard became quite complex and dropping one of the three language bindings may have helped to keep the standard maintainable as a whole. Furthermore, the C++ bindings were not very well designed. Although object orientated techniques had been applied, the MPI C++ bindings did not come close to a well designed C++ library by today’s standards. explains in more detail, what happened to the C++ bindings in his blog.
Alternative C++ bindings to the MPI Standard are provided by Boost MPI and OOMPI, which was an early attempt to bring MPI 1 functionality to C++ in an object orientated way. Boost MPI uses rather modern C++ programing techniques to provide a very nice interface to the MPI standard’s core functionality. With Boost MPI programs become more type save (When sending data of a particular C++ type, the corresponding MPI data type is deduced by the compiler.) and sending data given by user defined structures or classes becomes much more easy than with the MPI standard’s C or C++ bindings. Although Boost MPI is a huge improvement over the deprecated C++ bindings of the MPI standard it has also its limitations.
- It is no longer actively maintained.
- Sending data of complex classes and structures is based on Boost serialization, which may cause performance reductions and does not work in heterogeneous environments (different endians etc.).
- Boost MPI provides no equivalent to derived MPI data types (strided vectors, sub matrices, etc.).
- Although Boost MPI supports the more general graph communicators, there are no functions for Cartesian communicators.
- Boost MPI is based on C++03, it does not benefit from new C++11 features.
Because C++ was dropped from the MPI standard and because Boost MPI does not fulfill all my needs for a flexible easy-to-use C++ message passing library I started to write my own massage passing library on top of MPI, just called MPL (Message Passing Library), see my GitHub account. Note that MPL will neither bring all functions of the C language API to C++ nor provide a direct mapping of the C API to some C++ functions and classes. Its focus is on the MPI core functions, ease of use, type safety, and elegance. It uses C++11 features, wherever reasonable, e.g., lambda functions as custom reduction operations.
MPL relies heavily on templates and template meta programming and it comes just as a bunch of header files. Documentation is still missing and only available in form as the source code and a few sample programs. If you are familiar with MPI, however, the transition to MPL will not be difficult. Let us start with a hello-world type program:
#include <cstdlib> #include <iostream> #include <mpl/mpl.hpp> int main() { const mpl::communicator & comm_world(mpl::environment::comm_world()); std::cout << "Hello world! I am running on \"" << mpl::environment::processor_name() << "\". My rank is " << comm_world.rank() << " out of " << comm_world.size() << " processes.\n"; return EXIT_SUCCESS; }
Similar to MPI_COMM_WORLD
in MPI, MPL has a global communicator that contains all processes, which belong to a parallel computation. Each communicator has a rank (the number of the process within a communicator) and a size (the total number of processes). The program shown above just prints for each process its rank, the size of the world communicator and the computer’s name, where the process runs. Note that with MPL it is not required to initialize or to finalize the massage passing library (MPI_Init
and MPI_Finalize
are called implicitly by some compiler magic.).
Let us look at a less trivial example and see how messages are send and received. A very elementary example using the C language bindings of MPI and C++11 may look like this:
#include <cstdlib> #include <complex> #include <iostream> #include <mpi.h> int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); int c_size, c_rank; MPI_Comm_rank(MPI_COMM_WORLD, &c_rank); MPI_Comm_size(MPI_COMM_WORLD, &c_size); if (c_size<2) { MPI_Finalize(); return EXIT_FAILURE; } // send and receive a single floating point number if (c_rank==0) { double pi=3.14; MPI_Send(&pi, // pointer to memory 1, // number of data items MPI_DOUBLE, // data type 1, // destination 0, // tag MPI_COMM_WORLD // communicator ); std::cout << "sent: " << pi << '\n'; } else if (c_rank==1) { double pi=0; MPI_Recv(&pi, // pointer to memory 1, // number of data items MPI_DOUBLE, // data type 0, // source 0, // tag MPI_COMM_WORLD, // communicator MPI_STATUS_IGNORE // ignore the receive status ); std::cout << "got : " << pi << '\n'; } MPI_Finalize(); return EXIT_SUCCESS; }
Here the standard MPI functions MPI_Send
and MPI_Recv
are employed. The function signature requires a lot of parameters: a pointer to a buffer, the number of items to be send or received, the data type, a source or destination, a tag, and finally the communicator. With MPL this simplifies a lot. MPL assumes that only one data item is send or received at a time, thus the number of data items is not needed to be specified. Furthermore, the underling MPI datatype will be deduced automatically at compile time by the compiler. This eliminates a typical error of MPI programs, e.g., passing a pointer to a do an integer by specifying MPI_DOUBLE
as the data type. The tag, which may be used to distinguish between different kind of messages, becomes in MPL an argument with a default value, so it is optional. Thus, in MPL only the communicator, a reference to the data and a source or destination has to be given to the send and receive functions. The MPL equivalent to the MPI program shown above may look as:
#include <cstdlib> #include <complex> #include <iostream> #include <mpl/mpl.hpp> int main() { const mpl::communicator &comm_world=mpl::environment::comm_world(); if (comm_world.size()<2) return EXIT_FAILURE; // send and recieve a single floating point number if (comm_world.rank()==0) { double pi=3.14; comm_world.send(pi, 1); // send to rank 1 std::cout << "sent: " << pi << '\n'; } else if (comm_world.rank()==1) { double pi=0; comm_world.recv(pi, 0); // receive from rank 0 std::cout << "got : " << pi << '\n'; } return EXIT_SUCCESS; }
Of course sending and receiving single data items will not be sufficient for a message passing library. This is why MPL introduces the concept of data layouts. Data layouts specify the memory layout of a set of data to be sent or received (similar to derived data types in MPI). The layout may be continuous, a strided vector etc. The data layout is provided as an additional parameter to sending or receiving functions and, in contrast to the case of single data items, data is passed via a pointer. The following example may give an idea how data layouts are used with MPL:
#include <cstdlib> #include <complex> #include <iostream> #include <vector> #include <mpl/mpl.hpp> int main() { const mpl::communicator &comm_world=mpl::environment::comm_world(); if (comm_world.size()<2) return EXIT_FAILURE; std::vector<double> v(8); mpl::contiguous_layout<double> v_layout(v.size()); // send and recieve a vector of floating point numbers if (comm_world.rank()==0) { double init=0; for (double &x : v) { x=init; ++init; } comm_world.send(v.data(), v_layout, 1); // send to rank 1 std::cout << "sent: "; for (double &x : v) std::cout << x << ' '; std::cout << '\n'; } else if (comm_world.rank()==1) { comm_world.recv(v.data(), v_layout, 0); // receive from rank 0 std::cout << "got : "; for (double &x : v) std::cout << x << ' '; std::cout << '\n'; } return EXIT_SUCCESS; }
Addendum: Besides MPL, Boost MPI and OOMPI there is MPP, which is a further library that attempts to bring MPI to modern C++.
Hi, I find your library on github. I really like the idea of distributed_grid. I wonder if this library supports CUDA. Addtionally, it seems the data is owned by the distributed_grid. is it possible to make it “mdspan” like? This way would be better to mannual manage memory, especially between device and host memories.