# Revisiting the named parameter idiom in C++14

Some programming languages have functions with named parameters. Named parameters let the programmer pass the parameters to a function in any order and they are distinguished by a name. So the programmer can explicitly pass all the needed parameters and default values without worrying about the order used in the function declaration. In C++, however, functions have positional arguments.  This means, for functions with many parameters, the programmer has to remember the types and the order in which to pass them. Furthermore, default values can only be given to the last parameters, so it is not possible to specify one of the later parameters and use the default value for former ones.

Named parameters might become a feature of future C++ versions but can not jet be used directly in C++. Thus, C++ programmers have devised various techniques to emulate named parameters in C++, e.g., by using method chaining. C++14 offers some new features which allow implementations of named parameters with a very neat syntax, as I will demonstrate in the following. The presented implementation ensures that some possible errors, which might occur in the context of named parameters are detected at compile time.

• Each named parameter can be given only once per function call.
• Named parameters without a default value must be given.
• Only allowed named parameters can be passed to a function.

Let us have a look at a specific example how this implementation can be used. In this implementation, parameter names and values are represented by structures with a single data member value. In the following example these are name, buffersize and writemode. These structures are passed as anonymous objects, which are constructed via aggregate initialization, to a variadic template function, which forwards these function arguments (or the default values) in the correct order to the actual implementation by a standard function with a fixed number of positional arguments.

 Source code
#include <iostream>
#include <string>
#include "named_parameter.hpp"

// some structures representing parameter names and values
struct name {
const std::string &value;
};

struct buffersize {
const int &value;
};

struct writemode {
const bool value;
};

constexpr writemode writable{true};

struct invalid {
const int &value;
};

// a function with many parameters
void open_file_impl(const std::string &name, int buffersize, bool writable) {
std::cout << "name       : " << name << std::endl;
std::cout << "buffersize : " << buffersize << std::endl;
std::cout << "writable   : " << writable << std::endl;
std::cout << std::endl;
}

// template function taking named parameters which
// are passed to actual function implementation
template<typename... Ts>
void open_file(const Ts &...args) {
static_assert(named_para::is_valid<named_para::parameter_set<name, buffersize, writemode>, Ts...>(),
"invalid named parameter argument");
auto t=std::make_tuple(args...);
open_file_impl(named_para::get<name>(t),
named_para::get<buffersize>(t, 1024),
named_para::get<writemode>(t, false));
}

int main() {
open_file(name{"file.dat"});
open_file(buffersize{8192}, writable, name{"file_name.dat"});
// next line will not compile because of missing mandatory parameter
// open_file(buffersize{1234});
// next line will not compile because of invalid parameter
// open_file(buffersize{1234}, name{"another_file_name.dat"}, invalid{1});
// next line will not compile because of repeated parameter
// open_file(buffersize{1234}, name{"another_file_name.dat"}, name{"foo"});
}

The implementation of the named parameters is given in the following header file. The whole magic of this implementation relies on the template class std::tuple, which allows in C++14 to access a specific element via its type by the function std::get, and variadic templates. In the application code as shown above, function arguments to the variadic template function are put into a std::tuple and the function named_para::get returns a desired function argument from the tuple or a possible default value if not present. In this way, named function arguments can be put into the right order to pass them to a function with positional arguments. In order to decide if the default argument has to be employed, it has to be checked if a certain type, which represents a parameter name, is present in the set of given arguments. This is accomplished by the template class has_type, which has either std::true or std::false as base class, depending on if the parameter name is present in the function call or not. Finally, the helper class is_valid can be used to check if a set of given parameter names contains only valid types (parameter names).

 Source code
#ifndef NAMED_PARAMETER_HPP
#define NAMED_PARAMETER_HPP

#include <tuple>
#include <type_traits>

namespace named_para {

// compile time check if type T is in tuple of type Tuple
template<typename T, typename Tuple>
struct has_type;

// empty tuple cannot contain T
template<typename T>
struct has_type<T, std::tuple<>> : std::false_type {
};

// 1st type in tuple is not T, T may be in other elements
template<typename T, typename U, typename... Ts>
struct has_type<T, std::tuple<U, Ts...>> : has_type<T, std::tuple<Ts...>> {
};

// 1st type in tuple is T, thus tuple contains an element of type T
template<typename T, typename... Ts>
struct has_type<T, std::tuple<T, Ts...>> : std::true_type {
};

// compile time check if type pack Ts contains only valid types given in tuple
template<typename Tuple, typename... Ts>
class is_valid;

// check if T is in Tuple if type pack has only a single type T
template<typename Tuple, typename T>
struct is_valid<Tuple, T> : has_type<T, Tuple> {
};

// recursive definition of is_valid
// type pack is valid if 1st element is valid and remaining types are valid
template<typename Tuple, typename T, typename... Ts>
struct is_valid<Tuple, T, Ts...> : std::integral_constant<bool, is_valid<Tuple, T>() and is_valid<Tuple, Ts...>()> {
};

template<typename T, typename Tup, typename Enable=void>
class get_impl {
public:
// type T not present in tupe, return default value
static auto get(const Tup &tup, const decltype(T::value) &def) -> decltype(T::value) {
return def;
}
};

template<typename T, typename Tup>
class get_impl<T, Tup, typename std::enable_if<has_type<T, Tup>::value>::type> {
public:
// type T present in tuple, return T::value from tuple, ignore 2nd parameter
static auto get(const Tup &tup, const decltype(T::value) &) -> decltype(T::value) {
return std::get<T>(tup).value;
}
// type T present in tuple, return T::value from tuple
static auto get(const Tup &tup) -> decltype(T::value) {
return std::get<T>(tup).value;
}
};

// get T::value if element of type T present in tuple or default parameter otherwise
template<typename T, typename Tup>
auto get(const Tup &tup, const decltype(T::value) &def) -> decltype(get_impl<T, Tup>::get(tup, def)) {
return get_impl<T, Tup>::get(tup, def);
};

// get T::value from tuple
template<typename T, typename Tup>
auto get(const Tup &tup) -> decltype(get_impl<T, Tup>::get(tup)) {
return get_impl<T, Tup>::get(tup);
};

template<typename ...Ts> using parameter_set = std::tuple<Ts...>;
}

#endif

# New TRNG release

A new version of TRNG (Tina’s Random Number Generator Library) has been released. TRNG may be utilized in sequential as well as in parallel Monte Carlo simulations. It does not depend on a specific parallelization technique, e.g., POSIX threads, MPI and others. The new version 4.18 is a bug-fix and maintenance release and can be downloaded from the TRNG webpage. The new release also introduces support for a new discrete probability distribution, the zero-truncated Poisson distribution.

Poisson distribution vs. zero-truncated Poisson distribution with parameter $\lambda=3$ in both cases.

# MPL – Data types

In earlier posts I gave an introduction to the Message Passing Library (MPL). With MPL processes can send and receive messages with data of different data types. What kind of data types are supported by MPL? MPL supports the same kinds of elementary data types as MPI does. These are:

• the integer types char, short int, int, long int, long long int as well as the unsigned variants thereof.
• the boolean type bool.
• the floating point types float, double and long double.
• the complex types std::complex<float>, std::complex<double>, std::complex<long double>.

The following program sends and receives data of each of these data types.

 Sending and receiving of data of elementary types.
#include <cstdlib>
#include <complex>
#include <iostream>
#include <mpl/mpl.hpp>

int main() {
const mpl::communicator &comm_world=mpl::environment::comm_world();
if (comm_world.size()<2)
comm_world.abort(EXIT_FAILURE);
if (comm_world.rank()==0) {
// send a data item of each standard type
// note "char" might equal "signed char" or "unsigned char" depending on the compiler
char x1='A';                               comm_world.send(x1, 1);
signed char x2='B';                        comm_world.send(x2, 1);
unsigned char x3='C';                      comm_world.send(x3, 1);
signed short int x4=-1;                    comm_world.send(x4, 1);
unsigned short int x5=1;                   comm_world.send(x5, 1);
signed int x6=-10;                         comm_world.send(x6, 1);
unsigned int x7=10;                        comm_world.send(x7, 1);
signed long int x8=-100;                   comm_world.send(x8, 1);
unsigned long int x9=100;                  comm_world.send(x9, 1);
signed long long int x10=-1000;            comm_world.send(x10, 1);
unsigned long long int x11=1000;           comm_world.send(x11, 1);
bool x12=true;                             comm_world.send(x12, 1);
float x13=1.2345;                          comm_world.send(x13, 1);
double x14=2.3456;                         comm_world.send(x14, 1);
long double x15=3.4567;                    comm_world.send(x15, 1);
std::complex<float> x16(1.2, -1.2);        comm_world.send(x16, 1);
std::complex<double> x17(2.3, -2.3);       comm_world.send(x17, 1);
std::complex<long double> x18(3.4, -3.4);  comm_world.send(x18, 1);
}
if (comm_world.rank()==1) {
// receive a data item of each standard type
char x1;                        comm_world.recv(x1, 0);   std::cout << "x1 = " << x1 << '\n';
signed char x2;                 comm_world.recv(x2, 0);   std::cout << "x2 = " << x2 << '\n';
unsigned char x3;               comm_world.recv(x3, 0);   std::cout << "x3 = " << x3 << '\n';
signed short int x4;            comm_world.recv(x4, 0);   std::cout << "x4 = " << x4 << '\n';
unsigned short int x5;          comm_world.recv(x5, 0);   std::cout << "x5 = " << x5 << '\n';
signed int x6;                  comm_world.recv(x6, 0);   std::cout << "x6 = " << x6 << '\n';
unsigned int x7;                comm_world.recv(x7, 0);   std::cout << "x7 = " << x7 << '\n';
signed long int x8;             comm_world.recv(x8, 0);   std::cout << "x8 = " << x8 << '\n';
unsigned long int x9;           comm_world.recv(x9, 0);   std::cout << "x9 = " << x9 << '\n';
signed long long int x10;       comm_world.recv(x10, 0);  std::cout << "x10 = " << x10 << '\n';
unsigned long long int x11;     comm_world.recv(x11, 0);  std::cout << "x11 = " << x11 << '\n';
bool x12;                       comm_world.recv(x12, 0);  std::cout << "x12 = " << x12 << '\n';
float x13;                      comm_world.recv(x13, 0);  std::cout << "x13 = " << x13 << '\n';
double x14;                     comm_world.recv(x14, 0);  std::cout << "x14 = " << x14 << '\n';
long double x15;                comm_world.recv(x15, 0);  std::cout << "x15 = " << x15 << '\n';
std::complex<float> x16;        comm_world.recv(x16, 0);  std::cout << "x16 = " << x16 << '\n';
std::complex<double> x17;       comm_world.recv(x17, 0);  std::cout << "x17 = " << x17 << '\n';
std::complex<long double> x18;  comm_world.recv(x18, 0);  std::cout << "x18 = " << x18 << '\n';
}
return EXIT_SUCCESS;
}

MPL would not give a great advantage over MPI if it would only support these elementary data types as MPI does. However, MPL comes also with some support for user defined data types. To be able to exchange data of custom types via a message passing library. The message passing library must have some knowledge about the internal representation of user defined data types. Because C++ has very limited type introspection capabilities this knowledge cannot be obtained automatically by the message passing library. Usually information about the internal structure of user defined types (structures and classes) has to be exposed explicitly to the message passing library. Therefore, MPL supports message exchange of data where information about the internal representation can be obtained automatically and introduces a mechanism to expose the internal representation of custom types to MPL if this is not possible.

The data types, where MPL can infer their internal representation, are enumeration types, C arrays of constant size and the template classes std::array, std::pair and std::tuple of the C++ Standard Template Library. The only limitation is, that the C array and the mentioned template classes hold data elements of types that can be sent or received by MPL. This rule can be applied recursively, which allows to build quite complex data structures. For example, this means one can send and receive data of type std::pair<int, double>, because int and double can be sent or received. But also std::array<std::pair<int, double>, 8>, which represents 8 pairs of int and double, can be used. Enumeration types are internally dealt with like integers. Note that MPL determines automatically, which integer type is chosen by the compiler (or programmer) to represent an enumeration type. Examples are given below.

 Sending and receiving arrays, pairs and tuples.
#include <cstdlib>
#include <complex>
#include <iostream>
#include <array>
#include <utility>
#include <tuple>
#include <mpl/mpl.hpp>

int main() {
const mpl::communicator &comm_world=mpl::environment::comm_world();
if (comm_world.size()<2)
comm_world.abort(EXIT_FAILURE);
const int N=8;
int x1[N];  // C array of constant size (known at compile time)
std::array<double, N> x2;  // STL array
std::pair<double, int> x3;  // STL pair
std::tuple<double, int, char> x4;  // STL tuple
if (comm_world.rank()==0) {
// send a C array of ints
for (int i=0; i<N; ++i)
x1[i]=i;
comm_world.send(x1, 1);
// send an STL array of doubles
for (int i=0; i<N; ++i)
x2[i]=i+0.01*i;
comm_world.send(x2, 1);
// send an STL pair of double and int
x3=std::make_pair(3.14, 77);
comm_world.send(x3, 1);
// send an STL tuple of double, int and char
x4=std::make_tuple(3.14, 77, 'Q');
comm_world.send(x4, 1);
}
if (comm_world.rank()==1) {
// recieve a C array of ints
comm_world.recv(x1, 0);
std::cout << "x1 = [";
for (auto i : x1)
std::cout << i << ", ";
std::cout << "]\n";
// receive an STL array of doubles
comm_world.recv(x2, 0);
std::cout << "x2 = [";
for (auto i : x2)
std::cout << i << ", ";
std::cout << "]\n";
// receive an STL pair of double and int
comm_world.recv(x3, 0);
std::cout << "x3 = [" << x3.first << ", " << x3.second << "]\n";
// receive an STL tuple of double, int  and char
comm_world.recv(x4, 0);
std::cout << "x4 = [" << std::get<0>(x4) << ", " << std::get<1>(x4) << ", " << std::get<2>(x4) << "]\n";
}
return EXIT_SUCCESS;
}
 Sending and receiving enumeration types.
#include <cstdlib>
#include <complex>
#include <iostream>
#include <mpl/mpl.hpp>
#include <type_traits>

enum class colors : long long {red, green, blue=0x7fffffffffffffffll};

enum numbers_enum {one=1, two, three, four};

int main() {
const mpl::communicator &comm_world=mpl::environment::comm_world();
if (comm_world.size()<2)
comm_world.abort(EXIT_FAILURE);
if (comm_world.rank()==0) {
// send data items of enum types
colors c=colors::blue;  comm_world.send(c, 1);
numbers_enum n=three;   comm_world.send(n, 1);
}
if (comm_world.rank()==1) {
// receive data items of enum types
colors c;        comm_world.recv(c, 0);  std::cout << static_cast<long long>(c) << '\n';
numbers_enum n;  comm_world.recv(n, 0);  std::cout << static_cast<int>(n) << '\n';
}
return EXIT_SUCCESS;
}

User defined data structures come usually as structures or classes. Provided that these classes hold only non-static non-const data members of types, which MPL is able to send or receive, it is possible to expose these data members to MPL via  template specialization of the class struct_builder such that messages containing objects of these classes can be exchanged. Template specialization of the class struct_builder is illustrated in the following program. The specialized template has to derived from base_struct_builder and the internal data representation of the user defined class is exposed to MPL in the constructor.

 Sending and receiving classes.
#include <cstdlib>
#include <vector>
#include <iostream>
#include <mpl/mpl.hpp>

// a class with some non-const non-static data members
class structure {
private:
double d;
int i[8];
public:
// constructor
template<typename... T>
structure(double d=0, T... i) : d(d), i{i...} {
}
friend std::ostream& operator<<(std::ostream& stream,
const structure &str) {
stream << "d = " << str.d << ", i = [";
for (int j=0; j<7; ++j)
stream << str.i[j] << ", ";
stream << str.i[7] << "]";
return stream;
}
friend class mpl::struct_builder<structure>;
};

namespace mpl {

// expose internal data representation to MPL via template specialization
template<>
class struct_builder<structure> : public base_struct_builder<structure> {
struct_layout<structure> layout;
public:
struct_builder() {
structure str;
// register structure or class
layout.register_struct(str);
// register each non-const non-static data member
// MPL must be able to send and receive types of all data members
layout.register_element(str.d);
layout.register_element(str.i);
// finally, expose data layout to MPL
define_struct(layout);
}
};

}

int main() {
const mpl::communicator &comm_world=mpl::environment::comm_world();
if (comm_world.size()<2)
comm_world.abort(EXIT_FAILURE);
structure str;
// send a single structure
if (comm_world.rank()==0) {
str=structure(1.2, 1, 77, 8, 5, 2, 9, 6, 4);
comm_world.send(str, 1);
}
if (comm_world.rank()==1) {
comm_world.recv(str, 0);
std::cout << "got : " << str << '\n';
}
return EXIT_SUCCESS;
}

# MPL – Collective communication

In a previous post I gave a very short introduction to the Message Passing Library (MPL), which is a C++ message passing library based on MPI. Although it does not provide a direct mapping of the C API to C++ it comes with all modes of collective communication as defined be the MPI standard, e.g., barrier, broadcast, gather, scatter, reduce and so on. The following program does not perform any meaningful calculation, but illustrates some modes of collective communication.

 Collective communication with MPL.
#include <cstdlib>
#include <complex>
#include <iostream>
#include <vector>
#include <mpl/mpl.hpp>

int main() {
const mpl::communicator &comm_world=mpl::environment::comm_world();
std::vector<int> v;
if (comm_world.rank()==0)
for (int i=0; i<comm_world.size(); ++i)
v.push_back(i);
int x;
// rank 0 scatters data to all processes
comm_world.scatter(0, v.data(), x);
std::cout << "rank " << comm_world.rank() << " got " << x << '\n';
// wait until all processes have reached this point
comm_world.barrier();
x*=2;
// rank 0 gathers data from all processes
comm_world.gather(0, x, v.data());
if (comm_world.rank()==0)
for (int i=0; i<comm_world.size(); ++i)
std::cout << "got " << v[i] << " from rank " << i << '\n';
// wait until all processes have reached this point
comm_world.barrier();
// calculate global sum and pass result to rank 0
if (comm_world.rank()==0) {
int sum;
comm_world.reduce(mpl::plus<int>(), 0, x, sum);
std::cout << "sum = " << sum << '\n';
} else
comm_world.reduce(mpl::plus<int>(), 0, x);
// wait until all processes have reached this point
comm_world.barrier();
// calculate global sum and pass result to all
comm_world.allreduce(mpl::plus<int>(), x);
std::cout << "sum = " << x << '\n';
return EXIT_SUCCESS;
}

Note that the reduction operation (addition) in the example above is specified as an anonymous function object. In addition to addition, MPL provides multiplication, logical operations »and« and »or«, bitwise operations »and«, »or«, and »xor« as well as minimum and maximum. A reduction operation must take two arguments of the same kind and produce a result of the same type as the arguments. With MPL it becomes very easy to define custom reduction operations, as the following example shows. Note that it is required, that the reduction operation is implemented by a class, which is derived from std::function and has no member variables.

 Custom reduction operations.
#include <cstdlib>
#include <ctime>
#include <iostream>
#include <functional>
#include <vector>
#include <mpl/mpl.hpp>

// calculate least common multiple of two arguments
template<typename T>
class lcm : public std::function<T (T, T)> {
// helper: calculate greatest common divisor
T gcd(T a, T b) {
T zero=T(), t;
if (a<zero) a=-a;
if (b<zero) b=-b;
while (b>zero) {
t=a%b;  a=b;  b=t;
}
return a;
}
public:
T operator()(T a, T b) {
T zero=T();
T t((a/gcd(a, b))*b);
if (t<zero)
return -t;
return t;
}
};

int main() {
const mpl::communicator &comm_world=mpl::environment::comm_world();
// generate data
std::srand(std::time(0)*comm_world.rank());  // random seed
int v=std::rand()%12+1;
// calculate least common multiple and send result to rank 0
if (comm_world.rank()==0) {
int result;
// calculate least common multiple
comm_world.reduce(lcm<int>(), 0, v, result);
// display data from all ranks
std::cout << "Arguments:\n";
for (int r=0; r<comm_world.size(); ++r) {
if (r>0)
comm_world.recv(v, r);
std::cout << v << '\n';
}
// display results of global reduction
std::cout << "\nResult:\n";
std::cout << result << '\n';
} else {
// calculate least common multiple
comm_world.reduce(lcm<int>(), 0, v);
// send data to rank 0 for display
comm_world.send(v, 0);
}
return EXIT_SUCCESS;
}
MPI provides reductions operations MINLOC and MAXLOC, which simultaneously determine the minimum or maximum plus the location of the minimum or maximum. In MPL, however, such specific operations are not needed. Minimum and maximum operations for pairs of some scalar type and an integer do the same job as the following example shows.
 Simultaneous determination of minimum and its location.
#include <cstdlib>
#include <ctime>
#include <iostream>
#include <iomanip>
#include <vector>
#include <utility>
#include <mpl/mpl.hpp>

typedef std::pair<double, int> pair_t;

int main() {
const mpl::communicator &comm_world=mpl::environment::comm_world();
// generate data
std::srand(std::time(0)*comm_world.rank());  // random seed
const int n=8;
std::vector<pair_t> v(n);
for (pair_t &i : v)
i=std::make_pair(static_cast<double>(std::rand())/RAND_MAX, comm_world.rank());
// calculate minium and its location and send result to rank 0
mpl::contiguous_layout<pair_t> layout(n);
if (comm_world.rank()==0) {
std::vector<pair_t> result(n);
// calculate minimum
comm_world.reduce(mpl::min<pair_t>(), 0, v.data(), result.data(), layout);
// display data from all ranks
std::cout << "Arguments:\n";
for (int r=0; r<comm_world.size(); ++r) {
if (r>0)
comm_world.recv(v.data(), layout, r);
for (pair_t i : v)
std::cout << std::fixed << std::setprecision(5) << i.first << ' ' << i.second << '\t';
std::cout << '\n';
}
// display results of global reduction
std::cout << "\nResults:\n";
for (pair_t i : result)
std::cout << std::fixed << std::setprecision(5) << i.first << ' ' << i.second << '\t';
std::cout << '\n';
} else {
// calculate minium and its location and send result to rank 0
comm_world.reduce(mpl::min<pair_t>(), 0, v.data(), layout);
// send data to rank 0 for display
comm_world.send(v.data(), layout, 0);
}
return EXIT_SUCCESS;
}

# MPL – A message passing library

The Message Passing Interface (MPI) Standard defines a message passing library, which serves as the basis for many high-performance computing applications today. It provides portable scalable functions for data exchange in parallel computations for various parallel computing architectures. Originally application programing interfaces had been defined for C and Fortran as well as for C++. In the 2008 update, known as MPI-3, however, the C++ bindings have been removed from the MPI standard. During the various revisions the MPI standard became quite complex and dropping one of the three language bindings may have helped to keep the standard maintainable as a whole. Furthermore, the C++ bindings were not very well designed. Although object orientated techniques had been applied, the MPI C++ bindings did not come close to a well designed C++ library by todays standards. explains in more detail, what happened to the C++ bindings in his blog.

Alternative C++ bindings to the MPI Standard are provided by Boost MPI and OOMPI, which was an early attempt to bring MPI 1 functionality to C++ in an object orientated way. Boost MPI uses rather modern C++ programing techniques to provide a very nice interface to the MPI standard’s core functionality. With Boost MPI programs become more type save (When sending data of a particular C++ type, the corresponding MPI data type is deduced by the compiler.) and sending data given by user defined structures or classes becomes much more easy than with the MPI standard’s C or C++ bindings. Although Boost MPI is a huge improvement over the deprecated C++ bindings of the MPI standard it has also its limitations.

• It is no longer actively maintained.
• Sending data of complex classes and structures is based on Boost serialization, which may cause performance reductions and does not work in heterogeneous environments (different endians etc.).
• Boost MPI provides no equivalent to derived MPI data types (strided vectors, sub matrices, etc.).
• Although Boost MPI supports the more general graph communicators, there are no functions for Cartesian communicators.
• Boost MPI ist based on C++03, it does not benefit from new C++11 features.

Because C++ was dropped from the MPI standard and because Boost MPI does not fulfill all my needs for a flexible easy-to-use C++ message passing library I started to write my own massage passing library on top of MPI, just called MPL (Message Passing Library), see my GitHub account. Note that MPL will neither bring all functions of the C language API to C++ nor provide a direct mapping of the C API to some C++ functions and classes. Its focus is on the MPI core functions, ease of use, type safety, and elegance. It uses C++11 features, wherever reasonable, e.g., lambda functions as custom reduction operations.

MPL relies heavily on templates and template meta programming and it comes just as a bunch of header files. Documentation is still missing and only available in form as the source code and a few sample programs. If you are familiar with MPI, however, the transition to MPL will not be difficult. Let us start with a hello-world type program:

 Hello (paralell) world!
#include <cstdlib>
#include <iostream>
#include <mpl/mpl.hpp>

int main() {
const mpl::communicator & comm_world(mpl::environment::comm_world());
std::cout << "Hello world! I am running on \""
<< mpl::environment::processor_name()
<< "\". My rank is "
<< comm_world.rank()
<< " out of "
<< comm_world.size() << " processes.\n";
return EXIT_SUCCESS;
}

Similar to MPI_COMM_WORLD in MPI, MPL has a global communicator that contains all processes, which belong to a parallel computation. Each communicator has a rank (the number of the process within a communicator) and a size (the total number of processes). The program shown above just prints for each process its rank, the size of the world communicator and the computer’s name, where the process runs. Note that with MPL it is not required to initialize or to finalize the massage passing library (MPI_Init and MPI_Finalize are called implicitly by some compiler magic.).

Let us look at a less trivial example and see how messages are send and received. A very elementary example using the C language bindings of MPI and C++11 may look like this:

#include <cstdlib>
#include <complex>
#include <iostream>
#include <mpi.h>

int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
int c_size, c_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &c_rank);
MPI_Comm_size(MPI_COMM_WORLD, &c_size);
if (c_size<2) {
MPI_Finalize();
return EXIT_FAILURE;
}
// send and recieve a single floating point number
if (c_rank==0) {
double pi=3.14;
MPI_Send(&pi, // pointer to memory
1, // number of data items
MPI_DOUBLE, // data type
1, // destination
0, // tag
MPI_COMM_WORLD // communicator
);
std::cout << "sent: " << pi << '\n';
} else if (c_rank==1) {
double pi=0;
MPI_Recv(&pi, // pointer to memory
1, // number of data items
MPI_DOUBLE, // data type
0, // source
0, // tag
MPI_COMM_WORLD, // communicator
MPI_STATUS_IGNORE // ignore the receive status
);
std::cout << "got : " << pi << '\n';
}
MPI_Finalize();
return EXIT_SUCCESS;
}

Here the standard MPI functions MPI_Send and MPI_Recv are employed. The function signature requires a lot of parameters: a pointer to a buffer, the number of items to be send or received, the data type, a source or destination, a tag, and finally the communicator. With MPL this simplifies a lot. MPL assumes that only one data item is send or received at a time, thus the number of data items is not needed to be specified. Furthermore, the underling MPI datatype will be deduced automatically at compile time by the compiler. This eliminates a typical error of MPI programs, e.g., passing a pointer to a do an integer by specifying MPI_DOUBLE as the data type. The tag, which may be used to distinguish between different kind of messages, becomes in MPL an argument with a default value, so it is optional. Thus, in MPL only the communicator, a reference to the data and a source or destination has to be given to the send and receive functions. The MPL equivalent to the MPI program shown above may look as:

#include <cstdlib>
#include <complex>
#include <iostream>
#include <mpl/mpl.hpp>

int main() {
const mpl::communicator &comm_world=mpl::environment::comm_world();
if (comm_world.size()<2)
return EXIT_FAILURE;
// send and recieve a single floating point number
if (comm_world.rank()==0) {
double pi=3.14;
comm_world.send(pi, 1);  // send to rank 1
std::cout << "sent: " << pi << '\n';
} else if (comm_world.rank()==1) {
double pi=0;
comm_world.recv(pi, 0);  // receive from rank 0
std::cout << "got : " << pi << '\n';
}
return EXIT_SUCCESS;
}

Of course sending and receiving single data items will not be sufficient for a message passing library. This is why MPL introduces the concept of data layouts. Data layouts specify the memory layout of a set of data to be sent or received (similar to derived data types in MPI). The layout may be continuous, a strided vector etc. The data layout is provided as an additional parameter to sending or receiving functions and, in contrast to the case of single data items, data is passed via a pointer. The following example may give an idea how data layouts are used with MPL:

 Sending and receiving a continuous vector of floating point numbers in MPL.
#include <cstdlib>
#include <complex>
#include <iostream>
#include <vector>
#include <mpl/mpl.hpp>

int main() {
const mpl::communicator &comm_world=mpl::environment::comm_world();
if (comm_world.size()<2)
return EXIT_FAILURE;
std::vector<double> v(8);
mpl::contiguous_layout<double> v_layout(v.size());
// send and recieve a vector of floating point numbers
if (comm_world.rank()==0) {
double init=0;
for (double &x : v) {
x=init;
++init;
}
comm_world.send(v.data(), v_layout, 1);  // send to rank 1
std::cout << "sent: ";
for (double &x : v)
std::cout << x << ' ';
std::cout << '\n';
} else if (comm_world.rank()==1) {
comm_world.recv(v.data(), v_layout, 0);  // receive from rank 0
std::cout << "got : ";
for (double &x : v)
std::cout << x << ' ';
std::cout << '\n';
}
return EXIT_SUCCESS;
}

# Cell phone supercomputers

In “Physics of the Future” Michio Kaku states “Today your cell phone has more computer power than all of the NASA back in 1969, when it placed two astronauts on the moon.” and in the fictional movie “Iron Sky” the space ship “Götterdämmerung” is operated by a smart phone. So what is actually the computational power of modern cell phones? To get an answer to this question I run the linpack benchmark on my Jolla (with Qualcomm Snapdragon dual core cpu at 1.4 GHz). Indeed, according to this benchmark my cell phone has a single-core performance of the order of a Cray supercomputer of the early 1990s, more than 300Mflops, which is about an order of magnitude slower than a modern desktop cpu.

The linpack benchmak solves a large dense system of linear equations in order to determine the number of executed floating point operations per second. Screenshot shows results for a Jolla smart phone.

# C++11/14 for scientific computing V

## $\boldsymbol{\lambda}$ functions

Anonymous functions, often called $\lambda$ functions, are a common feature of scientific programming languages, e.g., Male and Matlab. They are particularly usefull when functions are employed, which take other functions as arguments.  For example, a numerical root finding algorithm requires to specify a function as an input parameter.

C++11 introduces $\lambda$ functions to C++ and in C++14 they have been extended and become slightly more flexible.  $\lambda$ functions can be assigned to variables similar to functors.  The main advantage of $\lambda$ functions over functors is that they can be defined where needed.  Furthermore, $\lambda$ functions capture variables in the current scope, thus they represent a closure.

The syntax of a $\lambda$ function has basically the form

[ capture-list ] ( params ) -> ret { body }

The possibly empty capture-list specifies which variables are captured and how they are captured (by value or by reference).  Then the arguments and the return type are specified followed by the definition of the function body.  The return type may be omitted if the function body consists of nothing but a single return statement with an expression.

The use of $\lambda$ functions is probably best explained by providing a specific example. In the following, a generic root finding algorithm (regula falsi) is implemented, which looks in an interval for a root of some function.

 Source code
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <cmath>
#include <limits>

template<typename T, typename F>
T regula_falsi(T s, T t, F f) {
typedef enum { none, left, right } side_t;
side_t side(none);
// starting values at endpoints of interval
T fs=f(s);
T ft=f(t);
T r;
for (int n=0; n<std::numeric_limits<T>::digits+2; ++n) {
r=(fs*t - ft*s)/(fs - ft);
if (std::abs(t-s)<std::numeric_limits<T>::epsilon()*std::abs(t+s))
break;
T fr=f(r);
if (fr*ft>0) { // fr and ft have same sign, copy r to t
t=r;
ft=fr;
if (side==left)
fs/=2;
side=left;
} else if (fs*fr>0) { // fr and fs have same sign, copy r to s
s=r;
fs=fr;
if (side==right)
ft/=2;
side=right;
} else { // fr*f_ very small (looks like zero)
break;
}
}
return r;
}

int main() {
// assign lambda function to variable and use afterwards
auto f = [](double x){ return std::cos(x)-x*x*x; };
double x0=regula_falsi(0., 1., f);
// specify lambda function directly as an argument
double x1=regula_falsi(0., 3., [](double x){ return std::cos(x); });
std::cout << std::setprecision(12)
<< x0 << '\n'
<< x1 << '\n';
return 0;
}

# C++11/14 for scientific computing IV

## Random numbers

C++98 inherited from C the standard functions rand and srand and the macro RAND_MAX for generating pseudo-random numbers. These functions suffer several problems. For example, it has been never specified which algorithm is used in rand to produce pseudo-random numbers. Thus, in C and in C++98 the outcome of a Monte Carlo simulation depends on the employed implementation of the standard library if rand is used. Furthermore, in C++98 it is hard to generate random numbers from some non-uniform distribution.

Therefore, in C++11 a portable random number generator facility has been introduced. All classes related to random number generation are declared in the header random. The random number generator facility consists of engines and distributions. Engines produce streams of pseudo random bit patterns and may implement various algorithms for random number generation, e.g., linear congruences or the Mersenne Twister. Distributions consume an engine’s bits to generate random numbers, which are distributed according to a specific distribution, e.g., uniform, exponential or Gaussian. The following code gives a simple example. A more detailed explanation of the C++11 random number generator facility can be found in Random Number Generation in C++11 and in the video rand() Considered Harmful.

 Source code
#include <iostream>
#include <random>

int main() {
// Mersenne Twister with default seed
std::mt19937 engine;
// exponential distribution with mean 1
std::exponential_distribution<> distribution(1);
for (int i(0); i<100; ++i)
std::cout << distribution(engine) << '\n';
}

# C++11/14 for scientific computing III

## Mathematical functions

C++11/14 introduces several new mathematical functions, which are all overloaded for the types float, double and long double and that are defined in the header file cmath. The following table summerizes the new functions.

Basic operations
remainder signed remainder of the division operation
remquo signed remainder as well as the three last bits of the division operation
fmax larger of two floating point values
fmin smaller of two floating point values
fdim positive difference of two floating point values ($\max(0, x-y)$)
nan, nanf, nanl not-a-number (NaN)
##### Exponential functions
exp2 returns $2$ raised to the given power ($2^x$)
expm1 returns e raised to the given power, minus one ($\exp x -1$)
log2 base 2 logarithm of the given number ($\log_2x$)
log1p natural logarithm of 1 plus the given number ($\ln(1+x)$)
##### Power functions
cbrt computes cubic root ($\sqrt[3]{x}$)
hypot computes square root of the sum of the squares of two given numbers ($\sqrt{x^2+y^2}$)
##### Hyperbolic functions
asinh computes the inverse hyperbolic sine
acosh computes the inverse hyperbolic cosine
atanh computes the inverse hyperbolic tangent
##### Error and gamma functions
erf error function
erfc complementary error function
tgamma gamma function
lgamma natural logarithm of the gamma function
##### Nearest integer floating point operations
trunc nearest integer not greater in magnitude than the given value
round, lround, llround nearest integer, rounding away from zero in halfway cases
nearbyint nearest integer using current rounding mode
rint, lrint, llrint nearest integer using current rounding mode with exception if the result differs
##### Floating point manipulation functions
scalbn, scalbln multiplies a number by FLT_RADIX raised to a power
ilogb extracts exponent of the number
logb extracts exponent of the number
nextafter, nexttoward next representable floating point value towards the given value
copysign copies the sign of a floating point value
##### Classification and comparison
fpclassify categorizes the given floating point value
isfinite checks if the given number has finite value
isinf checks if the given number is infinite
isnan checks if the given number is NaN
isnormal checks if the given number is normal
signbit checks if the given number is negative
isgreater checks if the first floating-point argument is greater than the second
isgreaterequal checks if the first floating-point argument is greater or equal than the second
isless checks if the first floating-point argument is less than the second
islessequal checks if the first floating-point argument is less or equal than the second
islessgreater checks if the first floating-point argument is less or greater than the second
isunordered checks if two floating-point values are unordered

## Minimum maximum functions

The header file algorithm defines template functions for determining the minimum or the maximum of two comparable objects of the same type, e.g., numbers. In C++11 overloaded versions of these functions have been introduced that allow to determine the minimum or the maximum of an arbitrary number of objects, which are passed via an initializer list. Furthermore, minmax determines both the minimum and the maximum in a single scan. The following code gives an illustrative example.

 Source code
#include <iostream>
#include <algorithm>

int main() {
int i0(0), i1(1), i2(2), i3(3), i4(4);
// determine minimum and maximum seperately
std::cout << "min : " << std::min({i3, i1, i2, i0, i4}) << '\n'
<< "max : " << std::max({i3, i1, i2, i0, i4}) << '\n';
// determine minimum and maximum in a single scan
auto min_max=std::minmax({i3, i1, i2, i0, i4});
std::cout << "min : " << min_max.first << '\n'
<< "max : " << min_max.second << '\n';
}

# C++11/14 for scientific computing II

## Complex numbers

The template class std::complex and functions for complex numbers (defined in the header file complex) have been extended in C++11/14. The new function std::proj returns the projection of the complex number $z$ onto the Riemann sphere. The functions std::asin, std::acos and std::atan calculate the inverse of the sine, cosine and tangent for complex arguments and similarly std::ahsin, std::ahcos and std::ahtan compute the inverses of the complex hyperbolic trigonometric functions. The member functions real and imag are overloaded in C++11. In C++11 it is not only possible to get the real and the imaginary part of a complex number, the real and the imaginary may now also be set by these functions as illustrated below.

 Source code
#include <iostream>
#include <complex>

int main() {
std::complex<double> z;
// set real and imaginary part
z.real(1);
z.imag(2);
// get real and imaginary part
std::cout << "z = " << z.real() << " + " << z.imag() << "i\n";
}

C++14 introduces the literals if, i and id, which represent pure imaginary numbers in single, double and extended precision, respectively. These literals are declared in the inline namespace std::literals::complex_literals and make complex expressions in source code more easy to write and read as the following example shows.

 Source code
#include <iostream>
#include <complex>

int main() {
using namespace std::literals;

double pi=std::acos(-1.);
std::complex<double> z=std::exp(1i*pi); // Euler's formula
std::cout << "exp(i, pi) = " << z << '\n';
}