C++11/14 for scientific computing IV

Random numbers

C++98 inherited from C the standard functions rand and srand and the macro RAND_MAX for generating pseudo-random numbers. These functions suffer several problems. For example, it has been never specified which algorithm is used in rand to produce pseudo-random numbers. Thus, in C and in C++98 the outcome of a Monte Carlo simulation depends on the employed implementation of the standard library if rand is used. Furthermore, in C++98 it is hard to generate random numbers from some non-uniform distribution.

Therefore, in C++11 a portable random number generator facility has been introduced. All classes related to random number generation are declared in the header random. The random number generator facility consists of engines and distributions. Engines produce streams of pseudo random bit patterns and may implement various algorithms for random number generation, e.g., linear congruences or the Mersenne Twister. Distributions consume an engine’s bits to generate random numbers, which are distributed according to a specific distribution, e.g., uniform, exponential or Gaussian. The following code gives a simple example. A more detailed explanation of the C++11 random number generator facility can be found in Random Number Generation in C++11 and in the video rand() Considered Harmful.

 Source code
#include <iostream>
#include <random>

int main() {
// Mersenne Twister with default seed
std::mt19937 engine;
// exponential distribution with mean 1
std::exponential_distribution<> distribution(1);
for (int i(0); i<100; ++i)
std::cout << distribution(engine) << '\n';
}

C++11/14 for scientific computing III

Mathematical functions

C++11/14 introduces several new mathematical functions, which are all overloaded for the types float, double and long double and that are defined in the header file cmath. The following table summerizes the new functions.

Basic operations
remainder signed remainder of the division operation
remquo signed remainder as well as the three last bits of the division operation
fmax larger of two floating point values
fmin smaller of two floating point values
fdim positive difference of two floating point values ($\max(0, x-y)$)
nan, nanf, nanl not-a-number (NaN)
Exponential functions
exp2 returns $2$ raised to the given power ($2^x$)
expm1 returns e raised to the given power, minus one ($\exp x -1$)
log2 base 2 logarithm of the given number ($\log_2x$)
log1p natural logarithm of 1 plus the given number ($\ln(1+x)$)
Power functions
cbrt computes cubic root ($\sqrt[3]{x}$)
hypot computes square root of the sum of the squares of two given numbers ($\sqrt{x^2+y^2}$)
Hyperbolic functions
asinh computes the inverse hyperbolic sine
acosh computes the inverse hyperbolic cosine
atanh computes the inverse hyperbolic tangent
Error and gamma functions
erf error function
erfc complementary error function
tgamma gamma function
lgamma natural logarithm of the gamma function
Nearest integer floating point operations
trunc nearest integer not greater in magnitude than the given value
round, lround, llround nearest integer, rounding away from zero in halfway cases
nearbyint nearest integer using current rounding mode
rint, lrint, llrint nearest integer using current rounding mode with exception if the result differs
Floating point manipulation functions
scalbn, scalbln multiplies a number by FLT_RADIX raised to a power
ilogb extracts exponent of the number
logb extracts exponent of the number
nextafter, nexttoward next representable floating point value towards the given value
copysign copies the sign of a floating point value
Classification and comparison
fpclassify categorizes the given floating point value
isfinite checks if the given number has finite value
isinf checks if the given number is infinite
isnan checks if the given number is NaN
isnormal checks if the given number is normal
signbit checks if the given number is negative
isgreater checks if the first floating-point argument is greater than the second
isgreaterequal checks if the first floating-point argument is greater or equal than the second
isless checks if the first floating-point argument is less than the second
islessequal checks if the first floating-point argument is less or equal than the second
islessgreater checks if the first floating-point argument is less or greater than the second
isunordered checks if two floating-point values are unordered

Minimum maximum functions

The header file algorithm defines template functions for determining the minimum or the maximum of two comparable objects of the same type, e.g., numbers. In C++11 overloaded versions of these functions have been introduced that allow to determine the minimum or the maximum of an arbitrary number of objects, which are passed via an initializer list. Furthermore, minmax determines both the minimum and the maximum in a single scan. The following code gives an illustrative example.

 Source code
#include <iostream>
#include <algorithm>

int main() {
int i0(0), i1(1), i2(2), i3(3), i4(4);
// determine minimum and maximum seperately
std::cout << "min : " << std::min({i3, i1, i2, i0, i4}) << '\n'
<< "max : " << std::max({i3, i1, i2, i0, i4}) << '\n';
// determine minimum and maximum in a single scan
auto min_max=std::minmax({i3, i1, i2, i0, i4});
std::cout << "min : " << min_max.first << '\n'
<< "max : " << min_max.second << '\n';
}

C++11/14 for scientific computing II

Complex numbers

The template class std::complex and functions for complex numbers (defined in the header file complex) have been extended in C++11/14. The new function std::proj returns the projection of the complex number $z$ onto the Riemann sphere. The functions std::asin, std::acos and std::atan calculate the inverse of the sine, cosine and tangent for complex arguments and similarly std::ahsin, std::ahcos and std::ahtan compute the inverses of the complex hyperbolic trigonometric functions. The member functions real and imag are overloaded in C++11. In C++11 it is not only possible to get the real and the imaginary part of a complex number, the real and the imaginary may now also be set by these functions as illustrated below.

 Source code
#include <iostream>
#include <complex>

int main() {
std::complex<double> z;
// set real and imaginary part
z.real(1);
z.imag(2);
// get real and imaginary part
std::cout << "z = " << z.real() << " + " << z.imag() << "i\n";
}

C++14 introduces the literals if, i and id, which represent pure imaginary numbers in single, double and extended precision, respectively. These literals are declared in the inline namespace std::literals::complex_literals and make complex expressions in source code more easy to write and read as the following example shows.

 Source code
#include <iostream>
#include <complex>

int main() {
using namespace std::literals;

double pi=std::acos(-1.);
std::complex<double> z=std::exp(1i*pi); // Euler's formula
std::cout << "exp(i, pi) = " << z << '\n';
}

C++11/14 for scientific computing I

In 2011 a new standard for the C++ programming language has been published, which is commonly referred to as C++11. This new standard introduces many new language features and standard library functions. Many of these new features have been introduced to make C++ more flexible and to make C++ more easy to use.  Bjarne Stroustrup (the creator of C++) even thinks «C++11 feels like a new language: The pieces just fit together better than they used to and I find a higher-level style of programming more natural than before and as efficient as ever.» This year another new version of C++ (C++14) has been defined, which brings some minor enhancements and clarifications compared to C++11. As C++ is a general purpose programing language the new features of the revised versions are not specifically designed for the needs of scientific computing. Nevertheless C++11/14 adds several new tools to the computational scientist’s toolbox. In the following I will present some of them.

New integer types

C++ does not define which are the minimal and maximal values that the integer types int and long can hold. The language standard requires only lower/upper bounds on these values. As a consequence int may be a 16-bit, 32-bit or a 64-bit integer or even something different. In C++11/14 the standard header file cstdint defines several new integer types with specific bit-width.

 int8_t int16_t int32_t int64_t signed integer type with width of exactly 8, 16, 32 and 64 bits respectively with no padding bits and using 2’s complement for negative values (provided only if the implementation directly supports the type) int_fast8_t int_fast16_t int_fast32_t int_fast64_t fastest signed integer type with width of at least 8, 16, 32 and 64 bits respectively int_least8_t int_least16_t int_least32_t int_least64_t smallest signed integer type with width of at least 8, 16, 32 and 64 bits respectively intmax_t maximum width integer type intptr_t integer type capable of holding a pointer uint8_t uint16_t uint32_t uint64_t unsigned integer type with width of exactly 8, 16, 32 and 64 bits respectively (provided only if the implementation directly supports the type) uint_fast8_t uint_fast16_t uint_fast32_t uint_fast64_t fastest unsigned integer type with width of at least 8, 16, 32 and 64 bits respectively uint_least8_t uint_least16_t uint_least32_t uint_least64_t smallest unsigned integer type with width of at least 8, 16, 32 and 64 bits respectively uintmax_t maximum width unsigned integer type uintptr_t unsigned integer type capable of holding a pointer

The header file cstdint defines also several macros for the maximal and minimal values of the integer types shown above. These values, however, are more conveniently accessed through the template class std::numeric_limits.

Numeric limits

The class template std::numeric_limits, which is defined in the header file limits provides a standardized way to query various properties of arithmetic types (e.g., the largest possible value for type int is std::numeric_limits<int>::max). This information is provided via specializations of the numeric_limits template. Since C++1 the members of std::numeric_limits are declared as static constexpr. Thus, their return values can be consumed by operations that require constant expressions, such as an integer template argument. Furthermore, the new members max_digits10 and lowest have been introduced in C++11, which give the number of decimal digits necessary to differentiate all values of this type and the lowest finite value of the given type.

Integer overflow

YouTube Error: An integer overflow occurred.

On YouTube a video has been watched more than 2,147,483,647 times, a number that can not been represented by a (signed) 32-bit integer.

Calculating the Hermite functions

The Hermite functions appear as the solutions of the quantum mechanical harmonic oscillator. But they have applications in many other fields and applications, e.g., pseudospectral methods. The Hermite functions $h_n(x)$ are defined as

\label{eq:h}
h_n(x) = \frac{1}{\sqrt{\sqrt{\pi} 2^n n!}} \mathrm{e}^{-x^2/2} H_n(x) \,,
where $H_n(x)$ denotes the $n$th Hermite polynomial defined via the recurrence relation

H_{n}(x) = 2xH_{n-1}(x)-2(n-1)H_{n-2}(x)
with the initial values $H_0(x)=1$ and $H_1(x)=2x$. Although the Hermite functions are quite well behaved, they are rather difficult to calculate especially for large $n$ and/or large $x$.

Calculating the Hermite functions via the definition \eqref{eq:h} fails easily due to numerical overflow or underflow that is caused by the rapid growth or decrease of the individual factors $\frac{1}{\sqrt{\sqrt{\pi} 2^n n!}} \mathrm{e}^{-x^2/2}$ and $H_n(x)$. A partial solution to this problem is to introduce the modified Hermite polynomials defined via the recurrence relation

\label{eq:h2}
\tilde H_{n}(x) = \sqrt{\frac{2}{n}}x\tilde H_{n-1}(x)-2\sqrt{\frac{n-1}{n}}\tilde H_{n-2}(x)
with the initial values $\tilde H_0(x)=1/\sqrt[4]{\pi}$ and $\tilde H_1(x)=\sqrt{2}\,x/\sqrt[4]{\pi}$. With these polynomials the Hermite functions become

h_n(x) = \mathrm{e}^{-x^2/2} \tilde H_n(x) \,.
Because the modified Hermite polynomials $\tilde H_{n}(x)$ grow much more slowly than the standard Hermite polynomials we got rid of the normalizing factor $\frac{1}{\sqrt{\sqrt{\pi} 2^n n!}}$.

But this does not solve all problems of underflow and overflow. For $x>38$ the factor $\mathrm{e}^{-x^2/2}$ is smaller than any number that can be represented as a double precision floating point number. A very robust remedy to numerical underflow was sketched in BIT Numerical Mathematics, Vol. 49, pp 281-295. The basic idea is to keep the magnitude of the modified Hermite polynomials on the order of one during their calculation via the recurrence relation \eqref{eq:h2} by introducing a suitable normalizing factor. During the calculation one has to keep track of the sum of the logarithms of these normalizing factors. Then the factor $\mathrm{e}^{-x^2/2}$ has to be modified by this sum when the value of the Hermite function is calculated. A Python implementation of this algorithm is shown below. This algorithm allows to calculate the values of high-order Hermite functions even for quite large arguments.

 Source code
from numpy import *
from pylab import *

def h(n, x):
if n==0:
return ones_like(x)*pi**(-0.25)*exp(-x**2/2)
if n==1:
return sqrt(2.)*x*norm*pi**(-0.25)*exp(-x**2/2)
h_i_2=ones_like(x)*pi**(-0.25)
h_i_1=sqrt(2.)*x*pi**(-0.25)
sum_log_scale=zeros_like(x)
for i in range(2, n+1):
h_i=sqrt(2./i)*x*h_i_1-sqrt((i-1.)/i)*h_i_2
h_i_2, h_i_1=h_i_1, h_i
log_scale=log(abs(h_i)).round()
scale=exp(-log_scale)
h_i=h_i*scale
h_i_1=h_i_1*scale
h_i_2=h_i_2*scale
sum_log_scale+=log_scale
return h_i*exp(-x**2/2+sum_log_scale)

The Hermite function $h_{800}(x)$. A direct calculation via the definition \eqref{eq:h2} of the Hermite function in terms of the modified Hermite polynomials $\tilde H_n(x)$ fails for $x>39$ due to numerical underflow as shown in the upper part. The shown Python code, which avoids underflow, has been utilized to produce the lower part.

I would like to thank Randolf Beerwerth for drawing my attention to this problem.

The Lanczos algorithm

Finding the eigenvalues and eigenvectors of large hermitian matrices is a key problem of (numerical) quantum mechanics. Often, however, the matrices of interest are much too large to employ exact methods. A popular and powerful approximation method is based on the Lanczos algorithm. The Lanczos algorithm determines an orthonormal basis of the Kyrlov sub-space $\mathcal{K}_k(\Psi, \hat H)$. The Kyrlov sub-space $\mathcal{K}_k(\Psi, \hat H)$ is the linear space that is spanned by the vectors $\Psi$, $\hat H\Psi$, ${\hat H}^2\Psi$, …, ${\hat H}^{k-1}\Psi$ with $k\le \dim\hat H$. Furthermore, the Lanczos algorithm constructs a real symmetric tridiagonal matrix $\hat H’$ with $\dim\hat H’=k$ that is an approximation of $\hat H$ in the sense that the eigenvalues of $\hat H’$ are close to some eigenvalues of $\hat H$. The eigenvectors of $\hat H$ can be approximated via the eigenvectors of $\hat H’$. Thus, the Lanczos algorithm reduces the problem of matrix diagonalization of large hermitian matrices to the diagonalization of a (usually) much smaller real symmetric tridiagonal matrix, which is a much simpler task.

The Lanczos algorithm has been applied to many problems of nonrelativistic quantum mechanics, in particular bound state calculations and time propagation. In a recent work (arXiv:1407.7370) we evaluated the Lanczos algorithm  for solving the time-independent as well as the time-dependent relativistic Dirac equation with arbitrary electromagnetic fields. We demonstrate that the Lanczos algorithm can yield very precise eigenenergies and allows very precise time propagation of relativistic wave packets. The Dirac Hamiltonian’s property of not being bounded does not hinder the applicability of the Lanczos algorithm. As the Lanczos algorithm requires only matrix-vector and inner products, which both can be efficiently parallelized, it is an ideal method for large-scale calculations. The excellent parallelization capabilities are demonstrated by a parallel implementation of the Dirac Lanczos propagator utilizing the Message Passing Interface standard.

The following python code shows how to solve the time-dependent free one-dimensional Dirac equation via the Lanczos algorithm. The Hamiltonian

\hat H = c \begin{pmatrix}
0 & 1 \\ 1 & 0
\end{pmatrix} \left(-\mathrm{i}\frac{\partial}{\partial x} \right) + \begin{pmatrix}
1 & 0 \\ 0 & -1
\end{pmatrix}m_0c^2
is approximated via second order finite differences. A detailed description of the Lanczos algorithm and its application to the Dirac equation is given in arXiv:1407.7370.

 Source code
#!/usr/bin/env python
# -*- coding: utf-8 -*-

# import useful modules
import matplotlib
from math import factorial
from numpy import *
from pylab import *
from numpy.polynomial.hermite import *

# use LaTeX, choose nice some looking fonts and tweak some settings
matplotlib.rc('font', family='serif')
matplotlib.rc('font', size=16)
matplotlib.rc('legend', fontsize=16)
matplotlib.rc('legend', numpoints=1)
matplotlib.rc('legend', handlelength=1.5)
matplotlib.rc('legend', frameon=False)
matplotlib.rc('lines', lw=1.5)
matplotlib.rc('text', usetex=True)
matplotlib.rc('text.latex',
preamble=[r'\usepackage[T1]{fontenc}',
r'\usepackage{amsmath}',
r'\usepackage{txfonts}',
r'\usepackage{textcomp}'])

close('all')
figure(figsize=(6, 4.5))

c=137.0359998  # speed of light in a.u.
N=1024+512

# the 1d Dirac Hamiltonian
def H(Psi, x, t, dt):
Psi=reshape(Psi, (N, 2))
dx=x[1]-x[0]
Psi_new=empty_like(Psi)
Psi_new[1:-1, 0]=-1j*c*(Psi[2:, 1]-Psi[0:-2, 1])/(2*dx) + c**2*Psi[1:-1, 0]
Psi_new[1:-1, 1]=-1j*c*(Psi[2:, 0]-Psi[0:-2, 0])/(2*dx) - c**2*Psi[1:-1, 1]
Psi_new[0, 0]=Psi_new[0, 1]=0
Psi_new[-1, 0]=Psi_new[-1, 1]=0
Psi_new*=dt
return reshape(Psi_new, 2*N)

# the Lanczos algorithm
def Lanczos(Psi, x, t, dt, H, m):
Psi_=Psi.copy()
# run Lanczos algorithm to calculate basis of Krylov space
V_j, V_j_1=[], []
A=zeros((m, m))
norms=zeros(m)
for j in range(0, m):
norms[j]=norm(Psi_)
V_j=Psi_/norms[j]
Psi_=H(V_j, x, t, dt)
if j>0:
A[j-1, j]=A[j, j-1]=vdot(V_j_1, Psi_).real
Psi_-=A[j-1, j]*V_j_1
A[j, j]=vdot(V_j, Psi_).real
Psi_-=A[j, j]*V_j
V_j_1, V_j=V_j, V_j_1
# diagonalize A
l, v=eig(A)
# calculate matrix exponential in Krylov space
c=dot(v, dot(diag(exp(-1j*l)), v[0, :]*norms[0]))
# run Lanczos algorithm 2nd time to transform result into original space
Psi_, Psi=Psi, zeros_like(Psi)
for j in range(0, m):
V_j=Psi_/norms[j]
Psi+=c[j]*V_j
Psi_=H(V_j, x, t, dt)
if j>0:
A[j-1, j]=A[j, j-1]=vdot(V_j_1, Psi_).real
Psi_-=A[j-1, j]*V_j_1
A[j, j]=vdot(V_j, Psi_).real
Psi_-=A[j, j]*V_j
V_j_1, V_j=V_j, V_j_1
return Psi

# define computational grid
x0, x1=-0.5, 0.5
x=linspace(x0, x1, N)
dx=x[1]-x[0]       # size of spatial grid spacing
dt=4./c**2         # temporal step size

# constuct momentum grid
dp=2*pi/(N*dx)
p=(arange(0, N)-0.5*(N-1))*dp
# choose initial condition
p_mu=75.       # mean momentum
sigma_p=50.    # momentum width
x_mu=-0.05     # mean position
# upper and lower components of free particle states,
# see e.g. Thaller »Advanced visual quantum mechanics«
d_p=sqrt(0.5+0.5/sqrt(1+p**2/c**2))
d_m=sqrt(0.5-0.5/sqrt(1+p**2/c**2))
d_m[p<0]*=-1
# initial condition in momentum space,
# gaussian wave packet of positive energy states
rho=(2*pi*sigma_p**2)**(-0.25)*exp(-(p-p_mu)**2/(4*sigma_p**2) - 1j*p*x_mu)
Psi=zeros((N, 2), dtype='complex')
Psi[:, 0]=d_p*rho
Psi[:, 1]=d_m*rho
# transform into real space with correct complex phases
Psi[:, 0]*=exp(1j*x[0]*dp*arange(0, N))
Psi[:, 1]*=exp(1j*x[0]*dp*arange(0, N))
Psi=ifft(Psi, axis=0)
Psi[:, 0]*=dp*N/sqrt(2*pi)*exp(1j*x*p[0])
Psi[:, 1]*=dp*N/sqrt(2*pi)*exp(1j*x*p[0])

# propagate
for k in range(0, 20):
# plot wave function
clf()
plot(x, Psi[:, 0].real**2+Psi[:, 0].imag**2+
Psi[:, 1].real**2+Psi[:, 1].imag**2,
color='#266bbd', label=r'$|\Psi(x)|^2$')
gca().set_xlim(x0, x1)
gca().set_ylim(-1, 16)
xlabel(r'$x$')
legend(loc='upper left')
tight_layout()
draw()
show()
pause(0.05)
Psi=reshape(Lanczos(reshape(Psi, 2*N), x, 0, dt, H, 128), (N, 2))

Evolution of a Dirac wave packet as calculated by the program shown above. The wave packet has Gaussian distribution in momentum space and is initially at $x=-0.2$. Due to the nonlinear relativistic relation between momentum an velocity the wave packet becomes asymmetric in position space.

New TRNG release

A new version of TRNG (Tina’s Random Number Generator Library) has been released. TRNG may be utilized in sequential as well as in parallel Monte Carlo simulations. It does not depend on a specific parallelization technique, e.g., POSIX threads, MPI and others. The new version 4.17 is a bug fix and maintenance release.

Lost in translation

Recently I found myself unexpectedly lost in translation.  As a “native speaker” of several languages of the C family I was facing the problem of calling various functions of a FORTRAN library from my C/C++ program. Although, I had done such kind of mixed-language programming in the past this time something went seriously wrong.  To call a FORTRAN function from C/C++ one has to provide a C/C++ function prototype for each FORTRAN function.  Thus the one-million dollar question was how does, for example, the FORTRAN function declaration

 Source code
DOUBLE COMPLEX FUNCTION ZDOTC(N, ZX, INCX, ZY, INCY)
*     .. Scalar Arguments ..
INTEGER INCX, INCY, N
*     .. Array Arguments ..
DOUBLE COMPLEX ZX(*), ZY(*)
translate to a C/C++ function prototype?  (This BLAS function calculates the scalar product of two complex vectors.)

Though there is no standard that defines inter-language operability between C/C++ and FORTRAN the answer is: It depends.  It depends on the employed FORTRAN and C/C++ compilers and possibly on the operating system.

First, one has to figure out which C/C++ data types are binary compatible to the various FORTRAN data types.  As mentioned before this is a compiler-dependent issue.  Typically the following mapping applies

• INTEGER -> signed int
• REAL -> float
• DOUBLE PRECISION -> double
• COMPLEX -> struct { float z[2]; } or float complex (with C99 and header file complex.h) or std::complex<float> (in C++ with header file complex)
• DOUBLE COMPLEX -> struct { double z[2]; } or double complex (with C99 and header file complex.h) or std::complex<double> (in C++ with header file complex)
• LOGICAL -> int (Only specific values may be valid representations of  true and false.)

Secondly, the FORTRAN function’s name in C/C++ representation has to be determined.  Again there is no general rule.  Most compilers, however, down-case the FORTRAN function name and add one or two underscores. Furthermore, FORTRAN function arguments are usually passed by pointers. Thus, the C++ function prototype for the ZDOTC FORTRAN function shown above may be

 Source code
extern "C" {
std::complex<double> zdotc_(const int *n,
const std::complex<double> *zx, const int *incx,
const std::complex<double> *zy, const int *incy);
}
The ZDOTC function does not modify all its input parameters (so-called in-parameters).  Therefore, the pointers in the C++ prototype have been declared as const.  For parameters that may be modified by a FORTRAN function (so-called in-out-parameters) the corresponding C/C++ pointers have to be declared as non-const.

With the explanations given, the translation form the FORTRAN function declaration to the C++ function prototype of ZDOTC looks straight forward.  Some FORTRAN compilers, however, do something quite unexpected, which finally let me feel a little bit lost in translation after getting strange run-time errors when calling ZDOTC from C++ as illustrated above.

Some FORTRAN compilers, in particular g77 and ifort, generate code that requires special treatment for complex return values.  These are returned via an extra argument in the calling sequence that points to where to store the return value.  Thus, the C++ function becomes

 Source code
extern "C" {
void zdotc_(std::complex<double> *res, const int *n,
const std::complex<double> *zx, const int *incx,
const std::complex<double> *zy, const int *incy);
}
with possibly a second underscore in the function name.

All these different calling conventions make it hard to write portable C/C++ programs that interact with FORTRAN libraries.  Some FORTRAN compilers even allow to specify on compile-time how to deal with complex return values or to change the byte-width of INTEGER and REAL variables.  This has to be taken into account when specifying the C/C++ function prototypes of FORTRAN functions.