For several decades, software developers (of high performance computing applications) could trust on Moore’s law. The number of transistors that can be placed inexpensively on an integrated circuit has doubled approximately every two years. This exponential growth allowed an ever-increasing computing power of modern CPUs. Thus, high performance computing applications became faster by just trusting on Moore’s law and waiting for faster CPUs.
However, this has changed dramatically with the introduction of multi-core architectures. »The free lunch is over« as Herb Sutter wrote in his Dr. Dobb’s Journal column. Transistor counts continue to grow exponentially. But today, hardware manufacturers use the plethora of transistors to put more cores on a single chip not to make CPUs faster. Clock rates and execution optimizations have reached their limits.
In order to benefit from todays multi-core architectures, one has to write multi-threaded programs. In fact, I believe the transition to multi-core architectures forces the most important change in the software development business of the decade because parallel programming differs very much from traditional sequential programming. It is hard, error-prone and requires another way of algorithmic thinking that an average software developer is not used to.
However, since IBM introduced the world’s first (non-embedded) multi-core processor in 2001, the POWER4 CPU, concurrent programming techniques started to penetrate into mainstream applications and, even more important, into standard libraries. Thus, the free lunch might be over but for programmers who hesitate to employ parallel programming techniques there is at least a free appetizer: parallelized standard libraries. Just to mention a few, there are
- the OpenMP Multi-Threaded Template Library and
- the Parallel Mode of the GNU C++ Standard Library, two parallelized replacements for the C++ Standard Template Library,
- the Intel Math Kernel Library and
- the AMD Core Math Library providing parallel BLAS3 routines and parallel LAPACK routines.
Parallelized standard libraries turn your sequential codes into parallel applications harnessing the power of modern multi-core CPUs requiring no or little changes to existing source codes. This is the good news. The bad news, however, is that this approach works only if one is able to delegate all computationally expensive operation to these libraries, which is not always possible. Optimal performance still requires to design multi-threaded software from scratch. Nevertheless, parallelized standard libraries are a good starting point for multi-core programming.