Parallelizing Particle-In-Cell Codes with OpenMP and MPI
Today's supercomputers often consists of clusters of SMP nodes. Both OpenMP and MPI are programming paradigms that can be used for parallelization of codes for such architectures. OpenMP uses shared memory, and hence is viewed as a simpler programming paradigm than MPI that is primarily a distributed memory paradigm. However, the Open MP applications may not scale beyond one SMP node. On the other hand, if we only use MPI, we might introduce overhead in intra-node communication. In this thesis we explore the trade-offs between using OpenMP, MPI and a mix of both paradigms for the same application. In particular, we look at a physics simulation and parallalize it with both OpenMP and MPI for large-scale simulations on modern supercomputers. A parallel SOR solver with OpenMP and MPI is implemented and the effects of such hybrid code are measured. We also utilize the FFTW-library that includes both system-optimized serial implementations and a parallel OpenMP FFT implementation. These solvers are used to make our existing Particle-In-Cell codes be more scalable and compatible with current programming paradigms and supercomputer architectures. We demonstrate that the overhead from communications in OpenMP loops on an SMP node is significant and increases with the number of CPUs participating in execution of the loop compared to equivalent MPI implementations. To analyze this result, we also present a simple model on how to estimate the overhead from communication in OpenMP loops. Our results are both surprising and should be of great interest to a large class of parallel applications.