Iterative Preconditioned Solvers on Multi- and Many-core Platforms
The Faculty of Informatics is pleased to announce a seminar given by Dimitar Lukarski
DATE: Wednesday, July 25th 2012
PLACE: University of Lugano, room SI-008, Informatics building (Via G. Buffi 13)
In this talk we consider two types of solvers - out-of-the-box solvers such as preconditioned Krylov subspace solvers (e.g. CG, BiCGStab, GMRES), and problem-aware solvers such as geometric matrix-based multi-grid methods. Clearly, the majority of the solvers can be written in terms of sparse matrix-vector and vector-vector operations which can be performed in parallel. The focus is on the parallel, generic and portable preconditioners which are suitable for multi-core and many-core devices. We study additive (e.g. Gauss-Seidel, SOR), multiplicative (ILU factorization with or without fill-ins) and approximate inverse preconditioners. The preconditioners can also be used as smoothing schemes in the multi-grid methods via a preconditioned defect correction step. We treat the additive splitting schemes by a multi-coloring technique to provide the necessary level of parallelism. For controlling the fill-in entries for the ILU factorization we propose a novel method which we call the power(q)-pattern method. This algorithm produces a new matrix structure with diagonal blocks containing only diagonal entries.
With these techniques we can perform the forward and backward substitution of the preconditioning step in parallel. By formulating the algorithm in block-matrix form we can execute the sweeps in parallel only by performing matrix-vector multiplications. Thus, we can express the data-parallelism in the sweeps without any specification of the underlying hardware or programming models.
In object-oriented languages, an abstraction separates the object behavior from its implementation. Based on this abstraction, we propose a new sparse linear algebra library which supports several platforms such as multi-core CPUs, GPUs and accelerators. The various backends (sequential, OpenMP, CUDA, OpenCL) consist of optimized and platform-specific matrix and vector routines. Using unified interfaces across all platforms, the library allows users to build linear solvers and preconditioners without any information about the underlying hardware. With this technique, we can write our solvers and preconditioners in a single source code for all platforms. Furthermore, we can extend the library by adding new platforms without modifying the existing solvers and preconditioners.
To show the efficiency of the parallel techniques we consider two scenarios - preconditioned Krylov subspace methods and matrix-based multi-grid methods. We demonstrate speed ups in two directions: first, the preconditioners/smoothers reduce the total solution time by decreasing the number of iterations, and second, the preconditioning/smoothing phase is efficiently executed in parallel providing good scalability across several parallel architectures. We present numerical experiments and performance analysis on several platforms such as multi-core CPU and GPU devices. Furthermore, we show the viability and benefit of the proposed preconditioning schemes and software approach.
Dimitar Lukarski holds a Bachelor's degree from Technical University of Sofia Bulgaria, a Master's degree from Technical University of Karlsruhe Germany, and a doctoral degree from Karlsruhe Institute of Technology
(KIT) Germany. He is currently working as a PostDoc at Uppsala Programming for Multicore Architectures Research Center (UPMARC) / Department of Information Technology at Uppsala University Sweden, on interdisciplinary topics in the area of parallel numerical methods and emerging hardware such as GPUs and multi-core CPUs. His research focus is on robust and fine-grained parallel iterative solvers and preconditioners with implementations on stream-based platforms such as CUDA.
HOST: Prof. Olaf Schenk