Contents
Heterogeneous Parallelization of the Cholesky Factorization
The below graph shows performance for a Cholesky factorization on a multicore host with multiple GPU accelerators, using the FLAME/C API and SuperMatrix runtime system.
Details
MKL 10.0 spotrf on two Intel Xeon QuadCore (2.2 GHz).
Dual Intel Xeon QuadCore E5405 a 2.0 Ghz (8 cores total).
- Algorithm-by-blocks on Tesla S870
Dual Intel Xeon QuadCore E5405@2.0 Ghz (8 cores total).
Connected via two PCIExpress x 16 to an NVIDIA S870 (4 GPUs).
- Algorithm-by-blocks on Tesla S1070
Dual Intel Xeon QuadCore E5440@2.83 Ghz (total 8 cores total).
Connected via two PCIExpress x 16 to an NVIDIA S1070 (4 GPUs).
More Information
Read
- Gregorio Quintana-OrtÃ, Francisco D. Igual, Enrique S. Quintana-OrtÃ, Robert van de Geijn. "Solving Dense Linear Algebra Problems on Platforms with Multiple Hardware Accelerators." Proceedings of 2009 ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, Raleigh, North Carolina, February 2009.
Available from http://www.cs.utexas.edu/users/flame/publications/
Acknowledgments
We gratefully acknowledge support from NVIDIA for this work.
