FLAME Performance on various systems
Contents
Libraries timed
LAPACK
LAPACK 3.0 from netlib
GotoBLAS
GotoBLAS 1.09
FLAME
FLAME 1.0
ACML
ACML 3.6.0, AMD's scientific library
MKL
MKL 9.0, Intel's scientific library
Notes:
- We saw no performance improvement when timing LAPACK 3.1
- Kazushige Goto attends our weekly meetings and continuously tunes his library to incorporate ideas and insights that come up during our group's discussions.
The order of libraries matters. In particular the GotoBLAS library includes some LAPACK level routines, like getrf. Thus, if one links to the above libraries in the order FLAME+GotoBLAS+LAPACK, routines from LAPACK that occur in the GotoBLAS are linked first, and any remaining routines are linked from LAPACK.
HP 4 CPU Itanium2 server
- The system has 4 Itanium2 processors.
- Each processor is a 1.5GHz Itanium2 processor.
- Peak floating-point performance is four operations per cycle. Thus, peak
- floating-point performance per CPU is 6 GFLOPs/sec. (24 total).
|
|
|
|
AMD Opteron 8 cores
- The system has 4 processor sockets.
- Installed in each socket is a 2.4GHz dual-core Opteron 880 processor.
- Peak floating-point performance is two operations per cycle. Thus, peak floating-point performance per core is 4.8 GFLOPs/sec. (38.4 total)
- Each core has 1MB of L2 cache.
- The system has 16GB of main memory installed.
|
|
|
|
HP ProLiant DL580 G4 Rack Server (8 cores)
Note: We are still playing with environment settings. These are preliminary performance numbers.
- The system has 4 processor sockets.
- Installed in each socket is a 2.6GHz dual-core Intel Xeon 7110M processor.
- Peak floating-point performance is two operations per cycle. Thus, peak floating-point performance per core is 5.2 GFLOPs/sec. (41.6 total)
|
|
|
|
