libflame download page
Contents
- Source code
- Reference Guide
- What is libflame?
- What's provided by libflame?
- What's new in libflame?
- What's in the latest snapshot?
- Status of operation support
- LAPACK compatibility in libflame
- System and software requirements
- Building and installing libflame
- Linking your LAPACK-dependent application to libflame
- Examples
- Beyond LAPACK
- Give us feedback!
Source code
libflame is provided as free software, licensed under the GNU Lesser General Public License (LGPL) in two forms:
Nightly snapshots. We provide nightly snapshots of the libflame source tree, identified by their subversion revision numbers. We strongly encourage interested users to download the latest nightly snapshot instead of the most recent milestone release. These snapshots provide the latest set of functionality and bug fixes, but may be slightly more prone to newer, more short-lived bugs when compared to the most recent stable release. This is simply due to the fact that the snapshot may capture recently-introduced bugs or other forms of breakage before a developer can identify and correct the problem. However, we make every effort to keep interim revisions functional and working as much as possible. That said, if you think you've found a bug, please send us feedback!
Previous milestone releases. The most recent milestone release of libflame is version 5.0. This and other milestone releases may be found here. Note: Even the latest milestone release lacks some of our most recent bug fixes and will be SIGNIFCANTLY MORE BUG-PRONE than the nightly snapshots. Please use a nightly snapshot!
Reference Guide
We strongly encourage our users to refer to the latest copy of the libflame user's guide for installation instructions and API reference.
What is libflame?
FLAME is a methodology for developing dense linear algebra libraries that is radically different from the LINPACK/LAPACK approach that dates back to the 1970s. By libflame we denote the library that has resulted from this project. For addition information, visit the FLAME home page.
What's provided by libflame?
The following libflame features benefit both basic and advanced users, as well as library developers:
A solution based on fundamental computer science. The FLAME project advocates a new approach to developing linear algebra libraries. Algorithms are obtained systematically according to rigorous principles of formal derivation. These methods are based on fundamental theorems of computer science to guarantee that the resulting algorithm is also correct. In addition, the FLAME methodology uses a new, more stylized notation for expressing loop-based linear algebra algorithms. This notation closely resembles how algorithms are naturally illustrated with pictures. (See Figure 1 and Figure 2 (left).)
Object-based abstractions and API. The BLAS, LAPACK, and ScaLAPACK projects place backward compatibility as a high priority, which hinders progress towards adopting modern software engineering principles such as object abstraction. libflame is built around opaque structures that hide implementation details of matrices, such as leading dimensions, and exports object-based programming interfaces to operate upon these structures. Likewise, FLAME algorithms are expressed (and coded) in terms of smaller operations on sub-partitions of the matrix operands. This abstraction facilitates programming without array or loop indices, which allows the user to avoid painful index-related programming errors altogether. Figure 2 compares the coding styles of libflame and LAPACK, highlighting the inherent elegance of FLAME code and its striking resemblance to the corresponding FLAME algorithm shown in Figure 1. This similarity is quite intentional, as it preserves the clarity of the original algorithm as it would be illustrated on a white-board or in a publication.
Educational value. Aside from the potential to introduce students to formal algorithm derivation, FLAME serves as an excellent vehicle for teaching linear algebra algorithms in a classroom setting. The clean abstractions afforded by the API also make FLAME ideally suited for instruction of high-performance linear algebra courses at the undergraduate and graduate level. Robert van de Geijn routinely uses FLAME in his linear algebra and numerical analysis courses. Some colleagues of the FLAME project are even beginning to use the notation to teach classes elsewhere around the country, including Timothy Mattson of Intel Corporation. Historically, the BLAS/LAPACK style of coding has been used in these settings. However, coding in this manner tends to obscure the algorithms; students often get bogged down debugging the frustrating errors that often result from indexing directly into arrays that represent the matrices. (See Figure 2.)
A complete dense linear algebra framework. Like LAPACK, libflame provides ready-made implementations of common linear algebra operations. The implementations found in libflame mirror many of those found in the BLAS and LAPACK packages. However, unlike LAPACK, libflame provides a framework for building complete custom linear algebra codes. We believe such an environment is more useful as it allows the user to quickly prototype a linear algebra solution to fit the needs of his application. We are currently writing a complete user's guide for libflame. In the meantime, users may browse the full list of routines available in libflame through our online doxygen documentation.
High performance. In our publications and performance graphs, we do our best to dispel the myth that user- and programmer-friendly linear algebra codes cannot yield high performance. Our FLAME implementations of operations such as Cholesky factorization and Triangular Inversion often outperform the corresponding implementations available in the LAPACK library. Figure 3 shows an example of the performance increase possible by using libflame compared to LAPACK. Many instances of the libflame performance advantage result from the fact that LAPACK provides only one variant (algorithm) of every operation, while libflame provides all known variants. This allows the user and/or library developer to choose which algorithmic variant is most appropriate for a given situation. libflame relies only on the presence of a core set of highly optimized unblocked routines to perform the small sub-problems found in FLAME algorithm codes.
Dependency-aware multithreaded parallelism. Until recently, the authors of the BLAS and LAPACK advocated getting shared-memory parallelism from LAPACK routines by simply linking to multithreaded BLAS. This low-level solution requires no changes to LAPACK code but also suffers from sharp limitations in terms of efficiency and scalability for small- and medium-sized matrix problems. The fundamental bottleneck to introducing parallelism directly within many algorithms is the web of data dependencies that inevitably exists between sub-problems. The libflame project has developed a runtime system, SuperMatrix, to detect and analyze dependencies found within FLAME algorithms-by-blocks (algorithms whose sub-problems operate only on block operands). Once dependencies are known, the system schedules sub-operations to independent threads of execution. This system is completely abstracted from the algorithm that is being parallelized and requires virtually no change to the algorithm code, but at the same time exposes abundant high-level parallelism. We have observed that this method provides increased performance for a range of small- and medium-sized problems, as shown in Figure 4. The most recent version of LAPACK does not offer any similar mechanism.
Support for hierarchical storage-by-blocks. Storing matrices by blocks, a concept advocated years ago by Fred Gustavson of IBM, often yields performance gains through improved spatial locality. Instead of representing matrices as a single linear array of data with a prescribed leading dimension as legacy libraries require (for column- or row-major order), the storage scheme is encoded into the matrix object. Here, internal elements refer recursively to child objects that represent sub-matrices. Currently, libflame provides a subset of the conventional API that supports hierarchical matrices, allowing users to create and manage such matrix objects as well as convert between storage-by-blocks and conventional "flat" storage schemes.
Advanced build system. From its early revisions, libflame distributions have been bundled with a robust build system, featuring automatic makefile creation and a configuration script conforming to GNU standards (allowing the user to run the ./configure; make; make install sequence common to many open source software projects). Without any user input, the configure script searches for and chooses compilers based on a pre-defined preference order for each architecture. The user may request specific compilers via the configure interface, or enable other non-default features of libflame such as custom memory alignment, multithreading (via POSIX threads or OpenMP), compiler options (debugging symbols, warnings, optimizations), and memory leak detection. The reference BLAS and LAPACK libraries provide no configuration support and require the user to manually modify a makefile with appropriate references to compilers and compiler options depending on the host architecture.
Windows support. While libflame was originally developed for GNU/Linux and UNIX environments, we have in the course of its development had the opportunity to port the library to Microsoft Windows. The Windows port features a separate build system implemented with Python and nmake, the Microsoft analogue to the make utility found in UNIX-like environments. As of this writing, the port is still very new and therefore should be considered experimental. However, we feel libflame for Windows is very close to usable for many in our audience, particularly those who consider themselves experts. We invite interested users to try the software and, of course, we welcome feedback to help improve our Windows support, and libflame in general.
Independence from Fortran and LAPACK. The libflame development team is pleased to offer a high-performance linear algebra solution that is 100% Fortran-free. libflame is a C-only implementation and does not depend on any external Fortran libraries, such as LAPACK. That said, we happily provide an optional backward compatibility layer, lapack2flame, that maps legacy LAPACK routine invocations to their corresponding native C implementations in libflame. This allows legacy applications to start taking advantage of libflame with virtually no changes to their source code. Furthermore, we understand that some users wish to leverage highly-optimized implementations that conform to the LAPACK interface, such as Intel's Math Kernel Library (MKL). As such, we allow those users to configure libflame such that their external LAPACK implementation is called for the small, performance-sensitive unblocked subproblems that arise within libflame's blocked algorithms and algorithms-by-blocks.
What's new in libflame?
We've added lots of functionality since libflame 4.0 was released on February 13, 2010. Here is a basic summary of what's new in libflame 5.0:
Library API and implementations:
- Implemented new operations:
- Reduction to upper Hessenberg form, via FLA_Hess_UT().
- Reduction to tridiagonal form, via FLA_Tridiag_UT(). (lower triangular storage only)
Reduction to bidiagonal form, via FLA_Bidiag_UT(). (upper bidiagonal case only, ie: m >= n)
- Reduction of a symmetric/Hermitian-definite generalized eigenproblem to standardized form. Sequential and hierarchical interfaces are available via FLA_Eig_gest() and FLASH_Eig_gest(), respectively.
- Changed UT Householder transform functions so that alpha is no longer constrained to be in the real domain, allowing tau to be complex. This allows the transform to retain the property of being a reflector in the complex domain.
Various improvements to SuperMatrix runtime system, including support for GPU execution if GPUs are available and the operation consists of sub-tasks that are supported via CUBLAS.
- Various improvements to control tree implementation, including better error checking to prevent the user from trying to run with control trees that execute non-existent variants.
- Fixed some bugs with beta scaling in gemm, hemm, symm, trmm, and trsm when beta is non-unit.
- Various improvments to the BLIS.
- Implemented the following new operations:
- bli_trmvsx (trmv to different vector)
- bli_trsvsx (trsv to different vector)
- bli_fnorm (Frobenius norm)
- bli_setv (set vector to scalar)
- bli_setm (set matrix to scalar)
- bli_setmr (set triangular matrix to scalar)
- bli_setdiag (set diagonal to scalar)
- Implemented missing row-major support for syr2k and her2k.
- Added various wrapper routines to map calls to Hermitian operations to symmetric equivalents for real datatypes.
- Rewrote BLIS macros to look and act more like functions.
- Changed semantics of axpymt, axpysmt, copymt, swapmt to match BLAS/LAPACK.
- Fixed a subtle scaling bug in herk and her2k.
- Implemented the following new operations:
- Many minor interface and implementation improvements.
- Many other bug fixes and cleanups.
Build system:
- Added configure-time switches to allow the building of static and/or dynamic libraries (--enable-static-build, --enable-dynamic-build). (GNU/Linux only)
- Added the ability to disable Fortran-77-based underscoring (--disable-autodetect-f77-underscoring) and LDFLAGS (--disable-autodetect-f77-ldflags) queries at configure-time, which allows one to build libflame in an environment that has absolutely no Fortran compiler.
Added a configure-time option to enable SuperMatrix visualization (--enable-supermatrix-visualization), which outputs DAG information which may be fed into graphviz.
- Added the ability for the user to specify a supplementary set of compiler FLAGS (--with-extra-cflags).
What's in the latest snapshot?
Here is a list of features and changes we've made since 5.0 that you can enjoy right now by downloading the latest snapshot.
- (not yet listed; please use a nightly snapshot!)
Status of operation support
libflame contains implementations of many operations that are provided by the BLAS and LAPACK libraries. However, not all FLAME implementions support every datatype. Also, in many cases, we use a different naming convention for our routine names. The following table summarizes which routines are supported within libflame and also provides their corresponding netlib name for reference.
operation name |
netlib routine name |
libflame routine name |
FLAME/C |
FLASH |
GPU support |
type support |
lapack2flame support |
Elemental |
libflame routine prefix |
|
|
FLA_ |
FLASH_* |
FLASH_# |
|
|
|
Level-3 BLAS |
||||||||
general matrix-matrix multiply |
?gemm |
Gemm |
y |
y |
y |
sdcz |
N/A |
y |
hermitian matrix-matrix multiply |
?hemm |
Hemm |
y |
y |
y |
sdcz |
N/A |
y |
hermitian rank-k update |
?herk |
Herk |
y |
y |
y |
sdcz |
N/A |
y |
hermitian rank-2k update |
?her2k |
Her2k |
y |
y |
y |
sdcz |
N/A |
y |
symmetric matrix-matrix multiply |
?symm |
Symm |
y |
y |
y |
sdcz |
N/A |
y |
symmetric rank-k update |
?syrk |
Syrk |
y |
y |
y |
sdcz |
N/A |
y |
symmetric rank-2k update |
?syr2k |
Syr2k |
y |
y |
y |
sdcz |
N/A |
y |
triangular matrix multiply |
?trmm |
Trmm |
y |
y |
y |
sdcz |
N/A |
y |
triangular solve with multiple right-hand sides |
?trsm |
Trsm |
y |
y |
y |
sdcz |
N/A |
y |
LAPACK-level |
||||||||
Cholesky factorization |
?potrf |
Chol |
y |
y |
y |
sdcz |
sdcz |
y |
LU factorization with no pivoting |
~ |
LU_nopiv |
y |
y |
y |
|
|
|
LU factorization with partial pivoting |
?getrf |
LU_piv |
y |
y |
y |
sdcz |
sdcz |
y |
LU factorization with incremental pivoting |
~ |
LU_incpiv |
|
y |
|
sdcz |
|
N/A |
QR factorization (via UT Householder transforms) |
?geqrf |
QR_UT |
y |
y |
y |
sdcz |
sdcz |
y |
QR factorization (via incremental UT Householder transforms) |
~ |
QR_UT_inc |
|
y |
|
sdcz |
|
N/A |
LQ factorization (via UT Householder transforms) |
?gelqf |
LQ_UT |
y |
y |
y |
sdcz |
sdcz |
|
Up-and-Downdate Cholesky/QR factor |
~ |
UDdate_UT |
y |
|
|
sdcz |
|
|
Up-and-Downdate Cholesky/QR factor (via incremental UT Householder-like transforms) |
~ |
UDdate_UT_inc |
|
y |
|
sdcz |
|
N/A |
Triangular matrix inversion |
?trtri |
Trinv |
y |
y |
y |
sdcz |
sdcz |
y |
Triangular-transpose matrix multiply |
?lauum |
Ttmm |
y |
y |
y |
sdcz |
sdcz |
|
SPD/HPD inversion |
?potri+ |
SPDinv |
y |
y |
y |
sdcz |
sdcz |
y |
Triangular Sylvester equation solve |
?trsyl^ |
Sylv |
y |
y |
y |
sdcz |
sdcz |
|
Triangular Lyapunov equation solve |
~ |
Lyap |
y |
y |
y |
|
|
|
Reduction of Hermitian-positive definite eigenproblem to standard form |
[sd]sygst, [cz]hegst |
Eig_gest |
y |
y |
y |
sdcz |
sdcz |
y |
Reduction to upper Hessenberg form |
?gehrd |
Hess_UT |
y |
|
|
sdcz |
sdcz |
|
Reduction to tridiagonal form |
[sd]sytrd, [cz]hetrd |
Tridiag_UT |
y |
|
|
sdcz |
sdcz |
y |
Reduction to bidiagonal form |
?gebrd |
Bidiag_UT |
y |
|
|
sdcz |
sdcz |
|
Symmetric/Hermitian Eigenvalue Decomposition |
[sd]syev, [cz]heev |
Hevd |
y |
|
|
dz |
|
y |
Generalized Symmetric/Hermitian Eigenvalue Decomposition |
[sd]sygvx, [cz]hegvx |
Soon! |
|
|
|
|
|
y |
Skew Symmetric/Hermitian Eigenvalue Decomposition |
~ |
Soon! |
|
|
|
|
|
y |
Singular Value Decomposition |
?gesvd |
Svd |
y |
|
|
dz |
|
|
Notes:
- y These routines are provided by libflame.
- ? Expands to one of {sdcz}.
- ~ These routines are not provided by LAPACK.
- + The LAPACK routine ?potri() differs from FLA_SPDinv() and FLASH_SPDinv() in that ?potri() require the user to invoke the Cholesky factorization manually and then pass in the result as input, whereas the FLAME implementations perform the Cholesky factorization internally and automatically.
- ^ LAPACK provides only an unblocked implementation of the triangular Sylvester equation solver. The lapack2flame compatibility interface maps invocations of ?trsyl() to the blocked implementation in libflame.
* Invocations of routines with the FLASH_ prefix call SuperMatrix by default. If SuperMatrix was not enabled at configure-time, or it was disabled at runtime with FLASH_Queue_disable(), then FLASH_ routines execute sequentially, though they will still use hierarchical storage.
- # GPU support must be enabled at configure-time and then invoked with FLASH_Queue_enable_gpu().
LAPACK compatibility in libflame
We provide an interface, lapack2flame, which allows legacy codes that link to LAPACK to utilize libflame without any code changes. However, lapack2flame does not provide interfaces to all routines within LAPACK. The column labeled "l2f support" in the above table shows which datatypes are supported for each operation.
System and software requirements
Please see the libflame user's guide for the latest system requirements for both GNU/Linux and UNIX, and Windows platforms.
Building and installing libflame
Please see the libflame user's guide for the latest instructions on downloading, configuring, compiling, and installing libflame.
Linking your LAPACK-dependent application to libflame
Please see the libflame user's guide for the latest instructions on linking your legacy, LAPACK-dependent application to libflame.
Examples
We have plenty of example code that is ready to run.
We offer a step-by-step walkthrough for running two example programs included in the libflame source distribution: the first executes a sequential Cholesky factorization with conventional ("flat") matrix storage; the second executes a multithreaded Cholesky factorization using SuperMatrix and hierarchical storage.
- Intermediate and advanced users may refer to the top-level test suite driver to find examples of how to run libflame routines. To run the test suite, configure/make/install libflame as you normally would, then change into the top-level test directory and edit the BLAS library referenced within the makefile. Then run make, and finally run the executable.
Potential users may also browse the code examples provided at our linear algebra wiki.
Beyond LAPACK
We have functionality beyond LAPACK. For example, we have routines for updating an LU factorization with pivoting. Adding additional operations is not our top priority at the moment. However, if you have an operation that you would like to see supported, it doesn't hurt to contact us with your request!
Give us feedback!
Questions? Comments? Suggestions? Please email us at flame@cs.utexas.edu !
