Differences between revisions 176 and 177

Deletions are marked like this. Additions are marked like this.
Line 113: Line 113:
= Future architectures =

While this wiki is primarily meant as a pedagogical tool, the techniques that are discussed, developed as part of the FLAME project,
are actually very powerful. Below we report performance of the Cholesky factorization when multiple GPUs are employed.
{{{#!frame align=float:left,width=10%,background=yellow
[:Cholesky_MultiGPU:Learn more]
}}}
    attachment:mtgpu_cholesky_comparison.png
{{{#!frame align=clear
}}}

Purpose

This wiki is an ever-growing repository of information related to linear algebra theory, practice, and experience. It is our hope that this makes the material more accessible to expert and novice alike. We hope it will become the Wikipedia of linear algebra.

Navigating the Wiki

This Wiki can be used in a number of different ways:

  • Search the Wiki for specific information by clicking on the FindPage tab at the top of this page.

  • Follow a suggested course through some of the basic material on linear algebra.

  • Access electronic documents that are linked to the information stored in the Wiki

Notation

Our exploration of linear algebra theory and practice includes explanations that deviate from more traditional expositions in the notation that is used. This is previewed by the following algorithm for computing the LU factorization (without pivoting), which is really Gaussian elimination in disguise (more):

Learn more

    • \resetsteps      % Reset all the commands to create a blank worksheet  

% Define the operation to be computed

\renewcommand{\routinename}{ A := \mbox{\sc LU\_unb\_var5}( A ) }

% Step 3: Loop-guard 

\renewcommand{\guard}{
  m( A_{TL} ) < m( A ) \mbox{ and } n( A_{TL} ) < n( A )
}

% Step 4: Define Initialize 

\renewcommand{\partitionings}{
  $
  A \rightarrow
  \FlaTwoByTwo{A_{TL}}{A_{TR}}
              {A_{BL}}{A_{BR}}
  $
}

\renewcommand{\partitionsizes}{
$ A_{TL} $ is $ 0 \times 0 $
}

% Step 5a: Repartition the operands 

\renewcommand{\repartitionings}{
$
  \FlaTwoByTwo{A_{TL}}{A_{TR}}
              {A_{BL}}{A_{BR}}
  \rightarrow
  \FlaThreeByThreeBR{A_{00}}{a_{01}}{A_{02}}
                    {a_{10}^T}{\alpha_{11}}{a_{12}^T}
                    {A_{20}}{a_{21}}{A_{22}}
$}

\renewcommand{\repartitionsizes}{
  $ \alpha_{11} $ is $ 1 \times 1 $
}

% Step 5b: Move the double lines 

\renewcommand{\moveboundaries}{
$
  \FlaTwoByTwo{A_{TL}}{A_{TR}}
              {A_{BL}}{A_{BR}}
  \leftarrow
  \FlaThreeByThreeTL{A_{00}}{a_{01}}{A_{02}}
                    {a_{10}^T}{\alpha_{11}}{a_{12}^T}
                    {A_{20}}{a_{21}}{A_{22}}
$}

% Step 8: Insert the updates required to change the 
%         state from that given in Step 6 to that given in Step 7
% Note: The below needs editing!!!

\renewcommand{\update}{
$
  \begin{array}{l @{\hspace{2pt}} l }
    a_{21} & \becomes a_{21} / \alpha_{11} \\ 
    A_{22} & \becomes A_{22} - a_{21} a_{12}^T
  \end{array}
$
}

\FlaAlgorithmNarrow


APIs

When appropriate, implementations are given using a number of Application Programming Interfaces (APIs) that allow code to closely mirror the algorithms as presented in the FLAME notation.

FLAME@lab

FLAME@lab is the API for Octave's Mscript (free Matlab!), LabView's MathScript, and Matlab Mscript. Using this API, the above LU factorization is coded as

Try it!


Learn more

    • function [ A_out ] = LU_unb_var5( A )
      
        [ ATL, ATR, ...
          ABL, ABR ] = FLA_Part_2x2( A, 0, 0, 'FLA_TL' );
      
        while ( size( ATL, 1 ) < size( A, 1 ) & size( ATL, 2 ) < size( A, 2 ) )
      
          [ A00,  a01,     A02,  ...
            a10t, alpha11, a12t, ...
            A20,  a21,     A22 ] = FLA_Repart_2x2_to_3x3( ATL, ATR, ...
                                                          ABL, ABR, 1, 1, 'FLA_BR' );
          %------------------------------------------------------------%
      
          a21 = a21 / alpha11;
          A22 = A22 - a21 * a12t;
      
          %------------------------------------------------------------%
          [ ATL, ATR, ...
            ABL, ABR ] = FLA_Cont_with_3x3_to_2x2( A00,  a01,     A02,  ...
                                                   a10t, alpha11, a12t, ...
                                                   A20,  a21,     A22, 'FLA_TL' );
        end
      
        A_out = [ ATL, ATR
                  ABL, ABR ];
      
      return
      

FLAME/C

A similar API for the C programming language supports the following code for LU factorization:

Try it!


Learn more

    • #include "FLAME.h"
      
      FLA_Error LU_unb_var5( FLA_Obj A )
      {
        FLA_Obj ATL,   ATR,      A00,  a01,     A02, 
                ABL,   ABR,      a10t, alpha11, a12t,
                                 A20,  a21,     A22;
      
        FLA_Part_2x2( A,    &ATL, &ATR,
                            &ABL, &ABR,     0, 0, FLA_TL );
      
        while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ){
      
          FLA_Repart_2x2_to_3x3( ATL, /**/ ATR,       &A00,  /**/ &a01,     &A02,
                              /* ************* */   /* *************************** */
                                                      &a10t, /**/ &alpha11, &a12t,
                                 ABL, /**/ ABR,       &A20,  /**/ &a21,     &A22,
                                 1, 1, FLA_BR );
      
          /*------------------------------------------------------------*/
      
          /* a21 = a21 / alpha11; */
          FLA_Inv_scal( alpha11, a21 );
      
          /* A22 = A22 - a21 * a12t; */
          FLA_Ger( FLA_MINUS_ONE, a21, a12t, A22 );
      
          /*------------------------------------------------------------*/
      
          FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR,       A00,  a01,     /**/ A02,
                                                           a10t, alpha11, /**/ a12t,
                                  /* ************** */  /* ************************* */
                                    &ABL, /**/ &ABR,       A20,  a21,     /**/ A22,
                                    FLA_TL );
        }
      
        return FLA_SUCCESS;
      }
      


FLAME/F

You asked for it, you got it! An API for FORTRAN supports the following code for LU factorization:

    •       SUBROUTINE LU_unb_var1_f( A, ierror )
            implicit none
            include "FLAMEf.h"
      !
            INTEGER A( FLA_OBJ_SIZE ), IERROR
      !
            INTEGER 
           &  ATL( FLA_OBJ_SIZE),   ATR( FLA_OBJ_SIZE),   
           &  ABL( FLA_OBJ_SIZE),   ABR( FLA_OBJ_SIZE),
      !      
           &  A00( FLA_OBJ_SIZE), a01( FLA_OBJ_SIZE),    A02( FLA_OBJ_SIZE), 
           &  a10t( FLA_OBJ_SIZE),alpha11( FLA_OBJ_SIZE),a12t( FLA_OBJ_SIZE),
           &  A20( FLA_OBJ_SIZE), a21( FLA_OBJ_SIZE),    A22( FLA_OBJ_SIZE)
      !
      !
      !
      !
      !
            call FLA_Part_2x2_f( A,    ATL, ATR,
           &                           ABL, ABR,     0, 0, FLA_TL, ierror )
      !
            do while ( FLA_Obj_length_f( ATL ) .lt. FLA_Obj_length_f( A ) ) 
               call FLA_Repart_2x2_to_3x3_f( 
           &               ATL, ATR,      A00,  a01,     A02,
           &                              a10t, alpha11, a12t,
           &               ABL, ABR,      A20,  a21,     A22,
           &               1, 1, FLA_BR, ierror )
      !    ------------------------------------------------------------
      !
      !        a21 = a21 / alpha11
      !
               call FLA_Inv_scal_f( alpha11, a21, ierror );
      !
      !        A22 = A22 - a21 * a12t; 
      !
               call FLA_Ger_f( FLA_MINUS_ONE, a21, a12t, A22, ierror );
      !
      !    ------------------------------------------------------------
               call FLA_Cont_with_3x3_to_2x2_f( 
           &              ATL, ATR,      A00,  a01,     A02,
           &                             a10t, alpha11, a12t,
           &              ABL, ABR,      A20,  a21,     A22,
           &              FLA_TL, ierror );
            enddo
      !
            ierror = FLA_SUCCESS
      !
            return
            end
      


Performance

We place a lot of emphasis on the fact that there are typically multiple algorithms for computing a given linear algebra operation. One reason is that under different circumstances different algorithms may perform better. Here "perform" may mean that they yield more accurate answers in the presense of round-off errors and/or that they produce the answer in less time.

The following graph illustrates the different rates of computation achieved by different algorithmic variants for LU factorization.

Learn more


Future architectures

While this wiki is primarily meant as a pedagogical tool, the techniques that are discussed, developed as part of the FLAME project, are actually very powerful. Below we report performance of the Cholesky factorization when multiple GPUs are employed.

Learn more


Sponsors

This wiki is sponsored in part by

  • The National Science Foundation (Awards CCF-0540926 and CCF-0342369);

    Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

  • An unrestricted grant from Dr. James Truchard; and
  • An equipment grant from Hewlett-Packard.

Contact Us

flame@cs.utexas.edu


This wiki is powered by MoinMoin.

LinearAlgebraWiki: FrontPage (last edited 2009-01-23 17:56:18 by RobertVanDeGeijn)