1.8
User Documentation for MADlib

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics.

Let \(A\) be a \(mxn\) matrix, where \(m \ge n\). Then \(A\) can be decomposed as follows:

\[ A = U \Sigma V^T, \]

where \(U\) is a \(m \times n\) orthonormal matrix, \(\Sigma\) is a \(n \times n\) diagonal matrix, and \(V\) is an \(n \times n\) orthonormal matrix. The diagonal elements of \(\Sigma\) are called the {singular values}.

SVD Functions

SVD factorizations are provided for dense matrices, sparse matrices, and block matrices. In addition, a native implementation is provided for sparse matrices for improved performance.

SVD Function for Dense Matrices

svd( source_table,
     output_table_prefix,
     row_id,
     k,
     n_iterations,
     result_summary_table
);

Arguments

source_table

TEXT. Source table name (dense matrix).

The table contains a row_id column that identifies each row. Further, the other columns are assumed to be the data for the matrix represented in two possible forms, illustrated by the following 2x2 matrix example:

  1.             row_id     col1     col2
    row1         1           1         0
    row2         2           0         1
        
  2.         row_id     row_vec
    row1        1       {1, 0}
    row2        2       {0, 1}
        
output_table_prefix
TEXT. Prefix for output tables. See Output Tables.
row_id
TEXT. ID for each row.
k
INTEGER. Number of singular vectors to compute.
n_iterations (optional)
INTEGER. Number of iterations to run.
result_summary_table (optional)
TEXT. The name of the table to store result summary.

SVD Function for Sparse Matrix input

Use this function for matrices that are represented in the sparse-matrix format (example below). The input matrix is converted to a dense matrix before the SVD operation.

svd_sparse( source_table,
            output_table_prefix,
            row_id,
            col_id,
            value,
            row_dim,
            col_dim,
            k,
            n_iterations,
            result_summary_table
          );

Arguments

source_table

TEXT. Source table name (sparse matrix).

An example sparse matrix representation is given below:

       row_id    col_id    value
row1      1         1        2
row2      2         1        1
row3      3         2        1

The row_id represents the row number, col_id represents the column number and the value represents the matrix value at [row_id, col_id]. The row_id and col_id values are indexed starting from 0. Thus the row_id ranges from 1 to row_dim, while the col_id ranges from 1 to col_dim

output_table_prefix
TEXT. Prefix for output tables. See Output Tables.
row_id
TEXT. Name of the column containing the row index for each entry in sparse matrix.
col_id
TEXT. Name of the column containing the column index for each entry in sparse matrix.
value
TEXT. Name of column containing the non-zero values of the sparse matrix.
row_dim
INTEGER. Number of rows in matrix.
col_dim
INTEGER. Number of columns in matrix.
k
INTEGER. Number of singular vectors to compute.
n_iterations (optional)
INTEGER. Number of iterations to run.
result_summary_table (optional)
TEXT, default: NULL. The name of the table to store a summary of the results.

Native implementation for sparse matrix

Use this function for matrices that are represented in the sparse-matrix format (example below). This function use the native sparse representation while computing the SVD. This function should be favored if the matrix is highly sparse.

svd_sparse_native( source_table,
                   output_table_prefix,
                   row_id,
                   col_id,
                   value,
                   row_dim,
                   col_dim,
                   k,
                   n_iterations,
                   result_summary_table
                 );

Arguments

source_table
TEXT. Source table name (sparse matrix - see example above).
output_table_prefix
TEXT. Prefix for output tables. See Output Tables.
row_id
TEXT. ID for each row.
col_id
TEXT. ID for each column.
value
TEXT. Non-zero values of the sparse matrix.
row_dim
INTEGER. Row dimension of sparse matrix.
col_dim
INTEGER. Col dimension of sparse matrix.
k
INTEGER. Number of singular vectors to compute.
n_iterations (optional)
INTEGER. Number of iterations to run.
result_summary_table (optional)
TEXT. Table name to store result summary.

Block matrices

svd_block( source_table,
           output_table_prefix,
           k,
           n_iterations,
           result_summary_table
         );

Arguments

source_table
TEXT. Source table name (block matrix).
output_table_prefix
TEXT. Prefix for output tables. See Output Tables.
k
INTEGER. Number of singular vectors to compute.
n_iterations (optional)
INTEGER. Number of iterations to run.
result_summary_table (optional)
TEXT. Table name to store result summary.

Output Tables

Output for eigen vectors/values is in the following three tables:

The singular vector tables are of the format:

row_id INTEGER. The ID corresponding to each eigen value (in decreasing order).
row_vec FLOAT8[]. Singular vector elements for this row_id. Each array is of size k.

The singular values table is in a sparse table format, since only the diagonal elements of the matrix are non-zero:

row_id INTEGER. i for ith eigen value.
col_id INTEGER. i for ith eigen value (same as row_id).
value FLOAT8. Eigen Value.

All row_id (and col_id) in the above tables start from 0.

The result summary table has the following columns:

rows_used INTEGER. Number of rows used for SVD calculation.
exec_time FLOAT8. Total time for executing SVD.
iter INTEGER. Total number of iterations run.
recon_error FLOAT8. Total quality score (i.e. approximation quality) for this set of orthonormal basis.
relative_recon_error FLOAT8. relative quality score.

In the result summary table, the reconstruction error is computed as \( \sqrt{mean((X - USV^T)_{ij}^2)} \), where the average is over all elements of the matrices. The relative reconstruction error is then computed as ratio of the reconstruction error and \( \sqrt{mean(X_{ij}^2)} \).

Examples
  1. View online help for the SVD function.
    SELECT madlib.svd();
    
  2. Create an input dataset (dense matrix).
    CREATE TABLE mat (
        row_id integer,
        row_vec double precision[]
    );
    COPY mat (row_id, row_vec) FROM stdin delimiter '|';
    1|{396,840,353,446,318,886,15,584,159,383}
    2|{691,58,899,163,159,533,604,582,269,390}
    3|{293,742,298,75,404,857,941,662,846,2}
    4|{462,532,787,265,982,306,600,608,212,885}
    5|{304,151,337,387,643,753,603,531,459,652}
    6|{327,946,368,943,7,516,272,24,591,204}
    7|{877,59,260,302,891,498,710,286,864,675}
    8|{458,959,774,376,228,354,300,669,718,565}
    9|{824,390,818,844,180,943,424,520,65,913}
    10|{882,761,398,688,761,405,125,484,222,873}
    11|{528,1,860,18,814,242,314,965,935,809}
    12|{492,220,576,289,321,261,173,1,44,241}
    13|{415,701,221,503,67,393,479,218,219,916}
    14|{350,192,211,633,53,783,30,444,176,932}
    15|{909,472,871,695,930,455,398,893,693,838}
    16|{739,651,678,577,273,935,661,47,373,618}
    \.
    
  3. Run SVD function for a dense matrix.
    DROP TABLE IF EXISTS svd_u;
    DROP TABLE IF EXISTS svd_v;
    DROP TABLE IF EXISTS svd_s;
    SELECT madlib.svd( 'mat',
                       'svd',
                       'row_id',
                       10
                     );
    
  4. Create a sparse matrix by running the matrix_sparsify() utility function on the dense matrix.
    DROP TABLE IF EXISTS mat_sparse;
    SELECT madlib.matrix_sparsify( 'mat',
                                   'mat_sparse',
                                   FALSE
                                 );
    DROP TABLE IF EXISTS svd_u;
    DROP TABLE IF EXISTS svd_v;
    DROP TABLE IF EXISTS svd_s;
    
  5. Run the SVD function for a sparse matrix.
    SELECT madlib.svd_sparse( 'mat_sparse',
                              'svd',
                              'row_id',
                              'col_id',
                              'value',
                              10
                            );
    

Technical Background
In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics. Let \(A\) be a \(m \times n\) matrix, where \(m \ge n\). Then \(A\) can be decomposed as follows:

\[ A = U \Sigma V^T, \]

where \(U\) is a \(m \times n\) orthonormal matrix, \(\Sigma\) is a \(n \times n\) diagonal matrix, and \(V\) is an \(n \times n\) orthonormal matrix. The diagonal elements of \(\Sigma\) are called the {singular values}. It is possible to formulate the problem of computing the singular triplets ( \(\sigma_i, u_i, v_i\)) of \(A\) as an eigenvalue problem involving a Hermitian matrix related to \(A\). There are two possible ways of achieving this:
  1. With the cross product matrix, \(A^TA\) and \(AA^T\)
  2. With the cyclic matrix

    \[ H(A) = \begin{bmatrix} 0 & A\\ A^* & 0 \end{bmatrix} \]

    The singular values are the nonnegative square roots of the eigenvalues of the cross product matrix. This approach may imply a severe loss of accuracy in the smallest singular values. The cyclic matrix approach is an alternative that avoids this problem, but at the expense of significantly increasing the cost of the computation. Computing the cross product matrix explicitly is not recommended, especially in the case of sparse A. Bidiagonalization was proposed by Golub and Kahan [citation?] as a way of tridiagonalizing the cross product matrix without forming it explicitly. Consider the following decomposition

    \[ A = P B Q^T, \]

    where \(P\) and \(Q\) are unitary matrices and \(B\) is an \(m \times n\) upper bidiagonal matrix. Then the tridiagonal matrix \(B*B\) is unitarily similar to \(A*A\). Additionally, specific methods exist that compute the singular values of \(B\) without forming \(B*B\). Therefore, after computing the SVD of B,

    \[ B = X\Sigma Y^T, \]

    it only remains to compute the SVD of the original matrix with \(U = PX\) and \(V = QY\).