1.8
User Documentation for MADlib

This module provides a set of basic matrix operations for matrices that are too big to fit in memory. We provide two storage formats for a matrix:

A 'row' column (called as row_id above) provides the row number of each row and a 'val' column (called as row_vec above) provides each row as an array. The row column should contain a series of integers from 1 to N with no duplicates, where N is the row dimensionality.

For sparse matrices, the row and col columns together should not contain a duplicate entry and the val column should be of scalar (non-array) data type.
For comparison, the dense representation of this matrix is shown below. Note the dimensionality of the dense matrix is 4 x 7 since the max value of row and col is 4 and 7 respectively, leading to all zeros in the last row and last column.  

 row_id |         row_vec
--------+-------------------------
   1    | {9,0,0,0,6,6,0}
   2    | {8,0,0,0,0,0,0}
   3    | {3,9,0,0,0,0,0}
   4    | {0,0,0,0,0,0,0}
Note
The functions below support several numeric types (unless otherwise noted) including SMALLINT, INTEGER, BIGINT, DOUBLE PRECISION (FLOAT8), NUMERIC (internally casted into FLOAT8, loss of precision can happen).

Matrix Operations

Given below are the supported matrix operations. The meaning of the arguments and other terms are common to all functions and provided at the end of the list as a glossary.

Glossary

The table below provides a glossary of the terms used in the matrix operations.

matrix_in, matrix_a, matrix_b

TEXT. Name of the table containing the input matrix.

  • For functions accepting one matrix, matrix_in denotes the input matrix.
  • For functions accepting two matrices, matrix_a denotes the first matrix and matrix_b denotes the second matrix. These two matrices can independently be in either dense or sparse format.

in_args, a_args, b_args

TEXT. A comma-delimited string containing multiple named arguments of the form "name=value". This argument is used as a container for multiple parameters related to a single matrix.

The following parameters are supported for this string argument:

row (Default: 'row_num') Name of the column containing row index of the matrix.
col (Default: 'col_num') Name of the column containing column index of the matrix.
val (Default: 'val') Name of the column containing the entries of the matrix.
trans (Default: False) Boolean flag to indicate if the matrix should be transposed before the operation. This is currently functional only for matrix_mult.

For example, the string argument with default values will be 'row=row_num, col=col_num, val=val, trans=False'. Alternatively, the string argument can be set to NULL or be blank ('') if default values are to be used.

matrix_out

TEXT. Name of the table to store the result matrix.

out_args

TEXT. A comma-delimited string containing named arguments of the form "name=value". This is an optional parameter and the default value is set as follows:

  • For functions with one input matrix, default = in_args
  • For functions with two input matrices, default = a_args.

The following parameters are supported for this string argument:

row Name of the column containing row index of the matrix.
col Name of the column containing column index of the matrix.
val Name of the column containing the entries of the matrix.

index

INTEGER. An integer representing a row or column index of the matrix. Should be a number from 1 to N, where N is the maximum size of the dimension.

dim

INTEGER. Should either be 1 or 2. This value indicates the dimension to operate along for the reduction/aggregation operations. The value of dim should be interpreted as the dimension to be flattened i.e. whose length reduces to 1 in the result.
For dim=1, a reduction function on an NxM matrix operates on successive elements in each column and returns a single vector with M elements (i.e. matrix with just 1 row and M columns).
For dim=2, a single vector is returned with N elements (i.e. matrix with just 1 column and N rows).

Examples

Related Topics

File array_ops.sql_in documenting the array operations Array Operations

File matrix_ops.sql_in for list of functions and usage.