Contents

This module provides a set of basic matrix operations for matrices that are too big to fit in memory. We provide two storage formats for a matrix:

Dense: The matrix is represented as a distributed collection of 1-D arrays. An example 3x10 matrix would be the below table:

 row_id |         row_vec
--------+-------------------------
   1    | {9,6,5,8,5,6,6,3,10,8}
   2    | {8,2,2,6,6,10,2,1,9,9}
   3    | {3,9,9,9,8,6,3,9,5,6}

A 'row' column (called as row_id above) provides the row number of each row and a 'val' column (called as row_vec above) provides each row as an array. The row column should contain a series of integers from 1 to N with no duplicates, where N is the row dimensionality.

Sparse: The matrix is represented using the row and column indices for each non-zero entry of the matrix. This representation is useful for sparse matrices, containing multiple zero elements. Given below is an example of a sparse 4x7 matrix with just 6 out of 28 entries being non-zero. The dimensionality of the matrix is inferred using the max value in row and col columns. Note the last entry is included (even though it is 0) to provide the dimensionality of the matrix (indicating that the 4th row and 7th column contain all zeros).
```
 row_id | col_id | value
--------+--------+-------
      1 |      1 |     9
      1 |      5 |     6
      1 |      6 |     6
      2 |      1 |     8
      3 |      1 |     3
      3 |      2 |     9
      4 |      7 |     0
(6 rows)
```

For sparse matrices, the row and col columns together should not contain a duplicate entry and the val column should be of scalar (non-array) data type.
For comparison, the dense representation of this matrix is shown below. Note the dimensionality of the dense matrix is 4 x 7 since the max value of row and col is 4 and 7 respectively, leading to all zeros in the last row and last column.

 row_id |         row_vec
--------+-------------------------
   1    | {9,0,0,0,6,6,0}
   2    | {8,0,0,0,0,0,0}
   3    | {3,9,0,0,0,0,0}
   4    | {0,0,0,0,0,0,0}

Note: The functions below support several numeric types (unless otherwise noted) including SMALLINT, INTEGER, BIGINT, DOUBLE PRECISION (FLOAT8), NUMERIC (internally casted into FLOAT8, loss of precision can happen).

Matrix Operations

Given below are the supported matrix operations. The meaning of the arguments and other terms are common to all functions and provided at the end of the list as a glossary.

Representation

-- Convert to sparse representation
  matrix_sparsify( matrix_in, in_args, matrix_out, out_args)
 
-- Convert to dense representation
  matrix_densify( matrix_in, in_args, matrix_out, out_args)

Mathematical operations

-- Matrix transposition
  matrix_trans( matrix_in, in_args, matrix_out, out_args)
 
-- Matrix addition
  matrix_add( matrix_a, a_args, matrix_b, b_args, matrix_out, out_args)
 
-- Matrix subtraction
  matrix_sub( matrix_a, a_args, matrix_b, b_args, matrix_out, out_args)
 
-- Matrix multiplication
  matrix_mult( matrix_a, a_args, matrix_b, b_args, matrix_out, out_args)
 
-- Element-wise matrix multiplication
  matrix_elem_mult( matrix_a, a_args, matrix_b, b_args, matrix_out, out_args)
 
-- Multiply matrix with scalar.
  matrix_scalar_mult( matrix_in, in_args, scalar, matrix_out, out_args)
 
-- Multiply matrix with vector.
  matrix_vec_mult( matrix_in, in_args, vector)

Extraction/visitor methods

-- Extract row from matrix given row index
  matrix_extract_row( matrix_in, in_args, index)
 
-- Extract column from matrix given column index
  matrix_extract_col( matrix_in, in_args, index)

Reduction operations (aggregate across specific dimension)

-- Get max value along dim. Also returns corresponding index if fetch_index = True
  matrix_max( matrix_in, in_args, dim, matrix_out, fetch_index)
 
-- Get min value along dim. Also returns corresponding index if fetch_index = True
  matrix_min( matrix_in, in_args, dim, matrix_out, fetch_index)
 
-- Get sum value along dimension from matrix.
  matrix_sum( matrix_in, in_args, dim)
 
-- Get mean value along dimension from matrix.
  matrix_mean( matrix_in, in_args, dim)

Glossary

The table below provides a glossary of the terms used in the matrix operations.

matrix_in, matrix_a, matrix_b

TEXT. Name of the table containing the input matrix.

For functions accepting one matrix, matrix_in denotes the input matrix.
For functions accepting two matrices, matrix_a denotes the first matrix and matrix_b denotes the second matrix. These two matrices can independently be in either dense or sparse format.

in_args, a_args, b_args

TEXT. A comma-delimited string containing multiple named arguments of the form "name=value". This argument is used as a container for multiple parameters related to a single matrix.

The following parameters are supported for this string argument:

row	(Default: 'row_num') Name of the column containing row index of the matrix.
col	(Default: 'col_num') Name of the column containing column index of the matrix.
val	(Default: 'val') Name of the column containing the entries of the matrix.
trans	(Default: False) Boolean flag to indicate if the matrix should be transposed before the operation. This is currently functional only for matrix_mult.

For example, the string argument with default values will be 'row=row_num, col=col_num, val=val, trans=False'. Alternatively, the string argument can be set to NULL or be blank ('') if default values are to be used.

matrix_out

TEXT. Name of the table to store the result matrix.

out_args

TEXT. A comma-delimited string containing named arguments of the form "name=value". This is an optional parameter and the default value is set as follows:

For functions with one input matrix, default = in_args
For functions with two input matrices, default = a_args.

The following parameters are supported for this string argument:

row	Name of the column containing row index of the matrix.
col	Name of the column containing column index of the matrix.
val	Name of the column containing the entries of the matrix.

index

INTEGER. An integer representing a row or column index of the matrix. Should be a number from 1 to N, where N is the maximum size of the dimension.

dim

INTEGER. Should either be 1 or 2. This value indicates the dimension to operate along for the reduction/aggregation operations. The value of dim should be interpreted as the dimension to be flattened i.e. whose length reduces to 1 in the result.
For dim=1, a reduction function on an NxM matrix operates on successive elements in each column and returns a single vector with M elements (i.e. matrix with just 1 row and M columns).
For dim=2, a single vector is returned with N elements (i.e. matrix with just 1 column and N rows).

Examples

Create some random data tables in dense format.

CREATE TABLE "mat_A" (
    row_id integer,
    row_vec integer[]
);
INSERT INTO "mat_A" (row_id, row_vec) VALUES (1, '{9,6,5,8,5,6,6,3,10,8}');
INSERT INTO "mat_A" (row_id, row_vec) VALUES (2, '{8,2,2,6,6,10,2,1,9,9}');
INSERT INTO "mat_A" (row_id, row_vec) VALUES (3, '{3,9,9,9,8,6,3,9,5,6}');
INSERT INTO "mat_A" (row_id, row_vec) VALUES (4, '{6,4,2,2,2,7,8,8,0,7}');
INSERT INTO "mat_A" (row_id, row_vec) VALUES (5, '{6,8,9,9,4,6,9,5,7,7}');
INSERT INTO "mat_A" (row_id, row_vec) VALUES (6, '{4,10,7,3,9,5,9,2,3,4}');
INSERT INTO "mat_A" (row_id, row_vec) VALUES (7, '{8,10,7,10,1,9,7,9,8,7}');
INSERT INTO "mat_A" (row_id, row_vec) VALUES (8, '{7,4,5,6,2,8,1,1,4,8}');
INSERT INTO "mat_A" (row_id, row_vec) VALUES (9, '{8,8,8,5,2,6,9,1,8,3}');
INSERT INTO "mat_A" (row_id, row_vec) VALUES (10, '{4,6,3,2,6,4,1,2,3,8}');

CREATE TABLE "mat_B" (
    row_id integer,
    vector integer[]
);
INSERT INTO "mat_B" (row_id, vector) VALUES (1, '{9,10,2,4,6,5,3,7,5,6}');
INSERT INTO "mat_B" (row_id, vector) VALUES (2, '{5,3,5,2,8,6,9,7,7,6}');
INSERT INTO "mat_B" (row_id, vector) VALUES (3, '{0,1,2,3,2,7,7,3,10,1}');
INSERT INTO "mat_B" (row_id, vector) VALUES (4, '{2,9,0,4,3,6,8,6,3,4}');
INSERT INTO "mat_B" (row_id, vector) VALUES (5, '{3,8,7,7,0,5,3,9,2,10}');
INSERT INTO "mat_B" (row_id, vector) VALUES (6, '{5,3,1,7,6,3,5,3,6,4}');
INSERT INTO "mat_B" (row_id, vector) VALUES (7, '{4,8,4,4,2,7,10,0,3,3}');
INSERT INTO "mat_B" (row_id, vector) VALUES (8, '{4,6,0,1,3,1,6,6,9,8}');
INSERT INTO "mat_B" (row_id, vector) VALUES (9, '{6,5,1,7,2,7,10,6,0,6}');
INSERT INTO "mat_B" (row_id, vector) VALUES (10, '{1,4,4,4,8,5,2,8,5,5}');

Transpose a matrix

SELECT madlib.matrix_trans('"mat_B"', 'row=row_id, val=vector',
                           'mat_r');
SELECT * FROM mat_r ORDER BY row_id;

-- Note the result matrix has inherited 'vector' as the name of value column
 row_id |         vector
--------+-------------------------
      1 | {9,5,0,2,3,5,4,4,6,1}
      2 | {10,3,1,9,8,3,8,6,5,4}
      3 | {2,5,2,0,7,1,4,0,1,4}
      4 | {4,2,3,4,7,7,4,1,7,4}
      5 | {6,8,2,3,0,6,2,3,2,8}
      6 | {5,6,7,6,5,3,7,1,7,5}
      7 | {3,9,7,8,3,5,10,6,10,2}
      8 | {7,7,3,6,9,3,0,6,6,8}
      9 | {5,7,10,3,2,6,3,9,0,5}
      10 | {6,6,1,4,10,4,3,8,6,5}
(10 rows)

Add the two matrices

SELECT madlib.matrix_add('"mat_A"', 'row=row_id, val=row_vec',
                         '"mat_B"', 'row=row_id, val=vector',
                         'mat_r', 'val=vector');
SELECT * FROM mat_r ORDER BY row_id;

 row_id |            vector
--------+-------------------------------
      1 | {18,16,7,12,11,11,9,10,15,14}
      2 | {13,5,7,8,14,16,11,8,16,15}
      3 | {3,10,11,12,10,13,10,12,15,7}
      4 | {8,13,2,6,5,13,16,14,3,11}
      5 | {9,16,16,16,4,11,12,14,9,17}
      6 | {9,13,8,10,15,8,14,5,9,8}
      7 | {12,18,11,14,3,16,17,9,11,10}
      8 | {11,10,5,7,5,9,7,7,13,16}
      9 | {14,13,9,12,4,13,19,7,8,9}
      10 | {5,10,7,6,14,9,3,10,8,13}
(10 rows)

Multiply the two matrices

DROP TABLE IF EXISTS mat_r;
SELECT madlib.matrix_mult('"mat_A"', 'row=row_id, val=row_vec',
                          '"mat_B"', 'row=row_id, val=vector, trans=true',
                          'mat_r');
SELECT * FROM mat_r ORDER BY row_id;

 row_id |                  row_vec
--------+-------------------------------------------
     1  | {380,373,251,283,341,303,302,309,323,281}
     2  | {318,318,222,221,269,259,236,249,264,248}
     3  | {382,366,216,300,397,276,277,270,313,338}
     4  | {275,284,154,244,279,183,226,215,295,204}
     5  | {381,392,258,319,394,298,342,302,360,300}
     6  | {321,333,189,276,278,232,300,236,281,250}
     7  | {443,411,282,365,456,318,360,338,406,330}
     8  | {267,240,150,186,270,194,210,184,233,193}
     9  | {322,328,234,264,291,245,317,253,291,219}
     10 | {246,221,109,173,222,164,167,185,181,189}
(10 rows)

Extract row and column from matrix given index

SELECT madlib.matrix_extract_row('"mat_A"', 'row=row_id, val=row_vec', 2) as row,
       madlib.matrix_extract_col('"mat_A"', 'row=row_id, val=row_vec', 3) as col;

          row           |          col
------------------------+-----------------------
 {8,2,2,6,6,10,2,1,9,9} | {5,2,9,2,9,7,7,5,8,3}
(1 rows)

Get min and max values along a specific dimension as well as corresponding indicies. Note that dim=2 implies that the min and max is computed on each row returning a column vector i.e. the column (dim=2) is flattened.

SELECT madlib.matrix_max('"mat_A"', 'row=row_id, val=row_vec', 2, 'mat_max_r', true),
       madlib.matrix_min('"mat_A"', 'row=row_id, val=row_vec', 2, 'mat_min_r', true);
SELECT * from mat_max_r;
SELECT * from mat_min_r;

         index         |            max
-----------------------+---------------------------
 {8,5,1,6,2,1,1,5,6,9} | {10,10,9,8,9,10,10,8,9,8}
(1 rows)
 
         index         |          min
-----------------------+-----------------------
 {7,7,0,8,4,7,4,6,7,6} | {3,1,3,0,4,2,1,1,1,1}
(1 rows)

Element-wise multiplication between two matrices

SELECT madlib.matrix_elem_mult('"mat_A"', 'row=row_id, val=row_vec',
                               '"mat_B"', 'val=vector',
                               'mat_r', 'val=vector');
SELECT * FROM mat_r ORDER BY row_id;

 row_id |             vector
--------+---------------------------------
     1  | {81,60,10,32,30,30,18,21,50,48}
     2  | {40,6,10,12,48,60,18,7,63,54}
     3  | {0,9,18,27,16,42,21,27,50,6}
     4  | {12,36,0,8,6,42,64,48,0,28}
     5  | {18,64,63,63,0,30,27,45,14,70}
     6  | {20,30,7,21,54,15,45,6,18,16}
     7  | {32,80,28,40,2,63,70,0,24,21}
     8  | {28,24,0,6,6,8,6,6,36,64}
     9  | {48,40,8,35,4,42,90,6,0,18}
     10 | {4,24,12,8,48,20,2,16,15,40}

Get sum values along dimension. Sum is computed for each row (i.e. column is flattened since dim=2)

SELECT madlib.matrix_sum('"mat_A"', 'row=row_id, val=row_vec', 2);

           matrix_sum
---------------------------------
 {66,55,67,46,70,56,76,46,58,39}
(1 rows)

Multiply matrix with a scalar

DROP TABLE IF EXISTS mat_r;
SELECT madlib.matrix_scalar_mult('"mat_A"', 'row=row_id, val=row_vec', 3, 'mat_r');
SELECT * FROM mat_r ORDER BY row_id;

 row_id |             row_vec
--------+---------------------------------
      0 | {27,18,15,24,15,18,18,9,30,24}
      1 | {24,6,6,18,18,30,6,3,27,27}
      2 | {9,27,27,27,24,18,9,27,15,18}
      3 | {18,12,6,6,6,21,24,24,0,21}
      4 | {18,24,27,27,12,18,27,15,21,21}
      5 | {12,30,21,9,27,15,27,6,9,12}
      6 | {24,30,21,30,3,27,21,27,24,21}
      7 | {21,12,15,18,6,24,3,3,12,24}
      8 | {24,24,24,15,6,18,27,3,24,9}
      9 | {12,18,9,6,18,12,3,6,9,24}
(10 rows)

Multiply matrix with a vector

SELECT madlib.matrix_vec_mult('"mat_A"', 'row=row_id, val=row_vec',
                              array[1,2,3,4,5,6,7,8,9,10]);

              matrix_vec_mult
-------------------------------------------
 {365,325,358,270,377,278,411,243,287,217}
(10 rows)

Examples with sparse representation. The function calls are the same as before.

SELECT madlib.matrix_sparsify('"mat_B"', 'row=row_id, val=vector',
                              '"mat_B_sparse"', 'col=col_id, val=val');

Create a matrix in sparse format.

CREATE TABLE "mat_A_sparse" (
    "rowNum" integer,
    col_num integer,
    entry integer
);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (1, 1, 9);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (1, 2, 6);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (1, 7, 3);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (1, 8, 10);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (1, 9, 8);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (2, 1, 8);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (2, 2, 2);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (2, 3, 6);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (3, 5, 6);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (3, 6, 3);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (7, 1, 7);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (8, 2, 8);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (8, 3, 5);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (9, 1, 6);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (9, 2, 3);
INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (10, 10, 0);

Transpose a matrix in sparse format

-- Note the double quotes for '"rowNum"' required per PostgreSQL rules
SELECT madlib.matrix_trans('"mat_A_sparse"', 'row="rowNum", val=entry',
                           'matrix_r_sparse');
SELECT "rowNum", col_num, entry FROM matrix_r_sparse ORDER BY col_num;

 rowNum | col_num | entry
--------+---------+-------
      1 |       1 |     9
      2 |       1 |     6
      7 |       1 |     3
      8 |       1 |    10
      9 |       1 |     8
      1 |       2 |     8
      2 |       2 |     2
      3 |       2 |     6
      5 |       3 |     6
      6 |       3 |     3
      1 |       7 |     7
      2 |       8 |     8
      3 |       8 |     5
      1 |       9 |     6
      2 |       9 |     3
     10 |      10 |     0
(16 rows)

Add two sparse matrices

SELECT madlib.matrix_add('"mat_A_sparse"', 'row="rowNum", val=entry',
                         '"mat_B_sparse"', 'row=row_id, col=col_id, val=val',
                         'matrix_r_sparse', 'col=col_out');
SELECT madlib.matrix_densify('matrix_r_sparse', 'row="rowNum", col=col_out, val=entry',
                             'matrix_r');
SELECT * FROM matrix_r ORDER BY "rowNum";

 rowNum |           entry
--------+---------------------------
      1 | {18,16,2,4,6,5,6,17,13,6}
      2 | {13,5,11,2,8,6,9,7,7,6}
      3 | {0,1,2,3,8,10,7,3,10,1}
      4 | {2,9,0,4,3,6,8,6,3,4}
      5 | {3,8,7,7,0,5,3,9,2,10}
      6 | {5,3,1,7,6,3,5,3,6,4}
      7 | {11,8,4,4,2,7,10,0,3,3}
      8 | {4,14,5,1,3,1,6,6,9,8}
      9 | {12,8,1,7,2,7,10,6,0,6}
     10 | {1,4,4,4,8,5,2,8,5,5}
(10 rows)

Multiply two sparse matrices

SELECT madlib.matrix_mult('"mat_A_sparse"', 'row="rowNum", col=col_num, val=entry',
                          '"mat_B_sparse"', 'row=row_id, col=col_id, val=val, trans=true',
                          'matrix_r');
SELECT * FROM matrix_r ORDER BY "rowNum";

 rowNum |                   entry
--------+-------------------------------------------
      1 | {260,216,137,180,190,156,138,222,174,159}
      2 | {104,76,14,34,82,52,72,44,64,40}
      3 | {51,66,33,36,15,45,33,21,33,63}
      4 | {0,0,0,0,0,0,0,0,0,0}
      5 | {0,0,0,0,0,0,0,0,0,0}
      6 | {0,0,0,0,0,0,0,0,0,0}
      7 | {63,35,0,14,21,35,28,28,42,7}
      8 | {90,49,18,72,99,29,84,48,45,52}
      9 | {84,39,3,39,42,39,48,42,51,18}
     10 | {0,0,0,0,0,0,0,0,0,0}
(10 rows)

Related Topics

File array_ops.sql_in documenting the array operations Array Operations

File matrix_ops.sql_in for list of functions and usage.