This module provides a set of basic matrix operations for matrices that are too big to fit in memory. We provide two storage formats for a matrix:
row_id | row_vec --------+------------------------- 1 | {9,6,5,8,5,6,6,3,10,8} 2 | {8,2,2,6,6,10,2,1,9,9} 3 | {3,9,9,9,8,6,3,9,5,6}
A 'row' column (called as row_id above) provides the row number of each row and a 'val' column (called as row_vec above) provides each row as an array. The row column should contain a series of integers from 1 to N with no duplicates, where N is the row dimensionality.
row_id | col_id | value --------+--------+------- 1 | 1 | 9 1 | 5 | 6 1 | 6 | 6 2 | 1 | 8 3 | 1 | 3 3 | 2 | 9 4 | 7 | 0 (6 rows)
For sparse matrices, the row and col columns together should not contain a duplicate entry and the val column should be of scalar (non-array) data type.
For comparison, the dense representation of this matrix is shown below. Note the dimensionality of the dense matrix is 4 x 7 since the max value of row and col is 4 and 7 respectively, leading to all zeros in the last row and last column.
row_id | row_vec --------+------------------------- 1 | {9,0,0,0,6,6,0} 2 | {8,0,0,0,0,0,0} 3 | {3,9,0,0,0,0,0} 4 | {0,0,0,0,0,0,0}
Given below are the supported matrix operations. The meaning of the arguments and other terms are common to all functions and provided at the end of the list as a glossary.
-- Convert to sparse representation matrix_sparsify( matrix_in, in_args, matrix_out, out_args) -- Convert to dense representation matrix_densify( matrix_in, in_args, matrix_out, out_args)
-- Matrix transposition matrix_trans( matrix_in, in_args, matrix_out, out_args) -- Matrix addition matrix_add( matrix_a, a_args, matrix_b, b_args, matrix_out, out_args) -- Matrix subtraction matrix_sub( matrix_a, a_args, matrix_b, b_args, matrix_out, out_args) -- Matrix multiplication matrix_mult( matrix_a, a_args, matrix_b, b_args, matrix_out, out_args) -- Element-wise matrix multiplication matrix_elem_mult( matrix_a, a_args, matrix_b, b_args, matrix_out, out_args) -- Multiply matrix with scalar. matrix_scalar_mult( matrix_in, in_args, scalar, matrix_out, out_args) -- Multiply matrix with vector. matrix_vec_mult( matrix_in, in_args, vector)
-- Extract row from matrix given row index matrix_extract_row( matrix_in, in_args, index) -- Extract column from matrix given column index matrix_extract_col( matrix_in, in_args, index)
-- Get max value along dim. Also returns corresponding index if fetch_index = True matrix_max( matrix_in, in_args, dim, matrix_out, fetch_index) -- Get min value along dim. Also returns corresponding index if fetch_index = True matrix_min( matrix_in, in_args, dim, matrix_out, fetch_index) -- Get sum value along dimension from matrix. matrix_sum( matrix_in, in_args, dim) -- Get mean value along dimension from matrix. matrix_mean( matrix_in, in_args, dim)
The table below provides a glossary of the terms used in the matrix operations.
TEXT. Name of the table containing the input matrix.
TEXT. A comma-delimited string containing multiple named arguments of the form "name=value". This argument is used as a container for multiple parameters related to a single matrix.
The following parameters are supported for this string argument:
row | (Default: 'row_num') Name of the column containing row index of the matrix. |
---|---|
col | (Default: 'col_num') Name of the column containing column index of the matrix. |
val | (Default: 'val') Name of the column containing the entries of the matrix. |
trans | (Default: False) Boolean flag to indicate if the matrix should be transposed before the operation. This is currently functional only for matrix_mult. |
For example, the string argument with default values will be 'row=row_num, col=col_num, val=val, trans=False'. Alternatively, the string argument can be set to NULL or be blank ('') if default values are to be used.
TEXT. Name of the table to store the result matrix.
TEXT. A comma-delimited string containing named arguments of the form "name=value". This is an optional parameter and the default value is set as follows:
The following parameters are supported for this string argument:
row | Name of the column containing row index of the matrix. |
---|---|
col | Name of the column containing column index of the matrix. |
val | Name of the column containing the entries of the matrix. |
INTEGER. An integer representing a row or column index of the matrix. Should be a number from 1 to N, where N is the maximum size of the dimension.
INTEGER. Should either be 1 or 2. This value indicates the dimension to operate along for the reduction/aggregation operations. The value of dim should be interpreted as the dimension to be flattened i.e. whose length reduces to 1 in the result.
For dim=1, a reduction function on an NxM matrix operates on successive elements in each column and returns a single vector with M elements (i.e. matrix with just 1 row and M columns).
For dim=2, a single vector is returned with N elements (i.e. matrix with just 1 column and N rows).
CREATE TABLE "mat_A" ( row_id integer, row_vec integer[] ); INSERT INTO "mat_A" (row_id, row_vec) VALUES (1, '{9,6,5,8,5,6,6,3,10,8}'); INSERT INTO "mat_A" (row_id, row_vec) VALUES (2, '{8,2,2,6,6,10,2,1,9,9}'); INSERT INTO "mat_A" (row_id, row_vec) VALUES (3, '{3,9,9,9,8,6,3,9,5,6}'); INSERT INTO "mat_A" (row_id, row_vec) VALUES (4, '{6,4,2,2,2,7,8,8,0,7}'); INSERT INTO "mat_A" (row_id, row_vec) VALUES (5, '{6,8,9,9,4,6,9,5,7,7}'); INSERT INTO "mat_A" (row_id, row_vec) VALUES (6, '{4,10,7,3,9,5,9,2,3,4}'); INSERT INTO "mat_A" (row_id, row_vec) VALUES (7, '{8,10,7,10,1,9,7,9,8,7}'); INSERT INTO "mat_A" (row_id, row_vec) VALUES (8, '{7,4,5,6,2,8,1,1,4,8}'); INSERT INTO "mat_A" (row_id, row_vec) VALUES (9, '{8,8,8,5,2,6,9,1,8,3}'); INSERT INTO "mat_A" (row_id, row_vec) VALUES (10, '{4,6,3,2,6,4,1,2,3,8}'); CREATE TABLE "mat_B" ( row_id integer, vector integer[] ); INSERT INTO "mat_B" (row_id, vector) VALUES (1, '{9,10,2,4,6,5,3,7,5,6}'); INSERT INTO "mat_B" (row_id, vector) VALUES (2, '{5,3,5,2,8,6,9,7,7,6}'); INSERT INTO "mat_B" (row_id, vector) VALUES (3, '{0,1,2,3,2,7,7,3,10,1}'); INSERT INTO "mat_B" (row_id, vector) VALUES (4, '{2,9,0,4,3,6,8,6,3,4}'); INSERT INTO "mat_B" (row_id, vector) VALUES (5, '{3,8,7,7,0,5,3,9,2,10}'); INSERT INTO "mat_B" (row_id, vector) VALUES (6, '{5,3,1,7,6,3,5,3,6,4}'); INSERT INTO "mat_B" (row_id, vector) VALUES (7, '{4,8,4,4,2,7,10,0,3,3}'); INSERT INTO "mat_B" (row_id, vector) VALUES (8, '{4,6,0,1,3,1,6,6,9,8}'); INSERT INTO "mat_B" (row_id, vector) VALUES (9, '{6,5,1,7,2,7,10,6,0,6}'); INSERT INTO "mat_B" (row_id, vector) VALUES (10, '{1,4,4,4,8,5,2,8,5,5}');
SELECT madlib.matrix_trans('"mat_B"', 'row=row_id, val=vector', 'mat_r'); SELECT * FROM mat_r ORDER BY row_id;
-- Note the result matrix has inherited 'vector' as the name of value column row_id | vector --------+------------------------- 1 | {9,5,0,2,3,5,4,4,6,1} 2 | {10,3,1,9,8,3,8,6,5,4} 3 | {2,5,2,0,7,1,4,0,1,4} 4 | {4,2,3,4,7,7,4,1,7,4} 5 | {6,8,2,3,0,6,2,3,2,8} 6 | {5,6,7,6,5,3,7,1,7,5} 7 | {3,9,7,8,3,5,10,6,10,2} 8 | {7,7,3,6,9,3,0,6,6,8} 9 | {5,7,10,3,2,6,3,9,0,5} 10 | {6,6,1,4,10,4,3,8,6,5} (10 rows)
SELECT madlib.matrix_add('"mat_A"', 'row=row_id, val=row_vec', '"mat_B"', 'row=row_id, val=vector', 'mat_r', 'val=vector'); SELECT * FROM mat_r ORDER BY row_id;
row_id | vector --------+------------------------------- 1 | {18,16,7,12,11,11,9,10,15,14} 2 | {13,5,7,8,14,16,11,8,16,15} 3 | {3,10,11,12,10,13,10,12,15,7} 4 | {8,13,2,6,5,13,16,14,3,11} 5 | {9,16,16,16,4,11,12,14,9,17} 6 | {9,13,8,10,15,8,14,5,9,8} 7 | {12,18,11,14,3,16,17,9,11,10} 8 | {11,10,5,7,5,9,7,7,13,16} 9 | {14,13,9,12,4,13,19,7,8,9} 10 | {5,10,7,6,14,9,3,10,8,13} (10 rows)
DROP TABLE IF EXISTS mat_r; SELECT madlib.matrix_mult('"mat_A"', 'row=row_id, val=row_vec', '"mat_B"', 'row=row_id, val=vector, trans=true', 'mat_r'); SELECT * FROM mat_r ORDER BY row_id;
row_id | row_vec --------+------------------------------------------- 1 | {380,373,251,283,341,303,302,309,323,281} 2 | {318,318,222,221,269,259,236,249,264,248} 3 | {382,366,216,300,397,276,277,270,313,338} 4 | {275,284,154,244,279,183,226,215,295,204} 5 | {381,392,258,319,394,298,342,302,360,300} 6 | {321,333,189,276,278,232,300,236,281,250} 7 | {443,411,282,365,456,318,360,338,406,330} 8 | {267,240,150,186,270,194,210,184,233,193} 9 | {322,328,234,264,291,245,317,253,291,219} 10 | {246,221,109,173,222,164,167,185,181,189} (10 rows)
SELECT madlib.matrix_extract_row('"mat_A"', 'row=row_id, val=row_vec', 2) as row, madlib.matrix_extract_col('"mat_A"', 'row=row_id, val=row_vec', 3) as col;
row | col ------------------------+----------------------- {8,2,2,6,6,10,2,1,9,9} | {5,2,9,2,9,7,7,5,8,3} (1 rows)
SELECT madlib.matrix_max('"mat_A"', 'row=row_id, val=row_vec', 2, 'mat_max_r', true), madlib.matrix_min('"mat_A"', 'row=row_id, val=row_vec', 2, 'mat_min_r', true); SELECT * from mat_max_r; SELECT * from mat_min_r;
index | max -----------------------+--------------------------- {8,5,1,6,2,1,1,5,6,9} | {10,10,9,8,9,10,10,8,9,8} (1 rows) index | min -----------------------+----------------------- {7,7,0,8,4,7,4,6,7,6} | {3,1,3,0,4,2,1,1,1,1} (1 rows)
SELECT madlib.matrix_elem_mult('"mat_A"', 'row=row_id, val=row_vec', '"mat_B"', 'val=vector', 'mat_r', 'val=vector'); SELECT * FROM mat_r ORDER BY row_id;
row_id | vector --------+--------------------------------- 1 | {81,60,10,32,30,30,18,21,50,48} 2 | {40,6,10,12,48,60,18,7,63,54} 3 | {0,9,18,27,16,42,21,27,50,6} 4 | {12,36,0,8,6,42,64,48,0,28} 5 | {18,64,63,63,0,30,27,45,14,70} 6 | {20,30,7,21,54,15,45,6,18,16} 7 | {32,80,28,40,2,63,70,0,24,21} 8 | {28,24,0,6,6,8,6,6,36,64} 9 | {48,40,8,35,4,42,90,6,0,18} 10 | {4,24,12,8,48,20,2,16,15,40}
SELECT madlib.matrix_sum('"mat_A"', 'row=row_id, val=row_vec', 2);
matrix_sum --------------------------------- {66,55,67,46,70,56,76,46,58,39} (1 rows)
DROP TABLE IF EXISTS mat_r; SELECT madlib.matrix_scalar_mult('"mat_A"', 'row=row_id, val=row_vec', 3, 'mat_r'); SELECT * FROM mat_r ORDER BY row_id;
row_id | row_vec --------+--------------------------------- 0 | {27,18,15,24,15,18,18,9,30,24} 1 | {24,6,6,18,18,30,6,3,27,27} 2 | {9,27,27,27,24,18,9,27,15,18} 3 | {18,12,6,6,6,21,24,24,0,21} 4 | {18,24,27,27,12,18,27,15,21,21} 5 | {12,30,21,9,27,15,27,6,9,12} 6 | {24,30,21,30,3,27,21,27,24,21} 7 | {21,12,15,18,6,24,3,3,12,24} 8 | {24,24,24,15,6,18,27,3,24,9} 9 | {12,18,9,6,18,12,3,6,9,24} (10 rows)
SELECT madlib.matrix_vec_mult('"mat_A"', 'row=row_id, val=row_vec', array[1,2,3,4,5,6,7,8,9,10]);
matrix_vec_mult ------------------------------------------- {365,325,358,270,377,278,411,243,287,217} (10 rows)
SELECT madlib.matrix_sparsify('"mat_B"', 'row=row_id, val=vector', '"mat_B_sparse"', 'col=col_id, val=val');
CREATE TABLE "mat_A_sparse" ( "rowNum" integer, col_num integer, entry integer ); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (1, 1, 9); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (1, 2, 6); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (1, 7, 3); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (1, 8, 10); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (1, 9, 8); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (2, 1, 8); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (2, 2, 2); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (2, 3, 6); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (3, 5, 6); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (3, 6, 3); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (7, 1, 7); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (8, 2, 8); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (8, 3, 5); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (9, 1, 6); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (9, 2, 3); INSERT INTO "mat_A_sparse" ("rowNum", col_num, entry) VALUES (10, 10, 0);
-- Note the double quotes for '"rowNum"' required per PostgreSQL rules SELECT madlib.matrix_trans('"mat_A_sparse"', 'row="rowNum", val=entry', 'matrix_r_sparse'); SELECT "rowNum", col_num, entry FROM matrix_r_sparse ORDER BY col_num;
rowNum | col_num | entry --------+---------+------- 1 | 1 | 9 2 | 1 | 6 7 | 1 | 3 8 | 1 | 10 9 | 1 | 8 1 | 2 | 8 2 | 2 | 2 3 | 2 | 6 5 | 3 | 6 6 | 3 | 3 1 | 7 | 7 2 | 8 | 8 3 | 8 | 5 1 | 9 | 6 2 | 9 | 3 10 | 10 | 0 (16 rows)
SELECT madlib.matrix_add('"mat_A_sparse"', 'row="rowNum", val=entry', '"mat_B_sparse"', 'row=row_id, col=col_id, val=val', 'matrix_r_sparse', 'col=col_out'); SELECT madlib.matrix_densify('matrix_r_sparse', 'row="rowNum", col=col_out, val=entry', 'matrix_r'); SELECT * FROM matrix_r ORDER BY "rowNum";
rowNum | entry --------+--------------------------- 1 | {18,16,2,4,6,5,6,17,13,6} 2 | {13,5,11,2,8,6,9,7,7,6} 3 | {0,1,2,3,8,10,7,3,10,1} 4 | {2,9,0,4,3,6,8,6,3,4} 5 | {3,8,7,7,0,5,3,9,2,10} 6 | {5,3,1,7,6,3,5,3,6,4} 7 | {11,8,4,4,2,7,10,0,3,3} 8 | {4,14,5,1,3,1,6,6,9,8} 9 | {12,8,1,7,2,7,10,6,0,6} 10 | {1,4,4,4,8,5,2,8,5,5} (10 rows)
SELECT madlib.matrix_mult('"mat_A_sparse"', 'row="rowNum", col=col_num, val=entry', '"mat_B_sparse"', 'row=row_id, col=col_id, val=val, trans=true', 'matrix_r'); SELECT * FROM matrix_r ORDER BY "rowNum";
rowNum | entry --------+------------------------------------------- 1 | {260,216,137,180,190,156,138,222,174,159} 2 | {104,76,14,34,82,52,72,44,64,40} 3 | {51,66,33,36,15,45,33,21,33,63} 4 | {0,0,0,0,0,0,0,0,0,0} 5 | {0,0,0,0,0,0,0,0,0,0} 6 | {0,0,0,0,0,0,0,0,0,0} 7 | {63,35,0,14,21,35,28,28,42,7} 8 | {90,49,18,72,99,29,84,48,45,52} 9 | {84,39,3,39,42,39,48,42,51,18} 10 | {0,0,0,0,0,0,0,0,0,0} (10 rows)
File array_ops.sql_in documenting the array operations Array Operations
File matrix_ops.sql_in for list of functions and usage.