The linalg module consists of useful utility functions for basic linear algebra operations. Several of these functions can be used while implementing new algorithms. These functions operate on vectors (1-D FLOAT8 array) and matrices (2-D FLOAT8 array). Note that other array types may need to be casted into FLOAT8[] before calling the functions.
Refer to the linalg.sql_in file for documentation on each of the utility functions.
norm1() | 1-norm of a vector, \(\|\vec{a}\|_1\). |
---|---|
norm2() | 2-norm of a vector, \(\|\vec{a}\|_2\). |
dist_norm1() | 1-norm of the difference between two vectors, \(\|\vec{a} - \vec{b}\|_1\). |
dist_norm2() | 2-norm of the difference between two vectors, \(\|\vec{a} - \vec{b}\|_2\). |
dist_pnorm() | Generic p-norm of the difference between two vectors, \(\|\vec{a} - \vec{b}\|_p, p > 0\). |
dist_inf_norm() | Infinity-norm of the difference between two vectors, \(\|\vec{a} - \vec{b}\|_\infty\). |
squared_dist_norm2() | Squared 2-norm of the difference between two vectors, \(\|\vec{a} - \vec{b}\|_2^2\). |
cosine_similarity() | Cosine score between two vectors, \(\frac{\vec{a} \cdot \vec{b}}{\|\vec{a}\|_2 \|\vec{b}\|_2}\). |
dist_angle() | Angle between two vectors in an Euclidean space, \(\cos^{-1}(\frac{\vec{a} \cdot \vec{b}}{\|\vec{a}\|_2 \|\vec{b}\|_2})\). |
dist_tanimoto() | Tanimoto distance between two vectors. [1] |
dist_jaccard() | Jaccard distance between two varchar vectors treated as sets. |
get_row() | Return the indexed row of a matrix (2-D array). |
get_col() | Return the indexed col of a matrix (2-D array). |
avg() | Compute the average of vectors. |
normalized_avg() | Compute the normalized average of vectors (unit vector in an Euclidean space). |
matrix_agg() | Combine vectors to a matrix. |
Vector Norms and Distances
CREATE TABLE two_vectors( id integer, a float8[], b float8[]);
INSERT INTO two_vectors VALUES (1, '{3,4}', '{4,5}'), (2, '{1,1,0,-4,5,3,4,106,14}', '{1,1,0,6,-3,1,2,92,2}');
SELECT id, madlib.norm1(a), madlib.norm2(a) FROM two_vectors;Result:
id | norm1 | norm2 ----+-------+------------------ 1 | 7 | 5 2 | 138 | 107.238052947636 (2 rows)
SELECT id, madlib.dist_norm1(a, b), madlib.dist_norm2(a, b), madlib.dist_pnorm(a, b, 5) AS norm5, madlib.dist_inf_norm(a, b), madlib.squared_dist_norm2(a, b) AS sq_dist_norm2, madlib.cosine_similarity(a, b), madlib.dist_angle(a, b), madlib.dist_tanimoto(a, b), madlib.dist_jaccard(a::text[], b::text[]) FROM two_vectors;Result:
id | dist_norm1 | dist_norm2 | norm5 | dist_inf_norm | sq_dist_norm2 | cosine_similarity | dist_angle | dist_tanimoto | dist_jaccard ----+------------+------------------+------------------+---------------+---------------+-------------------+--------------------+--------------------+------------------- 1 | 2 | 1.4142135623731 | 1.14869835499704 | 1 | 2 | 0.999512076087079 | 0.0312398334302684 | 0.0588235294117647 | 0.666666666666667 2 | 48 | 22.6274169979695 | 15.585086360695 | 14 | 512 | 0.985403348449008 | 0.17106899659286 | 0.0498733684005455 | 0.833333333333333 (2 rows)
Matrix Functions
CREATE TABLE matrix( id integer, m float8[]);
INSERT INTO matrix VALUES (1, '{{4,5},{3,5},{9,0}}');
SELECT madlib.get_row(m, 1) AS row_1, madlib.get_row(m, 2) AS row_2, madlib.get_row(m, 3) AS row_3, madlib.get_col(m, 1) AS col_1, madlib.get_col(m, 2) AS col_2 FROM matrix;Result:
row_1 | row_2 | row_3 | col_1 | col_2 -------+-------+-------+---------+--------- {4,5} | {3,5} | {9,0} | {4,3,9} | {5,5,0} (1 row)
Aggregate Functions
CREATE TABLE vector( id integer, v float8[]);
INSERT INTO vector VALUES (1, '{4,3}'), (2, '{8,6}'), (3, '{12,9}');
SELECT madlib.avg(v), madlib.normalized_avg(v), madlib.matrix_agg(v) FROM vector;Result:
avg | normalized_avg | matrix_agg -------+----------------+---------------------- {8,6} | {0.8,0.6} | {{4,3},{8,6},{12,9}} (1 row)
[1] http://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_similarity_and_distance