Arrays and Matrices

These modules provide basic mathematical operations to be run on array and matrices.

For a distributed system, a matrix cannot simply be represented as a 2D array of numbers in memory. **We provide two forms of distributed representation of a matrix**:

- Dense: The matrix is represented as a distributed collection of 1-D arrays. An example 3x10 matrix would be the below table:
row_id | row_vec --------+------------------------- 1 | {9,6,5,8,5,6,6,3,10,8} 2 | {8,2,2,6,6,10,2,1,9,9} 3 | {3,9,9,9,8,6,3,9,5,6}

- Sparse: The matrix is represented using the row and column indices for each non-zero entry of the matrix. Example:
row_id | col_id | value --------+--------+------- 1 | 1 | 9 1 | 5 | 6 1 | 6 | 6 2 | 1 | 8 3 | 1 | 3 3 | 2 | 9 4 | 7 | 0 (6 rows)

All matrix operations work with either form of representation.

In many cases, a matrix function can be **decomposed to vector operations applied independently on each row of a matrix (or corresponding rows of two matrices)**. We have also provided access to these internal vector operations (Array Operations) for greater flexibility. Matrix operations like *matrix_add* use the corresponding vector operation (*array_add*) and also include additional validation and formating. Other functions like *matrix_mult* are complex and use a combination of such vector operations and other SQL operations.

**It's important to note** that these array functions are only available for the dense format representation of the matrix. In general, the scope of a single array function invocation is limited to only an array (1-dimensional or 2-dimensional) that fits in memory. When such function is executed on a table of arrays, the function is called multiple times - once for each array (or pair of arrays). On contrary, scope of a single matrix function invocation is the complete matrix stored as a distributed table.

## Modules | |

Array Operations | |

Provides fast array operations supporting other MADlib modules. | |

Matrix Operations | |

Provides fast matrix operations supporting other MADlib modules. | |

Matrix Factorization | |

Linear algebra methods that factorize a matrix into a product of matrices. | |

Norms and Distance Functions | |

Provides utility functions for basic linear algebra operations. | |

Sparse Vectors | |

Implements a sparse vector data type that provides compressed storage of vectors that may have many duplicate elements. | |