2.1.0
User Documentation for Apache MADlib

This module provides a set of basic array operations implemented in C and SQL. It is a support module for several machine learning algorithms that require fast array operations.

Implementation Notes

All functions (except normalize() and array_filter()) described in this module work with 2-D arrays.

These functions support several numeric types:

Additionally, array_unnest_2d_to_1d() supports other data types such as TEXT or VARCHAR.

Several of the function require NO NULL VALUES, while others omit NULLs and return results. See details in description of individual functions.

Array Operations
array_add()

Adds two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type.

sum()

Aggregate, sums vector element-wisely. It requires that all the values are NON-NULL. Return type is the same as the input type.

array_sub()

Subtracts two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type.

array_mult()

Element-wise product of two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type.

array_div()

Element-wise division of two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type.

array_dot()

Dot-product of two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type.

array_contains()

Checks whether one array contains the other. This function returns TRUE if each non-zero element in the right array equals to the element with the same index in the left array.

array_max()

This function finds the maximum value in the array. NULLs are ignored. Return type is the same as the input type.

array_max_index()

This function finds the maximum value and corresponding index in the array. NULLs are ignored. Return type is array in format [max, index], and its element type is the same as the input type.

array_min()

This function finds the minimum value in the array. NULLs are ignored. Return type is the same as the input type.

array_min_index()

This function finds the minimum value and corresponding index in the array. NULLs are ignored. Return type is array in format [min, index], and its element type is the same as the input type.

array_sum()

This function finds the sum of the values in the array. NULLs are ignored. Return type is the same as the input type.

array_sum_big()

This function finds the sum of the values in the array. NULLs are ignored. Return type is always FLOAT8 regardless of input. This function is meant to replace array_sum() in cases when a sum may overflow the element type.

array_abs_sum()

This function finds the sum of abs of the values in the array. NULLs are ignored. Return type is the same as the input type.

array_abs()

This function takes an array as the input and finds abs of each element in the array, returning the resulting array. It requires that all the values are NON-NULL.

array_mean()

This function finds the mean of the values in the array. NULLs are ignored.

array_stddev()

This function finds the standard deviation of the values in the array. NULLs are ignored.

array_of_float()

This function creates an array of set size (the argument value) of FLOAT8, initializing the values to 0.0.

array_of_bigint()

This function creates an array of set size (the argument value) of BIGINT, initializing the values to 0.

array_fill()

This functions set every value in the array to some desired value (provided as the argument).

array_filter()

This function takes an array as the input and keep only elements that satisfy the operator on specified scalar. It requires that the array is 1-D and all the values are NON-NULL. Return type is the same as the input type. By default, this function removes all zeros.

array_scalar_mult()

This function takes an array as the input and executes element-wise multiplication by the scalar provided as the second argument, returning the resulting array. It requires that all the values are NON-NULL. Return type is the same as the input type.

array_scalar_add()

This function takes an array as the input and executes element-wise addition of the scalar provided as the second argument, returning the resulting array. It requires that all the values are NON-NULL. Return type is the same as the input type.

array_sqrt()

This function takes an array as the input and finds square root of each element in the array, returning the resulting array. It requires that all the values are NON-NULL.

array_pow()

This function takes an array and a float8 as the input and finds power of each element in the array, returning the resulting array. It requires that all the values are NON-NULL.

array_square()

This function takes an array as the input and finds square of each element in the array, returning the resulting array. It requires that all the values are NON-NULL.

normalize()

This function normalizes an array as sum of squares to be 1. It requires that the array is 1-D and all the values are NON-NULL.

array_unnest_2d_to_1d()

This function takes a 2-D array as the input and unnests it by one level. It returns a set of 1-D arrays that correspond to rows of the input array as well as an ID column with values corresponding to row positions occupied by those 1-D arrays within the 2-D array.

Examples
  1. Create a database table with two integer array columns and add some data.
    CREATE TABLE array_tbl ( id integer,
                             array1 integer[],
                             array2 integer[]
                           );
    INSERT INTO array_tbl VALUES
                           ( 1, '{1,2,3,4,5,6,7,8,9}', '{9,8,7,6,5,4,3,2,1}' ),
                           ( 2, '{1,1,0,1,1,2,3,99,8}','{0,0,0,-5,4,1,1,7,6}' );
    
  2. Find the minimum, maximum, mean, and standard deviation of the array1 column.
    SELECT id, madlib.array_min(array1), madlib.array_max(array1),
               madlib.array_min_index(array1), madlib.array_max_index(array1),
               madlib.array_mean(array1), madlib.array_stddev(array1)
    FROM array_tbl;
    
    Result:
    id | array_min | array_max | array_min_index | array_max_index |    array_mean    |   array_stddev
    ----+-----------+-----------+---------------+---------------+------------------+------------------
      1 |         1 |         9 | {1,1}         | {9,9}         |                5 | 2.73861278752583
      2 |         0 |        99 | {0,3}         | {99,8}        | 12.8888888888889 | 32.3784050118457(2 rows)
    
  3. Perform array addition and subtraction.
    SELECT id, madlib.array_add(array1,array2),
              madlib.array_sub(array1,array2)
    FROM array_tbl;
    
    Result:
     id |          array_add           |        array_sub
     ---+------------------------------+-------------------------
      2 | {1,1,0,-4,5,3,4,106,14}      | {1,1,0,6,-3,1,2,92,2}
      1 | {10,10,10,10,10,10,10,10,10} | {-8,-6,-4,-2,0,2,4,6,8}
    (2 rows)
    
  4. Perform element-wise array multiplication and division. The row with id=2 is excluded because the divisor array contains zero, which would cause a divide-by-zero error.
    SELECT id, madlib.array_mult(array1,array2),
               madlib.array_div(array1,array2)
    FROM array_tbl
    WHERE 0 != ALL(array2);
    Result:
     id |         array_mult         |      array_div
     ---+----------------------------+---------------------
      1 | {9,16,21,24,25,24,21,16,9} | {0,0,0,0,1,1,2,4,9}
    (1 row)
    
  5. Calculate the dot product of the arrays.
    SELECT id, madlib.array_dot(array1, array2)
    FROM array_tbl;
    
    Result:
     id | array_dot
     ---+----------
      2 |       745
      1 |       165
    (2 rows)
    
  6. Multiply an array by a scalar 3.
    SELECT id, madlib.array_scalar_mult(array1,3)
    FROM array_tbl;
    
    Result:
     id |     array_scalar_mult
     ---+--------------------------
      1 | {3,6,9,12,15,18,21,24,27}
      2 | {3,3,0,3,3,6,9,297,24}
    (2 rows)
    
  7. Construct a nine-element array with each element set to the value 1.3.
    SELECT madlib.array_fill(madlib.array_of_float(9), 1.3::float);
    
    Result:
                  array_fill
     --------------------------------------
     {1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3}
    (1 row)
    
  8. Unnest a column of 2-D arrays into sets of 1-D arrays.
    SELECT id, (madlib.array_unnest_2d_to_1d(val)).*
    FROM (
      SELECT 1::INT AS id, ARRAY[[1.3,2.0,3.2],[10.3,20.0,32.2]]::FLOAT8[][] AS val
      UNION ALL
      SELECT 2, ARRAY[[pi(),pi()/2],[2*pi(),pi()],[pi()/4,4*pi()]]::FLOAT8[][]
    ) t
    ORDER BY 1,2;
    
    Result:
     id | unnest_row_id |            unnest_result
    ----+---------------+--------------------------------------
      1 |             1 | {1.3,2,3.2}
      1 |             2 | {10.3,20,32.2}
      2 |             1 | {3.14159265358979,1.5707963267949}
      2 |             2 | {6.28318530717959,3.14159265358979}
      2 |             3 | {0.785398163397448,12.5663706143592}
    (5 rows)
    
    If the function is called without the .* notation then it will return a composite record type with two attributes: the row ID and corresponding unnested array result.

Related Topics

File array_ops.sql_in for list of functions and usage.