MADlib
1.0 A newer version is available
User Documentation
|
SQL functions for logistic regression. More...
Go to the source code of this file.
Functions | |
void | logregr_train (varchar tbl_source, varchar tbl_output, varchar dep_col, varchar ind_col, varchar grouping_col, integer max_iter, varchar optimizer, float8 tolerance, boolean verbose) |
Compute logistic-regression coefficients and diagnostic statistics. More... | |
float8 | logistic (float8 x) |
Evaluate the usual logistic function in an under-/overflow-safe way. More... | |
Definition in file logistic.sql_in.
float8 logistic | ( | float8 | x) |
x |
Evaluating this expression directly can lead to under- or overflows. This function performs the evaluation in a safe manner, making use of the following observations:
In order for the outcome of \( \exp(x) \) to be within the range of the minimum positive double-precision number (i.e., \( 2^{-1074} \)) and the maximum positive double-precision number (i.e., \( (1 + (1 - 2^{52})) * 2^{1023}) \), \( x \) has to be within the natural logarithm of these numbers, so roughly in between -744 and 709. However, \( 1 + \exp(x) \) will just evaluate to 1 if \( \exp(x) \) is less than the machine epsilon (i.e., \( 2^{-52} \)) or, equivalently, if \( x \) is less than the natural logarithm of that; i.e., in any case if \( x \) is less than -37. Note that taking the reciprocal of the largest double-precision number will not cause an underflow. Hence, no further checks are necessary.
Definition at line 602 of file logistic.sql_in.
void logregr_train | ( | varchar | tbl_source, |
varchar | tbl_output, | ||
varchar | dep_col, | ||
varchar | ind_col, | ||
varchar | grouping_col, | ||
integer | max_iter, | ||
varchar | optimizer, | ||
float8 | tolerance, | ||
boolean | verbose | ||
) |
To include an intercept in the model, set one coordinate in the independentVariables
array to 1.
tbl_source | Name of the source relation containing the training data |
tbl_output | Name of the output relation to store the model results Columns of the output relation are as follows:
|
dep_col | Name of the dependent column (of type BOOLEAN) |
ind_col | Name of the independent column (of type DOUBLE PRECISION[]) |
grouping_col | Comma delimited list of column names to group-by |
max_iter | The maximum number of iterations |
optimizer | The optimizer to use (either 'irls' /'newton' for iteratively reweighted least squares or 'cg' for conjugent gradient) |
tolerance | The difference between log-likelihood values in successive iterations that should indicate convergence. This value should be non-negative and a zero value here disables the convergence criterion, and execution will only stop after maxNumIterations iterations. |
verbose | If true, any error or warning message will be printed to the console (irrespective of the 'client_min_messages' set by server). If false, no error/warning message is printed to console. |
SELECT logregr_train('sourceName', 'outName' 'dependentVariable', 'independentVariables'); SELECT * from outName;
SELECT coef from outName;
SELECT coef, log_likelihood, p_values FROM outName;
Definition at line 499 of file logistic.sql_in.