SQL functions for statistical hypothesis tests. More...
Functions | |
float8 [] | t_test_one_transition (float8[] state, float8 value) |
float8 [] | t_test_merge_states (float8[] state1, float8[] state2) |
t_test_result | t_test_one_final (float8[] state) |
f_test_result | f_test_final (float8[] state) |
aggregate float8 [] | t_test_one (float8 value) |
Perform one-sample or dependent paired Student t-test. More... | |
float8 [] | t_test_two_transition (float8[] state, boolean first, float8 value) |
t_test_result | t_test_two_pooled_final (float8[] state) |
aggregate float8 [] | t_test_two_pooled (boolean first, float8 value) |
Perform two-sample pooled (i.e., equal variances) Student t-test. More... | |
t_test_result | t_test_two_unpooled_final (float8[] state) |
aggregate float8 [] | t_test_two_unpooled (boolean first, float8 value) |
Perform unpooled (i.e., unequal variances) t-test (also known as Welch's t-test) More... | |
aggregate float8 [] | f_test (boolean first, float8 value) |
Perform Fisher F-test. More... | |
float8 [] | chi2_gof_test_transition (float8[] state, bigint observed, float8 expected, bigint df) |
float8 [] | chi2_gof_test_transition (float8[] state, bigint observed, float8 expected) |
float8 [] | chi2_gof_test_transition (float8[] state, bigint observed) |
float8 [] | chi2_gof_test_merge_states (float8[] state1, float8[] state2) |
chi2_test_result | chi2_gof_test_final (float8[] state) |
aggregate float8 [] | chi2_gof_test (bigint observed, float8 expected=1, bigint df=0) |
Perform Pearson's chi-squared goodness-of-fit test. More... | |
aggregate float8 [] | chi2_gof_test (bigint observed, float8 expected) |
aggregate float8 [] | chi2_gof_test (bigint observed) |
float8 [] | ks_test_transition (float8[] state, boolean first, float8 value, bigint numFirst, bigint numSecond) |
ks_test_result | ks_test_final (float8[] state) |
float8 [] | mw_test_transition (float8[] state, boolean first, float8 value) |
Perform Kolmogorov-Smirnov test. More... | |
mw_test_result | mw_test_final (float8[] state) |
float8 [] | wsr_test_transition (float8[] state, float8 value, float8 precision) |
Perform Mann-Whitney test. More... | |
float8 [] | wsr_test_transition (float8[] state, float8 value) |
wsr_test_result | wsr_test_final (float8[] state) |
float8 [] | one_way_anova_transition (float8[] state, integer group, float8 value) |
Perform Wilcoxon-Signed-Rank test. More... | |
float8 [] | one_way_anova_merge_states (float8[] state1, float8[] state2) |
one_way_anova_result | one_way_anova_final (float8[] state) |
aggregate float8 [] | one_way_anova (integer group, float8 value) |
Perform one-way analysis of variance. More... | |
aggregate float8 [] chi2_gof_test | ( | bigint | observed, |
float8 | expected = 1 , |
||
bigint | df = 0 |
||
) |
Let be a realization of a (vector) random variable that follows the multinomial distribution with parameters and . Test the null hypothesis .
observed | Number of observations of the current event/row |
expected | Expected number of observations of current event/row. This number is not required to be normalized. That is, will be taken as expected divided by sum(expected) . Hence, if this parameter is not specified, chi2_test() will by default use , i.e., test that is a discrete uniform distribution. |
df | Degrees of freedom. This is the number of events reduced by the degree of freedom lost by using the observed numbers for defining the expected number of observations. If this parameter is 0, the degree of freedom is taken as . |
statistic FLOAT8
- Statistic The corresponding random variable is approximately chi-squared distributed with
df
degrees of freedom.df BIGINT
- Degrees of freedomp_value FLOAT8
- Approximate p-value, i.e., . Computed as (1.0 - chi_squared_cdf(statistic))
.phi FLOAT8
- Phi coefficient, i.e., contingency_coef FLOAT8
- Contingency coefficient, i.e., SELECT (chi2_gof_test(observed, 1, NULL)).* FROM source
var1
, var2
, observed
. SELECT (chi2_gof_test(observed, expected, deg_freedom)).* FROM ( SELECT observed, sum(observed) OVER (PARTITION BY var1)::DOUBLE PRECISION * sum(observed) OVER (PARTITION BY var2) AS expected FROM source ) p, ( SELECT (count(DISTINCT var1) - 1) * (count(DISTINCT var2) - 1) AS deg_freedom FROM source ) q;
aggregate float8 [] chi2_gof_test | ( | bigint | observed, |
float8 | expected | ||
) |
aggregate float8 [] chi2_gof_test | ( | bigint | observed | ) |
chi2_test_result chi2_gof_test_final | ( | float8 [] | state | ) |
float8 [] chi2_gof_test_merge_states | ( | float8 [] | state1, |
float8 [] | state2 | ||
) |
float8 [] chi2_gof_test_transition | ( | float8 [] | state, |
bigint | observed, | ||
float8 | expected, | ||
bigint | df | ||
) |
float8 [] chi2_gof_test_transition | ( | float8 [] | state, |
bigint | observed, | ||
float8 | expected | ||
) |
float8 [] chi2_gof_test_transition | ( | float8 [] | state, |
bigint | observed | ||
) |
aggregate float8 [] f_test | ( | boolean | first, |
float8 | value | ||
) |
Given realizations and of i.i.d. random variables and with unknown parameters and , test the null hypotheses and .
first | Indicator whether value is from first sample (if TRUE ) or from second sample (if FALSE ) |
value | Value of random variate or |
statistic FLOAT8
- Statistic The corresponding random variable is F-distributed with degrees of freedom in the numerator and degrees of freedom in the denominator.
df1 BIGINT
- Degrees of freedom in the numerator df2 BIGINT
- Degrees of freedom in the denominator p_value_one_sided FLOAT8
- Lower bound on one-sided p-value. In detail, the result is , which is a lower bound on . Computed as (1.0 - fisher_f_cdf(statistic))
.p_value_two_sided FLOAT8
- Two-sided p-value, i.e., where . Computed as (min(p_value_one_sided, 1. - p_value_one_sided))
.SELECT (f_test(first, value)).* FROM source
f_test_result f_test_final | ( | float8 [] | state | ) |
ks_test_result ks_test_final | ( | float8 [] | state | ) |
float8 [] ks_test_transition | ( | float8 [] | state, |
boolean | first, | ||
float8 | value, | ||
bigint | numFirst, | ||
bigint | numSecond | ||
) |
mw_test_result mw_test_final | ( | float8 [] | state | ) |
float8 [] mw_test_transition | ( | float8 [] | state, |
boolean | first, | ||
float8 | value | ||
) |
Given realizations and of i.i.d. random variables and i.i.d. , respectively, test the null hypothesis that the underlying distributions function are identical, i.e., .
first | Determines whether the value belongs to the first (if TRUE ) or the second sample (if FALSE ) |
value | Value of random variate or |
m | Size of the first sample. See usage instructions below. |
n | Size of the second sample. See usage instructions below. |
statistic FLOAT8
- Kolmogorov–Smirnov statistic where and (defined likewise) are the empirical distribution functions.
k_statistic FLOAT8
- Kolmogorov statistic where and is the statistic. Then is approximately Kolmogorov distributed.p_value FLOAT8
- Approximate p-value, i.e., an approximate value for . Computed as (1.0 - kolmogorov_cdf(k_statistic))
.SELECT (ks_test(first, value, (SELECT count(value) FROM source WHERE first), (SELECT count(value) FROM source WHERE NOT first) ORDER BY value )).* FROM source
ORDER BY value
) and will raise an exception if values are not ordered. aggregate float8 [] one_way_anova | ( | integer | group, |
float8 | value | ||
) |
Given realizations of i.i.d. random variables with unknown parameters and , test the null hypotheses .
group | Group which value is from. Note that group can assume arbitary value not limited to a continguous range of integers. |
value | Value of random variate |
sum_squares_between DOUBLE PRECISION
- sum of squares between the group means, i.e., sum_squares_within DOUBLE PRECISION
- sum of squares within the groups, i.e., df_between BIGINT
- degree of freedom for between-group variation df_within BIGINT
- degree of freedom for within-group variation mean_squares_between DOUBLE PRECISION
- mean square between groups, i.e., mean_squares_within DOUBLE PRECISION
- mean square within groups, i.e., statistic DOUBLE PRECISION
- Statistic computed as This statistic is Fisher F-distributed with degrees of freedom in the numerator and degrees of freedom in the denominator.
p_value DOUBLE PRECISION
- p-value, i.e., .SELECT (one_way_anova(group, value)).* FROM source
one_way_anova_result one_way_anova_final | ( | float8 [] | state | ) |
float8 [] one_way_anova_merge_states | ( | float8 [] | state1, |
float8 [] | state2 | ||
) |
float8 [] one_way_anova_transition | ( | float8 [] | state, |
integer | group, | ||
float8 | value | ||
) |
Given realizations of i.i.d. random variables with unknown mean , test the null hypotheses and .
value | Value of random variate or . Values of 0 are ignored (i.e., they do not count towards ). |
precision | The precision with which value is known. The precision determines the handling of ties. The current value is regarded a tie with the previous value if . If precision is negative, then it will be treated as value * 2^(-52) . (Note that is the machine epsilon for type DOUBLE PRECISION .) |
statistic FLOAT8
- statistic computed as follows. Let and be the signed rank sums where The Wilcoxon signed-rank statistic is .
rank_sum_pos FLOAT8
- rank sum of all positive values, i.e., rank_sum_neg FLOAT8
- rank sum of all negative values, i.e., num BIGINT
- number of non-zero valuesz_statistic FLOAT8
- z-statistic where is the number of values with absolute value equal to . The corresponding random variable is approximately standard normally distributed.
p_value_one_sided FLOAT8
- One-sided p-value i.e., . Computed as (1.0 - normal_cdf(z_statistic))
.p_value_two_sided FLOAT8
- Two-sided p-value, i.e., . Computed as (2 * normal_cdf(-abs(z_statistic)))
.SELECT (wsr_test(value - mu_0 ORDER BY abs(value))).* FROM source
SELECT (wsr_test(first - second - mu_0 ORDER BY abs(first - second))).* FROM sourceIf correctly determining ties is important (e.g., you may want to do so when comparing to software products that take
first
, second
, and mu_0
as individual parameters), supply the precision parameter. This can be done as follows: SELECT (wsr_test( first - second - mu_0, 3 * 2^(-52) * greatest(first, second, mu_0) ORDER BY abs(first - second) )).* FROM sourceHere is the machine epsilon, which we scale to the magnitude of the input data and multiply with 3 because we have a sum with three terms.
ORDER BY abs(value
)) and will raise an exception if the absolute values are not ordered. float8 [] t_test_merge_states | ( | float8 [] | state1, |
float8 [] | state2 | ||
) |
aggregate float8 [] t_test_one | ( | float8 | value | ) |
Given realizations of i.i.d. random variables with unknown parameters and , test the null hypotheses and .
value | Value of random variate |
statistic FLOAT8
- Statistic The corresponding random variable is Student-t distributed with degrees of freedom.
df FLOAT8
- Degrees of freedom p_value_one_sided FLOAT8
- Lower bound on one-sided p-value. In detail, the result is , which is a lower bound on . Computed as (1.0 - students_t_cdf(statistic))
.p_value_two_sided FLOAT8
- Two-sided p-value, i.e., . Computed as (2 * students_t_cdf(-abs(statistic)))
.SELECT (t_test_one(value - mu_0)).* FROM source
SELECT (t_test_one(first - second - mu_0)).* FROM source
t_test_result t_test_one_final | ( | float8 [] | state | ) |
float8 [] t_test_one_transition | ( | float8 [] | state, |
float8 | value | ||
) |
aggregate float8 [] t_test_two_pooled | ( | boolean | first, |
float8 | value | ||
) |
Given realizations and of i.i.d. random variables and with unknown parameters and , test the null hypotheses and .
first | Indicator whether value is from first sample (if TRUE ) or from second sample (if FALSE ) |
value | Value of random variate or |
statistic FLOAT8
- Statistic where
is the pooled variance. The corresponding random variable is Student-t distributed with degrees of freedom.
df FLOAT8
- Degrees of freedom p_value_one_sided FLOAT8
- Lower bound on one-sided p-value. In detail, the result is , which is a lower bound on . Computed as (1.0 - students_t_cdf(statistic))
.p_value_two_sided FLOAT8
- Two-sided p-value, i.e., . Computed as (2 * students_t_cdf(-abs(statistic)))
.SELECT (t_test_pooled(first, value)).* FROM source
t_test_result t_test_two_pooled_final | ( | float8 [] | state | ) |
float8 [] t_test_two_transition | ( | float8 [] | state, |
boolean | first, | ||
float8 | value | ||
) |
aggregate float8 [] t_test_two_unpooled | ( | boolean | first, |
float8 | value | ||
) |
Given realizations and of i.i.d. random variables and with unknown parameters and , test the null hypotheses and .
first | Indicator whether value is from first sample (if TRUE ) or from second sample (if FALSE ) |
value | Value of random variate or |
statistic FLOAT8
- Statistic The corresponding random variable is approximately Student-t distributed with
degrees of freedom (Welch–Satterthwaite formula).
df FLOAT8
- Degrees of freedom (as above)p_value_one_sided FLOAT8
- Lower bound on one-sided p-value. In detail, the result is , which is a lower bound on . Computed as (1.0 - students_t_cdf(statistic))
.p_value_two_sided FLOAT8
- Two-sided p-value, i.e., . Computed as (2 * students_t_cdf(-abs(statistic)))
.SELECT (t_test_unpooled(first, value)).* FROM source
t_test_result t_test_two_unpooled_final | ( | float8 [] | state | ) |
wsr_test_result wsr_test_final | ( | float8 [] | state | ) |
float8 [] wsr_test_transition | ( | float8 [] | state, |
float8 | value, | ||
float8 | precision | ||
) |
Given realizations and of i.i.d. random variables and i.i.d. , respectively, test the null hypothesis that the underlying distributions are equal, i.e., .
first | Determines whether the value belongs to the first (if TRUE ) or the second sample (if FALSE ) |
value | Value of random variate or |
statistic FLOAT8
- Statistic where is the u-statistic computed as follows. The z-statistic is approximately standard normally distributed.
u_statistic FLOAT8
- Statistic where where
is defined as the rank of in the combined list of all observations. For ties, the average rank of all equal values is used.
p_value_one_sided FLOAT8
- Approximate one-sided p-value, i.e., an approximate value for . Computed as (1.0 - normal_cdf(z_statistic))
.p_value_two_sided FLOAT8
- Approximate two-sided p-value, i.e., an approximate value for . Computed as (2 * normal_cdf(-abs(z_statistic)))
.SELECT (mw_test(first, value ORDER BY value)).* FROM source
ORDER BY value
) and will raise an exception if values are not ordered. float8 [] wsr_test_transition | ( | float8 [] | state, |
float8 | value | ||
) |