|
float8 [] | t_test_one_transition (float8[] state, float8 value) |
|
float8 [] | t_test_merge_states (float8[] state1, float8[] state2) |
|
t_test_result | t_test_one_final (float8[] state) |
|
f_test_result | f_test_final (float8[] state) |
|
aggregate float8 [] | t_test_one (float8 value) |
| Perform one-sample or dependent paired Student t-test. More...
|
|
float8 [] | t_test_two_transition (float8[] state, boolean first, float8 value) |
|
t_test_result | t_test_two_pooled_final (float8[] state) |
|
aggregate float8 [] | t_test_two_pooled (boolean first, float8 value) |
| Perform two-sample pooled (i.e., equal variances) Student t-test. More...
|
|
t_test_result | t_test_two_unpooled_final (float8[] state) |
|
aggregate float8 [] | t_test_two_unpooled (boolean first, float8 value) |
| Perform unpooled (i.e., unequal variances) t-test (also known as Welch's t-test) More...
|
|
aggregate float8 [] | f_test (boolean first, float8 value) |
| Perform Fisher F-test. More...
|
|
float8 [] | chi2_gof_test_transition (float8[] state, bigint observed, float8 expected, bigint df) |
|
float8 [] | chi2_gof_test_transition (float8[] state, bigint observed, float8 expected) |
|
float8 [] | chi2_gof_test_transition (float8[] state, bigint observed) |
|
float8 [] | chi2_gof_test_merge_states (float8[] state1, float8[] state2) |
|
chi2_test_result | chi2_gof_test_final (float8[] state) |
|
aggregate float8 [] | chi2_gof_test (bigint observed, float8 expected=1, bigint df=0) |
| Perform Pearson's chi-squared goodness-of-fit test. More...
|
|
aggregate float8 [] | chi2_gof_test (bigint observed, float8 expected) |
|
aggregate float8 [] | chi2_gof_test (bigint observed) |
|
float8 [] | ks_test_transition (float8[] state, boolean first, float8 value, bigint numFirst, bigint numSecond) |
|
ks_test_result | ks_test_final (float8[] state) |
|
float8 [] | mw_test_transition (float8[] state, boolean first, float8 value) |
| Perform Kolmogorov-Smirnov test. More...
|
|
mw_test_result | mw_test_final (float8[] state) |
|
float8 [] | wsr_test_transition (float8[] state, float8 value, float8 precision) |
| Perform Mann-Whitney test. More...
|
|
float8 [] | wsr_test_transition (float8[] state, float8 value) |
|
wsr_test_result | wsr_test_final (float8[] state) |
|
float8 [] | one_way_anova_transition (float8[] state, integer group, float8 value) |
| Perform Wilcoxon-Signed-Rank test. More...
|
|
float8 [] | one_way_anova_merge_states (float8[] state1, float8[] state2) |
|
one_way_anova_result | one_way_anova_final (float8[] state) |
|
aggregate float8 [] | one_way_anova (integer group, float8 value) |
| Perform one-way analysis of variance. More...
|
|
aggregate float8 [] chi2_gof_test |
( |
bigint |
observed, |
|
|
float8 |
expected = 1 , |
|
|
bigint |
df = 0 |
|
) |
| |
Let \( n_1, \dots, n_k \) be a realization of a (vector) random variable \( N = (N_1, \dots, N_k) \) that follows the multinomial distribution with parameters \( k \) and \( p = (p_1, \dots, p_k) \). Test the null hypothesis \( H_0 : p = p^0 \).
- Parameters
-
observed | Number \( n_i \) of observations of the current event/row |
expected | Expected number of observations of current event/row. This number is not required to be normalized. That is, \( p^0_i \) will be taken as expected divided by sum(expected) . Hence, if this parameter is not specified, chi2_test() will by default use \( p^0 = (\frac 1k, \dots, \frac 1k) \), i.e., test that \( p \) is a discrete uniform distribution. |
df | Degrees of freedom. This is the number of events reduced by the degree of freedom lost by using the observed numbers for defining the expected number of observations. If this parameter is 0, the degree of freedom is taken as \( (k - 1) \). |
- Returns
- A composite value as follows. Let \( n = \sum_{i=1}^n n_i \).
- Usage
- Test null hypothesis that all possible outcomes of a categorical variable are equally likely:
SELECT (chi2_gof_test(observed, 1, NULL)).* FROM source
- Test null hypothesis that two categorical variables are independent. Such data is often shown in a contingency table (also known as crosstab). A crosstab is a matrix where possible values for the first variable correspond to rows and values for the second variable to columns. The matrix elements are the observation frequencies of the joint occurrence of the respective values. chi2_gof_test() assumes that the crosstab is stored in normalized form, i.e., there are three columns
var1
, var2
, observed
. SELECT (chi2_gof_test(observed, expected, deg_freedom)).*
FROM (
SELECT
observed,
sum(observed) OVER (PARTITION BY var1)::DOUBLE PRECISION
* sum(observed) OVER (PARTITION BY var2) AS expected
FROM source
) p, (
SELECT
(count(DISTINCT var1) - 1) * (count(DISTINCT var2) - 1) AS deg_freedom
FROM source
) q;
aggregate float8 [] f_test |
( |
boolean |
first, |
|
|
float8 |
value |
|
) |
| |
Given realizations \( x_1, \dots, x_m \) and \( y_1, \dots, y_n \) of i.i.d. random variables \( X_1, \dots, X_m \sim N(\mu_X, \sigma^2) \) and \( Y_1, \dots, Y_n \sim N(\mu_Y, \sigma^2) \) with unknown parameters \( \mu_X, \mu_Y, \) and \( \sigma^2 \), test the null hypotheses \( H_0 : \sigma_X < \sigma_Y \) and \( H_0 : \sigma_X = \sigma_Y \).
- Parameters
-
first | Indicator whether value is from first sample \( x_1, \dots, x_m \) (if TRUE ) or from second sample \( y_1, \dots, y_n \) (if FALSE ) |
value | Value of random variate \( x_i \) or \( y_i \) |
- Returns
- A composite value as follows. We denote by \( \bar x, \bar y \) the sample means and by \( s_X^2, s_Y^2 \) the sample variances.
- Usage
-
float8 [] mw_test_transition |
( |
float8 [] |
state, |
|
|
boolean |
first, |
|
|
float8 |
value |
|
) |
| |
Given realizations \( x_1, \dots, x_m \) and \( y_1, \dots, y_m \) of i.i.d. random variables \( X_1, \dots, X_m \) and i.i.d. \( Y_1, \dots, Y_n \), respectively, test the null hypothesis that the underlying distributions function \( F_X, F_Y \) are identical, i.e., \( H_0 : F_X = F_Y \).
- Parameters
-
first | Determines whether the value belongs to the first (if TRUE ) or the second sample (if FALSE ) |
value | Value of random variate \( x_i \) or \( y_i \) |
m | Size \( m \) of the first sample. See usage instructions below. |
n | Size of the second sample. See usage instructions below. |
- Returns
- A composite value.
- Usage
- Test null hypothesis that two samples stem from the same distribution:
SELECT (ks_test(first, value,
(SELECT count(value) FROM source WHERE first),
(SELECT count(value) FROM source WHERE NOT first)
ORDER BY value
)).* FROM source
- Note
- This aggregate must be used as an ordered aggregate (
ORDER BY value
) and will raise an exception if values are not ordered.
aggregate float8 [] one_way_anova |
( |
integer |
group, |
|
|
float8 |
value |
|
) |
| |
Given realizations \( x_{1,1}, \dots, x_{1, n_1}, x_{2,1}, \dots, x_{2,n_2}, \dots, x_{k,n_k} \) of i.i.d. random variables \( X_{i,j} \sim N(\mu_i, \sigma^2) \) with unknown parameters \( \mu_1, \dots, \mu_k \) and \( \sigma^2 \), test the null hypotheses \( H_0 : \mu_1 = \dots = \mu_k \).
- Parameters
-
group | Group which value is from. Note that group can assume arbitary value not limited to a continguous range of integers. |
value | Value of random variate \( x_{i,j} \) |
- Returns
- A composite value as follows. Let \( n := \sum_{i=1}^k n_i \) be the total size of all samples. Denote by \( \bar x \) the grand mean, by \( \overline{x_i} \) the group sample means, and by \( s_i^2 \) the group sample variances.
- Usage
-
aggregate float8 [] t_test_two_pooled |
( |
boolean |
first, |
|
|
float8 |
value |
|
) |
| |
Given realizations \( x_1, \dots, x_n \) and \( y_1, \dots, y_m \) of i.i.d. random variables \( X_1, \dots, X_n \sim N(\mu_X, \sigma^2) \) and \( Y_1, \dots, Y_m \sim N(\mu_Y, \sigma^2) \) with unknown parameters \( \mu_X, \mu_Y, \) and \( \sigma^2 \), test the null hypotheses \( H_0 : \mu_X \leq \mu_Y \) and \( H_0 : \mu_X = \mu_Y \).
- Parameters
-
first | Indicator whether value is from first sample \( x_1, \dots, x_n \) (if TRUE ) or from second sample \( y_1, \dots, y_m \) (if FALSE ) |
value | Value of random variate \( x_i \) or \( y_i \) |
- Returns
- A composite value as follows. We denote by \( \bar x, \bar y \) the sample means and by \( s_X^2, s_Y^2 \) the sample variances.
- Usage
-
aggregate float8 [] t_test_two_unpooled |
( |
boolean |
first, |
|
|
float8 |
value |
|
) |
| |
Given realizations \( x_1, \dots, x_n \) and \( y_1, \dots, y_m \) of i.i.d. random variables \( X_1, \dots, X_n \sim N(\mu_X, \sigma_X^2) \) and \( Y_1, \dots, Y_m \sim N(\mu_Y, \sigma_Y^2) \) with unknown parameters \( \mu_X, \mu_Y, \sigma_X^2, \) and \( \sigma_Y^2 \), test the null hypotheses \( H_0 : \mu_X \leq \mu_Y \) and \( H_0 : \mu_X = \mu_Y \).
- Parameters
-
first | Indicator whether value is from first sample \( x_1, \dots, x_n \) (if TRUE ) or from second sample \( y_1, \dots, y_m \) (if FALSE ) |
value | Value of random variate \( x_i \) or \( y_i \) |
- Returns
- A composite value as follows. We denote by \( \bar x, \bar y \) the sample means and by \( s_X^2, s_Y^2 \) the sample variances.
- Usage
-
float8 [] wsr_test_transition |
( |
float8 [] |
state, |
|
|
float8 |
value, |
|
|
float8 |
precision |
|
) |
| |
Given realizations \( x_1, \dots, x_m \) and \( y_1, \dots, y_m \) of i.i.d. random variables \( X_1, \dots, X_m \) and i.i.d. \( Y_1, \dots, Y_n \), respectively, test the null hypothesis that the underlying distributions are equal, i.e., \( H_0 : \forall i,j: \Pr[X_i > Y_j] + \frac{\Pr[X_i = Y_j]}{2} = \frac 12 \).
- Parameters
-
first | Determines whether the value belongs to the first (if TRUE ) or the second sample (if FALSE ) |
value | Value of random variate \( x_i \) or \( y_i \) |
- Returns
- A composite value.
statistic FLOAT8
- Statistic
\[ z = \frac{u - \bar x}{\sqrt{\frac{mn(m+n+1)}{12}}} \]
where \( u \) is the u-statistic computed as follows. The z-statistic is approximately standard normally distributed.
u_statistic FLOAT8
- Statistic \( u = \min \{ u_x, u_y \} \) where
\[ u_x = mn + \binom{m+1}{2} - \sum_{i=1}^m r_{x,i} \]
where
\[ r_{x,i} = \{ j \mid x_j < x_i \} + \{ j \mid y_j < x_i \} + \frac{\{ j \mid x_j = x_i \} + \{ j \mid y_j = x_i \} + 1}{2} \]
is defined as the rank of \( x_i \) in the combined list of all \( m+n \) observations. For ties, the average rank of all equal values is used.
p_value_one_sided FLOAT8
- Approximate one-sided p-value, i.e., an approximate value for \( \Pr[Z \geq z \mid H_0] \). Computed as (1.0 - normal_cdf(z_statistic))
.
p_value_two_sided FLOAT8
- Approximate two-sided p-value, i.e., an approximate value for \( \Pr[|Z| \geq |z| \mid H_0] \). Computed as (2 * normal_cdf(-abs(z_statistic)))
.
- Usage
-
- Note
- This aggregate must be used as an ordered aggregate (
ORDER BY value
) and will raise an exception if values are not ordered.