All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.apache.commons.math.stat.inference.TTest Maven / Gradle / Ivy

Go to download

The Math project is a library of lightweight, self-contained mathematics and statistics components addressing the most common practical problems not immediately available in the Java programming language or commons-lang.

The newest version!
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.commons.math.stat.inference;

import org.apache.commons.math.MathException;
import org.apache.commons.math.stat.descriptive.StatisticalSummary;

/**
 * An interface for Student's t-tests.
 * 

* Tests can be:

    *
  • One-sample or two-sample
  • *
  • One-sided or two-sided
  • *
  • Paired or unpaired (for two-sample tests)
  • *
  • Homoscedastic (equal variance assumption) or heteroscedastic * (for two sample tests)
  • *
  • Fixed significance level (boolean-valued) or returning p-values. *

*

* Test statistics are available for all tests. Methods including "Test" in * in their names perform tests, all other methods return t-statistics. Among * the "Test" methods, double-valued methods return p-values; * boolean-valued methods perform fixed significance level tests. * Significance levels are always specified as numbers between 0 and 0.5 * (e.g. tests at the 95% level use alpha=0.05).

*

* Input to tests can be either double[] arrays or * {@link StatisticalSummary} instances.

* * * @version $Revision: 811786 $ $Date: 2009-09-06 11:36:08 +0200 (dim. 06 sept. 2009) $ */ public interface TTest { /** * Computes a paired, 2-sample t-statistic based on the data in the input * arrays. The t-statistic returned is equivalent to what would be returned by * computing the one-sample t-statistic {@link #t(double, double[])}, with * mu = 0 and the sample array consisting of the (signed) * differences between corresponding entries in sample1 and * sample2. *

* Preconditions:

    *
  • The input arrays must have the same length and their common length * must be at least 2. *

* * @param sample1 array of sample data values * @param sample2 array of sample data values * @return t statistic * @throws IllegalArgumentException if the precondition is not met * @throws MathException if the statistic can not be computed do to a * convergence or other numerical error. */ double pairedT(double[] sample1, double[] sample2) throws IllegalArgumentException, MathException; /** * Returns the observed significance level, or * p-value, associated with a paired, two-sample, two-tailed t-test * based on the data in the input arrays. *

* The number returned is the smallest significance level * at which one can reject the null hypothesis that the mean of the paired * differences is 0 in favor of the two-sided alternative that the mean paired * difference is not equal to 0. For a one-sided test, divide the returned * value by 2.

*

* This test is equivalent to a one-sample t-test computed using * {@link #tTest(double, double[])} with mu = 0 and the sample * array consisting of the signed differences between corresponding elements of * sample1 and sample2.

*

* Usage Note:
* The validity of the p-value depends on the assumptions of the parametric * t-test procedure, as discussed * * here

*

* Preconditions:

    *
  • The input array lengths must be the same and their common length must * be at least 2. *

* * @param sample1 array of sample data values * @param sample2 array of sample data values * @return p-value for t-test * @throws IllegalArgumentException if the precondition is not met * @throws MathException if an error occurs computing the p-value */ double pairedTTest(double[] sample1, double[] sample2) throws IllegalArgumentException, MathException; /** * Performs a paired t-test evaluating the null hypothesis that the * mean of the paired differences between sample1 and * sample2 is 0 in favor of the two-sided alternative that the * mean paired difference is not equal to 0, with significance level * alpha. *

* Returns true iff the null hypothesis can be rejected with * confidence 1 - alpha. To perform a 1-sided test, use * alpha * 2

*

* Usage Note:
* The validity of the test depends on the assumptions of the parametric * t-test procedure, as discussed * * here

*

* Preconditions:

    *
  • The input array lengths must be the same and their common length * must be at least 2. *
  • *
  • 0 < alpha < 0.5 *

* * @param sample1 array of sample data values * @param sample2 array of sample data values * @param alpha significance level of the test * @return true if the null hypothesis can be rejected with * confidence 1 - alpha * @throws IllegalArgumentException if the preconditions are not met * @throws MathException if an error occurs performing the test */ boolean pairedTTest( double[] sample1, double[] sample2, double alpha) throws IllegalArgumentException, MathException; /** * Computes a * t statistic given observed values and a comparison constant. *

* This statistic can be used to perform a one sample t-test for the mean. *

* Preconditions:

    *
  • The observed array length must be at least 2. *

* * @param mu comparison constant * @param observed array of values * @return t statistic * @throws IllegalArgumentException if input array length is less than 2 */ double t(double mu, double[] observed) throws IllegalArgumentException; /** * Computes a * t statistic to use in comparing the mean of the dataset described by * sampleStats to mu. *

* This statistic can be used to perform a one sample t-test for the mean. *

* Preconditions:

    *
  • observed.getN() > = 2. *

* * @param mu comparison constant * @param sampleStats DescriptiveStatistics holding sample summary statitstics * @return t statistic * @throws IllegalArgumentException if the precondition is not met */ double t(double mu, StatisticalSummary sampleStats) throws IllegalArgumentException; /** * Computes a 2-sample t statistic, under the hypothesis of equal * subpopulation variances. To compute a t-statistic without the * equal variances hypothesis, use {@link #t(double[], double[])}. *

* This statistic can be used to perform a (homoscedastic) two-sample * t-test to compare sample means.

*

* The t-statisitc is

*

*    t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var)) *

* where n1 is the size of first sample; * n2 is the size of second sample; * m1 is the mean of first sample; * m2 is the mean of second sample * * and var is the pooled variance estimate: *

* var = sqrt(((n1 - 1)var1 + (n2 - 1)var2) / ((n1-1) + (n2-1))) *

* with var1 the variance of the first sample and * var2 the variance of the second sample. *

* Preconditions:

    *
  • The observed array lengths must both be at least 2. *

* * @param sample1 array of sample data values * @param sample2 array of sample data values * @return t statistic * @throws IllegalArgumentException if the precondition is not met */ double homoscedasticT(double[] sample1, double[] sample2) throws IllegalArgumentException; /** * Computes a 2-sample t statistic, without the hypothesis of equal * subpopulation variances. To compute a t-statistic assuming equal * variances, use {@link #homoscedasticT(double[], double[])}. *

* This statistic can be used to perform a two-sample t-test to compare * sample means.

*

* The t-statisitc is

*

*    t = (m1 - m2) / sqrt(var1/n1 + var2/n2) *

* where n1 is the size of the first sample * n2 is the size of the second sample; * m1 is the mean of the first sample; * m2 is the mean of the second sample; * var1 is the variance of the first sample; * var2 is the variance of the second sample; *

* Preconditions:

    *
  • The observed array lengths must both be at least 2. *

* * @param sample1 array of sample data values * @param sample2 array of sample data values * @return t statistic * @throws IllegalArgumentException if the precondition is not met */ double t(double[] sample1, double[] sample2) throws IllegalArgumentException; /** * Computes a 2-sample t statistic , comparing the means of the datasets * described by two {@link StatisticalSummary} instances, without the * assumption of equal subpopulation variances. Use * {@link #homoscedasticT(StatisticalSummary, StatisticalSummary)} to * compute a t-statistic under the equal variances assumption. *

* This statistic can be used to perform a two-sample t-test to compare * sample means.

*

* The returned t-statisitc is

*

*    t = (m1 - m2) / sqrt(var1/n1 + var2/n2) *

* where n1 is the size of the first sample; * n2 is the size of the second sample; * m1 is the mean of the first sample; * m2 is the mean of the second sample * var1 is the variance of the first sample; * var2 is the variance of the second sample *

* Preconditions:

    *
  • The datasets described by the two Univariates must each contain * at least 2 observations. *

* * @param sampleStats1 StatisticalSummary describing data from the first sample * @param sampleStats2 StatisticalSummary describing data from the second sample * @return t statistic * @throws IllegalArgumentException if the precondition is not met */ double t( StatisticalSummary sampleStats1, StatisticalSummary sampleStats2) throws IllegalArgumentException; /** * Computes a 2-sample t statistic, comparing the means of the datasets * described by two {@link StatisticalSummary} instances, under the * assumption of equal subpopulation variances. To compute a t-statistic * without the equal variances assumption, use * {@link #t(StatisticalSummary, StatisticalSummary)}. *

* This statistic can be used to perform a (homoscedastic) two-sample * t-test to compare sample means.

*

* The t-statisitc returned is

*

*    t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var)) *

* where n1 is the size of first sample; * n2 is the size of second sample; * m1 is the mean of first sample; * m2 is the mean of second sample * and var is the pooled variance estimate: *

* var = sqrt(((n1 - 1)var1 + (n2 - 1)var2) / ((n1-1) + (n2-1))) *

* with var1 the variance of the first sample and * var2 the variance of the second sample. *

* Preconditions:

    *
  • The datasets described by the two Univariates must each contain * at least 2 observations. *

* * @param sampleStats1 StatisticalSummary describing data from the first sample * @param sampleStats2 StatisticalSummary describing data from the second sample * @return t statistic * @throws IllegalArgumentException if the precondition is not met */ double homoscedasticT( StatisticalSummary sampleStats1, StatisticalSummary sampleStats2) throws IllegalArgumentException; /** * Returns the observed significance level, or * p-value, associated with a one-sample, two-tailed t-test * comparing the mean of the input array with the constant mu. *

* The number returned is the smallest significance level * at which one can reject the null hypothesis that the mean equals * mu in favor of the two-sided alternative that the mean * is different from mu. For a one-sided test, divide the * returned value by 2.

*

* Usage Note:
* The validity of the test depends on the assumptions of the parametric * t-test procedure, as discussed * here *

* Preconditions:

    *
  • The observed array length must be at least 2. *

* * @param mu constant value to compare sample mean against * @param sample array of sample data values * @return p-value * @throws IllegalArgumentException if the precondition is not met * @throws MathException if an error occurs computing the p-value */ double tTest(double mu, double[] sample) throws IllegalArgumentException, MathException; /** * Performs a * two-sided t-test evaluating the null hypothesis that the mean of the population from * which sample is drawn equals mu. *

* Returns true iff the null hypothesis can be * rejected with confidence 1 - alpha. To * perform a 1-sided test, use alpha * 2

*

* Examples:

    *
  1. To test the (2-sided) hypothesis sample mean = mu at * the 95% level, use
    tTest(mu, sample, 0.05) *
  2. *
  3. To test the (one-sided) hypothesis sample mean < mu * at the 99% level, first verify that the measured sample mean is less * than mu and then use *
    tTest(mu, sample, 0.02) *

*

* Usage Note:
* The validity of the test depends on the assumptions of the one-sample * parametric t-test procedure, as discussed * here *

* Preconditions:

    *
  • The observed array length must be at least 2. *

* * @param mu constant value to compare sample mean against * @param sample array of sample data values * @param alpha significance level of the test * @return p-value * @throws IllegalArgumentException if the precondition is not met * @throws MathException if an error computing the p-value */ boolean tTest(double mu, double[] sample, double alpha) throws IllegalArgumentException, MathException; /** * Returns the observed significance level, or * p-value, associated with a one-sample, two-tailed t-test * comparing the mean of the dataset described by sampleStats * with the constant mu. *

* The number returned is the smallest significance level * at which one can reject the null hypothesis that the mean equals * mu in favor of the two-sided alternative that the mean * is different from mu. For a one-sided test, divide the * returned value by 2.

*

* Usage Note:
* The validity of the test depends on the assumptions of the parametric * t-test procedure, as discussed * * here

*

* Preconditions:

    *
  • The sample must contain at least 2 observations. *

* * @param mu constant value to compare sample mean against * @param sampleStats StatisticalSummary describing sample data * @return p-value * @throws IllegalArgumentException if the precondition is not met * @throws MathException if an error occurs computing the p-value */ double tTest(double mu, StatisticalSummary sampleStats) throws IllegalArgumentException, MathException; /** * Performs a * two-sided t-test evaluating the null hypothesis that the mean of the * population from which the dataset described by stats is * drawn equals mu. *

* Returns true iff the null hypothesis can be rejected with * confidence 1 - alpha. To perform a 1-sided test, use * alpha * 2.

*

* Examples:

    *
  1. To test the (2-sided) hypothesis sample mean = mu at * the 95% level, use
    tTest(mu, sampleStats, 0.05) *
  2. *
  3. To test the (one-sided) hypothesis sample mean < mu * at the 99% level, first verify that the measured sample mean is less * than mu and then use *
    tTest(mu, sampleStats, 0.02) *

*

* Usage Note:
* The validity of the test depends on the assumptions of the one-sample * parametric t-test procedure, as discussed * here *

* Preconditions:

    *
  • The sample must include at least 2 observations. *

* * @param mu constant value to compare sample mean against * @param sampleStats StatisticalSummary describing sample data values * @param alpha significance level of the test * @return p-value * @throws IllegalArgumentException if the precondition is not met * @throws MathException if an error occurs computing the p-value */ boolean tTest( double mu, StatisticalSummary sampleStats, double alpha) throws IllegalArgumentException, MathException; /** * Returns the observed significance level, or * p-value, associated with a two-sample, two-tailed t-test * comparing the means of the input arrays. *

* The number returned is the smallest significance level * at which one can reject the null hypothesis that the two means are * equal in favor of the two-sided alternative that they are different. * For a one-sided test, divide the returned value by 2.

*

* The test does not assume that the underlying popuation variances are * equal and it uses approximated degrees of freedom computed from the * sample data to compute the p-value. The t-statistic used is as defined in * {@link #t(double[], double[])} and the Welch-Satterthwaite approximation * to the degrees of freedom is used, * as described * * here. To perform the test under the assumption of equal subpopulation * variances, use {@link #homoscedasticTTest(double[], double[])}.

*

* Usage Note:
* The validity of the p-value depends on the assumptions of the parametric * t-test procedure, as discussed * * here

*

* Preconditions:

    *
  • The observed array lengths must both be at least 2. *

* * @param sample1 array of sample data values * @param sample2 array of sample data values * @return p-value for t-test * @throws IllegalArgumentException if the precondition is not met * @throws MathException if an error occurs computing the p-value */ double tTest(double[] sample1, double[] sample2) throws IllegalArgumentException, MathException; /** * Returns the observed significance level, or * p-value, associated with a two-sample, two-tailed t-test * comparing the means of the input arrays, under the assumption that * the two samples are drawn from subpopulations with equal variances. * To perform the test without the equal variances assumption, use * {@link #tTest(double[], double[])}.

*

* The number returned is the smallest significance level * at which one can reject the null hypothesis that the two means are * equal in favor of the two-sided alternative that they are different. * For a one-sided test, divide the returned value by 2.

*

* A pooled variance estimate is used to compute the t-statistic. See * {@link #homoscedasticT(double[], double[])}. The sum of the sample sizes * minus 2 is used as the degrees of freedom.

*

* Usage Note:
* The validity of the p-value depends on the assumptions of the parametric * t-test procedure, as discussed * * here

*

* Preconditions:

    *
  • The observed array lengths must both be at least 2. *

* * @param sample1 array of sample data values * @param sample2 array of sample data values * @return p-value for t-test * @throws IllegalArgumentException if the precondition is not met * @throws MathException if an error occurs computing the p-value */ double homoscedasticTTest( double[] sample1, double[] sample2) throws IllegalArgumentException, MathException; /** * Performs a * * two-sided t-test evaluating the null hypothesis that sample1 * and sample2 are drawn from populations with the same mean, * with significance level alpha. This test does not assume * that the subpopulation variances are equal. To perform the test assuming * equal variances, use * {@link #homoscedasticTTest(double[], double[], double)}. *

* Returns true iff the null hypothesis that the means are * equal can be rejected with confidence 1 - alpha. To * perform a 1-sided test, use alpha * 2

*

* See {@link #t(double[], double[])} for the formula used to compute the * t-statistic. Degrees of freedom are approximated using the * * Welch-Satterthwaite approximation.

*

* Examples:

    *
  1. To test the (2-sided) hypothesis mean 1 = mean 2 at * the 95% level, use *
    tTest(sample1, sample2, 0.05). *
  2. *
  3. To test the (one-sided) hypothesis mean 1 < mean 2 , * at the 99% level, first verify that the measured mean of sample 1 * is less than the mean of sample 2 and then use *
    tTest(sample1, sample2, 0.02) *

*

* Usage Note:
* The validity of the test depends on the assumptions of the parametric * t-test procedure, as discussed * * here

*

* Preconditions:

    *
  • The observed array lengths must both be at least 2. *
  • *
  • 0 < alpha < 0.5 *

* * @param sample1 array of sample data values * @param sample2 array of sample data values * @param alpha significance level of the test * @return true if the null hypothesis can be rejected with * confidence 1 - alpha * @throws IllegalArgumentException if the preconditions are not met * @throws MathException if an error occurs performing the test */ boolean tTest( double[] sample1, double[] sample2, double alpha) throws IllegalArgumentException, MathException; /** * Performs a * * two-sided t-test evaluating the null hypothesis that sample1 * and sample2 are drawn from populations with the same mean, * with significance level alpha, assuming that the * subpopulation variances are equal. Use * {@link #tTest(double[], double[], double)} to perform the test without * the assumption of equal variances. *

* Returns true iff the null hypothesis that the means are * equal can be rejected with confidence 1 - alpha. To * perform a 1-sided test, use alpha * 2. To perform the test * without the assumption of equal subpopulation variances, use * {@link #tTest(double[], double[], double)}.

*

* A pooled variance estimate is used to compute the t-statistic. See * {@link #t(double[], double[])} for the formula. The sum of the sample * sizes minus 2 is used as the degrees of freedom.

*

* Examples:

    *
  1. To test the (2-sided) hypothesis mean 1 = mean 2 at * the 95% level, use
    tTest(sample1, sample2, 0.05). *
  2. *
  3. To test the (one-sided) hypothesis mean 1 < mean 2, * at the 99% level, first verify that the measured mean of * sample 1 is less than the mean of sample 2 * and then use *
    tTest(sample1, sample2, 0.02) *

*

* Usage Note:
* The validity of the test depends on the assumptions of the parametric * t-test procedure, as discussed * * here

*

* Preconditions:

    *
  • The observed array lengths must both be at least 2. *
  • *
  • 0 < alpha < 0.5 *

* * @param sample1 array of sample data values * @param sample2 array of sample data values * @param alpha significance level of the test * @return true if the null hypothesis can be rejected with * confidence 1 - alpha * @throws IllegalArgumentException if the preconditions are not met * @throws MathException if an error occurs performing the test */ boolean homoscedasticTTest( double[] sample1, double[] sample2, double alpha) throws IllegalArgumentException, MathException; /** * Returns the observed significance level, or * p-value, associated with a two-sample, two-tailed t-test * comparing the means of the datasets described by two StatisticalSummary * instances. *

* The number returned is the smallest significance level * at which one can reject the null hypothesis that the two means are * equal in favor of the two-sided alternative that they are different. * For a one-sided test, divide the returned value by 2.

*

* The test does not assume that the underlying popuation variances are * equal and it uses approximated degrees of freedom computed from the * sample data to compute the p-value. To perform the test assuming * equal variances, use * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.

*

* Usage Note:
* The validity of the p-value depends on the assumptions of the parametric * t-test procedure, as discussed * * here

*

* Preconditions:

    *
  • The datasets described by the two Univariates must each contain * at least 2 observations. *

* * @param sampleStats1 StatisticalSummary describing data from the first sample * @param sampleStats2 StatisticalSummary describing data from the second sample * @return p-value for t-test * @throws IllegalArgumentException if the precondition is not met * @throws MathException if an error occurs computing the p-value */ double tTest( StatisticalSummary sampleStats1, StatisticalSummary sampleStats2) throws IllegalArgumentException, MathException; /** * Returns the observed significance level, or * p-value, associated with a two-sample, two-tailed t-test * comparing the means of the datasets described by two StatisticalSummary * instances, under the hypothesis of equal subpopulation variances. To * perform a test without the equal variances assumption, use * {@link #tTest(StatisticalSummary, StatisticalSummary)}. *

* The number returned is the smallest significance level * at which one can reject the null hypothesis that the two means are * equal in favor of the two-sided alternative that they are different. * For a one-sided test, divide the returned value by 2.

*

* See {@link #homoscedasticT(double[], double[])} for the formula used to * compute the t-statistic. The sum of the sample sizes minus 2 is used as * the degrees of freedom.

*

* Usage Note:
* The validity of the p-value depends on the assumptions of the parametric * t-test procedure, as discussed * here *

* Preconditions:

    *
  • The datasets described by the two Univariates must each contain * at least 2 observations. *

* * @param sampleStats1 StatisticalSummary describing data from the first sample * @param sampleStats2 StatisticalSummary describing data from the second sample * @return p-value for t-test * @throws IllegalArgumentException if the precondition is not met * @throws MathException if an error occurs computing the p-value */ double homoscedasticTTest( StatisticalSummary sampleStats1, StatisticalSummary sampleStats2) throws IllegalArgumentException, MathException; /** * Performs a * * two-sided t-test evaluating the null hypothesis that * sampleStats1 and sampleStats2 describe * datasets drawn from populations with the same mean, with significance * level alpha. This test does not assume that the * subpopulation variances are equal. To perform the test under the equal * variances assumption, use * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}. *

* Returns true iff the null hypothesis that the means are * equal can be rejected with confidence 1 - alpha. To * perform a 1-sided test, use alpha * 2

*

* See {@link #t(double[], double[])} for the formula used to compute the * t-statistic. Degrees of freedom are approximated using the * * Welch-Satterthwaite approximation.

*

* Examples:

    *
  1. To test the (2-sided) hypothesis mean 1 = mean 2 at * the 95%, use *
    tTest(sampleStats1, sampleStats2, 0.05) *
  2. *
  3. To test the (one-sided) hypothesis mean 1 < mean 2 * at the 99% level, first verify that the measured mean of * sample 1 is less than the mean of sample 2 * and then use *
    tTest(sampleStats1, sampleStats2, 0.02) *

*

* Usage Note:
* The validity of the test depends on the assumptions of the parametric * t-test procedure, as discussed * * here

*

* Preconditions:

    *
  • The datasets described by the two Univariates must each contain * at least 2 observations. *
  • *
  • 0 < alpha < 0.5 *

* * @param sampleStats1 StatisticalSummary describing sample data values * @param sampleStats2 StatisticalSummary describing sample data values * @param alpha significance level of the test * @return true if the null hypothesis can be rejected with * confidence 1 - alpha * @throws IllegalArgumentException if the preconditions are not met * @throws MathException if an error occurs performing the test */ boolean tTest( StatisticalSummary sampleStats1, StatisticalSummary sampleStats2, double alpha) throws IllegalArgumentException, MathException; }