T-test
T-test
1 Student’s T-test
William Sealy Gosset is credited with inventing the t-test while working for the Guinness brewery. The idea was refined and supported by the great statistician R. A. Fisher, and the idea was initially described in a paper anonymously by “Student”, in order to protect the commercial interests of Guinness. Today, it is perhaps one of the most prevalent and basic tools in statistics, and it is a fascinating story.
1.1 Objectives
The question of the t-test
Data and assumptions
Graphing
Test and alternatives
Practice exercises
2 The question of the t-test
The typical premise of the t-test is that it is used to compare populations you are interested in, which you measure with independent samples. There are a few versions of the basic question.
Compare two independent samples
- Here you have measured a numeric variable and have two samples.
- Are the means of the two samples different (i.e. did the samples come from different populations)?
- Example: An experiment with a control and one treatment group.
Compare 1 sample to a known mean
- Here you have one sample which you wish to compare to a mean value.
- Did the sample come from a population exhibiting the known mean?
- The data are simply a single numeric vector, and the population mean for comparison.
Paired samples
- Here the individual observation comprising the 2 samples are not independent.
- Example: Before vs After experiments
- Another example: Measuring plots that are paired spatially
- For each of these examples, there is a unit, patient, or plot identification, that represents the relationship of each paired measure.
3 Data and assumptions
The principle assumptions of the t-test are:
Gaussian distribution of observations WITHIN each sample
Heteroscedasticity (our old friend) - i.e., the variance is equal in each sample
Independence of observations
3.1 Evaluating and testing the assumptions
- The t-test is thought to be somewhat robust to violation of assumptions
- To a certain extent, Gaussian distribution and heteroscedasticiy assumption violations won’t bias your results.
- The assumption of independence of observations is always of high importance.
Testing assumptions
Gaussian distribution of observations WITHIN each sample
Plot and evaluate a a histogram (
hist()) and a q-q plot (e.g., withqqplot())Use a statistical test evaluating whether data are Gaussian (e.g.,
shapiro.test()NB 1test EACH SAMPLE SEPARATELY (this is sometimes confusing for beginners).
and If guilty?…
- Mann-Whitney U-test will allow violation of assumptions
- Also called the name Wilcoxon Test
Heteroscedasticity assumption
- Examine graphically
# Bartlett's Test
bartlett.test(list(group1, group2))
Independence assumption
- Can’t really be directly tested without supporting information
- It’s up to you to ensure your analysis is being done on the right data
- in time series and spatial data, you can test for autocorrelations (out of scope)
typical Output of t test function.
The t value; can be be positive or negative. The absolute value of t should increase with the probability that the samples came different populations.
The degrees of freedom; The number of independent data points available to estimate the t-statistic.
The 95% confidence interval; This gives a range of values that is likely to contain the true difference in means between the populations from which the samples were taken
The P-value