Oneway ANOVA & the GLM

Week 7

T-test

  • two samples
  • are they from two populations with different means?
  • or are they from the same population
    • (and sample differences are due to random chance?)
# A tibble: 12 × 2
       x g        
   <dbl> <fct>    
 1    90 control  
 2   115 control  
 3   108 control  
 4   110 control  
 5   120 control  
 6   104 control  
 7    97 treatment
 8   121 treatment
 9   120 treatment
10   106 treatment
11    79 treatment
12   153 treatment
          g        x
1   control 107.8333
2 treatment 112.6667

T-test

  • what is the probability of taking two samples of size N=6 from the same population and observing a difference in means as large as the one observed?
  • this is the probability of observing the data under H_{0}, the null hypothesis
# A tibble: 12 × 2
       x g        
   <dbl> <fct>    
 1    90 control  
 2   115 control  
 3   108 control  
 4   110 control  
 5   120 control  
 6   104 control  
 7    97 treatment
 8   121 treatment
 9   120 treatment
10   106 treatment
11    79 treatment
12   153 treatment
          g        x
1   control 107.8333
2 treatment 112.6667

ANOVA: N groups

  • H_{0}: each group was sampled from same population
  • H_{1}: at least one group was sampled from populations with different means
  • p-value: what is the probability of observing differences between groups as large as the ones observed, if H_{0} is true?
    • if we sampled three groups of size N=4 from the same population?
# A tibble: 12 × 2
       x g         
   <dbl> <fct>     
 1    96 treatment1
 2    71 treatment1
 3    82 treatment1
 4   134 treatment1
 5   124 treatment2
 6   103 treatment2
 7   114 treatment2
 8   124 treatment2
 9   110 treatment3
10   131 treatment3
11   131 treatment3
12   133 treatment3
           g      x
1 treatment1  95.75
2 treatment2 116.25
3 treatment3 126.25

ANOVA

  • ANOVA stands for ANalysis Of VAriance
  • ANOVA is a statistical test that compares the means of two or more groups
  • many forms of ANOVA exist, but we will start with the simplest:
    • one-way between-subjects ANOVA
  • (read Navarro, chapter 14)

one-way between-subjects ANOVA

  • one-way: one independent variable
    • later we will see two-factor ANOVA and n-factor ANOVA
  • between-subjects: each participant contributes an observation in only one group
    • later we will see within-subjects ANOVA and mixed ANOVA

Omnibus F-test

  • ANOVA computes an “omnibus” F-statistic, which is a ratio of two variances
    • (omnibus means “overall”)
    • the numerator is the between-groups variance
    • the denominator is the within-groups variance
  • Omnibus F-statistic is a metric of the “overall” question:
    • “are the means of the groups the same? (or not the same)?”

Omnibus F-test

Omnibus F-test

  • F = \frac{\mathrm{BetweenVariance}}{\mathrm{WithinVariance}}
  • what is F going to be?
  • F-ratio far above 1.0: between-groups variance is larger than within-groups variance

Omnibus F-test

  • F = \frac{\mathrm{BetweenVariance}}{\mathrm{WithinVariance}}
  • what is F going to be?
  • F-ratio below 1.0: between-groups variance is smaller than within-groups variance

Omnibus F-test

  • the null hypothesis is that the means of all groups are equal
    • H_{0}: each group was sampled from the same population
    • H_{1}: at least one group was sampled from a different population (with a different population mean)
  • the p-value for the omnibus F-test is the probability of observing an F-statistic as large as the one computed, assuming that the null hypothesis is true

Distribution of F under H_{0}

  • under the null hypothesis, there are no differences between groups in the population
  • but random sampling results in differences between sample groups
  • the F-statistic is an overall (omnibus) measure of the differences between all groups
  • under the null hypothesis we expect the F-statistic to be close to 1.0 most of the time
  • but due to random sampling, under the null hypothesis, sometimes it will be larger
  • the p-value tells us how likely is it to observe a given F-statistic under H_{0}

Distribution of F under H_{0}

Distribution of F under H_{0}

Distribution of F under H_{0}

Distribution of F under H_{0}

Omnibus F-test

  • following a significant omnibus F-test, we can perform follow-up tests to determine which groups differ from each other
  • not this week—we will cover follow-up tests next week
  • If the omnibus F-test is not significant, we should stop
    • Omnibus F-test protects us from making more Type I errors than we want
    • more about this next week

ANOVA Table/Formulas

  • \mathrm{SS_{b}} = sum of squares between
    • each group mean minus the grand mean of all groups
  • \mathrm{SS_{w}} = sum of squares within
    • each observation minus the group mean to which it belongs

ANOVA Table/Formulas

  • read Navarro, chapter 14, for a worked example, going from the raw data to the ANOVA table
  • today, we will use R to compute the ANOVA table

An illustrative dataset

library(tidyverse)
load(url("https://www.gribblelab.org/2812/data/clinicaltrial.Rdata"))
(clin.trial <- tibble(clin.trial))
# A tibble: 18 × 3
   drug     therapy    mood.gain
   <fct>    <fct>          <dbl>
 1 placebo  no.therapy       0.5
 2 placebo  no.therapy       0.3
 3 placebo  no.therapy       0.1
 4 anxifree no.therapy       0.6
 5 anxifree no.therapy       0.4
 6 anxifree no.therapy       0.2
 7 joyzepam no.therapy       1.4
 8 joyzepam no.therapy       1.7
 9 joyzepam no.therapy       1.3
10 placebo  CBT              0.6
11 placebo  CBT              0.9
12 placebo  CBT              0.3
13 anxifree CBT              1.1
14 anxifree CBT              0.8
15 anxifree CBT              1.2
16 joyzepam CBT              1.8
17 joyzepam CBT              1.3
18 joyzepam CBT              1.4
  • DV is mood.gain
  • IV is drug
  • drug is a factor with 3 levels
    • placebo (n=6)
    • anxifree (n=6)
    • joyzepam (n=6)
  • any factors in your dataset must be set in R as a factor <fct> — this is critical for computing ANOVA in R
    • factor() will convert numbers or characters to a <fct>
  • for now let’s ignore the therapy factor in the dataset.

An illustrative dataset

# A tibble: 18 × 3
   drug     therapy    mood.gain
   <fct>    <fct>          <dbl>
 1 placebo  no.therapy       0.5
 2 placebo  no.therapy       0.3
 3 placebo  no.therapy       0.1
 4 anxifree no.therapy       0.6
 5 anxifree no.therapy       0.4
 6 anxifree no.therapy       0.2
 7 joyzepam no.therapy       1.4
 8 joyzepam no.therapy       1.7
 9 joyzepam no.therapy       1.3
10 placebo  CBT              0.6
11 placebo  CBT              0.9
12 placebo  CBT              0.3
13 anxifree CBT              1.1
14 anxifree CBT              0.8
15 anxifree CBT              1.2
16 joyzepam CBT              1.8
17 joyzepam CBT              1.3
18 joyzepam CBT              1.4
  • how many participants in each group?
xtabs( ~drug, clin.trial )
drug
 placebo anxifree joyzepam 
       6        6        6 
  • mean of each group?
aggregate( mood.gain ~ drug, clin.trial, mean )
      drug mood.gain
1  placebo 0.4500000
2 anxifree 0.7166667
3 joyzepam 1.4833333
  • sd of each group?
aggregate( mood.gain ~ drug, clin.trial, sd )
      drug mood.gain
1  placebo 0.2810694
2 anxifree 0.3920034
3 joyzepam 0.2136976

An illustrative dataset

# A tibble: 18 × 3
   drug     therapy    mood.gain
   <fct>    <fct>          <dbl>
 1 placebo  no.therapy       0.5
 2 placebo  no.therapy       0.3
 3 placebo  no.therapy       0.1
 4 anxifree no.therapy       0.6
 5 anxifree no.therapy       0.4
 6 anxifree no.therapy       0.2
 7 joyzepam no.therapy       1.4
 8 joyzepam no.therapy       1.7
 9 joyzepam no.therapy       1.3
10 placebo  CBT              0.6
11 placebo  CBT              0.9
12 placebo  CBT              0.3
13 anxifree CBT              1.1
14 anxifree CBT              0.8
15 anxifree CBT              1.2
16 joyzepam CBT              1.8
17 joyzepam CBT              1.3
18 joyzepam CBT              1.4
  • or, using our dplyr/tidyverse skills:
clin.trial %>% 
    group_by(drug) %>% 
    summarise(n = n(),
              mean = mean(mood.gain),
              sd = sd(mood.gain))
# A tibble: 3 × 4
  drug         n  mean    sd
  <fct>    <int> <dbl> <dbl>
1 placebo      6 0.45  0.281
2 anxifree     6 0.717 0.392
3 joyzepam     6 1.48  0.214

An illustrative dataset

# A tibble: 18 × 3
   drug     therapy    mood.gain
   <fct>    <fct>          <dbl>
 1 placebo  no.therapy       0.5
 2 placebo  no.therapy       0.3
 3 placebo  no.therapy       0.1
 4 anxifree no.therapy       0.6
 5 anxifree no.therapy       0.4
 6 anxifree no.therapy       0.2
 7 joyzepam no.therapy       1.4
 8 joyzepam no.therapy       1.7
 9 joyzepam no.therapy       1.3
10 placebo  CBT              0.6
11 placebo  CBT              0.9
12 placebo  CBT              0.3
13 anxifree CBT              1.1
14 anxifree CBT              0.8
15 anxifree CBT              1.2
16 joyzepam CBT              1.8
17 joyzepam CBT              1.3
18 joyzepam CBT              1.4
  • plotmeans() from the gplots package
library(gplots)
plotmeans( formula = mood.gain ~ drug, # plot mood.gain by drug
  data = clin.trial,                   # the data frame
  xlab = "Drug Administered",          # x-axis label
  ylab = "Mood Gain"                   # don't display sample size
 )

An illustrative dataset

or the ggplot way:

clin.trial %>% 
    group_by(drug) %>% 
    summarise(mood = mean(mood.gain),
              sd   = sd(mood.gain),
              N    = n(),
              se   = sd/sqrt(N),
              ci95 = se * qt(.975,N-1)) %>%
    ggplot(aes(x=drug,y=mood)) +
        geom_errorbar(aes(ymin  = mood-ci95, 
                          ymax  = mood+ci95),
                          width = 0.1) +
        geom_point(aes(x=drug,y=mood),
                   size = 3) + 
        geom_jitter(data=clin.trial, 
                    aes(x=drug,y=mood.gain),
                    width = 0.1,
                    color = "red",
                    alpha = 0.8) + 
        labs(x="Drug", y="Mood Gain",
             title="Clinical Trial Data",
             subtitle="(from the Navarro text)") + 
        theme_bw(base_size=16)

The ANOVA Omnibus F-test

  • Are the differences between group means real^{*} or due to random chance?
  • what is the probability of observing differences as big as this if the null hypothesis is true?
  • H_{0}: each group was sampled from population(s) with the same mean
  • H_{1}: it is not true that each group was sampled from population(s) with the same mean

The ANOVA Omnibus F-test

  • Are the differences between group means real^{*} or due to random chance?
  • what is the probability of observing differences as big as this if the null hypothesis is true?
  • Omnibus F-test: F = \frac{\mathrm{MS_{between}}}{\mathrm{MS_{within}}}
    • gives an overall measure of the difference between all group means together, relative to the variability within each group

ANOVA in R - aov()

my.anova <- aov( mood.gain ~ drug, data = clin.trial )
summary(my.anova) 
            Df Sum Sq Mean Sq F value   Pr(>F)    
drug         2  3.453  1.7267   18.61 8.65e-05 ***
Residuals   15  1.392  0.0928                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA in R - aov()

my.anova <- aov( mood.gain ~ drug, data = clin.trial )
summary(my.anova) 
            Df Sum Sq Mean Sq F value   Pr(>F)    
drug         2  3.453  1.7267   18.61 8.65e-05 ***
Residuals   15  1.392  0.0928                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • p < 0.05, so we reject the null hypothesis
  • The groups were not sampled from the same population(s) with the same mean
  • What is differerent from what?? We don’t know yet, the omnibus F-test just tells us that there are differences somewhere

ANOVA in R : follow-up tests

  • next week we will talk about post-hoc tests, which allow us to determine which groups are different from each other
  • an issue of great importance is Type-I Error rate
  • more on this next week

ANOVA Assumptions

  1. Normality of the residuals
  2. Homogeneity of variance across groups
  3. Independence of observations

1. Normality

  • residuals are assumed to be normally distributed
  • plot a histogram of the residuals
  • plot a Q-Q plot (quantile-quantile plot)
  • Shapiro-Wilk hypothesis test

1. Normality

hist(residuals(my.anova))

qqnorm(y=residuals(my.anova),pch=16)
qqline(y=residuals(my.anova),lty=2)

1. Normality

shapiro.test(residuals(my.anova))

    Shapiro-Wilk normality test

data:  residuals(my.anova)
W = 0.96019, p-value = 0.6053
  • H_{0}—samples from population(s) with a normal distribution
  • H_{1}—not from a pop. with a normal distribution
  • p >> .05
  • we fail to reject the null hypothesis
  • normality assumption not violated

1. Normality

  • if normality assumption is violated, we can still use ANOVA
  • but we should use a non-parametric test instead
  • Kruskal-Wallis test for non-parametric ANOVA
  • works on ranks of the data not the data itself
  • kruskal.test() in R
  • see Navarro, Ch. 14.10 for details about how it works

1. Normality

kruskal.test(mood.gain ~ drug, data = clin.trial)

    Kruskal-Wallis rank sum test

data:  mood.gain by drug
Kruskal-Wallis chi-squared = 12.076, df = 2, p-value = 0.002386
  • just like our regular ANOVA, p < .05, so we reject the null hypothesis that the groups were sampled from the same population(s) with the same mean(s)

2. Homogeneity of Variance

  • population standard deviation is the same for all groups
  • test using our sample data: residuals are assumed to have the same variance across groups
  • Levene’s test for homogeneity of variance
  • leveneTest() in the car package

2. Homogeneity of Variance

library(car) # you may need to install.packages("car") once
leveneTest(mood.gain ~ drug, data = clin.trial)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  2  1.4672 0.2618
      15               
  • H_{0}—samples are from population(s) with same sd
  • H_{1}—not from populations with same sd
  • p >> .05
  • we fail to reject the null hypothesis
  • homogeneity of variance assumption not violated

2. Homogeneity of Variance

  • if homogeneity of variance assumption is violated, we can still use ANOVA
  • use the Welch one-way test instead of the regular ANOVA F-test
  • oneway.test() in R
  • see Navarro, Ch. 14.8 for details

2. Homogeneity of Variance

oneway.test(mood.gain ~ drug, data = clin.trial)

    One-way analysis of means (not assuming equal variances)

data:  mood.gain and drug
F = 26.322, num df = 2.0000, denom df = 9.4932, p-value = 0.000134
  • just like our regular ANOVA, p < .05, so we reject the null hypothesis that the groups were sampled from the same population(s) with the same mean(s)

3. Independence of observations

  • observations are assumed to be independent of each other
  • e.g. via random sampling from each group
  • e.g. via random assignment to each group
  • no data-driven test for this assumption
  • based on your conceptual understanding of the experiment and the data

3. Independence of observations

  • e.g. of clear violation:
  • repeated measures within each group
    • (e.g., if you have 2 groups, and each person is measured twice, then you have 2 observations per person, and you have 2 repeated measures per person)
    • we will cover repeated measures ANOVA at the end of the course

ANOVA as a linear model

  • ANOVA is a linear model of your data
  • just like regression / multiple-regression Y_{ij} = \mu_{j} + \epsilon_{ij}\\
  • Y_{ij} is the i^{th} observation in the j^{th} group
  • \mu_{j} is the mean of the j^{th} group
  • \epsilon_{ij} is the residual for the i^{th} observation in the j^{th} group

ANOVA as a linear model: H_{1}

Y_{ij} = \mu_{j} + \epsilon_{ij}\\

  • each group was sampled from a population with its own mean, \mu_{j}
  • this is H_{1}, the alternate hypothesis
  • what is the null hypothesis H_{0}?

ANOVA as a linear model: H_{0}

Y_{ij} = \mu + \epsilon_{ij}\\

  • each group was sampled from a population(s) with the same mean, \mu
  • this is H_{0}, the null hypothesis

ANOVA as a linear model: model comparison

  • Y_{ij} = \mu_{j} + \epsilon_{ij}
    • this is the full model H_{1}
    • different population mean \mu_{j} for each group j
  • Y_{ij} = \mu + \epsilon_{ij}
    • this is the restricted model H_{0}
    • same population mean \mu for all groups

ANOVA as a linear model: sample data

Group 1 Group 2 Group 3
4 7 6
5 4 9
2 6 8
1 3 5
3 5 7
mean=3 mean=5 mean=7

ANOVA as a linear model: H_{1}

  • full model (H_{1})
  • Y_{ij} = \mu_{j} + \epsilon_{ij}
  • different population mean \mu_{j} for each group j
  • 3 parameters to estimate: \mu_{1}, \mu_{2}, \mu_{3}
  • best estimates: \bar{Y}_{1}, \bar{Y}_{2}, \bar{Y}_{3}

ANOVA as a linear model: H_{0}

  • restricted model (H_{0})
  • Y_{ij} = \mu + \epsilon_{ij}
  • same population mean \mu for all groups j
  • 1parameter to estimate: \mu
  • the “grand mean” (average of all observations)

ANOVA as a linear model: H_{1} vs H_{0}

ANOVA as a linear model: H_{1} vs H_{0}

  • Y_{ij} = \mu_{j} + \epsilon_{ij} : estimate 3 parameters
  • Y_{ij} = \mu + \epsilon_{ij} : estimate 1 parameter
  • is it worth it to estimate 3 parameters instead of 1?
  • we “pay” by moving 2 df from the MSw term to the MSb term
  • the increase in SSb had better be worth it!
  • just like in linear regression, the ANOVA F-test is a test of this trade-off

Linear Models

Linear Models

Linear Models

Linear Models

Linear Models

  • Y_{ij} = \mu_{j} + \epsilon_{ij}
  • or recast in terms of \beta coefficients:
  • Y_{ij} = \beta_{0} + \beta_{j} + \epsilon_{ij}
  • grand-mean + effect-of-group + error