t-tests and their applications 🍵

---

layout: true
  
<div class="my-footer">
<span>
<a href="https://psychmethods.github.io/coursenotes/" target="_blank">Methods in Psychological Research</a>
</span>
</div>

---

# t-tests and their applications

---

## Roadmap: Last Week 
.large[
- p-values
- hypothesis testing
- experimental designs
]

---

## Roadmap: This Week
.large[
- t-test logic
- one sample t-tests
- two sample t-tests
- confidence intervals
]

---

# Recall: Z-scores

---

## From Z to t

.pull-left[
Remember `$Z$`-scores? We use them to:
- Standardize values
- Compare sample means to populations (Ok, more like a null-hypothesized value)
- Calculate probabilities
]
--
.pull-right[

- Through this process, we learned several key principles about:
	- hypothesis testing, 
	- statistical power, the factors that influence it, and
	- the sample size `n` needed to achieve it.

]

---

## From Z to t

.pull-left[
There's a catch...
]
--
.pull-right[
For Z-scores we need:
- Population mean `$\mu$`
- Population standard deviation `$\sigma$`
- Sample mean `$\bar{X}$`
- Sample size (`n`)
]

---

## The real world problem

.pull-left[
.hand[What we usually have]:
- Sample mean `$\bar{X}$`
- Sample standard deviation (s)
- Sample size (n)
]

.pull-right[
.hand[What we often don't have:]
- Population mean `$\mu$`
- Population standard deviation `$\sigma$`
]

---

## Enter the t-statistic

We can modify our Z-score formula to work with what we actually have:

$$
`\begin{align*}
Z = \frac{\bar{X}- \mu}{\sigma/\sqrt{n}}
\end{align*}`
$$

---

## Enter the t-statistic

We can modify our Z-score formula to work with what we actually have:

$$
`\begin{align*}
Z &= \frac{\bar{X}- \mu}{\sigma/\sqrt{n}}
\end{align*}`
$$

$$
`\begin{align*}
Z_{modified} &= \frac{M- \mu_{0} }{s/\sqrt{n}}
\end{align*}`
$$

---

- The only change is using substitute an estimate of `$\sigma$` (*i.e.*,  `$\hat{\sigma}$` ) instead of `$\sigma$`
- Seems simple to replace the sample standard deviation (*s*), but this has big implications...

---

## Thinking Intuitively

- The original `$Z$` statistic modified M by subtracting a constant, 
  - then dividing by a constant.
- The only thing in the `$Z$` statistic that would vary over repeated samples is
  - M, the sample mean.
- This means that the distribution of `$Z$` has to be the
  - same shape as the distribution of M

---

## Thinking Intuitively
.pull-left[
- The modified `$Z$` statistic has a 
  - sample quantity in its denominator.
- That quantity varies over repeated samples along with M.
- So now, instead of only one thing varying, 
  - you have two.
- It turns out that, as `$n$` gets larger and larger, 
  - these modifications matter less and less, 
  - because `$s$` starts acting more and more like the constant that it is estimating.
]
.pull-right[
$$
`\begin{align*}
Z_{modified} &= \frac{M- \mu_{0} }{s/\sqrt{n}}
\end{align*}`
$$
]

---

## Thinking Intuitively

- In fact, it was known back around 1900 that, as `$n$` goes to infinity, 
  - the modified `$Z$` statistic's distribution got closer and closer to 
  - the distribution of the original `$Z$` statistic.
- What people didn't know was...
  - how to characterize the performance of the modified `$Z$` statistic at small sample sizes.
  
---

## Student's T
.pull-left[
- W.S. Gossett was a statistician working for the Guinness brewery,
  - when he derived the exact distribution of `$Z_{modified}$` under some specific conditions.
- This development was considered something of a landmark by the statistical community.
- However, due confidentiality and conflicts of interest, 
  - Gossett published his work under the pen name of "Student".

]

.pull-right[
<img src="data:image/png;base64,#../img/William_Sealy_Gosset.jpg" width="65%" style="display: block; margin: auto;" />

]
---

## Student's T

.pull-left[
- Thus, the modified `$Z$` statistic became known as
  - “Student's t statistic” in his honor.
- The distribution of the statistic became known as
  - “Student’s t distribution,” 
- This statistic and distribution has many applications 
  - beyond what we are reviewing here
  
$$
t = \frac{\bar{X}- \mu}{s/\sqrt{n}}
$$
]

]
---

## t-distribution facts

.pull-left[
- the t-distribution varies as a function of:
   - degrees of freedom (df)
- For the sake of simplicity, df will be defined as one less than the number of observations in the sample. 
- Df = n-1 (n: sample size)

.center.large[
`$t_{df} = \frac{z}{ \sqrt{\frac{ \chi^{2}_{df}}{df}}}$`
]
]
.pull-right[
<img src="data:image/png;base64,#01_roadmap_files/figure-html/ztplot-1.png" width="90%" style="display: block; margin: auto;" />

]

---

---

## Z vs T
<br>

|   | Z-Distribution | T-distribution |
|---|---|---|
| Do we know the population variance? | Yes | No. We substitute sample variance `$s^{2}$` for `$\sigma^{2}$`|
| shape | Bell shaped and symmetric | Bell shaped and symmetric, but fatter tail than z-distribution. |
| Mean  | 0  | 0 |
| Variance  | 1 | df/(df-2): proportionally larger than z |
| score   | `$Z = \frac{M- \mu_{0} }{\sigma/\sqrt{n}}$`  | `$t = \frac{M- \mu_{0} }{s/\sqrt{n}}$`  |

---

# Wrapping Up...
---

# Three t-test Applications

---

## Three t-test Applications

- One sample t-test
  - Used when we want to know whether a sample we collected come from a particular population with unknown mean `$\mu$`. 
  - (similar to what we did with z-test so far)
--

- Matched pair t-test
  - Used when the two samples of data were related or provided by the same participants 
  - (*e.g.*, pre- and post-test)
--
  
- Independent sample t-test
  - test the difference between the means of two independent groups 
  - (*e.g.*, treatment and control group)

---

## General Procedure
.large[
1. Decide what type of test we want to use
2. Decide what the null and alternative hypothesis is.
3. List what we have
4. Compute t-statistic
5. finding critical value in t-table
6. Compare t-statistic to t* 
  - (we can also calculate p and compare to `$\alpha$`)
7. Make decision: reject null or not, and draw conclusion
]
---

## One Sample t-test

- Used when we want to know whether a sample we collected come from a particular population with unknown mean `$\mu$`. 
--

- Example:
- We had a group of 28 psychology minors who took a quiz on 1990s music, but clearly had no idea about the era (sample mean: 46.21, sample sd: 6.73). 
- If the students were simply guessing—like they'd never even heard of Nirvana or the Spice Girls—
  - we would expect they’d score about 20 (OUT OF 100) by pure chance.
--

- Question:
  - Did the students just guess by chance, or do they know a little more than expected?

---

## Workflow

- `1`. Decide what type of test we want to use
  - We don't know population sd 
      - ∴  t-test
  - We only have one sample, and we want to know whether it is from a particular population.
      - ∴  one sample t-test
- `2`. Decide what the null and alternative hypothesis is.
  - Null: students are guessing. `$H_{0}: \mu=20$`  `$(\mu_0)$`
  - Alternative: students are not guessing. `$H_{1}: \mu\ne20$`  `$(\mu_1)$`

---

## Workflow

- `3`. List what we have
  - `$\bar{x}= 46.21$`
  - `$\mu=20$`
  - `$N=28$`
  - `$s=6.73$`
- `4.`  Compute t-statistic
  - `$t = \frac{M- \mu_{0} }{s/\sqrt{n}}$` = 
  - (46.21−20)/(6.73/ `$\sqrt{28}$`)= 20.61
  
---

## Workflow
  
- `5`. finding critical value in t-table
  - Df = n-1 = 27
  - Because we specified the test as two-tailed, 
     - a 95% CI will have a upper tail probability of 0.05/2= 0.025
  - t*=2.051
- `6`. Compare t-statistic to t* (we can also calculate p and compare to `$\alpha$`)
  - 20.61 > 2.051
- `7`. Make decision: reject null or not, and draw conclusion
   - We reject null hypothesis based on the results, the students are not guessing.

---

## One-sample t-test: does mean = X?

- e.g. Taylor Swift’s latest album, The Tortured Poets Department, is said to average 2.1 million streams per track each month.
**Swifties want to know if this album really lives up to the expectations.**

---

## One-sample t-test: Does The Tortured Poets Department Hit the Right Note?

- Null hypothesis, `$H_0$`:
    + Average monthly plays per track = 2.1 million
- Alternative hypothesis: `$H_1$`:
    + Average monthly plays per track `$\ne$` 2.1 million
- Tails: *two-tailed*
- Either *reject* or *do not reject* the null hypothesis

---

## One sample t-test; the data

.small.pull-left[

|Month     | Monthly.play.count|
|:---------|------------------:|
|January   |               2.90|
|February  |               2.99|
|March     |               2.48|
|April     |               1.48|
|May       |               2.71|
|June      |               4.17|
|July      |               3.74|
|August    |               3.04|
|September |               1.23|
|October   |               2.72|
|November  |               3.23|
|December  |               3.40|
]
.pull-right[

- mean = `$(2.9 + \dots + 3.40) / 12$` = 2.841
- Standard deviation = 0.837 million plays
- Hypothesized Mean = 2.1 million
]

---

## One-sample t-test; key assumptions

.pull-left-narrow[
- Observations are independent
- Observations are normally distributed
]
.pull-right-wide[
<img src="data:image/png;base64,#01_roadmap_files/figure-html/unnamed-chunk-4-1.png" width="90%" style="display: block; margin: auto;" />
]

---

## One-sample t-test; results
.pull-left[
- Test statistic:

`$t_{n-1} = t_{11} = \frac{\bar{x} - \mu_0}{s.d. / \sqrt{n}}$`

`$= \frac{2.84 - 2.10}{s.e.(M)} =$` 3.065

]
.pull-right[

<img src="data:image/png;base64,#01_roadmap_files/figure-html/unnamed-chunk-6-1.png" width="90%" style="display: block; margin: auto;" />
]

---

## One-sample t-test; results

---

## One-sample t-test; results

- Test statistic: `$t_{n-1} = t_{11} = \frac{\bar{x} - \mu_0} {s.d. / \sqrt{n}} = \frac{2.84 - 2.10}{s.e.(\bar{x})} =$` 3.065
- df = 11
- P = 0.01
- ***Reject*** `$H_0$`
- Evidence that Taylor’s monthly play count `$\ne$` 2.1 million.
---

## One-sample t-test; results

- Taylor’s average monthly play count per track is 2.84 million (95% CI: 2.30, 3.37).
- It is not equal to the hypothesized mean told to the Swifies of 2.1.
- t=3.07, df=11, p=0.01

---

## 3rd Example

- We wanted to test whether the volume of a shipment of lumber is less than usual:
  - `$\mu_{0} = 39000$` cubic feet
  
.pull-left-narrow[
- Classic `$R$`  syntax
  - t.test(y, mu = 0)
  - where x is the name of our variable of interest, and 
  - mu is set equal to the mean specified by the null hypothesis.
]

``` r
set.seed(0)
treeVolume <- c(rnorm(75, 
                      mean = 36500, 
                      sd = 2000))
t.test(treeVolume, mu = 39000) # Ho: mu = 39000
```

]

.center.footnote[Source Code: https://datascienceplus.com/t-tests/]

---

## Output

```
## 
## 	One Sample t-test
## 
## data:  treeVolume
## t = -12.288, df = 74, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 39000
## 95 percent confidence interval:
##  36033.60 36861.38
## sample estimates:
## mean of x 
##  36447.49
```

---

# Wrapping Up...
<!--
- source code: https://ggplot2tutor.com/tutorials/sampling_distributions
- source code: https://github.com/bioinformatics-core-shared-training/IntroductionToStats

# References

-->