class: center, middle, inverse, title-slide .title[ # Sampling Distributions
🐱 ] .author[ ### S. Mason Garrison ] --- layout: true <div class="my-footer"> <span> <a href="https://psychmethods.github.io/coursenotes/" target="_blank">Methods in Psychological Research</a> </span> </div> --- class: middle # Sampling Distributions --- class: middle # Road Map - Sampling distributions: what and why - The sampling distribution of the mean - Laws that describe distribution shape - Law of Large Numbers - Central Limit Theorem - Using sampling distributions for inference --- # Sampling Distribution - Simple Idea: - Distribution of a statistic - Compare to the distribution of a given variable -- - Formal Definition: - Sampling distribution is the distribution of values taken by a statistic across all possible samples of a given size, drawn from a particular population --- ## All statistics have a sampling distribution - Mean, mode, third quartile, upper extreme, etc - Each statistic has its own sampling distribution - Terminology: - `\(\mu_{x}\)`, `\(\sigma_{x}\)` = raw score distribution - `\(\mu_{\bar{x}}\)`, `\(\sigma_{\bar{x}}\)` = Parameters of the sampling distribution of that particular statistic --- ## Population Distributions are different - Be careful: The population distribution describes the individuals that make up the population. - A sampling distribution describes how a statistic varies in many samples from the population. --- ## Sampling distribution: Illustration <img src="data:image/png;base64,#../img/distribution_flow_chart.png" width="95%" style="display: block; margin: auto;" /> --- # Example: Sampling Distribution of the Mean - SAT Scores at WFU - `\(\mu_{x}\)` Population mean = 1500 - `\(\sigma{x}\)` Population standard deviation = 100 <br> - Sampling Distribution of 25 WFU Student Scores - `\(\mu_{x} =\mu_{\bar{x}}\)` = 1500 - `\(\sigma_{\bar{x}} = \frac{\sigma_{x}}{\sqrt{n}} = \frac{100}{\sqrt{25}} = 20\)` --- # General Equation .pull-left-narrow[ - `\(\mu_{\bar{x}} =\mu_{x}\)` - `\(\sigma_{\bar{x}} = \frac{\sigma_{x}}{\sqrt{n}}\)` - `\(\sigma^2_{\bar{x}} = \frac{\sigma^2_{x}}{n}\)` ] .pull-right-wide[ - Because the mean of the statistic `\(\bar{x}\)` is always equal to the mean `\(\mu\)` of the population - (that is, the sampling distribution of the mean is centered at `\(\mu\)`) - we say the statistic `\(\bar{x}\)` is an unbiased estimator of the parameter `\(\mu\)` - Note: on any particular sample, `\(\bar{x}\)` may not may fall above or below `\(\mu\)` ] --- # Example: Sampling Distribution of the Mean .pull-left[ - Population mean `\(\mu\)` = 25 - Population standard deviation `\(\sigma\)` = 10 - - <img src="data:image/png;base64,#sampling-distributions_files/figure-html/unnamed-chunk-3-1.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ - SRS of size 10: `\(\bar{x}\)` = 26.91 - SRS of size 10: `\(\bar{x}\)` = 26.96 - SRS of size 10: `\(\bar{x}\)` = 24.49 - SRS of size 10: `\(\bar{x}\)` = 20.83 <img src="data:image/png;base64,#sampling-distributions_files/figure-html/unnamed-chunk-4-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Standard Deviation of the Sampling Distribution - Because the standard deviation of the sampling distribution of `\(\bar{x}\)` is `$$\sigma_{\bar{x}} = \frac{\sigma_{x}}{\sqrt{n}}$$` - the averages are less variable than individual observations, and - averages are less variable than the results of small samples. --- # Standard Deviation of the Sampling Distribution - Not only is the standard deviation of the distribution of `\(\bar{x}\)` smaller than the standard deviation of individual observations, - it gets smaller as we take larger samples. - The results of large samples are less variable than the results of small samples. - Although the standard deviation of the distribution of `\(\bar{x}\)` gets smaller, - it does so at the rate of `\(\sqrt{n}\)`, not `\(n\)`. - To cut the sampling distribution’s standard deviation in half, for instance, - you must take a sample four times as large, not just twice as large. --- # Wrapping Up... --- # This Time... - Laws that describe the shape of distributions --- # Laws that describe the shape of distributions - Law of Large Numbers - Central Limit Theorem --- # Law of Large Numbers - Although a single statistic, like `\(\bar{x}\)` is rarely identical to the population mean, we can still use it as a reasonable estimate of the population mean. - Why? -- - If we keep taking larger and larger samples, the statistic `\(\bar{x}\)` is guaranteed to get closer and closer to the parameter `\(\mu\)` --- # Law of Large Numbers .pull-left[ <img src="data:image/png;base64,#sampling-distributions_files/figure-html/unnamed-chunk-5-1.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ - As the size of the sample increases, the sample mean gets closer to the population mean - Note: This is a probabilistic statement, not a deterministic one ] --- # Law of Large Numbers - As the size of the sample increases, the sample mean gets closer to the population mean - The sample mean is an unbiased estimator of the population mean - The larger the sample size, the more likely the sample mean is to be close to the population mean - Example: Rolling a die - The average of a large number of independent observations from the same population is likely to be close to the population mean - The average of a small number of independent observations from the same population is not likely to be close to the population mean - Note: This is a probabilistic statement, not a deterministic one --- # Law of Large Numbers in Action <img src="data:image/png;base64,#sampling-distributions_files/figure-html/unnamed-chunk-6-1.png" width="95%" style="display: block; margin: auto;" /> --- # Central Limit Theorem - Similarly, the Central Limit Theorem describes the shape of a sampling distribution. - Draw an SRS of size `\(n\)` from any population with mean `\(\mu\)` and finite standard deviation `\(\sigma\)`. - The central limit theorem says that when n is large, the sampling distribution of the sample mean `\(\bar{x}\)` is approximately Normal: - `\(\bar{x}\)` is distributed as N (`\(\mu\)`, `\(\frac{\sigma}{\sqrt{n}}\)`) --- # Central Limit Theorem - No matter what the shape of the raw score distribution. - The central limit theorem allows us to use normal probability calculations to answer questions about sample means from many observations, - even when the population distribution is not normal (e.g, skewed, uniform, bimodal) --- # Reiterate - With a simple random sample, - A sampling distribution of the mean is normal if: - The raw score distribution is normal; OR - The sample size is “large” - N = 30, is usually large enough - The central limit theorem does not apply under small sample size --- # Example: CLT - The distribution of household income in the U.S. is strongly right-skewed, with a mean of about $54,000 and a standard deviation of about $65,000. - .question[ If we take a random sample of 45 households, what is the probability that the mean household income for the sample is more than $60,000?] -- - Because the sample size is large, we can use the central limit theorem to answer this question. -- .pull-left[ - The sampling distribution of the mean is approximately normal with: - Mean = $54,000 - Standard deviation = `\(\frac{65000}{\sqrt{45}}\)` `\(\approx\)` 9689.63 ] -- .pull-right[ - We want to find `\(P(\bar{x} > 60000)\)` ] --- .pull-left-wide[ - We want to find `\(P(\bar{x} > 60000)\)` - `\(Z = \frac{\bar{x} - \mu}{\sigma_{\bar{x}}} = \frac{60000 - 54000}{65000/\sqrt{45}} \approx\)` 0.62 .medi[ ``` r round((60000 - 54000)/(65000/sqrt(45)),4) ``` ``` ## [1] 0.6192 ``` ]] .pull-right-narrow.small[ > Btw... AI guessed: `\(\approx 0.41\)` ] --- .pull-left-wide[ - `\(P(\bar{x} > 60000) = P(Z >\)` 0.62) = - `\(1 - P(Z <\)` 0.62) = .medi[ ``` r round(pnorm((60000 - 54000)/(65000/sqrt(45))),4) ``` ``` ## [1] 0.7321 ``` ] - 1 - 0.7321 = 0.2679 .medi[ ``` r round(1-pnorm((60000 - 54000)/(65000/sqrt(45))),4) ``` ``` ## [1] 0.2679 ``` ] ] .pull-right-narrow.small[ > AI guessed: 0.3409 ] --- # Example: CLT .pull-left-narrow[ <img src="data:image/png;base64,#sampling-distributions_files/figure-html/unnamed-chunk-10-1.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right-wide[ - Based on service records from the past year, the time (in hours) that a technician requires to complete preventative maintenance on an air conditioner follows the distribution that is strongly right-skewed and whose most likely outcomes are close to 0. The mean time is `\(\mu = 1\)` hour and the standard deviation is `\(\sigma\)` = 1 hours. ] - .question[Your company will service an SRS of 70 air conditioners. You have budgeted 1.1 hours per unit. Will this be enough?] -- The central limit theorem states that the sampling distribution of the mean time spent working on the 70 units has: - Mean = 1 hour - Standard deviation = `\(\frac{\sigma_{\bar{x}}}{\sqrt{n}} = \frac{1}{\sqrt{70}} \approx\)` 0.12 - and is approximately normal because the sample size is large. --- # Will this be enough? .pull-left-wide[ - We want to find `\(P(\bar{x} > 1.1)\)` - The sampling distribution of the mean time spent working is approximately ( N(1, 0.12) ) since ( n = 70 `\(\ge\)` 30 ). ] .pull-right-narrow[ <img src="data:image/png;base64,#sampling-distributions_files/figure-html/clt_schematic-1.png" width="95%" style="display: block; margin: auto auto auto 0;" /> ] -- $$ Z = \frac{\bar{x} - \mu}{\sigma_{\bar{x}}} = \frac{1.1 - 1}{\frac{1}{\sqrt{70}}} \approx $$ ``` r round((1.1 - 1)/(1/sqrt(70)),4) ``` ``` ## [1] 0.8367 ``` --- $$ p(\bar{x} > 1.1) = P(Z > 0.8367) $$ $$ = 1 - P(Z < 0.8367) = $$ ``` r round(1-pnorm((1.1 - 1)/(1/sqrt(70))),4) ``` ``` ## [1] 0.2014 ``` If you budget 1.1 hours per unit, there is a 20% chance the technicians will not complete the work within the budgeted time. --- # Wrapping Up... --- # Sampling distributions & statistical significance --- # Statistical significance .pull-left[ - We have looked carefully at the sampling distribution of a sample mean. - However, any statistic we can calculate from a sample will have a sampling distribution. - The sampling distribution of a statistic allows us to determine how likely a particular value of the statistic is. - For example, here is the sampling distribution of the sample standard deviation and variance for samples of size 5 from a normal population with mean 0 and standard deviation 1. ] .pull-right[ <img src="data:image/png;base64,#sampling-distributions_files/figure-html/unnamed-chunk-13-1.png" width="90%" height="20%" style="display: block; margin: auto;" /> <img src="data:image/png;base64,#sampling-distributions_files/figure-html/unnamed-chunk-14-1.png" width="90%" height="20%" style="display: block; margin: auto;" /> ] --- # The sampling distribution of a sample statistic is determined by: - the particular sample statistic we are interested in, - the distribution of the population of individual values from which the sample statistic is computed, and - the method by which samples are selected from the population. --- # Sampling distributionsand statistical significance - The sampling distribution allows us to determine the probability of observing any particular value of the sample statistic in another such sample from the population. - We said that an observed effect so large that it would rarely occur by chance is called statistically significant. --- # Consider This… - We may decide, based on our observed set of 1000 samples, that because we say only 2 with variances above 200, that that is a statistically significant event. --- # Wrapping Up...